Komparasi Empat Kernel Support Vector Machine pada Klasifikasi Cyberbullying Twitter Berbahasa Indonesia
Abstract
Cyberbullying on the Twitter social media platform has emerged as a significant social problem in Indonesia, with adverse effects on the mental health and well-being of its victims. Given the enormous volume of daily tweets, automated detection of cyberbullying expressions has become an urgent necessity. This study aims to compare the performance of four kernel functions in the Support Vector Machine (SVM) algorithm namely Linear, Radial Basis Function (RBF), Polynomial, and Sigmoid for cyberbullying classification on Indonesian-language tweets. The dataset used is a publicly available corpus of 13,169 annotated tweets released by Ibrohim and Budi in 2019. The preprocessing pipeline includes case folding, text cleaning, slang normalization using a colloquial dictionary, stopword removal, and stemming based on the Sastrawi library. Text features are extracted using Term Frequency–Inverse Document Frequency (TF-IDF) with a combination of unigrams and bigrams limited to the top 5,000 features. Model training is conducted on a stratified 80:20 split. Experimental results show that the RBF kernel achieves the highest performance with an accuracy of 0.8281 and an F1-score of 0.8269, slightly outperforming the Linear kernel (accuracy 0.8258; F1-score 0.8256). The Sigmoid kernel reaches an accuracy of 0.8204, while the Polynomial kernel records the lowest performance (accuracy 0.7674). The Linear kernel proves to be the most efficient option with the shortest training time (9.19 seconds) without significantly compromising accuracy. These findings can support the development of automated content moderation systems on Indonesian-language platforms.
References
J. W. Patchin and S. Hinduja, "Bullies move beyond the schoolyard: A preliminary look at cyberbullying," Youth Violence and Juvenile Justice, vol. 4, no. 2, pp. 148–169, 2006.
R. Slonje, P. K. Smith, and A. Frisén, "The nature of cyberbullying, and strategies for prevention," Computers in Human Behavior, vol. 29, no. 1, pp. 26–32, 2013.
P. K. Smith, J. Mahdavi, M. Carvalho, S. Fisher, S. Russell, and N. Tippett, "Cyberbullying: Its nature and impact in secondary school pupils," Journal of Child Psychology and Psychiatry, vol. 49, no. 4, pp. 376–385, 2008.
M. F. Wright, "The role of cyberbullying perpetration in psychological well-being: A longitudinal study," Computers in Human Behavior, vol. 87, pp. 173–180, 2018.
M. O. Ibrohim and I. Budi, "Multi-label hate speech and abusive language detection in Indonesian Twitter," in Proc. Third Workshop on Abusive Language Online, Florence, Italy, 2019, pp. 46–57.
B. Haidar, M. Chamoun, and A. Serhrouchni, "A multilingual system for cyberbullying detection: Arabic content detection using machine learning," Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 6, pp. 275–284, 2017.
H. Rosa, N. Pereira, R. Ribeiro, P. C. Ferreira, J. P. Carvalho, S. Oliveira, L. Coheur, P. Paulino, A. M. Veiga Simão, and I. Trancoso, "Automatic cyberbullying detection: A systematic review," Computers in Human Behavior, vol. 93, pp. 333–345, 2019.
C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA, USA: MIT Press, 2002.
H.-T. Lin and C.-J. Lin, "A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods," Department of Computer Science, National Taiwan University, Tech. Rep., 2003.
T. Joachims, "Text categorization with support vector machines: Learning with many relevant features," in Proc. 10th European Conf. Machine Learning (ECML), Chemnitz, Germany, 1998, pp. 137–142.
N. P. G. Naomi, A. Romadhony, and S. Suyanto, "Indonesian hate speech detection using IndoBERTweet and BiLSTM," in Proc. 10th Int. Conf. Information and Communication Technology (ICoICT), Bandung, Indonesia, 2022, pp. 137–141.
J. P. Haumahu, S. D. H. Permana, and Y. Yaddarabullah, "Fake news classification for Indonesian news using extreme gradient boosting (XGBoost)," IOP Conference Series: Materials Science and Engineering, vol. 1098, no. 5, p. 052081, 2021.
F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, "IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP," in Proc. 28th Int. Conf. Computational Linguistics (COLING), Barcelona, Spain (Online), 2020, pp. 757–770.
L. Buitinck et al., "API design for machine learning software: Experiences from the scikit-learn project," in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, Prague, Czech Republic, 2013, pp. 108–122.
H. Susanto, A. Bashri, and W. F. Senjaya, "Sastrawi: A library for stemming Indonesian language," GitHub repository, 2014. [Online]. Available: https://github.com/sastrawi/sastrawi
G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing & Management, vol. 24, no. 5, pp. 513–523, 1988.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Komparasi Empat Kernel Support Vector Machine pada Klasifikasi Cyberbullying Twitter Berbahasa Indonesia
Pages: 25 - 32