An Improved Toxic Speech Detection on Multimodal Scam Confrontation Data Using LSTM-Based Deep Learning

Gumelar, Agustinus Bimo and Sugiarto, Indar and Purnomo, Mauridhi Hery (2024) An Improved Toxic Speech Detection on Multimodal Scam Confrontation Data Using LSTM-Based Deep Learning. [UNSPECIFIED]

	PDF Download (1305Kb)
	PDF Download (7Mb)

Official URL: https://inass.org/wp-content/uploads/2024/07/20241...

Abstract

Toxic speech has gained substantial attention, focusing on its detrimental effects and prevalence across online platforms. This phenomenon often exhibits discernible patterns in pronunciation analogous to emotions such as happiness or anger. It has been relatively underexplored in prior studies, which predominantly addressed offensive language, hate speech, and sarcasm without considering their emotional properties. Social media platforms have emerged as spaces where individuals share personal encounters with toxic speech that impacts on their well-being. To address this challenge, our study introduces a novel approach that combines speech and text data within a Long Short-Term Memory (LSTM) framework. Unlike existing methods that primarily focus on text analysis, our approach uniquely integrates both speech and text, thereby enhancing the model’s ability to accurately detect toxic content. This multimodal data strategy is such an innovative step forward that it provides a more comprehensive solution to the problem of toxic speech detection. Our collected dataset comprises two-way conversations from online fraud reports and confrontations related to loan scams uploaded on YouTube, conducted in the Indonesian language. The absence of subtitles can emerge any ambiguity of homonyms, so it is required to transcribe the audio content to text. To do this, we used native speakers to make sure the transcription was correct in the Indonesian language of the toxic context. In addition, speech features, such as pitch, intensity, and speaking rate, were utilized alongside text features, including Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). As a result, validation through F1-score measurement yielded 92.73% for text data and 89.09% for speech data. Our proposed approach provided a substantial improvement of approximately 12%-30% compared to the previous LSTM models. The performance comparison results confirmed that our proposed approach can enhance the accuracy of toxic speech detection.

Item Type:	UNSPECIFIED
Uncontrolled Keywords:	Toxic speech detection, Speech pitch, Speech intensity, Bag-of-words, Term frequency-inverse document frequency, Long short-term memory
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Faculty of Industrial Technology > Electrical Engineering Department
Depositing User:	Admin
Date Deposited:	16 Jan 2025 18:27
Last Modified:	09 Sep 2025 04:26
URI:	https://repository.petra.ac.id/id/eprint/21822

Actions (login required)

View Item