Comparing Two-Stage Classification and Genre-Incorporated Methods for Explicit Lyrics Detection
DOI:
10.20944/preprints202503.1338.v1
Publication Date:
2025-03-20T01:14:06Z
AUTHORS (3)
ABSTRACT
The increasing prevalence of explicit content in song lyrics, particularly within popular genres like rap and pop, raises concerns about its societal impact, especially on younger listeners. With the vast amount of music available on streaming platforms, manually filtering explicit content has become increasingly impractical, necessitating the use of advanced techniques for content moderation. This study tackles the issue by applying classification algorithms, drawing parallels to non-destructive testing (NDT) techniques used for material inspection, where data-driven methods identify and categorize po-tential flaws without altering the object. Specifically, it compares two approaches: a two-stage classification method and a genre-based dataset method. The two-stage method uses machine learning predictions of lyrics, refined through a dictionary-based approach. While this improves recall by identifying explicit content missed by the initial model, it sacrifices precision. This method shows an accuracy of 82.51%, precision of 37.64%, recall of 94.41%, and an F1-score of 53.82%. The genre-incorporated approach, on the other hand, integrates genre-specific data to enhance classification accuracy, like incorporating contextual insights in NDT for better defect detection. The most effective model, a Random Forest trained on a balanced dataset, achieved an accuracy of 99.52%, precision of 98.40%, recall of 97.37%, and an F1-score of 97.88%. These findings con-tribute to more efficient content moderation, ensuring a safer music experience for vulnerable groups like children, while drawing on the principles of NDT for more ac-curate and non-invasive analysis.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....