NFDI4DS | UHH-SEMS - Publication Details

Comparing automated text classification methods

Sentiment Analysis Intuition

DOI: 10.1016/j.ijresmar.2018.09.009 Publication Date: 2018-10-24T23:17:19Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Jochen Hartmann

Juliana Huppertz

Christina Schamp

Mark Heitmann

ABSTRACT

Online social media drive the growth of unstructured text data. Many marketing applications require structuring this data at scales non-accessible to human coding, e.g., detect communication shifts in sentiment or other researcher-defined content categories. Several methods have been proposed automatically classify text. This paper compares performance ten such approaches (five lexicon-based, five machine learning algorithms) across 41 datasets covering major platforms, various sample sizes, and languages. So far, research relies predominantly on support vector machines (SVM) Linguistic Inquiry Word Count (LIWC). Across all tasks we study, either random forest (RF) naive Bayes (NB) performs best terms correctly uncovering intuition. In particular, RF exhibits consistently high for three-class sentiment, NB small samples sizes. SVM never outperform remaining methods. All lexicon-based approaches, LIWC perform poorly compared with learning. some applications, accuracies only slightly exceed chance. Since additional considerations classification choice are also favor RF, our results suggest that can benefit from considering these alternatives.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (74)

CITATIONS (303)

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications CROSSREF - Publications

PlumX Metrics

Comparing automated text classification methods

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....