PANDORA Talks: Personality and Demographics on Reddit
Demographics
Interpretability
DOI:
10.31234/osf.io/94xcp
Publication Date:
2020-04-09T10:47:11Z
AUTHORS (5)
ABSTRACT
Personality and demographics are important variables in social sciences, whilein NLP they can aid interpretability removal of societal biases.However, datasets with both personality demographic labels scarce. Toaddress this, we present PANDORA, the first large-scale dataset Reddit commentslabeled three models (including well-established Big 5 model) (age, gender, location) for more than 10k users. Weshowcase usefulness this on experiments, where leveragethe readily available data from other to predict theBig traits, analyze gender classification biases arising frompsycho-demographic variables, carry out a confirmatory exploratoryanalysis based psychological theories. Finally, benchmarkprediction all variables.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (11)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....