- Natural Language Processing Techniques
- Authorship Attribution and Profiling
- Computational and Text Analysis Methods
- Language, Discourse, Communication Strategies
- Linguistic Variation and Morphology
- Mobile Crowdsensing and Crowdsourcing
- Language and cultural evolution
- Fractal and DNA sequence analysis
- Digital Communication and Language
- Advanced Text Analysis Techniques
- Anomaly Detection Techniques and Applications
- Music and Audio Processing
- Gender Studies in Language
- Topic Modeling
- Linguistics, Language Diversity, and Identity
University of Edinburgh
2016-2019
The Alan Turing Institute
2019
China Datang Corporation (China)
2017-2018
University of Southern California
2014
Philippa Shoemark, Farhana Ferdousi Liza, Dong Nguyen, Scott Hale, Barbara McGillivray. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Philippa Shoemark, Debnil Sur, Luke Shrimpton, Iain Murray, Sharon Goldwater. Proceedings of the 15th Conference European Chapter Association for Computational Linguistics: Volume 1, Long Papers. 2017.
Research that involves human behavior analysis usually requires laborious and costly efforts for obtaining micro-level annotations on a large video corpus. With the emerging paradigm of crowdsourcing however, these can be considerably reduced. We first present OCTAB (Online Crowdsourcing Tool Annotations Behaviors), web-based annotation tool allows precise convenient in videos, directly portable to popular platforms. As part OCTAB, we introduce training module with specialized...
Recent work has proposed using network science to analyse the structure of mental lexicon by viewing words as nodes in a phonological network, with edges connecting that differ single phoneme.Comparing networks across different languages could provide insights into linguistic typology and cognitive pressures shape language acquisition, evolution, processing.However, previous studies have not considered how statistics gathered from these are affected factors such size distribution word...
Sociolinguistic research suggests that speakers modulate their language style in response to audience. Similar effects have recently been claimed occur the informal written context of Twitter, with users choosing less region-specific and non-standard vocabulary when addressing larger audiences. However, these studies not carefully controlled for possible confound topic: is, tweets addressed a broad audience might also tend towards topics engender more formal style. In addition, it is clear...
Sociolinguistics is often concerned with how variants of a linguistic item (e.g., nothing vs. nothin') are used by different groups or in situations. We introduce the task inducing lexical variables from code-mixed text: that is, identifying equivalence pairs such as (football, fitba) along their code (football→British, fitba→Scottish). adapt framework for gender-biased word to this new task, and present results on three English dialects, using tweets text. Our system achieves precision over...