- Topic Modeling
- Sentiment Analysis and Opinion Mining
- Misinformation and Its Impacts
- Natural Language Processing Techniques
- Business Strategies and Management Research
- Handwritten Text Recognition Techniques
- Machine Learning and Data Classification
- Advanced Data Storage Technologies
- Complex Network Analysis Techniques
- Spam and Phishing Detection
- Scientific Computing and Data Management
- Wikis in Education and Collaboration
- ICT in Developing Communities
- Psychology of Moral and Emotional Judgment
- Innovation and Socioeconomic Development
- Cultural Differences and Values
- Linguistic Studies and Language Acquisition
- Innovative Human-Technology Interaction
- Time Series Analysis and Forecasting
- E-Government and Public Services
- Energy and Environment Impacts
- Software Engineering Techniques and Practices
- Computational and Text Analysis Methods
- Mobile Crowdsensing and Crowdsourcing
- Psychological Well-being and Life Satisfaction
Google (United States)
2024
University of Cambridge
2017-2022
Monash University
2021
University of Groningen
2021
Masaryk University
2021
Czech Academy of Sciences, Institute of Sociology
2021
Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier. Proceedings of the 58th Annual Meeting Association for Computational Linguistics. 2020.
Data is a critical resource for Machine Learning (ML), yet working with data remains key friction point. This paper introduces Croissant, metadata format datasets that simplifies how used by ML tools and frameworks. Croissant makes more discoverable, portable interoperable, thereby addressing significant challenges in management responsible AI. already supported several popular dataset repositories, spanning hundreds of thousands datasets, ready to be loaded into the most
In this paper, we propose to adapt the four-staged pipeline proposed by Zubiaga et al. (2018) for Rumor Verification task problem of Fake News Detection. We show that recently released FNC-1 corpus covers two its steps, namely Tracking and Stance Detection task. identify asymmetry in length input be a key characteristic latter step, when adapted framework Detection, handle it as specific type Cross-Level Inspired theories from field Journalism Studies, implement test architectures...
In recent years, there has been an increasing interest in the application of Artificial Intelligence – and especially Machine Learning to field Sustainable Development (SD). However, until now, NLP not systematically applied this context. paper, we show high potential enhance project sustainability. particular, focus on case community profiling developing countries, where, contrast developed world, a notable data gap exists. Here, could help address cost time barrier structuring qualitative...
Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
We introduce 9 guiding principles to integrate Participatory Design (PD) methods in the development of Natural Language Processing (NLP) systems. The adoption PD by NLP will help alleviate issues concerning more democratic, fairer, less-biased technologies process natural language data. This short paper is outcome an ongoing dialogue between designers and experts adopts a non-standard format following previous work Traum (2000); Bender (2013); Abzianidze Bos (2019). Every section principle....
Understanding consumer needs and values is crucial to the sustainable delivery uptake of energy access projects in Low-and Middle-Income Countries (LMICs). Nevertheless, many aim empower women without first assessing gendered roles, needs, values, relations for both men project communities. Neglecting these can be detrimental end-users projects, exacerbating conflict within households rather than empowering vulnerable groups. We propose a value-based approach elicit varying priorities assess...
Abstract In this work, we focus on the task of open-type relation argument extraction (ORAE) : given a corpus, query entity Q , and knowledge base (e.g., “ authored notable work with title X ”), model has to extract an non-standard type (entities that cannot be extracted by standard named tagger, for example, book or art) from corpus. We develop compare wide range neural models yielding large improvements over strong baseline obtained question answering system. The impact different sentence...
We present a new challenging news dataset that targets both stance detection (SD) and fine-grained evidence retrieval (ER). With its 3,291 expert-annotated articles, the constitutes high-quality benchmark for future research in SD multi-task learning. provide detailed description of corpus collection methodology carry out an extensive analysis on sources disagreement between annotators, observing correlation their diffusion uncertainty around target real world. Our experiments show poses...
Stance detection (SD) entails classifying the sentiment of a text towards given target, and is relevant sub-task for opinion mining social media analysis. Recent works have explored knowledge infusion supplementing linguistic competence latent large pre-trained language models with structured graphs (KGs), yet few applied such methods to SD task. In this work, we first perform stance-relevant probing on Transformers-based in zero-shot setting, showing these models' real-world about targets...
This article describes a dataset of perceived values and socioeconomic indicators collected in rural Ugandan communities. The data were interviews which employed: (1) the User-Perceived Value game, solicits verbal using graphical prompts 'why'-probing; (2) socio-economic surveys, demographic data. constitutes 119 conducted between 2014 2015 seven villages. Interviews various settings (e.g. individual/group, women/men/mixed) different local languages (which subsequently translated into...
Stephanie Hirmer, Alycia Leonard, Josephine Tumwesige, Costanza Conforti. Proceedings of the 16th Conference European Chapter Association for Computational Linguistics: Main Volume. 2021.
Data is a critical resource for Machine Learning (ML), yet working with data remains key friction point. This paper introduces Croissant, metadata format datasets that simplifies how used by ML tools and frameworks. Croissant makes more discoverable, portable interoperable, thereby addressing significant challenges in management responsible AI. already supported several popular dataset repositories, spanning hundreds of thousands datasets, ready to be loaded into the most
We present a new challenging stance detection dataset, called Will-They-Won't-They (WT-WT), which contains 51,284 tweets in English, making it by far the largest available dataset of type. All annotations are carried out experts; therefore, constitutes high-quality and reliable benchmark for future research detection. Our experiments with wide range recent state-of-the-art systems show that poses strong challenge to existing models this domain.
In recent years, there has been an increasing interest in the application of Artificial Intelligence - and especially Machine Learning to field Sustainable Development (SD). However, until now, NLP not applied this context. research paper, we show high potential applications enhance sustainability projects. particular, focus on case community profiling developing countries, where, contrast developed world, a notable data gap exists. context, could help address cost time barrier structuring...