- Natural Language Processing Techniques
- Topic Modeling
- Speech and dialogue systems
- Authorship Attribution and Profiling
- Sentiment Analysis and Opinion Mining
- Multi-Agent Systems and Negotiation
- Mental Health via Writing
- Personality Traits and Psychology
- Spam and Phishing Detection
- Advanced Text Analysis Techniques
- Hate Speech and Cyberbullying Detection
- Misinformation and Its Impacts
- Semantic Web and Ontologies
- Speech Recognition and Synthesis
- Text Readability and Simplification
- Language, Metaphor, and Cognition
- Digital Mental Health Interventions
- Complex Network Analysis Techniques
- Computational and Text Analysis Methods
- Social Media and Politics
- Education and Digital Technologies
- Video Surveillance and Tracking Methods
- Anomaly Detection Techniques and Applications
- Mental Health Research Topics
- Linguistics, Language Diversity, and Identity
Universidade de São Paulo
2014-2023
Universidade Cidade de São Paulo
2023
Hospital Universitário da Universidade de São Paulo
2023
Universidad San Pedro
2017
Intel (United States)
2010
Ibero American University
2010
King's College Hospital
2007
Brazilian Society of Computational and Applied Mathematics
2006
Pontifícia Universidade Católica do Rio Grande do Sul
1998
It is often desirable that referring expressions be chosen in such a way their referents are easy to identify. This article focuses on hierarchically structured domains, exploring the hypothesis can improved by including logically redundant information them if this leads significant reduction amount of search needed identify referent. Generation algorithms presented implement idea into generated expression, certain well-circumscribed situations. To test our hypotheses, and assess performance...
Transformer-based language models such as Bidirectional Encoder Representations from Transformers (BERT) are now mainstream in the NLP field, but extensions to languages other than English, new domains and/or more specific text genres still demand.In this paper we introduced BERTabaporu, a BERT model that has been pre-trained on Twitter data Brazilian Portuguese language.The is shown outperform best-known general-purpose for three Twitter-related tasks, making potentially useful resource general.
Earlier work has suggested that, in hierarchically ordered domains (e.g., a document divided into sections and subsections), referring expressions that are judiciously over-specified to higher extent than is achieved by existing generation algorithms can make it considerably easier for hearer find the referent of expression. The present paper investigates over-specification spatial domains, which plays an important role daily life. We report experiment whose aim (1) out whether similar as...
This article presents a method for prompt-based mental health screening from large and noisy dataset of social media text. Our uses GPT 3.5. prompting to distinguish publications that may be more relevant the task, then straightforward bag-of-words text classifier predict actual user labels. Results are found on pair with BERT mixture experts classifier, incurring only fraction its training costs.
Advances in the Natural Language Processing (NLP) and machine learning fields have led to development of automated methods for recognition personality traits from text available social media similar sources. Systems this kind exploit close relation between lexical knowledge models – such as well-known Big Five model provide information about author an input a non-intrusive fashion, at low cost. Although now well-established research topic field, computational still leaves number questions...
The language employed by an individual when discussing topics of a moral nature (of the kind typically found in, e.g., social media) is revealing not only text affective contents itself, but also who wrote in first place. Based on these observations, this work intends to illustrate how two kinds morality-related information may be inferred from presenting number shallow and deep learning models stance foundations classification. In doing so, we introduce novel corpus texts labelled with...
Abstract As in many other natural language processing (NLP) fields, the use of statistical methods is now part mainstream generation (NLG). In development systems this kind, however, there issue data sparseness, a problem that particularly evident case morphologically-rich languages such as Portuguese. This work presents shallow surface realisation system makes factored models (FLMs) Portuguese to overcome some these difficulties. The combines FLMs trained on large corpus with number NLP...
This paper presents a study on the recognition of personality traits from text in Brazilian Portuguese. Based well-known Big Five model personality, we collected basic linguistic-computational resource - which can be seen as parallel corpus texts and inventories then use this to build supervised models Facebook status updates.
The inference of politically-oriented information from text data is a popular research topic in Natural Language Processing (NLP) at both text- and author-level. In recent years, studies this kind have been implemented with the aid representations ranging simple count-based models (e.g., bag-of-words) to sequence-based built transformers BERT). Despite considerable success, however, we may still ask whether results be improved further by combining these additional representations. To shed...
Predicting mental health statuses from social media text is a well-known Natural Language Processing (NLP) task. In this work, we focus on the issue of depression and anxiety disorder prediction Twitter by comparing more conventional approach based engineered features with data-oriented alternative mixture specialists transformer language models. Results large corpus depression/anxiety self-disclosed diagnoses in Portuguese are reported, feature importance analysis carried out to provide...
We introduce a labelled corpus of stances about moral issues for the Brazilian Portuguese language, and present reference results both stance recognition polarity classification tasks.The is built from Twitter further expanded with data elicited through crowd sourcing by their own authors.Put together, are expected to be taken as baseline studies in field text.
Computational models of hate speech detection and related tasks (e.g., detecting misogyny, racism, xenophobia, homophobia etc.) have emerged as major Natural Language Processing (NLP) research topics in recent years. In the present work, we investigate a range alternative implementations three these - namely, speech, aggressive behaviour target group recognition- by presenting number experiments involving different learning methods, including regularised logistic regression, convolutional...
At both semantic and syntactic levels, the generation of referring expressions (REG) involves far more than simply producing 'correct' output strings and, accordingly, remains central to study development Natural Language Generation (NLG) systems.In particular, REG algorithms have pay regard humanlikeness, an issue that lies at very heart classic definition Artificial Intelligence as, e.g., motivated by Turing test.In this work we present end-to-end approach takes humanlikeness into account,...
This paper discusses the computational problem of generating referring expressions (REG) in 3D virtual worlds. We propose a REG algorithm that attempts to make adequate choices spatial relations for purpose disambiguation (as opposed to, e.g., determining localisation previously identified object). The decisions made by are based on existing models reference, and further refined use domain knowledge obtained from corpus instructions environments. proposed approach is shown outperform number...