- Topic Modeling
- Natural Language Processing Techniques
- Text Readability and Simplification
- Authorship Attribution and Profiling
- Text and Document Classification Technologies
- Sentiment Analysis and Opinion Mining
- Spam and Phishing Detection
- Speech and dialogue systems
- Hate Speech and Cyberbullying Detection
- COVID-19 diagnosis using AI
- Misinformation and Its Impacts
- Digital Mental Health Interventions
- Online Learning and Analytics
- Mental Health via Writing
- Domain Adaptation and Few-Shot Learning
- Spanish Linguistics and Language Studies
- Oil and Gas Production Techniques
- COVID-19 and Mental Health
- Linguistics, Language Diversity, and Identity
- Video Analysis and Summarization
- Academic integrity and plagiarism
- Web Application Security Vulnerabilities
- Imbalanced Data Classification Techniques
- Translation Studies and Practices
- Pneumonia and Respiratory Infections
NortonLifeLock (United States)
2023
Universitat Politècnica de València
2012-2023
GfK (Germany)
2017-2018
Polytechnic University of Puerto Rico
2017
In this work we describe the system built for three English subtasks of Se-mEval 2016 Task 3 by Department Computer Science University Houston (UH) and Pattern Recognition Human Language Technology (PRHLT) research center -Universitat Politècnica de València: UH-PRHLT.Our represents instances using both lexical semantic-based similarity measures between text pairs.Our semantic features include use distributed representations words, knowledge graphs generated with BabelNet multilingual...
We study the problem of building text classifiers with little or no training data, commonly known as zero and few-shot classification. In recent years, an approach based on neural textual entailment models has been found to give strong results a diverse range tasks. this work, we show that proper pre-training, Siamese Networks embed texts labels offer competitive alternative. These allow for large reduction in inference cost: constant number rather than linear. Furthermore, introduce label...
Current approaches to cross-language document retrieval and categorization are based on discriminative methods which represent documents in a low-dimensional vector space. In this paper we propose shift from the supervised knowledge-based paradigm provide similarity measure draws BabelNet, large multilingual knowledge resource. Our experiments show state-of-the-art results cross-lingual categorization.
The polarity classification task aims at automatically identifying whether a subjective text is positive or negative. When the target domain different from those where model was trained, we refer to cross-domain setting. That setting usually implies use of adaptation method. In this work, study single and tasks string kernels perspective. Contrary classical methods, which employ texts both domains detect pivot features, do not for training. Our approach detects lexical peculiarities that...
In this paper, we present our participation to the EmoContext shared task on detecting emotions in English textual conversations between a human and chatbot. We propose four neural systems combine them further improve results. show that ensemble can successfully distinguish three (SAD, HAPPY, ANGRY) separate from rest (OTHERS) highly-imbalanced scenario. Our best system achieved 0.77 F1-score was ranked fourth out of 165 submissions.
Sanja Štajner, Marc Franco-Salvador, Simone Paolo Ponzetto, Rosso, Heiner Stuckenschmidt. Proceedings of the 55th Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2017.
Background The current COVID-19 pandemic is associated with extensive individual and societal challenges, including challenges to both physical mental health. To date, the development of health problems such as depressive symptoms accompanying population-based federal distancing measures largely unknown, opportunities for rapid, effective, valid monitoring are currently a relevant matter investigation. Objective In this study, we aim investigate, first, temporal progression during and,...
This paper presents the overview of AuTexTification shared task as part IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within framework SEPLN conference. consists two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by large language model. For 2, attribute machine-generated one six different generation models. Our dataset contains more than 160.000 texts across languages (English and Spanish) five domains (tweets,...
The objective of Native Language Identification is to determine the native language author a text that he or she wrote in another language. By contrast, Variety aims at classifying texts representing different varieties single We postulate both tasks may be reduced objective, which identify variety text. design general approach combines string kernels and word embeddings, capture characteristics texts. results our experiments show achieves excellent on tasks, without any task-specific adaptations.
Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of texts: semantically-informed similarity measure and edit distance. Both able to extract semantic information from either an external resource or distributed representation words, resulting in informative features training supervised classifier...