- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Linguistic Studies and Language Acquisition
- Authorship Attribution and Profiling
- Semantic Web and Ontologies
- Speech and dialogue systems
- Software Engineering Research
- Sentiment Analysis and Opinion Mining
- Advanced Text Analysis Techniques
- Multimodal Machine Learning Applications
- Interpreting and Communication in Healthcare
- Biomedical Text Mining and Ontologies
- linguistics and terminology studies
- Second Language Acquisition and Learning
- Speech Recognition and Synthesis
- Software Engineering Techniques and Practices
- Language, Metaphor, and Cognition
- Hate Speech and Cyberbullying Detection
- Digital Communication and Language
- Reading and Literacy Development
- Algorithms and Data Compression
- Lexicography and Language Studies
- Language and cultural evolution
- Machine Learning in Healthcare
Institute for Computational Linguistics “A. Zampolli”
2016-2025
National Research Council
2009-2024
University of Pisa
2005-2023
University of Bologna
2023
University of Groningen
2014-2021
National Academies of Sciences, Engineering, and Medicine
2010-2020
University of California, Davis
2020
University of Genoa
2018-2019
Istituto Nazionale di Fisica Nucleare, Sezione di Roma I
2018
University of Salerno
2018
Much progress has been made in the field of sentiment analysis past years. Researchers relied on textual data for this task, while only recently they have started investigating approaches to predict sentiments from multimedia content. With increasing amount shared social media, there is also a rapidly growing interest that work "in wild", i.e. are able deal with uncontrolled conditions. In work, we faced challenge training visual classifier starting large set user-generated and unlabeled...
Verena Lyding, Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell’Orletta, Henrik Dittmann, Alessandro Lenci, Vito Pirrelli. Proceedings of the 9th Web as Corpus Workshop (WaC-9). 2014.
This work focuses on the analysis of Italian social media messages for disaster management and aims at detection carrying critical information damage assessment task. A main novelty this study consists in focus out-domain cross-event detection, investigation most relevant tweet-derived features these tasks. We devised different experiments by resorting to a wide set linguistic qualifying lexical grammatical structure text as well ad-hoc specifically implemented investigated effective that...
In this paper we present a comparison between the linguistic knowledge encoded in internal representations of contextual Language Model (BERT) and contextual-independent one (Word2vec). We use wide set probing tasks, each which corresponds to distinct sentence-level feature extracted from different levels annotation. show that, although BERT is capable understanding full context word an input sequence, implicit its aggregated sentence still comparable that model. also find able encode...
A company who wishes to enter an established marked with a new, competitive product is required analyse the solutions of competitors. Identifying and comparing features provided by other vendors might greatly help during market analysis. However, mining common variant from publicly available documents competitors time consuming error-prone task. In this paper, we suggest employ natural language processing approach based on contrastive analysis identify commonalities variabilities brochures...
The future evolution of the application natural language processing technologies in requirements engineering can be viewed from four dimensions: discipline, dynamism, domain knowledge, and datasets.
Cancer cells are characterized by chromosomal instability (CIN) and it is thought that errors in pathways involved faithful chromosome segregation play a pivotal role the genesis of CIN. Cohesin forms large protein ring binds DNA strands encircling them. In addition to this central segregation, cohesin also needed for repair, gene transcription regulation chromatin architecture. Though mutations both cohesin-regulator genes have been identified many human cancers, contribution cancer...
Mental reconstruction (MRC) and Free Recall (FR) have been recognized for enhancing the quality of witness statements. However, mechanisms underlying this association remain insufficiently understood. This study explores how time allocated to MRC FR variations in educational level influence eyewitness testimonies. Testimony is evaluated based on manually annotated content information provided by experts testimony assessment, which measures adherence events. further complemented fine-grained...
Biomedical natural language processing (NLP) increasingly relies on large models and extensive datasets, presenting significant computational challenges. We propose Blue5, a multi-task model based SciFive that incorporates instance selection (IS) to enable efficient, learning (MTL) biomedical data. adapt the E2SC-IS framework for domain, integrating calibrated SVM classifier reduce costs. Our approach achieves an average data reduction of 26.6% across several tasks BLUE (Biomedical Language...
Deterministic transition-based Shift/Reduce dependency parsers make often mistakes in the analysis of long span dependencies (McDonald & Nivre, 2007).
In this paper, we present a crowdsourcing-based approach to model the human perception of sentence complexity. We collect large corpus sentences rated with judgments complexity for two typologically-different languages, Italian and English. test our in experimental scenarios aimed investigate contribution wide set lexical, morpho-syntactic syntactic phenomena predicting i) degree agreement among annotators independently from assigned judgment ii)
In this paper we investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after fine-tuning process how affects its predictions during several classification problems. We use wide set of probing tasks, each which corresponds to distinct sentence-level feature extracted from different levels annotation. show that BERT is able encode range characteristics, but it tends lose information when trained on specific downstream tasks. also find BERT's capacity kind...
The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with specific view to text simplification.In particular, it addresses two open issues connected it, i.e. corpora be used for training, and identification most effective features determine readability.An existing assessment tool developed Italian was specialized at level training corpus learning algorithm.A maximum entropy-based feature selection ranking algorithm (grafting)...
In this paper, we present design and construction of the first Italian corpus for automatic semi-automatic text simplification.In line with current approaches, propose a new annotation scheme specifically conceived to identify typology changes an original sentence undergoes when it is manually simplified.Such has been applied two aligned corpora, containing texts corresponding simplified versions, selected as representative different manual simplification strategies addressing target reader...