- Topic Modeling
- Natural Language Processing Techniques
- Text and Document Classification Technologies
- Machine Learning and Data Classification
- Authorship Attribution and Profiling
- Sentiment Analysis and Opinion Mining
- Advanced Text Analysis Techniques
- Imbalanced Data Classification Techniques
- Machine Learning and Algorithms
- Neural Networks and Applications
- Scientific Computing and Data Management
- Computational and Text Analysis Methods
- Domain Adaptation and Few-Shot Learning
- Semantic Web and Ontologies
- Artificial Intelligence in Healthcare
- Web Data Mining and Analysis
- Data Analysis with R
- Anomaly Detection Techniques and Applications
- Hate Speech and Cyberbullying Detection
- Image Enhancement Techniques
- Advanced Statistical Methods and Models
- Image and Video Quality Assessment
- Ethics and Social Impacts of AI
- Names, Identity, and Discrimination Research
- Biomedical Text Mining and Ontologies
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo"
2015-2024
Consorzio Roma Ricerche
2020-2024
Consorzio Pisa Ricerche
2016-2024
National Research Council
2024
Universitas Gunung Rinjani
2023
Institute of Scientific and Technical Information of China
2020-2022
Universidad de Granada
2012-2017
Hamad bin Khalifa University
2016
The accuracy of many classification algorithms is known to suffer when the data are imbalanced (i.e., distribution examples across classes severely skewed). Many applications binary text this type, with positive class interest far outnumbered by negative examples. Oversampling generating synthetic training minority class) an often used strategy counter problem. We present a new oversampling method specifically designed for classifying (such as text) which distributional hypothesis holds,...
Algorithms and models are increasingly deployed to inform decisions about people, inevitably affecting their lives. As a consequence, those in charge of developing these must carefully evaluate impact on different groups people favour group fairness, that is, ensure determined by sensitive demographic attributes, such as race or sex, not treated unjustly. To achieve this goal, the availability (awareness) attributes evaluating is fundamental. Unfortunately, collecting storing often conflict...
Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target'' domain when the only available training data belongs to different "source'' domain. In this paper we present Distributional Correspondence Indexing (DCI) method adaptation in sentiment classification. DCI derives term representations vector space common both domains where each dimension reflects its distributional correspondence pivot, i.e., highly predictive that behaves...
Sentiment quantification is the task of training, by means supervised learning, estimators relative frequency (also called “prevalence”) sentiment-related classes (such as Positive , Neutral Negative ) in a sample unlabelled texts. This especially important when these texts are tweets, since final goal most sentiment classification efforts carried out on Twitter data actually (and not individual tweets). It well-known that solving “classify and count” (i.e., classifying all items standard...
The Questio de aqua et terra is a cosmological treatise traditionally attributed to Dante Alighieri. However, the authenticity of this text controversial, due discrepancies with Dante's established works and absence contemporary references. This study investigates via computational authorship verification (AV), class techniques which combine supervised machine learning stylometry. We build family AV systems assemble corpus 330 13th- 14th-century Latin texts, we use comparatively evaluate...
This survey provides an overview of the challenges misspellings in natural language processing (NLP). While often unintentional, have become ubiquitous digital communication, especially with proliferation Web 2.0, user-generated content, and informal text mediums such as social media, blogs, forums. Even if humans can generally interpret misspelled text, NLP models frequently struggle to handle it: this causes a decline performance common tasks like classification machine translation. In...
\emph{Sentiment Quantification} (i.e., the task of estimating relative frequency sentiment-related classes -- such as \textsf{Positive} and \textsf{Negative} in a set unlabelled documents) is an important topic sentiment analysis, study quantities trends across population often higher interest than analysis individual instances. In this work we propose method for \emph{Cross-Lingual Sentiment Quantification}, performing quantification when training documents are available source language...
In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of in document collection order to compute a score reflecting importance for document. tasks characterized by presence training data (such as text classification) it seems logical that function should take into account distribution (as estimated from data) across classes interest. Although "supervised weighting" use this intuition have been described before, they failed show consistent...
Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a common set C classes, documents each written in one languages L , and doing so more accurately than when “naïvely” classifying document via its corresponding language-specific classifier. To obtain an increase the classification accuracy for given language, system thus needs also leverage training examples other languages. We tackle “multilabel” CLC funnelling new ensemble learning method that we...
Abstract Quantification is the supervised learning task that consists of training predictors class prevalence values sets unlabelled data, and special interest when labelled data on which predictor has been trained are not IID, i.e., suffer from dataset shift . To date, quantification methods have mostly tested only a case shift, prior probability ; relationship between other types remains, by large, unexplored. In this work we carry out an experimental analysis how current algorithms behave...
Image metrics based on Human Visual System (HVS) play a remarkable role in the evaluation of complex image processing algorithms. However, mimicking HVS is known to be and computationally expensive (both terms time memory), its usage thus limited few applications small input data. All this makes such not fully attractive real-world scenarios. To address these issues, we propose Deep Quality Metric (DIQM), deep-learning approach learn global quality feature (mean-opinion-score). DIQM can...
QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification the task of training quantifiers via learning, where a quantifier predictor that estimates relative frequencies values) classes interest sample unlabelled data. While can be trivially performed by applying standard classifier to each data item and counting how many items have been assigned class, it has shown this "classify count" method outperformed...
Obtaining high-quality labelled data for training a classifier in new application domain is often costly. Transfer Learning (a.k.a. “Inductive Transfer”) tries to alleviate these costs by transferring, the “target” of interest, knowledge available from different “source” domain. In transfer learning lack information target compensated availability at time set unlabelled examples distribution. Transductive denotes setting which only documents that we are interested classifying known and time....
Several disciplines, like the social sciences, epidemiology, sentiment analysis, or market research, are interested in knowing distribution of classes a population rather than individual labels members thereof. Quantification is supervised machine learning task concerned with obtaining accurate predictors class prevalence, and to do so particularly presence label shift. The distribution-matching (DM) approaches represent one most important families among quantification methods that have been...
The 3rd International Workshop on Learning to Quantify (LQ 2023)1 took place September 18, 2023 in Torino, IT, where it was organised as a satellite event of the 34th European Conference Machine and Principles Practice Knowledge Discovery Databases (ECML PKDD 2023). Like main program conference, workshop employed hybrid format, with all presentations given presence attendees participating or online. This report presents summary workshop, briefly summarising individual works presented,...
Quantification is a supervised learning task that consists in predicting, given set of classes C and D unlabelled items, the prevalence (or relative frequency) p_c(D) each class c\in\mathcalC D. can principle be solved by classifying all items counting how many them have been attributed to class. However, this "classify count" approach has shown yield suboptimal quantification accuracy; established as its own, rise number methods specifically devised for it. We propose recurrent neural...
Salud M. Jiménez Zafra, Giacomo Berardi, Andrea Esuli, Diego Marcheggiani, María Teresa Martín-Valdivia, Alejandro Moreo Fernández. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015.