- Natural Language Processing Techniques
- Topic Modeling
- Advanced Text Analysis Techniques
- Semantic Web and Ontologies
- Web Data Mining and Analysis
- Text and Document Classification Technologies
- Data Quality and Management
- Data Management and Algorithms
- Rough Sets and Fuzzy Logic
- Image Retrieval and Classification Techniques
- Sentiment Analysis and Opinion Mining
- Multimodal Machine Learning Applications
- Information Retrieval and Search Behavior
- Linguistics and Discourse Analysis
- Advanced Image and Video Retrieval Techniques
- Complex Network Analysis Techniques
- Service-Oriented Architecture and Web Services
- Speech and dialogue systems
- Cultural Insights and Digital Impacts
- Software Engineering Research
- Data Mining Algorithms and Applications
- Canadian Identity and History
- linguistics and terminology studies
- Opinion Dynamics and Social Influence
- Biomedical Text Mining and Ontologies
Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2015-2024
CEA LIST
2012-2024
CEA Paris-Saclay
2011-2024
Université Paris-Saclay
2021-2024
Integra (United States)
2010-2021
Laboratoire des signaux et systèmes
2019-2020
CEA Paris-Saclay - Etablissement de Fontenay-aux-roses
2004-2011
École Polytechnique Fédérale de Lausanne
1998-2002
Kiem-Hieu Nguyen, Xavier Tannier, Olivier Ferret, Romaric Besançon. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.
Whether to retrieve, answer, translate, or reason, multimodality opens up new challenges and perspectives. In this context, we are interested in answering questions about named entities grounded a visual context using Knowledge Base (KB). To benchmark task, called KVQAE (Knowledge-based Visual Question Answering Entities), provide ViQuAE, dataset of 3.7K paired with images. This is the first cover wide range entity types (e.g. persons, landmarks, products). The annotated semi-automatic...
Information Extraction has recently been extended to new areas by loosening the constraints on strict definition of extracted information and allowing design more open extraction systems. In this domain unsupervised extraction, we focus task extracting characterizing a priori unknown relations between given set entity types. One challenges is deal with large amount candidate when them from corpus. We propose in paper an approach for filtering such based heuristics machine learning models....
Numerous domains have interests in studying the viewpoints expressed online, be it for marketing, cybersecurity, or research purposes with rise of computational social sciences. Current stance detection models are usually grounded on specificities some platforms. This rigidity is unfortunate since does not allow integration multitude signals informing effective detection. We propose SCSD model, Sequential Community-based Stance Detection a semi-supervised ensemble algorithm which considers...
Starting from an ontology of a targeted financial domain corresponding to transaction, performance and management change news, relevant segments text containing at least keyword are extracted. The linguistic pattern each segment is automatically generated serve initially as learning model. Each composed named entities, keywords articulation words. Some generic entities like organizations, persons, locations, dates grammatical annotations by automatic tool. During the step, manually annotated...
Stance detection systems often integrate social clues in their algorithms. While the influence of groups on stance is known, there no evaluation how well state-of-the-art community algorithms perform terms detecting like-minded communities, i.e. communities that share same a given subject. We used Twitter's interactions to compare results datasets Scottish Independence Referendum and US Midterm Elections. Our show relying information diffusion better for this task confirm previous...
The French presidential election was one of the main political event 2017, and triggered a lot activity on Twitter. campaign highly unpredictable led to rise 5 parties instead historical bipartite (left-right) confrontation, ranging from far-left far-right. This dataset paper proposes #Élysée2017fr, large complex 22853 Twitter profiles active during (from November 2016 May 2017), their corresponding tweets retweets, plus retweet mention networks related these profiles. were manually...
The efficiency of Information Extraction systems is known to be heavily influenced by domain-specific knowledge but the cost developing such considerably high. In this article, we consider problem event extraction and show that learning word representations from unlabeled data using them for representing roles enable outperform previous state-of-the-art models on MUC-4 set.
The design of efficient textual similarities is an important issue in the domain data exploration. Textual are for example central document collection structuring (e.g. clustering), or information retrieval (IR) which relies on computation measuring adequacy between a query and documents. objective this paper to present compare several similarity measures framework distributional semantics (DS) model IR. This extension standard vector space model, further takes co-frequencies terms given...