- Topic Modeling
- Natural Language Processing Techniques
- Text and Document Classification Technologies
- Advanced Text Analysis Techniques
- Information Retrieval and Search Behavior
- Semantic Web and Ontologies
- Time Series Analysis and Forecasting
- Algorithms and Data Compression
- Image Retrieval and Classification Techniques
- Biomedical Text Mining and Ontologies
- Face and Expression Recognition
- Rough Sets and Fuzzy Logic
- Advanced Image and Video Retrieval Techniques
- Web Data Mining and Analysis
- Data Management and Algorithms
- Complex Network Analysis Techniques
- Spam and Phishing Detection
- Data Mining Algorithms and Applications
- Bayesian Modeling and Causal Inference
- Machine Learning and Algorithms
- Advanced Graph Neural Networks
- Speech and dialogue systems
- Neural Networks and Applications
- Recommender Systems and Techniques
- Radiomics and Machine Learning in Medical Imaging
Université Grenoble Alpes
2015-2024
Laboratoire d'Informatique de Grenoble
2015-2024
Centre National de la Recherche Scientifique
2014-2024
Institut polytechnique de Grenoble
2017-2024
Laboratoire Interdisciplinaire de Physique
2022
Université Joseph Fourier
2008-2019
Académie de Grenoble
2019
Xerox (France)
1998-2018
Heriot-Watt University
2018
Laboratoire d'Informatique et d'Automatique pour les Systèmes
2011-2018
In statistical relational learning, the link prediction problem is key to automatically understand structure of large knowledge bases. As in previous studies, we propose solve this through latent factorization. However, here make use complex valued embeddings. The composition embeddings can handle a variety binary relations, among them symmetric and antisymmetric relations. Compared state-of-the-art models such as Neural Tensor Network Holographic Embeddings, our approach based on arguably...
This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March September 2013. assesses ability systems to semantically index very large numbers scientific articles, return concise user-understandable answers given natural language questions by combining information from articles ontologies.The 2013 comprised two tasks, Task 1a 1b. In participants were asked automatically...
In statistical relational learning, knowledge graph completion deals with automatically understanding the structure of large graphs--labeled directed graphs-- and predicting missing relationships--labeled edges. State-of-the-art embedding models propose different trade-offs between modeling expressiveness, time space complexity. We reconcile both expressiveness complexity through use complex-valued embeddings explore link such unitary diagonalization. corroborate our approach theoretically...
Non-negative Matrix Factorization (NMF, [5]) and Probabilistic Latent Semantic Analysis (PLSA, [4]) have been successfully applied to a number of text analysis tasks such as document clustering. Despite their different inspirations, both methods are instances multinomial PCA [1]. We further explore this relationship first show that PLSA solves the problem NMF with KL divergence, then implications relationship.
We address the problem of categorising documents using kernel-based methods such as Support Vector Machines. Since work Joachims (1998), there is ample experimental evidence that SVM standard word frequencies features yield state-of-the-art performance on a number benchmark problems. Recently, Lodhi et al. (2002) proposed use string kernels, novel way computing document similarity based matching non-consecutive subsequences characters. In this article, we propose technique with sequences...
In this paper, we make use of linguistic knowledge to identify certain noun phrases, both in English and French, which are likely be terms. We then test compare different statistical scores select the "good" ones among candidate terms, finally propose a method build correspondences multi-words units across languages.
LSHTC is a series of challenges which aims to assess the performance classification systems in large-scale large number classes (up hundreds thousands). This paper describes dataset that have been released along series. The details construction datsets and design tracks as well evaluation measures we implemented quick overview results. All these datasets are available online runs may still be submitted on server challenges.
We introduce in this paper the family of information-based models for ad hoc information retrieval. These draw their inspiration from a long-standing hypothesis IR, namely fact that difference behaviors word at document and collection levels brings on significance document. This has been exploited 2-Poisson mixture models, notion eliteness BM25, more recently DFR models. show here that, combined with notions related to burstiness, it can lead simpler better
We introduce in this survey the major concepts, models, and algorithms proposed so far to infer causal relations from observational time series, a task usually referred as discovery series. To do so, after description of underlying concepts modelling assumptions, we present different methods according family approaches they belong to: Granger causality, constraint-based approaches, noise-based score-based logic-based topology-based difference-based approaches. then evaluate several...
We present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new that aim at solving these Empirical evaluation shows strengths weaknesses of methods, as well significant gain in accuracy extracted lexicons.
The job management system is the HPC middleware responsible for distributing computing power to applications. While such systems generate an ever increasing amount of data, they are characterized by uncertainties on some parameters like running times. question raised in this work is: To what extent it possible/useful take into account predictions times improving global scheduling?
In recent years, large language models (LLMs) have demonstrated exceptional power in various domains, including information retrieval. Most of the previous practices involve leveraging these to create a single embedding for each query, passage, or document individually, strategy exemplified and used by Retrieval-Augmented Generation (RAG) framework. While this method has proven effective, we argue that it falls short fully capturing nuanced intricacies document-level texts due its reliance...
This paper focuses on exploiting different models and methods in bilingual lexicon extraction, either from parallel or comparable corpora, specialized domains. First, a special attention is given to the use of multilingual thesauri, search strategies based such thesauri are investigated. Then, method combine for extraction presented. Our results show that combination significantly improves results, hierarchical information contained our thesaurus, UMLS/MeSH, primary importance. Lastly,...