- Information Retrieval and Search Behavior
- Topic Modeling
- Recommender Systems and Techniques
- Data Management and Algorithms
- Advanced Database Systems and Queries
- Natural Language Processing Techniques
- Data Quality and Management
- Multimodal Machine Learning Applications
- Text and Document Classification Technologies
- Gene expression and cancer classification
- Cell Image Analysis Techniques
- Digital and Traditional Archives Management
- Software Testing and Debugging Techniques
- Advanced Text Analysis Techniques
- Diverse Musicological Studies
- Radio, Podcasts, and Digital Media
- Mobile Crowdsensing and Crowdsourcing
- Advanced Image and Video Retrieval Techniques
- Educational Technology in Learning
- E-Learning and Knowledge Management
- Data Stream Mining Techniques
- Semantic Web and Ontologies
- Marine animal studies overview
- Web Data Mining and Analysis
- Cultural Heritage Management and Preservation
Universidade da Coruña
2019-2024
Creating test collections for offline retrieval evaluation requires human effort to judge documents' relevance. This expensive activity motivated much work in developing methods constructing benchmarks with fewer assessment costs. In this respect, adjudication actively decide both which documents and the order experts review them, better exploit budget or lower it. Researchers evaluate quality of those by measuring correlation between known gold ranking systems under full collection observed...
Null Hypothesis Significance Testing is the \textit{de facto} tool for assessing effectiveness differences between Information Retrieval systems. Researchers use statistical tests to check whether those will generalise online settings or are just due samples observed in laboratory. Much work has been devoted studying which test most reliable when comparing a pair of systems, but IR real-world experiments involve more than two. In multiple comparisons scenario, testing several systems...
Information Retrieval is an area where evaluation crucial to validate newly proposed models. As the first step in of models, researchers carry out offline experiments on specific datasets. While field started around ad-hoc search, number new tasks continuously growing. These demand development test collections (documents, information needs, and judgments). The construction those datasets relies expensive campaigns like TREC. Due size modern collections, obtaining relevance for each...
Information retrieval systems play a crucial role in addressing users' information needs by aiding their exploration of vast collections information. This thesis is framed critical research aspect: evaluation. In particular, we propose new approaches for creating annotated test collections. Such are essential evaluating systems' effectiveness controlled experiments. Reflecting real-world conditions accurately these pivotal progress the field. We aim to introduce innovative techniques...
Test collections are an integral part of Information Retrieval (IR) research. They allow researchers to evaluate and compare ranking algorithms in a quick, easy reproducible way. However, constructing these datasets requires great efforts manual labelling logistics, having only few human relevance judgements can introduce biases the comparison. Recent research has explored use Large Language Models (LLMs) for documents building new retrieval test collections. Their strong text-understanding...
Social networks constitute a valuable source for documenting heritage constitution processes or obtaining real-time snapshot of cultural research topic. Many researchers use social as thermometer to study these processes, creating, this purpose, collections that born-digital archives potentially reusable, searchable, and interest other citizens. However, retrieval archiving techniques used in within studies are still semi-manual, being time-consuming task hindering the reproducibility,...
Offline evaluation of information retrieval systems depends on test collections. These datasets provide the researchers with a corpus documents, topics and relevance judgements indicating which documents are relevant for each topic. Gathering latter is costly, requiring human assessors to judge documents. Therefore, experts usually only portion corpus. The most common approach selecting that subset pooling. By intelligently choosing assess, it possible optimise number positive labels given...
Information Retrieval is not any more exclusively about document ranking. Continuously new tasks are proposed on this and sibling fields. With proliferation of tasks, it becomes crucial to have a cheap way constructing test collections evaluate the developments. Building time resource consuming: requires obtain documents, define user needs assessors judge lot documents. To reduce latest, pooling strategies aim decrease assessment effort by presenting sample documents in corpus with maximum...