NFDI4DS | UHH-SEMS - Publication Details

On Crowdsourcing Relevance Magnitudes for Information Retrieval Evaluation

OPENALEX - Publications

Eddy Maddalena Stefano Mizzaro Falk Scholer Andrew Turpin

Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response their perceived intensity. We investigate use magnitude judging relevance documents information retrieval evaluation, carrying out large-scale user study across 18 TREC topics and collecting over 50,000 judgments using crowdsourcing. Our analysis shows that can be reliably collected crowdsourcing, are competitive terms assessor cost, are, on...

10.1145/3002172 article EN ACM transactions on office information systems 2017-01-04

The Impact of Task Abandonment in Crowdsourcing

OPENALEX - Publications

Lei Han Kevin Roitero Ujwal Gadiraju Cristina Sarasua Alessandro Checco and 2 more

Crowdsourcing has become a standard methodology to collect manually annotated data such as relevance judgments at scale. On crowdsourcing platforms like Amazon MTurk or FigureEight, crowd workers select tasks work on based different dimensions task reward and requester reputation. Requesters then receive the of who self-selected into completed them successfully. Several workers, however, preview tasks, begin working them, reaching varying stages completion without finally submitting their...

10.1109/tkde.2019.2948168 article EN IEEE Transactions on Knowledge and Data Engineering 2019-01-01

All Those Wasted Hours

OPENALEX - Publications

Lei Han Kevin Roitero Ujwal Gadiraju Cristina Sarasua Alessandro Checco and 2 more

Crowdsourcing has become a standard methodology to collect manually annotated data such as relevance judgments at scale. On crowdsourcing platforms like Amazon MTurk or FigureEight, crowd workers select tasks work on based different dimensions task reward and requester reputation. Requesters then receive the of who self-selected into completed them successfully. Several workers, however, preview tasks, begin working them, reaching varying stages completion without finally submitting their...

10.1145/3289600.3291035 article EN 2019-01-30

On the effect of relevance scales in crowdsourcing relevance assessments for Information Retrieval evaluation

OPENALEX - Publications

Kevin Roitero Eddy Maddalena Stefano Mizzaro Falk Scholer

10.1016/j.ipm.2021.102688 article EN Information Processing & Management 2021-07-28

Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing

OPENALEX - Publications

Alessandro Checco Kevin Roitero Eddy Maddalena Stefano Mizzaro Gianluca Demartini

In the context of micro-task crowdsourcing, each task is usually performed by several workers. This allows researchers to leverage measures agreement among workers on same task, estimate reliability collected data and better understand answering behaviors participants. While many between annotators have been proposed, they are known for suffering from problems abnormalities. this paper, we identify main limits existing in crowdsourcing context, both means toy examples as well with real-world...

10.1609/hcomp.v5i1.13306 article EN Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 2017-09-21

Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting the Time to Judge

OPENALEX - Publications

Eddy Maddalena Marco Basaldella Dario De Nart Dante Degl’Innocenti Stefano Mizzaro and 1 more

Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks the availability of crowdsourcing platforms and quality control techniques that allow obtain reliable results. Previous work used ask multiple crowd workers judge a document with respect query studied how best aggregate same topic-document pair. This paper addresses aspect been rather overlooked so far: we study time available express judgment affects its quality. We also discuss loss making...

10.1609/hcomp.v4i1.13284 article EN Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 2016-09-21

Crowd Worker Strategies in Relevance Judgment Tasks

OPENALEX - Publications

Lei Han Eddy Maddalena Alessandro Checco Cristina Sarasua Ujwal Gadiraju and 2 more

Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as relevance judgments used create information retrieval (IR) evaluation collections. Previous research has shown how collecting high quality labels from crowdsourcing platform can be challenging. Existing assurance techniques focus on answer aggregation or the use gold questions where ground-truth data allows check for responses.

10.1145/3336191.3371857 article EN 2020-01-20

The Benefits of Magnitude Estimation Relevance Assessments for Information Retrieval Evaluation

OPENALEX - Publications

Andrew Turpin Falk Scholer Stefano Mizzaro Eddy Maddalena

Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response their perceived intensity. We investigate use magnitude judging relevance documents context information retrieval evaluation, carrying out large-scale user study across 18 TREC topics and collecting more than 50,000 judgments. Our analysis shows that on average judgments are rank-aligned with ordinal made by expert assessors. An advantage users...

10.1145/2766462.2767760 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015-08-04

On Fine-Grained Relevance Scales

OPENALEX - Publications

Kevin Roitero Eddy Maddalena Gianluca Demartini Stefano Mizzaro

In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments has been replaced by multi-level and gain-based metrics leveraging such judgment scales. Recent work also proposed evaluated unbounded scales means Magnitude Estimation (ME) compared them with While ME brings advantages like ability for assessors to always judge next document as having higher or lower than any documents they have judged so far, it comes some drawbacks. For example, is not a...

10.1145/3209978.3210052 article EN 2018-06-27

Considering Assessor Agreement in IR Evaluation

OPENALEX - Publications

Eddy Maddalena Kevin Roitero Gianluca Demartini Stefano Mizzaro

The agreement between relevance assessors is an important but understudied topic in the Information Retrieval literature because of limited data available about documents assessed by multiple judges. This issue has gained even more importance recently light crowdsourced judgments, where it customary to gather many labels for each topic-document pair. In a crowdsourcing setting, often used as proxy quality, although without any systematic verification conjecture that higher corresponds...

10.1145/3121050.3121060 article EN 2017-09-29

On Transforming Relevance Scales

OPENALEX - Publications

Lei Han Kevin Roitero Eddy Maddalena Stefano Mizzaro Gianluca Demartini

Information Retrieval (IR) researchers have often used existing IR evaluation collections and transformed the relevance scale in which judgments been collected, e.g., to use metrics that assume binary like Mean Average Precision. Such transformations are arbitrary (e.g., 0,1 mapped 0 2,3 1) it is assumed they no impact on results of evaluation. Moreover, crowdsourcing collect has become a standard methodology. When designing judgment task, one decision be made how granular should be. then...

10.1145/3357384.3357988 article EN 2019-11-03

Preliminary results from a crowdsourcing experiment in immunohistochemistry

OPENALEX - Publications

Vincenzo Della Mea Eddy Maddalena Stefano Mizzaro Piernicola Machin Carlo Alberto Beltrami

Crowdsourcing, i.e., the outsourcing of tasks typically performed by a few experts to large crowd as an open call, has been shown be reasonably effective in many cases, like Wikipedia, Chess match Kasparov against world 1999, and several others. The aim present paper is describe setup experimentation crowdsourcing techniques applied quantification immunohistochemistry. Fourteen Images from MIB1-stained breast specimens were first manually counted pathologist, then submitted platform through...

10.1186/1746-1596-9-s1-s6 article EN cc-by Diagnostic Pathology 2014-12-01

Qrowdsmith: Enhancing Paid Microtask Crowdsourcing with Gamification and Furtherance Incentives

OPENALEX - Publications

Eddy Maddalena Luis Ibán̄ez Neal Reeves Elena Simperl

Microtask crowdsourcing platforms are social intelligence systems in which volunteers, called crowdworkers, complete small, repetitive tasks return for a small fee. Beyond payments, task requesters considering non-monetary incentives such as points, badges, and other gamified elements to increase performance improve crowdworker experience. In this article, we present Qrowdsmith, platform gamifying microtask crowdsourcing. To design the system, explore empirically range of financial analyse...

10.1145/3604940 article EN ACM Transactions on Intelligent Systems and Technology 2023-06-22

Crowdsourced Fact-checking: Does It Actually Work?

OPENALEX - Publications

David La Barbera Eddy Maddalena Michael Soprano Kevin Roitero Gianluca Demartini and 3 more

There is an important ongoing effort aimed to tackle misinformation and perform reliable fact-checking by employing human assessors at scale, with a crowdsourcing-based approach. Previous studies on the feasibility of crowdsourcing for task detection have provided inconsistent results: some them seem confirm effectiveness assessing truthfulness statements claims, whereas others fail reach level higher than automatic machine learning approaches, which are still unsatisfactory. In this paper,...

10.1016/j.ipm.2024.103792 article EN cc-by Information Processing & Management 2024-05-31

Longitudinal Loyalty: Understanding The Barriers To Running Longitudinal Studies On Crowdsourcing Platforms

OPENALEX - Publications

Michael Soprano Kevin Roitero Ujwal Gadiraju Eddy Maddalena Gianluca Demartini

Crowdsourcing tasks have been widely used to collect a large number of human labels at scale. While some these are deployed by requesters and performed only once crowd workers, others require the same worker perform task or variant it more than once, thus participating in so-called longitudinal study . Despite prevalence studies crowdsourcing, there is limited understanding factors that influence participation them across different crowdsourcing marketplaces. We present results from...

10.1145/3674884 article EN ACM Transactions on Social Computing 2024-07-11

Mapping Points of Interest Through Street View Imagery and Paid Crowdsourcing

OPENALEX - Publications

Eddy Maddalena Luis Ibán̄ez Elena Simperl

We present the Virtual City Explorer (VCE), an online crowdsourcing platform for collection of rich geotagged information in urban environments. Compared to other volunteered geographic approaches, which are constrained by number and availability mapping enthusiasts on ground, VCE uses digital street imagery allow people virtually explore a city from anywhere world, using browser or mobile phone. In addition, contributions designed as paid microtasks—small jobs that can be carried out...

10.1145/3403931 article EN ACM Transactions on Intelligent Systems and Technology 2020-08-10

Point at the Triple: Generation of Text Summaries from Knowledge Base Triples

OPENALEX - Publications

Pavlos Vougiouklis Eddy Maddalena Jonathon Hare Elena Simperl

We investigate the problem of generating natural language summaries from knowledge base triples. Our approach is based on a pointer-generator network, which, in addition to regular words fixed target vocabulary, able verbalise triples several ways. undertake an automatic and human evaluation single open-domain generation tasks. Both show that our significantly outperforms other data-driven baselines.

10.1613/jair.1.11694 article EN cc-by Journal of Artificial Intelligence Research 2020-09-03

Mobile crowdsourcing: four experiments on platforms and tasks

OPENALEX - Publications

Vincenzo Della Mea Eddy Maddalena Stefano Mizzaro

10.1007/s10619-014-7162-x article EN Distributed and Parallel Databases 2014-10-15

Towards building a standard dataset for Arabic keyphrase extraction evaluation

OPENALEX - Publications

Muhammad Helmy Marco Basaldella Eddy Maddalena Stefano Mizzaro Gianluca Demartini

Keyphrases are short phrases that best represent a document content. They can be useful in variety of applications, including summarization and retrieval models. In this paper, we introduce the first dataset keyphrases for an Arabic collection, obtained by means crowdsourcing. We experimentally evaluate different crowdsourced answer aggregation strategies validate their performances against expert annotations to quality our dataset. report about experimental results, features, some lessons...

10.1109/ialp.2016.7875927 article EN 2016-11-01