- Semantic Web and Ontologies
- Mobile Crowdsensing and Crowdsourcing
- Topic Modeling
- Video Analysis and Summarization
- Natural Language Processing Techniques
- Recommender Systems and Techniques
- Multimedia Communication and Technology
- Open Education and E-Learning
- Music and Audio Processing
- Intelligent Tutoring Systems and Adaptive Learning
- Service-Oriented Architecture and Web Services
- Advanced Text Analysis Techniques
- Business Process Modeling and Analysis
- Data Stream Mining Techniques
- Biomedical Text Mining and Ontologies
- Web Data Mining and Analysis
- Multi-Agent Systems and Negotiation
- Innovative Teaching and Learning Methods
- Data Quality and Management
- Image Retrieval and Classification Techniques
- Data Visualization and Analytics
- Expert finding and Q&A systems
- Web Applications and Data Management
- Speech and dialogue systems
- Ethics and Social Impacts of AI
Google (United States)
2013-2024
Georgetown University
2023
Vrije Universiteit Amsterdam
2011-2020
Amsterdam UMC Location VUmc
2011-2018
CNI College
2017-2018
Leiden University
2018
Eindhoven University of Technology
2002-2011
University of Amsterdam
2007-2011
Centrum Wiskunde & Informatica
2009
Tamedia (Switzerland)
2002-2008
We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized dialog, which have up to 137B parameters and are pre-trained on 1.56T words public dialog data web text. While model scaling alone can improve quality, it shows less improvements safety factual grounding. demonstrate that fine-tuning with annotated enabling the consult external knowledge sources lead significant towards two key challenges The first challenge,...
AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated significance due to its heightened downstream impact, impacting predictions cancer detection, wildlife poaching, loan allocations. Paradoxically, data is the most under-valued de-glamorised aspect of AI. In this paper, we report on practices AI, from interviews with 53 practitioners India, East West African countries, USA. We define, identify, present empirical evidence...
The increasing availability of (digital) cultural heritage artefacts offers great potential for increased access to art content, but also necessitates tools help users deal with such abundance information. User-adaptive recommender systems aim present their content tailored interests. These try adapt the user based on feedback from which artworks he or she finds interesting. Users need be able depend system competently and find that are most interesting them. This paper investigates...
Galleries, Libraries, Archives and Museums (short: GLAMs) around the globe are beginning to explore potential of crowdsourcing, i. e. outsourcing specific activities a community though an open call. In this paper, we propose typology these activities, based on empirical study substantial amount projects initiated by relevant cultural heritage institutions. We use Digital Content Life Cycle model relation between different types crowdsourcing core organizations. Finally, focus two critical...
Big data is having a disruptive impact across the sciences. Human annotation of semantic interpretation tasks critical part big semantics, but it based on an antiquated ideal single correct truth that needs to be similarly disrupted. We expose seven myths about human annotation, most which derive from truth, and dispel these with examples our research. propose new theory crowd intuition subjective, measuring annotations same objects (in examples, sentences) will provide useful representation...
Copyright is held by the author/owner(s). WebSci-13, May 2-4, 2013, Paris, France. ACM 978-1-4503-1889-1. Abstract One of first steps in most web data analytics creating a human annotated ground truth, typically based on assumption that for each instance there single right answer. From this it has always followed truth quality can be measured inter-annotator agreement. We challenge observing certain annotation tasks, disagreement reflects semantic ambiguity target instances and provides...
Discussing things you care about can be difficult, especially via online platforms, where sharing your opinion leaves open to the real and immediate threats of abuse harassment. Due these threats, people stop expressing themselves give up on seeking different opinions. Recent research efforts focus examining strengths weaknesses (e.g. potential unintended biases) using machine learning as a support tool facilitate safe space for discussions; example, through detecting various types negative...
Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, faithfulness of underlying problems. Neglecting fundamental importance data given rise inaccuracy, bias, fragility in real-world applications, is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite evaluating data-centric algorithms. We aim foster...
Web 2.0 — the perceived second generation of World Wide that aims to improve collaboration, sharing information and interoperability enables increasing access digital collections museums. The expectation is more people will spend time preparing their visit before actually visiting museum look for related reflecting on what they have seen or missed after museum. It can also be expected curators want enhance visitors' experiences in personalized, intensive engaging way promised by an improved...
In this work we present an in-depth analysis of the user behaviors on different Social Sharing systems. We consider three popular platforms, Flickr, Delicious and StumbleUpon, and, by combining techniques from social network with semantic analysis, characterize tagging behavior as well tendency to create friendship relationships users these platforms. The aim our investigation is see if (and how) features goals a given system reflect its moreover, there exists correlation between users....
In collaborative Web-based platforms, user reputation scores are generally computed according to two orthogonal perspectives: (a) helpfulness-based (HBR) and (b) centrality-based (CBR) scores. HBR approaches, the most reputable users those who post helpful reviews opinion of members their community. CBR a “who-trusts-whom” network—known as trust network —is available occupy central position in network, some definition centrality. The identification featuring large is one important research...
Many museums are currently providing online access to their collections. The state of the art research in last decade shows that it is beneficial for institutions provide datasets as Linked Data order achieve easy cross-referencing, interlinking and integration. In this paper, we present Rijksmuseum linked dataset (accessible at http://datahub.io/dataset/rijksmuseum), along with collection vocabulary statistics, well lessons learned from process converting Data. version March 2016 contains...
Abstract Large neural models have brought a new challenge to natural language generation (NLG): It has become imperative ensure the safety and reliability of output that generate freely. To this end, we present an evaluation framework, Attributable Identified Sources (AIS), stipulating NLG pertaining external world is be verified against independent, provided source. We define AIS two-stage annotation pipeline for allowing annotators evaluate model according guidelines. successfully validate...
In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different communities may use terminology to discuss similar topics, all of this work is ultimately aimed at developing AI that benefits humanity while being grounded human rights and ethics, reducing potential harms AI. special interest group, we aim bring together researchers from academia industry interested these topics map current future trends advance...
Abstract Large neural models have brought a new challenge to natural language generation (NLG): it has become imperative ensure the safety and reliability of output that generate freely. To this end, we present an evaluation framework, Attributable Identified Sources (AIS), stipulating NLG pertaining external world is be verified against independent, provided source. We define AIS two-stage annotation pipeline for allowing annotators evaluate model according guidelines. successfully validate...