NFDI4DS | UHH-SEMS - Publication Details

C Gysel

ORCID: 0000-0003-3433-7317

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5084609544

Research Areas

Topic Modeling
Natural Language Processing Techniques
History of Medicine Studies
Medical and Biological Sciences
Medical History and Innovations
Medicine and Dermatology Studies History
Information Retrieval and Search Behavior
Speech and dialogue systems
Speech Recognition and Synthesis
Semantic Web and Ontologies
Advanced Image and Video Retrieval Techniques
Image Retrieval and Classification Techniques
Expert finding and Q&A systems
Text and Document Classification Technologies
Web Data Mining and Analysis
AI in Service Interactions
Advanced Text Analysis Techniques
dental development and anomalies
Advanced Graph Neural Networks
Neural Networks and Applications
Oral and Maxillofacial Pathology
Multimodal Machine Learning Applications
Neurology and Historical Studies
Machine Learning and Algorithms
Social Media and Politics

Apple (United Kingdom)
2021-2025

Apple (United States)
2018-2024

University of Amsterdam
2015-2021

Apple (Israel)
2019-2021

Apple (Germany)
2021

Amsterdam University of the Arts
2015-2018

Learning Latent Vector Spaces for Product Search

OPENALEX - Publications

C Gysel Maarten de Rijke Evangelos Kanoulas

We introduce a novel latent vector space model that jointly learns the representations of words, e-commerce products and mapping between two without need for explicit annotations. The power lies in its ability to directly discriminative relation particular word. compare our method existing models (LSI, LDA word2vec) evaluate it as feature learning rank setting. Our achieves enhanced performance better product representations. Furthermore, from words benefit errors propagated back during...

10.1145/2983323.2983702 preprint EN 2016-10-24

Neural Vector Spaces for Unsupervised Information Retrieval

OPENALEX - Publications

C Gysel Maarten de Rijke Evangelos Kanoulas

We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In NVSM paradigm, we learn low-dimensional words and from scratch using gradient descent rank according to their similarity with query are composed word representations. show performs better at document ranking than existing latent semantic vector space methods. The addition mixture lexical language models state-of-the-art baseline model...

10.1145/3196826 article EN ACM transactions on office information systems 2018-06-26

Unsupervised, Efficient and Semantic Expertise Retrieval

OPENALEX - Publications

C Gysel Maarten de Rijke Marcel Worring

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations way. compare our to state-of-the-art statistical vector space probabilistic generative approaches. Our proposed log-linear achieves retrieval performance levels document-centric methods with low inference cost so-called profile-centric It yields a...

10.1145/2872427.2882974 preprint EN 2016-04-11

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

OPENALEX - Publications

Zhihong Lei Xingyu Na Mingbin Xu Ernest Pusateri C Gysel and 3 more

10.1109/icassp49660.2025.10888744 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Lexical Query Modeling in Session Search

OPENALEX - Publications

C Gysel Evangelos Kanoulas Maarten de Rijke

Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC logs and compare performance of different lexical matching approaches Naive methods based on term frequency weighing perform par with specialized models. addition, investigate viability models in setting We give important insights into potential limitations search propose future directions field

10.1145/2970398.2970422 preprint EN 2016-09-09

Synthetic Query Generation using Large Language Models for Virtual Assistants

OPENALEX - Publications

Sonal Sannigrahi Thiago Fraga-Silva Youssef Oualil C Gysel

Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands.The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives.Hence, the generation of synthetic queries similar existing VA usage can greatly improve upon VA's abilities-especially for use-cases do not (yet) occur in paired audio/text data.In this paper, we provide a...

10.1145/3626772.3661355 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Neural Networks for Information Retrieval

OPENALEX - Publications

Tom Kenter Alexey Borisov C Gysel Mostafa Dehghani Maarten de Rijke and 1 more

Machine learning plays a role in many aspects of modern IR systems, and deep is applied all them. The fast pace modern-day research has given rise to different approaches for problems. amount information available can be overwhelming both junior students experienced researchers looking new topics directions. Additionally, it interesting see what key insights into problems the technologies are able give us. aim this full-day tutorial clear overview current tried-and-trusted neural methods how...

10.1145/3077136.3082062 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2017-07-28

Reply With

OPENALEX - Publications

C Gysel Bhaskar Mitra Matteo Venanzi Roy Rosemarin Grzegorz Kukła and 2 more

Email responses often contain items-such as a file or hyperlink to an external document-that are attached included inline in the body of message. Analysis enterprise email corpus reveals that 35% time when users include these items part their response, attachable item is already present inbox sent folder. A modern client can proactively retrieve relevant from user's past emails based on context current conversation, and recommend them for inclusion, reduce effort involved composing response....

10.1145/3132847.3132979 preprint EN 2017-11-06

Studying Topical Relevance with Evidence-based Crowdsourcing

OPENALEX - Publications

Oana Inel Giannis Haralabopoulos Dan Li C Gysel Zoltán Szlávik and 3 more

Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, task of creating such laborious due amounts data that need be annotated, and intrinsic subjectivity itself. In this paper we study topical relevance from a user perspective by addressing problems ambiguity. We compare our approach results with established TREC annotation guidelines results. The comparison based series crowdsourcing pilots...

10.1145/3269206.3271779 article EN 2018-10-17

Connecting and Comparing Language Model Interpolation Techniques

OPENALEX - Publications

Ernest Pusateri C Gysel Rami Botros Sameer Badaskar Mirko Hannemann and 2 more

In this work, we uncover a theoretical connection between two language model interpolation techniques, count merging and Bayesian interpolation. We compare these techniques as well linear in three scenarios with abundant training data per component model. Consistent prior show that both outperform include the first (to our knowledge) published comparison of interpolation, showing perform similarly. Finally, argue other considerations will make preferred approach most circumstances.

10.21437/interspeech.2019-1822 article EN Interspeech 2022 2019-09-13

Neural Networks for Information Retrieval

OPENALEX - Publications

Tom Kenter Alexey Borisov C Gysel Mostafa Dehghani Maarten de Rijke and 1 more

Machine learning plays a role in many aspects of modern IR systems, and deep is applied all them. The fast pace modern-day research has given rise to approaches problems. amount information available can be overwhelming both for junior students experienced researchers looking new topics directions. aim this full- day tutorial give clear overview current tried-and-trusted neural methods how they benefit IR.

10.1145/3159652.3162009 article EN 2018-02-02

Pytrec_eval

OPENALEX - Publications

C Gysel Maarten de Rijke

We introduce pytrec_eval, a Python interface to the trec_eval information retrieval evaluation toolkit. pytrec_eval exposes reference implementations of within as native extension. show that is around one order magnitude faster than invoking sub process from Python. Compared implementation NDCG, twice fast for practically-sized rankings. Finally, we demonstrate its effectiveness in an application where combined with Pyndri and OpenAI Gym query expansion learned using Q-learning.

10.1145/3209978.3210065 preprint EN 2018-06-27

Semantic Entity Retrieval Toolkit

OPENALEX - Publications

C Gysel Maarten de Rijke Evangelos Kanoulas

Unsupervised learning of low-dimensional, semantic representations words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations our previously published entity representation models. The toolkit a unified interface to different algorithms, fine-grained parsing configuration can be used transparently with GPUs. addition, users easily modify existing models or implement their own in framework. After model...

10.48550/arxiv.1706.03757 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Determining the Presence of Political Parties in Social Circles

OPENALEX - Publications

C Gysel Bart Goethals Maarten de Rijke

We derive the political climate of social circles Twitter users using a weakly-supervised approach. By applying random walks over sub-sample Twitter's graph we infer distribution indicating presence eight Flemish parties in users' months before 2014 elections. The structure is induced through combination connection and retweet features combines information million tweets 14 follower connections. solely exploit do not rely on tweet content. For validation compare affiliation politically...

10.1609/icwsm.v9i1.14650 article EN Proceedings of the International AAAI Conference on Web and Social Media 2021-08-03

Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants

OPENALEX - Publications

C Gysel Manos Tsagkias Ernest Pusateri Ilya Oparin

We focus on improving the effectiveness of a Virtual Assistant (VA) in recognizing emerging entities spoken queries. introduce method that uses historical user interactions to forecast which will gain popularity and become trending, it subsequently integrates predictions within Automated Speech Recognition (ASR) component VA. Experiments show our proposed approach results 20% relative reduction errors entity name utterances without degrading overall recognition quality system.

10.1145/3397271.3401298 preprint EN 2020-07-25

A Discriminative Entity-Aware Language Model for Virtual Assistants

OPENALEX - Publications

Mandana Saebi Ernest Pusateri Aaksha Meghawat C Gysel

High-quality automatic speech recognition (ASR) is essential for virtual assistants (VAs) to work well. However, ASR often performs poorly on VA requests containing named entities. In this work, we start from the observation that many errors entities are inconsistent with real-world knowledge. We extend previous discriminative n-gram language modeling approaches incorporate knowledge a Knowledge Graph (KG), using features capture entity type-entity and entity-entity relationships. apply our...

10.21437/interspeech.2021-1767 article EN Interspeech 2022 2021-08-27

Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities

OPENALEX - Publications

C Gysel

Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges respect to modeling spoken information queries for virtual assistants, list opportunities where methods research can be applied improve the quality of assistant speech recognition. how query domain classification, knowledge graphs user interaction data, personalization helpful accurate recognition queries. Finally,...

10.1145/3539618.3591849 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

Structural Regularities in Text-based Entity Vector Spaces

OPENALEX - Publications

C Gysel Maarten de Rijke Evangelos Kanoulas

Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on textual documents they are associated with. Recent semantic entity algorithms represent queries and experts finite-dimensional vector spaces, where both constructed from text sequences. We investigate spaces degree which capture structural regularities. Such an unsupervised manner without explicit information about aspects. For concreteness, we address these questions for...

10.1145/3121050.3121066 preprint EN 2017-09-29

Space-Efficient Representation of Entity-centric Query Language Models

OPENALEX - Publications

C Gysel Mirko Hannemann Ernest Pusateri Youssef Oualil Ilya Oparin

Virtual assistants make use of automatic speech recognition (ASR) to help users answer entity-centric queries.However, spoken entity is a difficult problem, due the large number frequently-changing named entities.In addition, resources available for are constrained when ASR performed on-device.In this work, we investigate probabilistic grammars as language models within finite-state transducer (FST) framework.We introduce deterministic approximation that avoids explicit expansion...

10.21437/interspeech.2022-193 article EN Interspeech 2022 2022-09-16

Mix 'n Match

OPENALEX - Publications

C Gysel Maarten de Rijke Evangelos Kanoulas

Two products are substitutes if both can satisfy the same consumer need. Intrinsic incorporation of product substitutability - where is integrated within latent vector space models in contrast to extrinsic re-ranking result lists. The fusion text matching and objectives allows mix match regularities contained descriptions substitution relations. We introduce a method for intrinsically incorporating search that estimated using gradient descent; it integrates flawlessly with state-of-the-art...

10.1145/3269206.3271668 article EN 2018-10-17

Error-Driven Pruning of Language Models for Virtual Assistants

OPENALEX - Publications

Sashank Gondala Lyan Verwimp Ernest Pusateri Manos Tsagkias C Gysel

Language models (LMs) for virtual assistants (VAs) are typically trained on large amounts of data, resulting in prohibitively which require excessive memory and/or cannot be used to serve user requests real-time. Entropy pruning results smaller but with significant degradation effectiveness the tail request distribution. We customize entropy by allowing a keep list infrequent n-grams that more relaxed threshold, and propose three methods construct list. Each method has its own advantages...

10.1109/icassp39728.2021.9415035 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Semantic Entities

OPENALEX - Publications

C Gysel Maarten de Rijke Marcel Worring

Entity retrieval has seen a lot of interest from the research community over past decade. Ten years ago, expertise task gained popularity in during TREC Enterprise Track [10]. It remained relevant ever since, while broadening to social media, tracking dynamics [1-5, 8, 11], and, more generally, range entity tasks.

10.1145/2810133.2810139 article EN 2015-10-13

Coming Soon ...