C Gysel

ORCID: 0000-0003-3433-7317
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • History of Medicine Studies
  • Medical and Biological Sciences
  • Medical History and Innovations
  • Medicine and Dermatology Studies History
  • Information Retrieval and Search Behavior
  • Speech and dialogue systems
  • Speech Recognition and Synthesis
  • Semantic Web and Ontologies
  • Advanced Image and Video Retrieval Techniques
  • Image Retrieval and Classification Techniques
  • Expert finding and Q&A systems
  • Text and Document Classification Technologies
  • Web Data Mining and Analysis
  • AI in Service Interactions
  • Advanced Text Analysis Techniques
  • dental development and anomalies
  • Advanced Graph Neural Networks
  • Neural Networks and Applications
  • Oral and Maxillofacial Pathology
  • Multimodal Machine Learning Applications
  • Neurology and Historical Studies
  • Machine Learning and Algorithms
  • Social Media and Politics

Apple (United Kingdom)
2021-2025

Apple (United States)
2018-2024

University of Amsterdam
2015-2021

Apple (Israel)
2019-2021

Apple (Germany)
2021

Amsterdam University of the Arts
2015-2018

We introduce a novel latent vector space model that jointly learns the representations of words, e-commerce products and mapping between two without need for explicit annotations. The power lies in its ability to directly discriminative relation particular word. compare our method existing models (LSI, LDA word2vec) evaluate it as feature learning rank setting. Our achieves enhanced performance better product representations. Furthermore, from words benefit errors propagated back during...

10.1145/2983323.2983702 preprint EN 2016-10-24

We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In NVSM paradigm, we learn low-dimensional words and from scratch using gradient descent rank according to their similarity with query are composed word representations. show performs better at document ranking than existing latent semantic vector space methods. The addition mixture lexical language models state-of-the-art baseline model...

10.1145/3196826 article EN ACM transactions on office information systems 2018-06-26

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations way. compare our to state-of-the-art statistical vector space probabilistic generative approaches. Our proposed log-linear achieves retrieval performance levels document-centric methods with low inference cost so-called profile-centric It yields a...

10.1145/2872427.2882974 preprint EN 2016-04-11

10.1109/icassp49660.2025.10888744 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC logs and compare performance of different lexical matching approaches Naive methods based on term frequency weighing perform par with specialized models. addition, investigate viability models in setting We give important insights into potential limitations search propose future directions field

10.1145/2970398.2970422 preprint EN 2016-09-09

Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands.The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives.Hence, the generation of synthetic queries similar existing VA usage can greatly improve upon VA's abilities-especially for use-cases do not (yet) occur in paired audio/text data.In this paper, we provide a...

10.1145/3626772.3661355 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Machine learning plays a role in many aspects of modern IR systems, and deep is applied all them. The fast pace modern-day research has given rise to different approaches for problems. amount information available can be overwhelming both junior students experienced researchers looking new topics directions. Additionally, it interesting see what key insights into problems the technologies are able give us. aim this full-day tutorial clear overview current tried-and-trusted neural methods how...

10.1145/3077136.3082062 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2017-07-28

Email responses often contain items-such as a file or hyperlink to an external document-that are attached included inline in the body of message. Analysis enterprise email corpus reveals that 35% time when users include these items part their response, attachable item is already present inbox sent folder. A modern client can proactively retrieve relevant from user's past emails based on context current conversation, and recommend them for inclusion, reduce effort involved composing response....

10.1145/3132847.3132979 preprint EN 2017-11-06

Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, task of creating such laborious due amounts data that need be annotated, and intrinsic subjectivity itself. In this paper we study topical relevance from a user perspective by addressing problems ambiguity. We compare our approach results with established TREC annotation guidelines results. The comparison based series crowdsourcing pilots...

10.1145/3269206.3271779 article EN 2018-10-17

In this work, we uncover a theoretical connection between two language model interpolation techniques, count merging and Bayesian interpolation. We compare these techniques as well linear in three scenarios with abundant training data per component model. Consistent prior show that both outperform include the first (to our knowledge) published comparison of interpolation, showing perform similarly. Finally, argue other considerations will make preferred approach most circumstances.

10.21437/interspeech.2019-1822 article EN Interspeech 2022 2019-09-13

Machine learning plays a role in many aspects of modern IR systems, and deep is applied all them. The fast pace modern-day research has given rise to approaches problems. amount information available can be overwhelming both for junior students experienced researchers looking new topics directions. aim this full- day tutorial give clear overview current tried-and-trusted neural methods how they benefit IR.

10.1145/3159652.3162009 article EN 2018-02-02

We introduce pytrec_eval, a Python interface to the trec_eval information retrieval evaluation toolkit. pytrec_eval exposes reference implementations of within as native extension. show that is around one order magnitude faster than invoking sub process from Python. Compared implementation NDCG, twice fast for practically-sized rankings. Finally, we demonstrate its effectiveness in an application where combined with Pyndri and OpenAI Gym query expansion learned using Q-learning.

10.1145/3209978.3210065 preprint EN 2018-06-27

Unsupervised learning of low-dimensional, semantic representations words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations our previously published entity representation models. The toolkit a unified interface to different algorithms, fine-grained parsing configuration can be used transparently with GPUs. addition, users easily modify existing models or implement their own in framework. After model...

10.48550/arxiv.1706.03757 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We derive the political climate of social circles Twitter users using a weakly-supervised approach. By applying random walks over sub-sample Twitter's graph we infer distribution indicating presence eight Flemish parties in users' months before 2014 elections. The structure is induced through combination connection and retweet features combines information million tweets 14 follower connections. solely exploit do not rely on tweet content. For validation compare affiliation politically...

10.1609/icwsm.v9i1.14650 article EN Proceedings of the International AAAI Conference on Web and Social Media 2021-08-03

We focus on improving the effectiveness of a Virtual Assistant (VA) in recognizing emerging entities spoken queries. introduce method that uses historical user interactions to forecast which will gain popularity and become trending, it subsequently integrates predictions within Automated Speech Recognition (ASR) component VA. Experiments show our proposed approach results 20% relative reduction errors entity name utterances without degrading overall recognition quality system.

10.1145/3397271.3401298 preprint EN 2020-07-25

High-quality automatic speech recognition (ASR) is essential for virtual assistants (VAs) to work well. However, ASR often performs poorly on VA requests containing named entities. In this work, we start from the observation that many errors entities are inconsistent with real-world knowledge. We extend previous discriminative n-gram language modeling approaches incorporate knowledge a Knowledge Graph (KG), using features capture entity type-entity and entity-entity relationships. apply our...

10.21437/interspeech.2021-1767 article EN Interspeech 2022 2021-08-27

Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges respect to modeling spoken information queries for virtual assistants, list opportunities where methods research can be applied improve the quality of assistant speech recognition. how query domain classification, knowledge graphs user interaction data, personalization helpful accurate recognition queries. Finally,...

10.1145/3539618.3591849 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on textual documents they are associated with. Recent semantic entity algorithms represent queries and experts finite-dimensional vector spaces, where both constructed from text sequences. We investigate spaces degree which capture structural regularities. Such an unsupervised manner without explicit information about aspects. For concreteness, we address these questions for...

10.1145/3121050.3121066 preprint EN 2017-09-29

Virtual assistants make use of automatic speech recognition (ASR) to help users answer entity-centric queries.However, spoken entity is a difficult problem, due the large number frequently-changing named entities.In addition, resources available for are constrained when ASR performed on-device.In this work, we investigate probabilistic grammars as language models within finite-state transducer (FST) framework.We introduce deterministic approximation that avoids explicit expansion...

10.21437/interspeech.2022-193 article EN Interspeech 2022 2022-09-16

Two products are substitutes if both can satisfy the same consumer need. Intrinsic incorporation of product substitutability - where is integrated within latent vector space models in contrast to extrinsic re-ranking result lists. The fusion text matching and objectives allows mix match regularities contained descriptions substitution relations. We introduce a method for intrinsically incorporating search that estimated using gradient descent; it integrates flawlessly with state-of-the-art...

10.1145/3269206.3271668 article EN 2018-10-17

Language models (LMs) for virtual assistants (VAs) are typically trained on large amounts of data, resulting in prohibitively which require excessive memory and/or cannot be used to serve user requests real-time. Entropy pruning results smaller but with significant degradation effectiveness the tail request distribution. We customize entropy by allowing a keep list infrequent n-grams that more relaxed threshold, and propose three methods construct list. Each method has its own advantages...

10.1109/icassp39728.2021.9415035 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Entity retrieval has seen a lot of interest from the research community over past decade. Ten years ago, expertise task gained popularity in during TREC Enterprise Track [10]. It remained relevant ever since, while broadening to social media, tracking dynamics [1-5, 8, 11], and, more generally, range entity tasks.

10.1145/2810133.2810139 article EN 2015-10-13
Coming Soon ...