- Topic Modeling
- Natural Language Processing Techniques
- History of Medicine Studies
- Medical and Biological Sciences
- Medical History and Innovations
- Medicine and Dermatology Studies History
- Information Retrieval and Search Behavior
- Speech and dialogue systems
- Speech Recognition and Synthesis
- Semantic Web and Ontologies
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Expert finding and Q&A systems
- Text and Document Classification Technologies
- Web Data Mining and Analysis
- AI in Service Interactions
- Advanced Text Analysis Techniques
- dental development and anomalies
- Advanced Graph Neural Networks
- Neural Networks and Applications
- Oral and Maxillofacial Pathology
- Multimodal Machine Learning Applications
- Neurology and Historical Studies
- Machine Learning and Algorithms
- Social Media and Politics
Apple (United Kingdom)
2021-2025
Apple (United States)
2018-2024
University of Amsterdam
2015-2021
Apple (Israel)
2019-2021
Apple (Germany)
2021
Amsterdam University of the Arts
2015-2018
We introduce a novel latent vector space model that jointly learns the representations of words, e-commerce products and mapping between two without need for explicit annotations. The power lies in its ability to directly discriminative relation particular word. compare our method existing models (LSI, LDA word2vec) evaluate it as feature learning rank setting. Our achieves enhanced performance better product representations. Furthermore, from words benefit errors propagated back during...
We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In NVSM paradigm, we learn low-dimensional words and from scratch using gradient descent rank according to their similarity with query are composed word representations. show performs better at document ranking than existing latent semantic vector space methods. The addition mixture lexical language models state-of-the-art baseline model...
We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations way. compare our to state-of-the-art statistical vector space probabilistic generative approaches. Our proposed log-linear achieves retrieval performance levels document-centric methods with low inference cost so-called profile-centric It yields a...
Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC logs and compare performance of different lexical matching approaches Naive methods based on term frequency weighing perform par with specialized models. addition, investigate viability models in setting We give important insights into potential limitations search propose future directions field
Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands.The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives.Hence, the generation of synthetic queries similar existing VA usage can greatly improve upon VA's abilities-especially for use-cases do not (yet) occur in paired audio/text data.In this paper, we provide a...
Machine learning plays a role in many aspects of modern IR systems, and deep is applied all them. The fast pace modern-day research has given rise to different approaches for problems. amount information available can be overwhelming both junior students experienced researchers looking new topics directions. Additionally, it interesting see what key insights into problems the technologies are able give us. aim this full-day tutorial clear overview current tried-and-trusted neural methods how...
Email responses often contain items-such as a file or hyperlink to an external document-that are attached included inline in the body of message. Analysis enterprise email corpus reveals that 35% time when users include these items part their response, attachable item is already present inbox sent folder. A modern client can proactively retrieve relevant from user's past emails based on context current conversation, and recommend them for inclusion, reduce effort involved composing response....
Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, task of creating such laborious due amounts data that need be annotated, and intrinsic subjectivity itself. In this paper we study topical relevance from a user perspective by addressing problems ambiguity. We compare our approach results with established TREC annotation guidelines results. The comparison based series crowdsourcing pilots...
In this work, we uncover a theoretical connection between two language model interpolation techniques, count merging and Bayesian interpolation. We compare these techniques as well linear in three scenarios with abundant training data per component model. Consistent prior show that both outperform include the first (to our knowledge) published comparison of interpolation, showing perform similarly. Finally, argue other considerations will make preferred approach most circumstances.
Machine learning plays a role in many aspects of modern IR systems, and deep is applied all them. The fast pace modern-day research has given rise to approaches problems. amount information available can be overwhelming both for junior students experienced researchers looking new topics directions. aim this full- day tutorial give clear overview current tried-and-trusted neural methods how they benefit IR.
We introduce pytrec_eval, a Python interface to the trec_eval information retrieval evaluation toolkit. pytrec_eval exposes reference implementations of within as native extension. show that is around one order magnitude faster than invoking sub process from Python. Compared implementation NDCG, twice fast for practically-sized rankings. Finally, we demonstrate its effectiveness in an application where combined with Pyndri and OpenAI Gym query expansion learned using Q-learning.
Unsupervised learning of low-dimensional, semantic representations words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations our previously published entity representation models. The toolkit a unified interface to different algorithms, fine-grained parsing configuration can be used transparently with GPUs. addition, users easily modify existing models or implement their own in framework. After model...
We derive the political climate of social circles Twitter users using a weakly-supervised approach. By applying random walks over sub-sample Twitter's graph we infer distribution indicating presence eight Flemish parties in users' months before 2014 elections. The structure is induced through combination connection and retweet features combines information million tweets 14 follower connections. solely exploit do not rely on tweet content. For validation compare affiliation politically...
We focus on improving the effectiveness of a Virtual Assistant (VA) in recognizing emerging entities spoken queries. introduce method that uses historical user interactions to forecast which will gain popularity and become trending, it subsequently integrates predictions within Automated Speech Recognition (ASR) component VA. Experiments show our proposed approach results 20% relative reduction errors entity name utterances without degrading overall recognition quality system.
High-quality automatic speech recognition (ASR) is essential for virtual assistants (VAs) to work well. However, ASR often performs poorly on VA requests containing named entities. In this work, we start from the observation that many errors entities are inconsistent with real-world knowledge. We extend previous discriminative n-gram language modeling approaches incorporate knowledge a Knowledge Graph (KG), using features capture entity type-entity and entity-entity relationships. apply our...
Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges respect to modeling spoken information queries for virtual assistants, list opportunities where methods research can be applied improve the quality of assistant speech recognition. how query domain classification, knowledge graphs user interaction data, personalization helpful accurate recognition queries. Finally,...
Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on textual documents they are associated with. Recent semantic entity algorithms represent queries and experts finite-dimensional vector spaces, where both constructed from text sequences. We investigate spaces degree which capture structural regularities. Such an unsupervised manner without explicit information about aspects. For concreteness, we address these questions for...
Virtual assistants make use of automatic speech recognition (ASR) to help users answer entity-centric queries.However, spoken entity is a difficult problem, due the large number frequently-changing named entities.In addition, resources available for are constrained when ASR performed on-device.In this work, we investigate probabilistic grammars as language models within finite-state transducer (FST) framework.We introduce deterministic approximation that avoids explicit expansion...
Two products are substitutes if both can satisfy the same consumer need. Intrinsic incorporation of product substitutability - where is integrated within latent vector space models in contrast to extrinsic re-ranking result lists. The fusion text matching and objectives allows mix match regularities contained descriptions substitution relations. We introduce a method for intrinsically incorporating search that estimated using gradient descent; it integrates flawlessly with state-of-the-art...
Language models (LMs) for virtual assistants (VAs) are typically trained on large amounts of data, resulting in prohibitively which require excessive memory and/or cannot be used to serve user requests real-time. Entropy pruning results smaller but with significant degradation effectiveness the tail request distribution. We customize entropy by allowing a keep list infrequent n-grams that more relaxed threshold, and propose three methods construct list. Each method has its own advantages...
Entity retrieval has seen a lot of interest from the research community over past decade. Ten years ago, expertise task gained popularity in during TREC Enterprise Track [10]. It remained relevant ever since, while broadening to social media, tracking dynamics [1-5, 8, 11], and, more generally, range entity tasks.