- Pleistocene-Era Hominins and Archaeology
- Ancient and Medieval Archaeology Studies
- Geology and Paleoclimatology Research
- Speech Recognition and Synthesis
- Archaeology and ancient environmental studies
- Music and Audio Processing
- Advanced Bandit Algorithms Research
- Image Processing and 3D Reconstruction
- Speech and Audio Processing
- Recommender Systems and Techniques
- Marine and environmental studies
- Natural Language Processing Techniques
- Geological Formations and Processes Exploration
- Information Retrieval and Search Behavior
- Web Data Mining and Analysis
- Machine Learning and Algorithms
- Topic Modeling
- Protein Structure and Dynamics
- Expert finding and Q&A systems
- Optimization and Search Problems
- Machine Learning in Bioinformatics
- Speech and dialogue systems
- Text and Document Classification Technologies
- Forensic Anthropology and Bioarchaeology Studies
- Multimodal Machine Learning Applications
Leiden University
2009-2024
University of Cologne
2014-2021
National Yang Ming Chiao Tung University
2021
Deutsches Archäologisches Institut, Zentrale
2021
Snap (United States)
2019
Zhejiang Financial College
2018
Alibaba Group (China)
2017-2018
Alibaba Group (United States)
2017
Microsoft (United States)
2011-2015
University of Reading
2012-2013
Personalized web services strive to adapt their (advertisements, news articles, etc) individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, service is featured with dynamically changing pools content, rendering traditional collaborative filtering methods inapplicable. Second, the scale most practical interest calls solutions that are fast in learning computation. In work, we model...
Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news in general. \emph{Offline} evaluation of the effectiveness new these applications is critical protecting user experiences but very challenging due to their "partial-label" nature. Common practice create a simulator which simulates environment problem at hand then run an algorithm against this simulator. However, creating itself often difficult modeling bias usually...
Recommender systems are widely used in online e-commerce applications to improve user engagement and then increase revenue. A key challenge for recommender is providing high quality recommendation users ``cold-start" situations. We consider three types of cold-start problems: 1) on existing items new users; 2) 3) users. propose predictive feature-based regression models that leverage all available information items, such as demographic item content features, tackle problems. The resulting...
User behavior provides many cues to improve the relevance of search results through personalization. One aspect user that especially strong signals for delivering better is an individual's history queries and clicked documents. Previous studies have explored how short-term or long-term can be predictive relevance. Ours first study assess (session) (historic) interact, each may used in isolation combination optimally contribute gains Our key findings include: historic substantial benefits at...
Minghui Qiu, Feng-Lin Li, Siyu Wang, Xing Gao, Yan Chen, Weipeng Zhao, Haiqing Jun Huang, Wei Chu. Proceedings of the 55th Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2017.
In Web-based services of dynamic content (such as news articles), recommender systems face the difficulty timely identifying new items high-quality and providing recommendations for users. We propose a feature-based machine learning approach to personalized recommendation that is capable handling cold-start issue effectively. maintain profiles interest, in which temporal characteristics content, e.g. popularity freshness, are updated real-time manner. also users including demographic...
Personalized search systems tailor results to the current user intent using historic interactions. This relies on being able find pertinent information in that user's history, which can be challenging for unseen queries and new scenarios. Building richer models of users' tasks help improve likelihood finding relevant content enhance relevance coverage personalization methods. The task-based approach applied or as we focus here, all histories so-called "groupization" (a variant whereby other...
Precipitation prediction, such as short-term rainfall is a very important problem in the field of meteorological service. In practice, most recent studies focus on leveraging radar data or satellite images to make predictions. However, there another scenario where set weather features are collected by various sensors at multiple observation sites. The observations site sometimes incomplete but provide clues for prediction nearby sites, which not fully exploited existing work yet. To solve...
Nowadays, it is a heated topic for many industries to build automatic question-answering (QA) systems. A key solution these QA systems retrieve from knowledge base the most similar question of given question, which can be reformulated as paraphrase identification (PI) or natural language inference (NLI) problem. However, existing models PI and NLI have at least two problems: They rely on large amount labeled data, not always available in real scenarios, they may efficient industrial...
Accurate estimation of post-click conversion rate is critical for building recommender systems, which has long been confronted with sample selection bias and data sparsity issues. Methods in the Entire Space Multi-task Model (ESMM) family leverage sequential pattern user actions, i.e. $impression\rightarrow click \rightarrow conversion$ to address issue. However, they still fail ensure unbiasedness CVR estimates. In this paper, we theoretically demonstrate that ESMM suffers from following...
Search tasks, comprising a series of search queries serving the same information need, have recently been recognized as an accurate atomic unit for modeling user intent. Most prior research in this area has focused on short-term tasks within single session, and heavily depend human annotations supervised classification model learning. In work, we target identification long-term, or cross-session, (transcending session boundaries) by investigating inter-query dependencies learned from users'...
Search engines train and apply a single ranking model across all users, but searchers' information needs are diverse cover broad range of topics. Hence, user-independent is insufficient to satisfy different users' result preferences. Conventional personalization methods learn separate models user interests use those re-rank the results from generic model. Those require significant history preferences, have low coverage in case memory-based that direct associations between query-URL pairs,...
Knowledge-driven conversation approaches have achieved remarkable research attention recently. However, generating an informative response with multiple relevant knowledge without losing fluency and coherence is still one of the main challenges. To address this issue, paper proposes a method that uses recurrent interaction among decoding steps to incorporate appropriate knowledge. Furthermore, we introduce copy mechanism using knowledge-aware pointer network words from external according...
Early Upper Paleolithic sites in the Danube catchment have been put forward as evidence that river was an important conduit for modern humans during their initial settlement of Europe. Central to this model is Carpathian Basin, a region covering most Middle Danube. As archaeological record still poorly understood, paper aims provide contextual assessment Basin's geological and paleoenvironmental archives, starting with late Pleistocene. Subsequently, it compiles early data from synchronic...
Conjoint analysis is one of the most popular market research methodologies for assessing how customers with heterogeneous preferences appraise various objective characteristics in products or services, which provides critical inputs many marketing decisions, e.g. optimal design new and target selection. Nowadays it becomes practical e-commercial applications to collect millions samples quickly. However, large-scale data sets make traditional conjoint coupled sophisticated Monte Carlo...
In this paper, we present an end-to-end automatic speech recognition system, which successfully employs subword units in a hybrid CTC-Attention based system. The are obtained by the byte-pair encoding (BPE) compression algorithm. Compared to using words as modeling units, characters or does not suffer from out-of-vocabulary (OOV) problem. Furthermore, further offers capability longer context than characters. We evaluate different systems over LibriSpeech 1000h dataset. subword-based system...
We propose a CTC alignment-based single step non-autoregressive transformer (CASS-NAT) for speech recognition. Specifically, the alignment contains information of (a) number tokens decoder input, and (b) time span acoustics each token. The are used to extract acoustic representation token in parallel, referred as token-level embedding which substitutes word autoregressive (AT) achieve parallel generation decoder. During inference, an error-based sampling method is proposed be applied output...
Singing voice conversion is a task to convert song sang by source singer the of target singer. In this paper, we propose using parallel data free, many-to-one technique on singing voices. A phonetic posterior feature first generated decoding voices through robust Automatic Speech Recognition Engine (ASR). Then, trained Recurrent Neural Network (RNN) with Deep Bidirectional Long Short Term Memory (DBLSTM) structure used model mapping from person-independent content acoustic features person....
Over the past few years, major web search engines have introduced knowledge bases to offer popular facts about people, places, and things on entity pane next regular results. In addition information searched by user, often provides a ranked list of related entities. To keep users engaged, it is important develop recommendation model that tailors entities individual user interests. We propose probabilistic Three-way Entity Model (TEM) personalized using three data sources: base, click log,...