Wei Chu

ORCID: 0000-0002-4595-388X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Pleistocene-Era Hominins and Archaeology
  • Ancient and Medieval Archaeology Studies
  • Geology and Paleoclimatology Research
  • Speech Recognition and Synthesis
  • Archaeology and ancient environmental studies
  • Music and Audio Processing
  • Advanced Bandit Algorithms Research
  • Image Processing and 3D Reconstruction
  • Speech and Audio Processing
  • Recommender Systems and Techniques
  • Marine and environmental studies
  • Natural Language Processing Techniques
  • Geological Formations and Processes Exploration
  • Information Retrieval and Search Behavior
  • Web Data Mining and Analysis
  • Machine Learning and Algorithms
  • Topic Modeling
  • Protein Structure and Dynamics
  • Expert finding and Q&A systems
  • Optimization and Search Problems
  • Machine Learning in Bioinformatics
  • Speech and dialogue systems
  • Text and Document Classification Technologies
  • Forensic Anthropology and Bioarchaeology Studies
  • Multimodal Machine Learning Applications

Leiden University
2009-2024

University of Cologne
2014-2021

National Yang Ming Chiao Tung University
2021

Deutsches Archäologisches Institut, Zentrale
2021

Snap (United States)
2019

Zhejiang Financial College
2018

Alibaba Group (China)
2017-2018

Alibaba Group (United States)
2017

Microsoft (United States)
2011-2015

University of Reading
2012-2013

Personalized web services strive to adapt their (advertisements, news articles, etc) individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, service is featured with dynamically changing pools content, rendering traditional collaborative filtering methods inapplicable. Second, the scale most practical interest calls solutions that are fast in learning computation. In work, we model...

10.1145/1772690.1772758 preprint EN 2010-04-26

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news in general. \emph{Offline} evaluation of the effectiveness new these applications is critical protecting user experiences but very challenging due to their "partial-label" nature. Common practice create a simulator which simulates environment problem at hand then run an algorithm against this simulator. However, creating itself often difficult modeling bias usually...

10.1145/1935826.1935878 preprint EN 2011-02-01

Recommender systems are widely used in online e-commerce applications to improve user engagement and then increase revenue. A key challenge for recommender is providing high quality recommendation users ``cold-start" situations. We consider three types of cold-start problems: 1) on existing items new users; 2) 3) users. propose predictive feature-based regression models that leverage all available information items, such as demographic item content features, tackle problems. The resulting...

10.1145/1639714.1639720 article EN 2009-10-23

User behavior provides many cues to improve the relevance of search results through personalization. One aspect user that especially strong signals for delivering better is an individual's history queries and clicked documents. Previous studies have explored how short-term or long-term can be predictive relevance. Ours first study assess (session) (historic) interact, each may used in isolation combination optimally contribute gains Our key findings include: historic substantial benefits at...

10.1145/2348283.2348312 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2012-08-12

Minghui Qiu, Feng-Lin Li, Siyu Wang, Xing Gao, Yan Chen, Weipeng Zhao, Haiqing Jun Huang, Wei Chu. Proceedings of the 55th Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2017.

10.18653/v1/p17-2079 article EN cc-by 2017-01-01

In Web-based services of dynamic content (such as news articles), recommender systems face the difficulty timely identifying new items high-quality and providing recommendations for users. We propose a feature-based machine learning approach to personalized recommendation that is capable handling cold-start issue effectively. maintain profiles interest, in which temporal characteristics content, e.g. popularity freshness, are updated real-time manner. also users including demographic...

10.1145/1526709.1526802 article EN 2009-04-20

Personalized search systems tailor results to the current user intent using historic interactions. This relies on being able find pertinent information in that user's history, which can be challenging for unseen queries and new scenarios. Building richer models of users' tasks help improve likelihood finding relevant content enhance relevance coverage personalization methods. The task-based approach applied or as we focus here, all histories so-called "groupization" (a variant whereby other...

10.1145/2488388.2488511 article EN 2013-05-13

Precipitation prediction, such as short-term rainfall is a very important problem in the field of meteorological service. In practice, most recent studies focus on leveraging radar data or satellite images to make predictions. However, there another scenario where set weather features are collected by various sensors at multiple observation sites. The observations site sometimes incomplete but provide clues for prediction nearby sites, which not fully exploited existing work yet. To solve...

10.1109/icdm.2017.49 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2017-11-01

Nowadays, it is a heated topic for many industries to build automatic question-answering (QA) systems. A key solution these QA systems retrieve from knowledge base the most similar question of given question, which can be reformulated as paraphrase identification (PI) or natural language inference (NLI) problem. However, existing models PI and NLI have at least two problems: They rely on large amount labeled data, not always available in real scenarios, they may efficient industrial...

10.1145/3159652.3159685 article EN 2018-02-02

Accurate estimation of post-click conversion rate is critical for building recommender systems, which has long been confronted with sample selection bias and data sparsity issues. Methods in the Entire Space Multi-task Model (ESMM) family leverage sequential pattern user actions, i.e. $impression\rightarrow click \rightarrow conversion$ to address issue. However, they still fail ensure unbiasedness CVR estimates. In this paper, we theoretically demonstrate that ESMM suffers from following...

10.1145/3477495.3531972 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022-07-06

Search tasks, comprising a series of search queries serving the same information need, have recently been recognized as an accurate atomic unit for modeling user intent. Most prior research in this area has focused on short-term tasks within single session, and heavily depend human annotations supervised classification model learning. In work, we target identification long-term, or cross-session, (transcending session boundaries) by investigating inter-query dependencies learned from users'...

10.1145/2488388.2488507 article EN 2013-05-13

Search engines train and apply a single ranking model across all users, but searchers' information needs are diverse cover broad range of topics. Hence, user-independent is insufficient to satisfy different users' result preferences. Conventional personalization methods learn separate models user interests use those re-rank the results from generic model. Those require significant history preferences, have low coverage in case memory-based that direct associations between query-URL pairs,...

10.1145/2484028.2484068 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2013-07-28

Knowledge-driven conversation approaches have achieved remarkable research attention recently. However, generating an informative response with multiple relevant knowledge without losing fluency and coherence is still one of the main challenges. To address this issue, paper proposes a method that uses recurrent interaction among decoding steps to incorporate appropriate knowledge. Furthermore, we introduce copy mechanism using knowledge-aware pointer network words from external according...

10.18653/v1/2020.acl-main.6 article EN 2020-01-01

10.1179/0197726114z.00000000045 article EN Lithic Technology 2014-10-09

Early Upper Paleolithic sites in the Danube catchment have been put forward as evidence that river was an important conduit for modern humans during their initial settlement of Europe. Central to this model is Carpathian Basin, a region covering most Middle Danube. As archaeological record still poorly understood, paper aims provide contextual assessment Basin's geological and paleoenvironmental archives, starting with late Pleistocene. Subsequently, it compiles early data from synchronic...

10.1007/s10963-018-9115-1 article EN cc-by Journal of World Prehistory 2018-05-29

10.1109/icassp49660.2025.10888243 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Conjoint analysis is one of the most popular market research methodologies for assessing how customers with heterogeneous preferences appraise various objective characteristics in products or services, which provides critical inputs many marketing decisions, e.g. optimal design new and target selection. Nowadays it becomes practical e-commercial applications to collect millions samples quickly. However, large-scale data sets make traditional conjoint coupled sophisticated Monte Carlo...

10.1145/1557019.1557138 article EN 2009-06-28

In this paper, we present an end-to-end automatic speech recognition system, which successfully employs subword units in a hybrid CTC-Attention based system. The are obtained by the byte-pair encoding (BPE) compression algorithm. Compared to using words as modeling units, characters or does not suffer from out-of-vocabulary (OOV) problem. Furthermore, further offers capability longer context than characters. We evaluate different systems over LibriSpeech 1000h dataset. subword-based system...

10.1109/iscslp.2018.8706675 article EN 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2018-11-01

We propose a CTC alignment-based single step non-autoregressive transformer (CASS-NAT) for speech recognition. Specifically, the alignment contains information of (a) number tokens decoder input, and (b) time span acoustics each token. The are used to extract acoustic representation token in parallel, referred as token-level embedding which substitutes word autoregressive (AT) achieve parallel generation decoder. During inference, an error-based sampling method is proposed be applied output...

10.1109/icassp39728.2021.9413429 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Singing voice conversion is a task to convert song sang by source singer the of target singer. In this paper, we propose using parallel data free, many-to-one technique on singing voices. A phonetic posterior feature first generated decoding voices through robust Automatic Speech Recognition Engine (ASR). Then, trained Recurrent Neural Network (RNN) with Deep Bidirectional Long Short Term Memory (DBLSTM) structure used model mapping from person-independent content acoustic features person....

10.1109/mipr.2019.00059 preprint EN 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) 2019-03-01

Over the past few years, major web search engines have introduced knowledge bases to offer popular facts about people, places, and things on entity pane next regular results. In addition information searched by user, often provides a ranked list of related entities. To keep users engaged, it is important develop recommendation model that tailors entities individual user interests. We propose probabilistic Three-way Entity Model (TEM) personalized using three data sources: base, click log,...

10.1145/2684822.2685304 article EN 2015-01-28
Coming Soon ...