- Topic Modeling
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Quantum Information and Cryptography
- Quantum Computing Algorithms and Architecture
- Text and Document Classification Technologies
- Quantum Mechanics and Applications
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Internet Traffic Analysis and Secure E-voting
- Network Security and Intrusion Detection
- Web Data Mining and Analysis
- Explainable Artificial Intelligence (XAI)
- Adversarial Robustness in Machine Learning
- Speech Recognition and Synthesis
- Handwritten Text Recognition Techniques
- Advanced Graph Neural Networks
- Advanced Steganography and Watermarking Techniques
- Information Retrieval and Search Behavior
- Expert finding and Q&A systems
- Cryptography and Data Security
- Data Management and Algorithms
- Music and Audio Processing
- Advanced Computational Techniques and Applications
- Advanced Malware Detection Techniques
Borealis (Austria)
2021
China Academy of Launch Vehicle Technology
2021
University of Science and Technology of China
2007-2020
University of Waterloo
2017-2020
East China Normal University
2020
Institute of Electronics
2017-2019
Nankai University
2013-2019
Chinese Academy of Sciences
2017-2019
National Institute of Advanced Industrial Science and Technology
2019
Suzhou Research Institute
2010-2017
Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Jimmy Lin. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics (Demonstrations). 2019.
One technique to improve the retrieval effectiveness of a search engine is expand documents with terms that are related or representative documents' content.From perspective question answering system, this might comprise questions document can potentially answer. Following observation, we propose simple method predicts which queries will be issued for given and then expands it those predictions vanilla sequence-to-sequence model, trained using datasets consisting pairs query relevant...
The advent of deep neural networks pre-trained via language modeling tasks has spurred a number successful applications in natural processing. This work explores one such popular model, BERT, the context document ranking. We propose two variants, called monoBERT and duoBERT, that formulate ranking problem as pointwise pairwise classification, respectively. These models are arranged multi-stage architecture to form an end-to-end search system. One major advantage this design is ability trade...
Following recent successes in applying BERT to question answering, we explore simple applications ad hoc document retrieval. This required confronting the challenge posed by documents that are typically longer than length of input was designed handle. We address this issue inference on sentences individually, and then aggregating sentence scores produce scores. Experiments TREC microblog newswire test collections show our approach is yet effective, as report highest average precision these...
Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, Jimmy Lin. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Zeynep Akkalyoncu Yilmaz, Shengjin Wang, Wei Yang, Haotian Zhang, Jimmy Lin. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP): System Demonstrations. 2019.
Is neural IR mostly hype? In a recent SIGIR Forum article, Lin expressed skepticism that ranking models were actually improving ad hoc retrieval effectiveness in limited data scenarios. He provided anecdotal evidence authors of papers demonstrate "wins" by comparing against weak baselines. This paper provides rigorous evaluation those claims two ways: First, we conducted meta-analysis have reported experimental results on the TREC Robust04 test collection. We do not find an upward trend over...
Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Peng Xu, Dhruv Kumar, Wei Yang, Wenjie Zi, Keyi Tang, Chenyang Huang, Jackie Chi Kit Cheung, Simon J.D. Prince, Yanshuai Cao. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
Recently, a simple combination of passage retrieval using off-the-shelf IR techniques and BERT reader was found to be very effective for question answering directly on Wikipedia, yielding large improvement over the previous state art standard benchmark dataset. In this paper, we present data augmentation technique distant supervision that exploits positive as well negative examples. We apply stage-wise approach fine tuning multiple datasets, starting with is "furthest" from test ending...
Learning word embeddings has received a significant amount of attention recently. Often, are learned in an unsupervised manner from large collection text. The genre the text typically plays important role effectiveness resulting embeddings. How to effectively train embedding models using data different domains remains problem that is underexplored. In this paper, we present simple yet effective method for learning based on domains. We demonstrate our approach through extensive experiments...
Wei Yang, Luchen Tan, Chunwei Lu, Anqi Cui, Han Li, Xi Chen, Kun Xiong, Muzi Wang, Ming Jian Pei, Jimmy Lin. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 2019.
Despite substantial interest in applications of neural networks to information retrieval, ranking models have mostly been applied “standard” ad hoc retrieval tasks over web pages and newswire articles. This paper proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network), a novel model specifically designed for short social media posts. We identify document length, informal language, heterogeneous relevance signals as features that distinguish documents our domain,...
Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Lei Li. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies: Industry Papers. 2021.
Document summarization has been widely studied for many years. Existing methods mainly use statistical or linguistic information to extract the most informative sentences from document. However, those ignore relationship between different granularities (i.e., word, sentence, and topic). Actually, interactions can be used in document summarization. In this paper we proposed a method based on heterogeneous graph. The is first implemented by constructing graph which reflect size of granularity...
We tackle the problem of question answering directly on a large document collection, combining simple "bag words" passage retrieval with BERT-based reader for extracting answer spans. In context this architecture, we present data augmentation technique using distant supervision to automatically annotate paragraphs as either positive or negative examples supplement existing training data, which are then used together fine-tune BERT. explore number details that critical achieving high accuracy...
We present Capreolus, a toolkit designed to facilitate end-to-end it ad hoc retrieval experiments with neural networks by providing implementations of prominent ranking models within common framework. Our adopts standard reranking architecture via tight integration the Anserini for candidate document generation using bag-of-words approaches. Using we are able reproduce Yang et al.'s recent SIGIR 2019 finding that, in scenario on test collection from TREC 2004 Robust Track, many do not...