Wei Yang

ORCID: 0000-0003-1266-048X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Advanced Text Analysis Techniques
  • Quantum Information and Cryptography
  • Quantum Computing Algorithms and Architecture
  • Text and Document Classification Technologies
  • Quantum Mechanics and Applications
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Internet Traffic Analysis and Secure E-voting
  • Network Security and Intrusion Detection
  • Web Data Mining and Analysis
  • Explainable Artificial Intelligence (XAI)
  • Adversarial Robustness in Machine Learning
  • Speech Recognition and Synthesis
  • Handwritten Text Recognition Techniques
  • Advanced Graph Neural Networks
  • Advanced Steganography and Watermarking Techniques
  • Information Retrieval and Search Behavior
  • Expert finding and Q&A systems
  • Cryptography and Data Security
  • Data Management and Algorithms
  • Music and Audio Processing
  • Advanced Computational Techniques and Applications
  • Advanced Malware Detection Techniques

Borealis (Austria)
2021

China Academy of Launch Vehicle Technology
2021

University of Science and Technology of China
2007-2020

University of Waterloo
2017-2020

East China Normal University
2020

Institute of Electronics
2017-2019

Nankai University
2013-2019

Chinese Academy of Sciences
2017-2019

National Institute of Advanced Industrial Science and Technology
2019

Suzhou Research Institute
2010-2017

Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Jimmy Lin. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics (Demonstrations). 2019.

10.18653/v1/n19-4013 preprint EN 2019-01-01

One technique to improve the retrieval effectiveness of a search engine is expand documents with terms that are related or representative documents' content.From perspective question answering system, this might comprise questions document can potentially answer. Following observation, we propose simple method predicts which queries will be issued for given and then expands it those predictions vanilla sequence-to-sequence model, trained using datasets consisting pairs query relevant...

10.48550/arxiv.1904.08375 preprint EN other-oa arXiv (Cornell University) 2019-01-01

The advent of deep neural networks pre-trained via language modeling tasks has spurred a number successful applications in natural processing. This work explores one such popular model, BERT, the context document ranking. We propose two variants, called monoBERT and duoBERT, that formulate ranking problem as pointwise pairwise classification, respectively. These models are arranged multi-stage architecture to form an end-to-end search system. One major advantage this design is ability trade...

10.48550/arxiv.1910.14424 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Following recent successes in applying BERT to question answering, we explore simple applications ad hoc document retrieval. This required confronting the challenge posed by documents that are typically longer than length of input was designed handle. We address this issue inference on sentences individually, and then aggregating sentence scores produce scores. Experiments TREC microblog newswire test collections show our approach is yet effective, as report highest average precision these...

10.48550/arxiv.1903.10972 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, Jimmy Lin. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1352 article EN cc-by 2019-01-01

Zeynep Akkalyoncu Yilmaz, Shengjin Wang, Wei Yang, Haotian Zhang, Jimmy Lin. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP): System Demonstrations. 2019.

10.18653/v1/d19-3004 article EN 2019-01-01

Is neural IR mostly hype? In a recent SIGIR Forum article, Lin expressed skepticism that ranking models were actually improving ad hoc retrieval effectiveness in limited data scenarios. He provided anecdotal evidence authors of papers demonstrate "wins" by comparing against weak baselines. This paper provides rigorous evaluation those claims two ways: First, we conducted meta-analysis have reported experimental results on the TREC Robust04 test collection. We do not find an upward trend over...

10.1145/3331184.3331340 preprint EN 2019-07-18

Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1540 article EN cc-by 2019-01-01

Peng Xu, Dhruv Kumar, Wei Yang, Wenjie Zi, Keyi Tang, Chenyang Huang, Jackie Chi Kit Cheung, Simon J.D. Prince, Yanshuai Cao. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.163 article EN cc-by 2021-01-01

Recently, a simple combination of passage retrieval using off-the-shelf IR techniques and BERT reader was found to be very effective for question answering directly on Wikipedia, yielding large improvement over the previous state art standard benchmark dataset. In this paper, we present data augmentation technique distant supervision that exploits positive as well negative examples. We apply stage-wise approach fine tuning multiple datasets, starting with is "furthest" from test ending...

10.48550/arxiv.1904.06652 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Learning word embeddings has received a significant amount of attention recently. Often, are learned in an unsupervised manner from large collection text. The genre the text typically plays important role effectiveness resulting embeddings. How to effectively train embedding models using data different domains remains problem that is underexplored. In this paper, we present simple yet effective method for learning based on domains. We demonstrate our approach through extensive experiments...

10.18653/v1/d17-1312 preprint EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2017-01-01

Wei Yang, Luchen Tan, Chunwei Lu, Anqi Cui, Han Li, Xi Chen, Kun Xiong, Muzi Wang, Ming Jian Pei, Jimmy Lin. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 2019.

10.18653/v1/n19-2008 article EN 2019-01-01

Despite substantial interest in applications of neural networks to information retrieval, ranking models have mostly been applied “standard” ad hoc retrieval tasks over web pages and newswire articles. This paper proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network), a novel model specifically designed for short social media posts. We identify document length, informal language, heterogeneous relevance signals as features that distinguish documents our domain,...

10.1609/aaai.v33i01.3301232 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Lei Li. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies: Industry Papers. 2021.

10.18653/v1/2021.naacl-industry.15 article EN cc-by 2021-01-01

Document summarization has been widely studied for many years. Existing methods mainly use statistical or linguistic information to extract the most informative sentences from document. However, those ignore relationship between different granularities (i.e., word, sentence, and topic). Actually, interactions can be used in document summarization. In this paper we proposed a method based on heterogeneous graph. The is first implemented by constructing graph which reflect size of granularity...

10.1109/fskd.2012.6234047 article EN 2012-05-01

We tackle the problem of question answering directly on a large document collection, combining simple "bag words" passage retrieval with BERT-based reader for extracting answer spans. In context this architecture, we present data augmentation technique using distant supervision to automatically annotate paragraphs as either positive or negative examples supplement existing training data, which are then used together fine-tune BERT. explore number details that critical achieving high accuracy...

10.1145/3366423.3380060 article EN 2020-04-20

We present Capreolus, a toolkit designed to facilitate end-to-end it ad hoc retrieval experiments with neural networks by providing implementations of prominent ranking models within common framework. Our adopts standard reranking architecture via tight integration the Anserini for candidate document generation using bag-of-words approaches. Using we are able reproduce Yang et al.'s recent SIGIR 2019 finding that, in scenario on test collection from TREC 2004 Robust Track, many do not...

10.1145/3336191.3371868 article EN 2020-01-20
Coming Soon ...