Zewen Chi

ORCID: 0000-0003-1615-1885
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Speech Recognition and Synthesis
  • Speech and dialogue systems
  • Text Readability and Simplification
  • Advanced Text Analysis Techniques
  • Biomedical Text Mining and Ontologies
  • Data Quality and Management
  • Advanced Neural Network Applications
  • Advanced Graph Neural Networks
  • Image Retrieval and Classification Techniques
  • Recommender Systems and Techniques
  • Video Analysis and Summarization
  • Text and Document Classification Technologies
  • Expert finding and Q&A systems
  • Computational and Text Analysis Methods
  • Generative Adversarial Networks and Image Synthesis
  • Sentiment Analysis and Opinion Mining
  • Aesthetic Perception and Analysis
  • Domain Adaptation and Few-Shot Learning
  • Parallel Computing and Optimization Techniques

Beijing Institute of Technology
2018-2024

Beijing Computing Center
2021-2024

Microsoft (Germany)
2023

Microsoft (Finland)
2021-2022

Microsoft Research (India)
2021

Microsoft (United States)
2021

Harbin Institute of Technology
2021

Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, Heyan Huang, Ming Zhou. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.

10.18653/v1/2021.naacl-main.280 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages. We propose to pretrain the encoder and decoder a sequence-to-sequence model under both monolingual cross-lingual settings. The pre-training objective encourages represent different languages in shared space, so that can conduct zero-shot transfer. After procedure, use data fine-tune pre-trained downstream NLG tasks. Then trained single be directly evaluated beyond...

10.1609/aaai.v34i05.6256 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, Multimodal Large Language Model (MLLM) that can perceive modalities, learn in context (i.e., few-shot), follow instructions zero-shot). Specifically, train Kosmos-1 from scratch on web-scale corpora, including arbitrarily interleaved text images, image-caption pairs, data. We evaluate various settings, zero-shot, few-shot,...

10.48550/arxiv.2302.14045 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Furu Wei. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.427 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning representations. More importantly, inspired by framework, propose a new task based on contrastive learning. Specifically, regard bilingual sentence pair two views of same meaning and encourage their encoded...

10.48550/arxiv.2007.07834 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The task of table structure recognition aims to recognize the internal a table, which is key step make machines understand tables. Currently, there are lots studies on this for different file formats such as ASCII text and HTML. It also attracts attention structures in PDF files. However, it hard existing methods accurately complicated tables contain spanning cells occupy at least two columns or rows. To address issue, we propose novel graph neural network recognizing files, named GraphTSR....

10.48550/arxiv.1908.04729 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang, Saksham Singhal, Xian-Ling Mao, Heyan Xia Song, Furu Wei. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.

10.18653/v1/2021.emnlp-main.125 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan Furu Wei. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.265 article EN cc-by 2021-01-01

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original input model-preferred prompts. Specifically, first perform supervised fine-tuning pretrained language model on small collection manually engineered Then use reinforcement learning explore better We define...

10.48550/arxiv.2212.09611 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.264 article EN cc-by 2021-01-01

Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is big convergence in terms architecture, most pretrained are typically still developed for specific tasks or modalities. In this work, we propose use language as general-purpose interface various foundation models. A collection encoders perceive diverse modalities (such vision, and language), they dock with model that plays the role universal task layer. We...

10.48550/arxiv.2206.06336 preprint EN other-oa arXiv (Cornell University) 2022-01-01

In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, present two pre-training tasks, namely multilingual replaced token detection, and translation detection. Besides, pretrain the model, named as XLM-E, on both parallel corpora. Our outperforms baseline models various understanding with much less computation cost. Moreover, analysis shows that XLM-E tends obtain better transferability.

10.48550/arxiv.2106.16138 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.acl-long.856 article EN cc-by 2023-01-01

Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual systems adopt randomly initialized Transformer backbone. In this work, inspired by the recent success of language pre-training, we present XLM-T, which initializes with an off-the-shelf pretrained cross-lingual encoder and fine-tunes it parallel data. This simple method achieves significant improvements on WMT dataset 10 pairs OPUS-100 corpus 94 pairs. Surprisingly, is...

10.48550/arxiv.2012.15547 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Pretrained language models have achieved great success in a wide range of natural processing (NLP) problems, because they learn representations from large-scale text corpora and can adapt to downstream tasks by finetuning them on annotated task data. However, such relies both data, so the lack training data is major practical problem for many languages, especially low-resource languages. In this paper, we explore whether pretrained English model benefit non-English NLP systems scenarios,...

10.1109/taslp.2023.3267618 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2023-04-17

Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens best-matched according their hidden representations. However, learning such encourages token clustering around expert centroids, implying trend toward representation collapse. In this work, we propose estimate scores between and on low-dimensional hypersphere. We conduct extensive experiments cross-lingual language...

10.48550/arxiv.2204.09179 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Cross-Lingual Summarization (CLS) is a task that extracts important information from source document and summarizes it into summary in another language. It challenging requires system to understand, summarize, translate at the same time, making highly related Monolingual (MS) Machine Translation (MT). In practice, training resources for are far more than cross-lingual monolingual summarization. Thus incorporating corpus CLS would be beneficial its performance. However, present work only...

10.1145/3477495.3532071 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022-07-06

Recently, it has attracted much attention to build reliable named entity recognition (NER) systems using limited annotated data. Nearly all existing works heavily rely on domain-specific resources, such as external lexicons and knowledge bases. However, resources are often not available, meanwhile it's difficult expensive construct the which become a key obstacle wider adoption. To tackle problem, in this work, we propose novel robust domain-adaptive approach RDANER for low-resource NER,...

10.1109/icbk50248.2020.00050 preprint EN 2020-08-01

Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling focus improving training or inference with better parallelization. In this work, we present TorchScale, an toolkit that allows researchers and developers to scale up efficiently effectively. TorchScale has the implementation of several modeling techniques, which can improve generality capability, as well stability efficiency. Experimental results language neural machine...

10.48550/arxiv.2211.13184 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Open-retrieval conversational machine reading comprehension (OCMRC) simulates real-life interaction scenes. Machines are required to make a decision of "Yes/No/Inquire" or generate follow-up question when the is "Inquire" based on retrieved rule texts, user scenario, and dialogue history. Recent studies try reduce information gap between decision-making generation, in order improve performance generation. However, still persists because these methods limited pipeline framework, where...

10.18653/v1/2023.acl-long.857 article EN cc-by 2023-01-01
Coming Soon ...