Chenlei Guo

ORCID: 0009-0006-0502-947X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Recommender Systems and Techniques
  • Visual Attention and Saliency Detection
  • Text and Document Classification Technologies
  • Visual perception and processing mechanisms
  • Multimodal Machine Learning Applications
  • Sentiment Analysis and Opinion Mining
  • Machine Learning and Data Classification
  • Expert finding and Q&A systems
  • Information Retrieval and Search Behavior
  • Machine Learning in Healthcare
  • Mobile Crowdsensing and Crowdsourcing
  • Speech Recognition and Synthesis
  • Web Data Mining and Analysis
  • Domain Adaptation and Few-Shot Learning
  • Gaze Tracking and Assistive Technology
  • Text Readability and Simplification
  • Advanced Image and Video Retrieval Techniques
  • Semantic Web and Ontologies
  • AI in Service Interactions
  • Multi-Agent Systems and Negotiation
  • Human Mobility and Location-Based Analysis
  • Image and Signal Denoising Methods

Amazon (United States)
2020-2024

Oregon State University
2022

University of Virginia
2022

Amazon (Germany)
2019-2022

LinkedIn (United States)
2022

Harbin Institute of Technology
2021

Carnegie Mellon University
2009-2010

Fudan University
2006-2009

Salient areas in natural scenes are generally regarded as which the human eye will typically focus on, and finding these is key step object detection. In computer vision, many models have been proposed to simulate behavior of eyes such SaliencyToolBox (STB), Neuromorphic Vision Toolkit (NVT), others, but they demand high computational cost computing useful results mostly relies on their choice parameters. Although some region-based approaches were reduce complexity feature maps, still not...

10.1109/tip.2009.2030969 article EN IEEE Transactions on Image Processing 2009-08-25

Salient areas in natural scenes are generally regarded as the candidates of attention focus human eyes, which is key stage object detection. In computer vision, many models have been proposed to simulate behavior eyes such SaliencyToolBox (STB), neuromorphic vision toolkit (NVT) and etc., but they demand high computational cost their remarkable results mostly rely on choice parameters. Recently a simple fast approach based Fourier transform called spectral residual (SR) was proposed, used SR...

10.1109/cvpr.2008.4587715 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2008-06-01

Knowledge distillation is typically conducted by training a small model (the student) to mimic large and cumbersome teacher). The idea compress the knowledge from teacher using its output probabilities as soft-labels optimize student. However, when considerably large, there no guarantee that internal of will be transferred into student; even if student closely matches soft-labels, representations may different. This mismatch can undermine generalization capabilities originally intended In...

10.1609/aaai.v34i05.6229 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Dingcheng Li, Zheng Chen, Eunah Cho, Jie Hao, Xiaohu Liu, Fan Xing, Chenlei Guo, Yang Liu. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

10.18653/v1/2022.naacl-main.398 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

Today, most of the large-scale conversational AI agents such as Alexa, Siri, or Google Assistant are built using manually annotated data to train different components system including Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Entity Resolution (ER). Typically, accuracy machine learning models in these improved by transcribing annotating data. As scope systems increase cover more scenarios domains, manual annotation improve becomes prohibitively costly time...

10.1609/aaai.v34i08.7022 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

We present a methodology for the automatic identification and delineation of germ-layer components in H&E stained images teratomas derived from human nonhuman primate embryonic stem cells. A knowledge understanding biology these cells may lead to advances tissue regeneration repair, treatment genetic developmental syndromes, drug testing discovery. As teratoma is chaotic organization tissues three primary germ layers, often multiple tissues, each having complex unpredictable positions,...

10.1109/isbi.2010.5490168 article EN 2010-04-01

Abstract Today, most of the large‐scale conversational AI agents such as Alexa, Siri, or Google Assistant are built using manually annotated data to train different components system including automatic speech recognition (ASR), natural language understanding (NLU), and entity resolution (ER). Typically, accuracy machine learning models in these improved by transcribing annotating data. As scope systems increase cover more scenarios domains, manual annotation improve becomes prohibitively...

10.1609/aaai.12025 article EN cc-by AI Magazine 2021-12-01

Today, most of the large-scale conversational AI agents such as Alexa, Siri, or Google Assistant are built using manually annotated data to train different components system including automatic speech recognition (ASR), natural language understanding (NLU), and entity resolution (ER). Typically, accuracy machine learning models in these improved by transcribing annotating data. As scope systems increase cover more scenarios domains, manual annotation improve becomes prohibitively costly time...

10.1609/aimag.v42i4.15102 article EN AI Magazine 2022-01-12

Query Rewriting (QR) plays a critical role in large-scale dialogue systems for reducing frictions. When there is an entity error, it imposes extra challenges system to produce satisfactory responses. In this work, we propose KG-ECO: Knowledge Graph enhanced Entity COrrection query rewriting, correction with corrupt span detection and retrieval/re-ranking functionalities.To boost the model performance, incorporate (KG) provide structural information (neighboring entities encoded by graph...

10.1109/icassp49357.2023.10096826 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Query rewrite (QR) is an emerging component in conversational AI systems, reducing user defect. User defect caused by various reasons, such as errors the spoken dialogue system, users' slips of tongue or their abridged language. Many defects stem from personalized factors, user's speech pattern, dialect, preferences. In this work, we propose a search-based QR framework, which focuses on automatic reduction We build index for each user, encompasses diverse affinity layers to reflect personal...

10.18653/v1/2021.nlp4convai-1.17 article EN cc-by 2021-01-01

Jie Hao, Yang Liu, Xing Fan, Saurabh Gupta, Saleh Soltan, Rakesh Chada, Pradeep Natarajan, Chenlei Guo, Gokhan Tur. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2022.

10.18653/v1/2022.emnlp-industry.48 article EN cc-by 2022-01-01

In this paper, we develop a robust signal space separation (rSSS) algorithm for real-time magnetoencephalography (MEG) data processing. rSSS is based on the spatial (SSS) method and it applies regression to automatically detect remove bad MEG channels so that results of SSS are not distorted. We extend existing via three important new contributions: 1) low-rank solver efficiently performs matrix operations; 2) subspace iteration scheme selects using low-order spherical harmonic functions; 3)...

10.1109/tbme.2010.2043358 article EN IEEE Transactions on Biomedical Engineering 2010-02-19

Knowledge distillation is typically conducted by training a small model (the student) to mimic large and cumbersome teacher). The idea compress the knowledge from teacher using its output probabilities as soft-labels optimize student. However, when considerably large, there no guarantee that internal of will be transferred into student; even if student closely matches soft-labels, representations may different. This mismatch can undermine generalization capabilities originally intended In...

10.48550/arxiv.1910.03723 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Query rewriting (QR) is an increasingly important component in voice assistant systems to reduce customer friction caused by errors a spoken language understanding pipeline. These originate from various sources such as Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) modules. In this work, we construct user interaction graph their queries using data mined Markov Chain Model [1], introduce self-supervised pre-training process for learning query embeddings leveraging...

10.1109/icassp39728.2021.9413840 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

For voice assistants like Alexa, Google Assistant, and Siri, correctly interpreting users’ intentions is of utmost importance. However, users sometimes experience friction with these assistants, caused by errors from different system components or user such as slips the tongue. Users tend to rephrase their queries until they get a satisfactory response. Rephrase detection used identify rephrases has long been treated task pairwise input, which does not fully utilize contextual information...

10.18653/v1/2021.emnlp-main.143 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Zhongkai Sun, Yingxue Zhou, Jie Hao, Xing Fan, Yanbin Lu, Chengyuan Ma, Wei Shen, Chenlei Guo. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2023.

10.18653/v1/2023.emnlp-industry.41 article EN cc-by 2023-01-01

Spoken language understanding (SLU) systems in conversational AI agents often experience errors the form of misrecognitions by automatic speech recognition (ASR) or semantic gaps natural (NLU). These easily translate to user frustrations, particularly so recurrent events e.g. regularly toggling an appliance, calling a frequent contact, etc. In this work, we propose query rewriting approach leveraging users' historically successful interactions as memory. We present neural retrieval model and...

10.48550/arxiv.2011.04748 preprint EN other-oa arXiv (Cornell University) 2020-01-01

In this paper, an attention selection model with visual memory and online learning is proposed, which has three parts: Sensory Mapping (SM), Cognitive (CM) Motor (MM). CM the novelty of our incorporates learning. order to mimic memory, we put forward Amnesic Incremental Hierachical Discriminant Regression (AIHDR) Tree amnesic function guide deletion redundant information tree. Experimental results show that AIHDR tree better performance in retrieval speed accuracy than IHDR/HDR...

10.1109/ijcnn.2007.4371145 article EN IEEE International Conference on Neural Networks/IEEE ... International Conference on Neural Networks 2007-08-01

Subword tokenization is a commonly used input pre-processing step in most recent NLP models. However, it limits the models’ ability to leverage end-to-end task learning. Its frequency-based vocabulary creation compromises low-resource languages, leading models produce suboptimal representations. Additionally, dependency on fixed subword adaptability across languages and domains. In this work, we propose vocabulary-free neural tokenizer by distilling segmentation information from...

10.18653/v1/2022.repl4nlp-1.10 article EN cc-by 2022-01-01

Niranjan Uma Naresh, Ziyan Jiang, Ankit Ankit, Sungjin Lee, Jie Hao, Xing Fan, Chenlei Guo. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2022.

10.18653/v1/2022.emnlp-industry.7 article EN cc-by 2022-01-01

Text Style Transfer (TST) aims to alter the underlying style of source text another specific while keeping same content. Due scarcity high-quality parallel training data, unsupervised learning has become a trending direction for TST tasks. In this paper, we propose novel VAE based with pivOt Words Enhancement leaRning (VT-STOWER) method which utilizes Variational AutoEncoder (VAE) and external embeddings learn semantics distribution jointly. Additionally, introduce pivot words learning, is...

10.48550/arxiv.2112.03154 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Query rewriting (QR) is an increasingly important technique to reduce customer friction caused by errors in a spoken language understanding pipeline, where the originate from various sources such as speech recognition errors, or entity resolution errors. In this work, we first propose neural-retrieval based approach for query rewriting. Then, inspired wide success of pre-trained contextual embeddings, and also way compensate insufficient QR training data, language-modeling (LM) pre-train...

10.48550/arxiv.2002.05607 preprint EN cc-by-nc-sa arXiv (Cornell University) 2020-01-01
Coming Soon ...