NFDI4DS | UHH-SEMS - Publication Details

Haoyue Shi

ORCID: 0009-0003-8528-5497

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5091670413

Research Areas

Topic Modeling
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Text Readability and Simplification
Anomaly Detection Techniques and Applications
Video Surveillance and Tracking Methods
Advanced Text Analysis Techniques
Text and Document Classification Technologies
Human Pose and Action Recognition
Speech and dialogue systems
Domain Adaptation and Few-Shot Learning
Image Enhancement Techniques
Network Security and Intrusion Detection
Speech Recognition and Synthesis
Artificial Immune Systems Applications
Physical Education and Training Studies
Educational Technology and Pedagogy
Image Retrieval and Classification Techniques
Sports Analytics and Performance
Sports and Physical Education Research
American Sports and Literature
Sentiment Analysis and Opinion Mining
Sports, Gender, and Society
Advanced Image and Video Retrieval Techniques
Advanced Vision and Imaging

Xi'an Jiaotong University
2021-2023

Badan Penelitian dan Pengembangan Kesehatan
2023

Toyota Technological Institute at Chicago
2018-2022

Meta (United States)
2021

University of Washington
2021

Meta (Israel)
2021

Northwest A&F University
2019-2020

Peking University
2016-2018

Chongqing University of Arts and Sciences
2009

Memory-augmented appearance-motion network for video anomaly detection

OPENALEX - Publications

Le Wang Junwen Tian Sanping Zhou Haoyue Shi Gang Hua

10.1016/j.patcog.2023.109335 article EN Pattern Recognition 2023-01-15

Visually Grounded Neural Syntax Acquisition

OPENALEX - Publications

Haoyue Shi Jiayuan Mao Kevin Gimpel Karen Livescu

We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without any explicit supervision. The model learns by looking at natural images reading paired captions. VG-NSL generates constituency parse trees of texts, recursively composes constituents, matches them with images. define concreteness constituents their matching scores images, use it to guide parsing text. Experiments on MSCOCO data set show that outperforms...

10.18653/v1/p19-1180 preprint EN cc-by 2019-01-01

On Tree-Based Neural Sentence Modeling

OPENALEX - Publications

Haoyue Shi Hao Zhou Jiaze Chen Lei Li

Neural networks with tree-based sentence encoders have shown better results on many downstream tasks. Most of existing adopt syntactic parsing trees as the explicit structure prior. To study effectiveness different tree structures, we replace trivial (i.e., binary balanced tree, left-branching and right-branching tree) in encoders. Though contain no information, those get competitive or even all ten tasks investigated. This surprising result indicates that syntax guidance may not be main...

10.18653/v1/d18-1492 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

A Cross-Task Analysis of Text Span Representations

OPENALEX - Publications

Shubham Toshniwal Haoyue Shi Bowen Shi Lingyu Gao Karen Livescu and 1 more

Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution. While extensive research has focused on functional architectures for representing words sentences, there is less work arbitrary spans of text within sentences. In this paper, we conduct a comprehensive empirical evaluation six span representation methods using eight pretrained models across tasks, two that introduce. We find that,...

10.18653/v1/2020.repl4nlp-1.20 article EN 2020-01-01

Substructure Substitution: Structured Data Augmentation for NLP

OPENALEX - Publications

Haoyue Shi Karen Livescu Kevin Gimpel

We study a family of data augmentation methods, substructure substitution (SUB 2 ), that generalizes prior methods.SUB generates new examples by substituting substructures (e.g., subtrees or subsequences) with others having the same label.This idea can be applied to many structured NLP tasks such as part-of-speech tagging and parsing.For more general text classification) which do not have explicitly annotated substructures, we present variations SUB based on spans parse trees, introducing...

10.18653/v1/2021.findings-acl.307 article EN cc-by 2021-01-01

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

OPENALEX - Publications

Kaustubh Dhole Varun Gangal Sebastian Gehrmann Aadesh Gupta Zhenhao Li and 95 more

Data augmentation is an important component in the robustness evaluation of models natural language processing (NLP) and enhancing diversity data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based framework which supports creation both transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 for variety tasks. demonstrate efficacy NL-Augmenter by using several its analyze popular...

10.48550/arxiv.2112.02721 preprint EN cc-by arXiv (Cornell University) 2021-01-01

NL-Augmenter 🦎 → 🐍 A Framework for Task-Sensitive Natural Language Augmentation

OPENALEX - Publications

Kaustubh Dhole Varun Gangal Sebastian Gehrmann Aadesh Gupta Zhenhao Li and 95 more

Data augmentation is an important method for evaluating the robustness of and enhancing diversity training data natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based (NL) framework which supports creation transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 variety NL tasks annotated with noisy descriptive tags. The incorporate noise, intentional accidental human...

10.3384/nejlt.2000-1533.2023.4725 article EN Northern European Journal of Language Technology 2023-04-08

Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment

OPENALEX - Publications

Haoyue Shi Luke Zettlemoyer Sida I. Wang

Haoyue Shi, Luke Zettlemoyer, Sida I. Wang. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.67 article EN cc-by 2021-01-01

Learning Visually-Grounded Semantics from Contrastive Adversarial Samples

OPENALEX - Publications

Haoyue Shi Jiayuan Mao Tete Xiao Yuning Jiang Jian Sun

We study the problem of grounding distributional representations texts on visual domain, namely visual-semantic embeddings (VSE for short). Begin with an insightful adversarial attack VSE embeddings, we show limitation current frameworks and image-text datasets (e.g., MS-COCO) both quantitatively qualitatively. The large gap between number possible constitutions real-world semantics size parallel data, to a extent, restricts model establish link textual concepts. alleviate this by augmenting...

10.48550/arxiv.1806.10348 preprint EN other-oa arXiv (Cornell University) 2018-01-01

On the Role of Supervision in Unsupervised Constituency Parsing

OPENALEX - Publications

Haoyue Shi Karen Livescu Kevin Gimpel

We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the F1 score on Wall Street Journal (WSJ) development set (1,700 sentences). introduce strong baselines for them, by training an existing supervised model (Kitaev and Klein, 2018) same labeled examples they access. When 1,700 examples, or even when using only 50 5 development, such a few-shot approach can outperform all methods significant margin. Few-shot be further improved simple data...

10.18653/v1/2020.emnlp-main.614 article EN cc-by 2020-01-01

Loss functions for pose guided person image generation

OPENALEX - Publications

Haoyue Shi Le Wang Nanning Zheng Gang Hua Wei Tang

10.1016/j.patcog.2021.108351 article EN Pattern Recognition 2021-10-04

Cardiovascular risk prediction method based on test analysis and data mining ensemble system

OPENALEX - Publications

Shan Xu Haoyue Shi Xiaohui Duan Tiangang Zhu Peihua Wu and 1 more

Cardiovascular disease (CVD) is a highly significant contributor to loss of quality and quantity life all over the world. Early detection prediction very important for patients' treatment doctors' diagnose which can help reduce mortality. In this paper, we focus on practical problem Chinese hospital dealing with cardiovascular data make an early risk prediction. To better understand prescription advice in Chinese, basic natural language processing method was used synonym recognition...

10.1109/icbda.2016.7509809 article EN 2016-03-01

Abnormal Ratios Guided Multi-Phase Self-Training for Weakly-Supervised Video Anomaly Detection

OPENALEX - Publications

Haoyue Shi Le Wang Sanping Zhou Gang Hua Wei Tang

Weakly-supervised Video Anomaly Detection (W-VAD) aims to detect abnormal events in videos given only video-level labels for training. Recent methods relying on multiple instance learning (MIL) and self-training achieve good performance, but they tend focus easy patterns while ignoring hard ones, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.</i> , unusual driving trajectory or over-speeding driving. How anomalies is a critical largely...

10.1109/tmm.2023.3336576 article EN IEEE Transactions on Multimedia 2023-11-28

Deep Clustering of Text Representations for Supervision-Free Probing of Syntax

OPENALEX - Publications

Vikram Gupta Haoyue Shi Kevin Gimpel Mrinmaya Sachan

We explore deep clustering of multilingual text representations for unsupervised model interpretation and induction syntax. As these are high-dimensional, out-of-the-box methods like K-means do not work well. Thus, our approach jointly transforms the into a lower-dimensional cluster-friendly space clusters them. consider two notions syntax: Part Speech Induction (POSI) Constituency Labelling (CoLab) in this work. Interestingly, we find that Multilingual BERT (mBERT) contains surprising...

10.1609/aaai.v36i10.21317 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Grammar-Based Grounded Lexicon Learning

OPENALEX - Publications

Jiayuan Mao Haoyue Shi Jia-Jun Wu Roger Lévy Josh Tenenbaum

We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning compositional and grounded meaning representation of language from data, such as paired images texts. At the core G2L2 is collection lexicon entries, which map each word to tuple syntactic type neuro-symbolic semantic program. For example, shiny has adjective; its program symbolic form {\lambda}x. filter(x, SHINY), where concept SHINY associated with neural network embedding, will be used...

10.48550/arxiv.2202.08806 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Real Multi-Sense or Pseudo Multi-Sense: An Approach to Improve Word Representation

OPENALEX - Publications

Haoyue Shi Caihua Li Junfeng Hu

Previous researches have shown that learning multiple representations for polysemous words can improve the performance of word embeddings on many tasks. However, this leads to another problem. Several vectors a may actually point same meaning, namely pseudo multi-sense. In paper, we introduce concept multi-sense, and then propose an algorithm detect such cases. With consideration detected multi-sense cases, try refine existing eliminate influence Moreover, apply our previous released tested...

10.48550/arxiv.1701.01574 preprint EN cc-by arXiv (Cornell University) 2017-01-01

Understanding and Improving Multi-Sense Word Embeddings via Extended Robust Principal Component Analysis

OPENALEX - Publications

Haoyue Shi Yuqi Sun Junfeng Hu

Unsupervised learned representations of polysemous words generate a large pseudo multi senses since unsupervised methods are overly sensitive to contextual variations. In this paper, we address the multi-sense detection for word embeddings by dimensionality reduction sense pairs. We propose novel principal analysis method, termed Ex-RPCA, designed detect both and real senses. With empirically show that generated systematically in method. Moreover, can improved simple linear transformation...

10.48550/arxiv.1803.01255 preprint EN other-oa arXiv (Cornell University) 2018-01-01

On Tree-Based Neural Sentence Modeling

OPENALEX - Publications

Haoyue Shi Hao Zhou Jiaze Chen Lei Li

10.48550/arxiv.1808.09644 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Single-Stream Deep Similarity Learning Tracking

OPENALEX - Publications

Jifeng Ning Haoyue Shi Jing Ni Yangchen Fu

The deep similarity tracking via two-stream or multiple-stream network architectures has drawn great attention due to its strong capability of extracting discriminative feature with balanced accuracy and speed. However, these networks need a careful data pairing processing are usually difficult be updated for online visual tracking. In this paper, we propose simple effective extractor Single-Stream Deep Similarity learning Tracking, defined by SSDST. Different from the popular architecture,...

10.1109/access.2019.2939367 article EN cc-by IEEE Access 2019-01-01

Coming Soon ...