- Topic Modeling
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- Text and Document Classification Technologies
- Sentiment Analysis and Opinion Mining
- Human Pose and Action Recognition
- Domain Adaptation and Few-Shot Learning
- Advanced Text Analysis Techniques
- Multimodal Machine Learning Applications
Tsinghua University
2023
Hebei University of Science and Technology
2023
Video Grounding (VG) aims to locate the desired segment from a video given sentence query. Recent studies have found that current VG models are prone over-rely groundtruth moment annotation distribution biases in training set. To discourage standard model's behavior of exploiting such temporal and improve model generalization ability, we propose multiple negative augmentations hierarchical way, including cross-video clip-/video-level, self-shuffled with masks. These can effectively diversify...
New intent discovery is of great value to natural language processing, allowing for a better understanding user needs and providing friendly services. However, most existing methods struggle capture the complicated semantics discrete text representations when limited or no prior knowledge labeled data available. To tackle this problem, we propose novel clustering framework, USNID, <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u</b>...
Discovering the semantics of multimodal utterances is essential for understanding human language and enhancing human-machine interactions. Existing methods manifest limitations in leveraging nonverbal information discerning complex unsupervised scenarios. This paper introduces a novel clustering method (UMC), making pioneering contribution to this field. UMC unique approach constructing augmentation views data, which are then used perform pre-training establish well-initialized...