Luting Wang

ORCID: 0000-0001-8317-226X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Natural Language Processing Techniques
  • Topic Modeling
  • Visual Attention and Saliency Detection
  • Video Surveillance and Tracking Methods
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Neural Networks and Applications
  • Text Readability and Simplification
  • Human Pose and Action Recognition
  • Voice and Speech Disorders
  • Speech Recognition and Synthesis
  • Speech and Audio Processing

Beihang University
2021-2024

Institute of Art
2021-2023

SYSU-CMU International Joint Research Institute
2017

Sun Yat-sen University
2017

The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate and localise referred remote object according high-level language instruction. Different from related VLN tasks, the key REVERIE conduct goal-oriented exploration instead of strict instruction-following, due lack step-by-step navigation guidance. In this paper, we propose novel Cross-modality Knowledge Reasoning (CKR) model address unique challenges task. CKR, based on...

10.1109/cvpr46437.2021.00308 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Open-vocabulary object detection aims to provide detectors trained on a fixed set of categories with the generalizability detect objects described by arbitrary text queries. Previous methods adopt knowledge distillation extract from Pretrained Vision-and-Language Models (PVLMs) and transfer it detectors. However, due non-adaptive proposal cropping single-level feature mimicking processes, they suffer information destruction during extraction inefficient transfer. To remedy these limitations,...

10.1109/cvpr52729.2023.01076 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Given a high-level instruction, the task of Embodied Referring Expression (REVERIE) requires an embodied agent to localise remote referred object via navigating in unseen environment. Previous vision-language navigation methods utilise provided fine-grained instruction as step-by-step guidance conduct strict instruction-following, while REVERIE aims achieve efficient goal-oriented exploration according command. In this work, we propose Cross-modal Knowledge Reasoning (abbreviated CKR+)...

10.1109/tpami.2023.3326851 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-10-23

Electrolarynx (EL) is a speaking-aid device that helps laryngectomees who have their larynx removed to generate voice. However, the voice generated by EL unnatural and unintelligible due its flat pitch strong vibration noise. Targeting these challenges, previous works show electrolaryngeal speech can be enhanced using Gaussian Mixture Model (GMM) based conversion (VC). Although effective in improving naturalness, it degrades intelligibility of converted speech. To address this issue, we...

10.1109/apsipa.2017.8282244 article EN 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2017-12-01

Existing methods enhance open-vocabulary object detection by leveraging the robust recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where category names CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with open vocabulary knowledge biased categories during transfer from VLMs to detectors.To address these challenges, we propose...

10.48550/arxiv.2407.11335 preprint EN arXiv (Cornell University) 2024-07-15

Transformers have revolutionized the object detection landscape by introducing DETRs, acclaimed for their simplicity and efficacy. Despite advantages, substantial size of these models poses significant challenges practical deployment, particularly in resource-constrained environments. This paper addresses challenge compressing DETR leveraging knowledge distillation, a technique that holds promise maintaining model performance while reducing size. A critical aspect DETRs' is reliance on...

10.48550/arxiv.2409.06443 preprint EN arXiv (Cornell University) 2024-09-10

Open-vocabulary object detection aims to provide detectors trained on a fixed set of categories with the generalizability detect objects described by arbitrary text queries. Previous methods adopt knowledge distillation extract from Pretrained Vision-and-Language Models (PVLMs) and transfer it detectors. However, due non-adaptive proposal cropping single-level feature mimicking processes, they suffer information destruction during extraction inefficient transfer. To remedy these limitations,...

10.48550/arxiv.2303.05892 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Conventional knowledge distillation (KD) methods for object detection mainly concentrate on homogeneous teacher-student detectors. However, the design of a lightweight detector deployment is often significantly different from high-capacity detector. Thus, we investigate KD among heterogeneous pairs wide application. We observe that core difficulty (hetero-KD) significant semantic gap between backbone features detectors due to optimization manners. (homo-KD) suffer such and are hard directly...

10.48550/arxiv.2207.05345 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...