- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Natural Language Processing Techniques
- Advanced Sensor and Control Systems
- Topic Modeling
- Advanced Image Fusion Techniques
- Image Enhancement Techniques
- Domain Adaptation and Few-Shot Learning
- Industrial Technology and Control Systems
- Phonetics and Phonology Research
- Cryptographic Implementations and Security
- Advanced Decision-Making Techniques
- Coding theory and cryptography
- Music and Audio Processing
- Advanced Image Processing Techniques
- Advanced Algorithms and Applications
- Speech Recognition and Synthesis
- Speech and dialogue systems
- Chaos-based Image/Signal Encryption
- Image and Signal Denoising Methods
- Error Correcting Code Techniques
- Video Analysis and Summarization
- Speech and Audio Processing
- Elevator Systems and Control
Zhejiang University
2024
Beihang University
2020-2024
Institute of Art
2021-2023
Chongqing University
2005-2020
National Ilan University
2018-2019
Shenzhen Institutes of Advanced Technology
2010
Chinese University of Hong Kong
2010
The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate and localise referred remote object according high-level language instruction. Different from related VLN tasks, the key REVERIE conduct goal-oriented exploration instead of strict instruction-following, due lack step-by-step navigation guidance. In this paper, we propose novel Cross-modality Knowledge Reasoning (CKR) model address unique challenges task. CKR, based on...
Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions. For agent, inferring long-term target from visual-linguistic clues crucial for reliable path planning, which, however, has rarely been studied before literature. In this article, we propose a Target-Driven Structured Transformer Planner (TD-STP) long-horizon goal-guided and room layout-aware navigation. Specifically, devise Imaginary Scene Tokenization mechanism...
Vision-and-language Navigation (VLN) task requires an embodied agent to navigate a remote location following natural language instruction. Previous methods usually adopt sequence model (e.g., Transformer and LSTM) as the navigator. In such paradigm, predicts action at each step through maintained navigation state, which is generally represented one-dimensional vector. However, crucial clues (i.e., object-level environment layout) for discarded since vector essentially unstructured. this...
Given a high-level instruction, the task of Embodied Referring Expression (REVERIE) requires an embodied agent to localise remote referred object via navigating in unseen environment. Previous vision-language navigation methods utilise provided fine-grained instruction as step-by-step guidance conduct strict instruction-following, while REVERIE aims achieve efficient goal-oriented exploration according command. In this work, we propose Cross-modal Knowledge Reasoning (abbreviated CKR+)...
Audio-visual navigation is an audio-targeted wayfinding task where a robot agent entailed to travel never-before-seen 3D environment towards the sounding source. In this article, we present ORAN, omnidirectional audiovisual navigator based on cross-task skill transfer. particular, ORAN sharpens its two basic abilities for such challenging task, namely and information gathering. First, trained with confidence-aware policy distillation (CCPD) strategy. CCPD transfers fundamental,...
In this paper, the simplified step-by-step Reed-Solomon (RS) decoding algorithm was adopted in work to reduce hardware complexity and power consumption. According advantages of RS algorithm, we propose three-parallel architecture speed up rate without need for recursively solving error-location polynomial performing Forney algorithm. Finally, using TSMC 90nm technology, proposed design has a working frequency 286MHz. The gate counts throughput chip core are approximately 131K gates 6.8Gb/s...
This paper discussed the evaluation problem of image scrambling degree (ISD). Inspired by method texture characteristics, three new metrics for assessing objectively ISD were proposed. The first utilized performance energy concentration Walsh transformation (WT), which took into account properties that a good measurement should be contented. second used angular moment (ASM) gray level co-occurrence matrix (GLCM). third combined entropy GLCM with characteristic. Experimental results show...
IELS, which abbreviates Interactive English Learning System, is a computer assisted pronunciation training (CAPT) system for Chinese learners of whose mother language Mandarin. The provides instant feedback mispronunciations phoneme, word, lexical stress, and score the student's overall quality. employs client-server architecture, in client friendly interface audio I/O function, server takes charge speech processing, including HMM-based recognition, SVM-based stress detection scoring. IELS...
In this paper, we propose a peer group and hybrid vector filter for removal of impulse noise in color images. The proposed employs two rounds detections the image restoration. Through processing filter, noises images can be efficiently removed, but preserving edge. When density random-value is equal to 10%, peak signal-to-noise ratio (PSNR) 31 dB. Although PSNR restoration result not as expected when more than 70%, it still recognizable by humans compared with other existing filters....
Sometimes dark channel prior for single image dehazing algorithms may fail in sky areas and suffer from the darkness residual artifacts. In this paper, we have proposed a new method to solve these problems which based on fusion confidence. Firstly, original pixel-wise is used obtain channel, secondly use confidence correct transmittance of area. Experimental results show that can obviously eliminate color cast distortion phenomenon regions details o f restored become more distinct after removing
Vision-and-language Navigation (VLN) task requires an embodied agent to navigate a remote location following natural language instruction. Previous methods usually adopt sequence model (e.g., Transformer and LSTM) as the navigator. In such paradigm, predicts action at each step through maintained navigation state, which is generally represented one-dimensional vector. However, crucial clues (i.e., object-level environment layout) for discarded since vector essentially unstructured. this...
Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions. For agent, inferring long-term target from visual-linguistic clues crucial for reliable path planning, which, however, has rarely been studied before literature. In this article, we propose a Target-Driven Structured Transformer Planner (TD-STP) long-horizon goal-guided and room layout-aware navigation. Specifically, devise Imaginary Scene Tokenization mechanism...