- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Speech and Audio Processing
- Underwater Vehicles and Communication Systems
- Human Pose and Action Recognition
- Indoor and Outdoor Localization Technologies
- Private Equity and Venture Capital
- Engineering Diagnostics and Reliability
- Quantum Computing Algorithms and Architecture
- Face recognition and analysis
- Quantum many-body systems
- Quantum Information and Cryptography
- Robotic Path Planning Algorithms
- Video Analysis and Summarization
- Advanced Image Processing Techniques
- Digital Media Forensic Detection
- Fault Detection and Control Systems
- Emotion and Mood Recognition
- Corporate Finance and Governance
- Image and Signal Denoising Methods
- Millimeter-Wave Propagation and Modeling
- Speech and dialogue systems
- Phonetics and Phonology Research
- Robotic Mechanisms and Dynamics
Xi'an University of Technology
2024
Westlake University
2018-2024
Zhejiang University
2007-2024
National University of Defense Technology
2023-2024
Zhejiang University of Science and Technology
2021
Chengdu University of Information Technology
2020
Sichuan University
2020
Dalian Ocean University
2020
Tongji University
2012
Vision-and-language navigation (VLN) is a challenging task that requires an agent to navigate in real-world environments by understanding natural language instructions and visual information received real time. Prior works have implemented VLN tasks on continuous or physical robots, all of which use fixed-camera configuration due the limitations datasets, such as 1.5-m height, 90° horizontal field view (HFOV), so on. However, real-life robots with different purposes multiple camera...
Prior works in vision-and-language navigation (VLN) focus on using long short-term memory (LSTM) to carry the flow of information either model (navigator) or instruction generating (speaker).The outstanding capability LSTM process intermodal interactions has been widely verified; however, neglects intramodel interactions, leading negative effect navigator speaker. The performance attention-based Transformer is satisfactory sequence-to-sequence translation domains, but structure implemented...
The Vision-and-Language Navigation in Continuous Environments (VLN-CE) task requires an agent to follow a language instruction realistic environment. Understanding the environment is crucial, yet current methods are relatively simple and direct, without delving into interplay between instructions visual context. Therefore, we propose novel representation. First, Environment Representation Graph (ERG) through object detection express semantic level. Then, relational representations of...
In this paper, a real WiFi fingerprint based indoor localization system is considered for experiments, including three primary components: the APP in smart phone, server and embedded algorithm. As we all know, one of main drawbacks labor intensity time consumption data collection. This paper proposes an improved graph-based semi-supervised learning (I-GSSL) to better overcome problem. Apart from taking advantage propagation model, I-GSSL algorithm proposed handle existing out-of-sample...
The emergence of generative adversarial network (GAN) promotes the great progress deep learning generation model. In this paper, is used to remove visual artifact compressed video, and a perception enhancement algorithm for HEVC video proposed. Specifically, after compression, reconstructed image output by GAN generator. can effectively guide discriminator approximate mapping between encoded frame original frame. loss generator keep mapping, which not only improves quality but also removes...
This paper conducts a set of perceptual experiments together with an F 0 acoustic analysis on emotional speech Mandarin Chinese.Sixty utterances spoken by two actors in five basic emotions (i.e.happiness, fear, anger, sadness, and boredom) as well the neutral style served stimuli.The showed following results: (1) For Chinese, rates identification for differ significantly, ranking Sadness > Happiness Anger Fear Boredom.(2) Perceptual confusion occurs mainly between fear boredom anger.There is...
In this paper, a real WiFi fingerprint-based indoor localization system is considered, where three primary components including the APP in smart phone, server and embedded algorithm, have been designed. This paper proposes dedicated data preprocessing algorithm to solve singular-collection problem. Furthermore, issue of access point (AP) missing discussed theoretical analysis presented under condition two-AP scenario. Finally, because unequal amount location information contained received...
<title>Abstract</title> The bearing fault is one of the primary factors affecting safe and stable running mechanical systems. To guarantee normal reliable entire equipment, it crucial to promptly accurately monitor operating conditions bearings. Conventional diagnosis methods usually depend upon assumption that training test data are consistently distributed independent. However, this premise poses challenges resolution issues for changeable conditions. tackle aforementioned problem, a novel...
With the increasing demands of applications in virtual reality such as 3D films, Human-Machine Interactions and agents, analysis human face is considered to be more important a fundamental step for those tasks. Due information provided by an additional dimension, facial reconstruction enables aforementioned tasks achieved with higher accuracy than based on 2D analysis. The denser model is, it could provide. However, most existing dense methods require complicated processing high system cost....
Abstract For the problem of robot path planning under action external force field, directionality outward can promote and hinder movement robot. The continuous method break through traditional algorithm based on grid search. Limit, effectively utilize field existing in space, improve numerical implementation to ensure feasibility planned strong field. Based Level Set method, is defined. particle tracking transformed into evolution curve. optimal time obtained by solving Hamilton-Jacques...
Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a navigation task that requires an agent to follow language instruction realistic environment. The understanding of environments crucial part the VLN-CE task, but existing methods are relatively simple and direct environment, without delving into relationship between instructions visual environments. Therefore, we propose new environment representation order solve above problems. First, Environment Representation Graph...
The vision-and-language navigation (VLN) task requires the agent to navigate by following both natural language instructions and vision. Prior works focus on improving model or adopting data augmentation enhance accuracy efficiency of navigation. However, few attempts have been made optimize training process. In this paper, we propose a cluster-based curriculum learning (CCL) method schedule appropriate curricula acquire more rational Our CCL dynamically segments tasks in dataset into three...