- Multimodal Machine Learning Applications
- Robot Manipulation and Learning
- Advanced Image Processing Techniques
- Face recognition and analysis
- Soft Robotics and Applications
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Hand Gesture Recognition Systems
- IoT and Edge/Fog Computing
- Image and Signal Denoising Methods
- Advanced Vision and Imaging
- Traditional Chinese Medicine Studies
- Advanced Image and Video Retrieval Techniques
- Image Processing Techniques and Applications
- Facial Nerve Paralysis Treatment and Research
- Natural Language Processing Techniques
- Autonomous Vehicle Technology and Safety
- Remote Sensing and Land Use
- Face Recognition and Perception
- Molecular Communication and Nanonetworks
- Biomedical Text Mining and Ontologies
- Cryptography and Data Security
- Context-Aware Activity Recognition Systems
- Robotic Mechanisms and Dynamics
- Emotion and Mood Recognition
Ankang University
2024
Fudan University
2024
Lanzhou University of Technology
2023
University of Science and Technology of China
2022
North China Electric Power University
2021
Group Sense (China)
2016-2020
The Sense Innovation and Research Center
2018
Tencent (China)
2017
Deep Convolutional Neural Networks (CNNs) achieve substantial improvements in face detection the wild. Classical CNN-based methods simply stack successive layers of filters where an input sample should pass through all before reaching a face/non-face decision. Inspired by fact that for detection, deeper can discriminate between difficult samples while those shallower efficiently reject simple non-face samples, we propose Inside Cascaded Structure introduces classifiers at different within...
Exploiting relationships between visual regions and question words have achieved great success in learning multi-modality features for Visual Question Answering (VQA). However, we argue that existing methods mostly model relations individual words, which are not enough to correctly answer the question. From humans' perspective, answering a requires understanding summarizations of language information. In this paper, proposed Multi-modality Latent Interaction module (MLI) tackle problem. The...
Recent advancements in Large Language Models (LLMs) and Vision-Language (VLMs) have made them powerful tools embodied navigation, enabling agents to leverage commonsense spatial reasoning for efficient exploration unfamiliar environments. Existing LLM-based approaches convert global memory, such as semantic or topological maps, into language descriptions guide navigation. While this improves efficiency reduces redundant exploration, the loss of geometric information language-based...
Data-driven approach for grasping shows significant advance recently. But these approaches usually require much training data. To increase the efficiency of data collection, this paper presents a novel grasp system including whole pipeline from collection to model inference. The can collect effective sample with corrective strategy assisted by antipodal rule, and we design an affordance interpreter network predict pixelwise map. We define graspability, ungraspability background as...
Exploiting relationships between visual regions and question words have achieved great success in learning multi-modality features for Visual Question Answering (VQA). However, we argue that existing methods mostly model relations individual words, which are not enough to correctly answer the question. From humans' perspective, answering a requires understanding summarizations of language information. In this paper, proposed Multi-modality Latent Interaction module (MLI) tackle problem. The...
Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of to deal the object scale variations and leverage context. How perform multi-scale context aggregation within budget important. In this paper, firstly, we introduce a novel efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It lightweight cas-caded structure for Convolutional Neural Networks (CNNs) efficiently information. On other...
What is a proper representation for objects in manipulation? would human try to perceive when manipulating new object environment? In fact, instead of focusing on the texture and illumination, can infer "affordance" [36] from vision. Here describes object's intrinsic property that affords particular type manipulation. this work, we investigate whether such affordance be learned by deep neural network. particular, propose an Affordance Space Perception Network (ASPN) takes image as input...
Instance grasping is a challenging robotic task when robot aims to grasp specified target object in cluttered scenes. In this paper, we propose novel end-to-end instance method using only monocular workspace and query images, where the image includes several objects contains object. To effectively extract discriminative features facilitate training process, learning-based method, referred as Constraint Co-Attention Network (CCAN), proposed which consists of constraint co-attention module...
Interpersonal relation defines the association, e.g., warm, friendliness, and dominance, between two or more people. Motivated by psychological studies, we investigate if such fine-grained high-level traits can be characterized quantified from face images in wild. We address this challenging problem first studying a deep network architecture for robust recognition of facial expressions. Unlike existing models that typically learn expression labels alone, devise an effective multitask is...
Learning based robotic grasping methods achieve substantial progress with the development of deep neural networks. However, requirement large-scale training data in real world limits application scopes these methods. Given 3D models target objects, we propose a new learning-based approach built on 6D object poses estimation from monocular RGB image. We aim to leverage both synthesized pose dataset and small scale real-world weakly labeled (e.g., mark number objects image), reduce system...
Vision-based robotic manipulation with deep learning method has achieved substantial advances in the field of automatic agriculture, which can be deployed and applied picking, sorting transporting agricultural products so on. Deep reinforcement (DRL) is one learning-methods that help robot learn policy itself by exploration exploitation. Training real robots DRL would take a great price limits its application scope. Some approaches train simulation deploy model to transferring images...
Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information. However, previous approaches largely ignore recovery. This paper proposes Super-Identity Convolutional Neural Network (SICNN) recover information for generating faces closed real identity. Specifically, we define super-identity loss measure difference between hallucinated and its corresponding high-resolution within hypersphere...
Data-driven approach for grasping shows significant advance recently. But these approaches usually require much training data. To increase the efficiency of data collection, this paper presents a novel grasp system including whole pipeline from collection to model inference. The can collect effective sample with corrective strategy assisted by antipodal rule, and we design an affordance interpreter network predict pixelwise map. We define graspability, ungraspability background as...
Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of to deal the object scale variations and leverage context. How perform multi-scale context aggregation within budget important. In this paper, firstly, we introduce a novel efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It lightweight cascaded structure for Convolutional Neural Networks (CNNs) efficiently information. On other...
Learning-based robot arm grasping approach attracts increasing interests recently. The algorithm needs to accurately locate the point and angle. Existing methods usually require large amount of training data from physical robotic trial or synthetic samples simulation. system can show promising result with pre-defined objects, but performance may degrade for novel objects without annotation. Inspired by fact that we have a set pre-collected external source, only small quantity target...
Migrane is a common, chronic multifactorial disorders syndrome with multi-nervous system and non-nervous disorder.The pathogens of migrane are still unclear, mind, diet, endocrine, heredity have been considered to be attributed it.Pathogenesis therapeutic explored constantly.Now, an effective migraine modern medicine based on non-pharmacological treatment as well acute preventive medication.Traditional Chinese has developed day by day, which will establish new directions for migrainous...
A cognitive robot usually needs to perform multiple tasks in practice and locate the desired area for each task. Since deep learning has achieved substantial progress image recognition, solve this detection problem, it is straightforward label a functional (affordance) dataset apply well-trained deep-model-based classifier on all potential regions. However, annotating time consuming requirement of large amount training data limits application scope. We observe that are related surrounding...