Zhanpeng Zhang

ORCID: 0000-0002-1709-4176
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Robot Manipulation and Learning
  • Advanced Image Processing Techniques
  • Face recognition and analysis
  • Soft Robotics and Applications
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Hand Gesture Recognition Systems
  • IoT and Edge/Fog Computing
  • Image and Signal Denoising Methods
  • Advanced Vision and Imaging
  • Traditional Chinese Medicine Studies
  • Advanced Image and Video Retrieval Techniques
  • Image Processing Techniques and Applications
  • Facial Nerve Paralysis Treatment and Research
  • Natural Language Processing Techniques
  • Autonomous Vehicle Technology and Safety
  • Remote Sensing and Land Use
  • Face Recognition and Perception
  • Molecular Communication and Nanonetworks
  • Biomedical Text Mining and Ontologies
  • Cryptography and Data Security
  • Context-Aware Activity Recognition Systems
  • Robotic Mechanisms and Dynamics
  • Emotion and Mood Recognition

Ankang University
2024

Fudan University
2024

Lanzhou University of Technology
2023

University of Science and Technology of China
2022

North China Electric Power University
2021

Group Sense (China)
2016-2020

The Sense Innovation and Research Center
2018

Tencent (China)
2017

Deep Convolutional Neural Networks (CNNs) achieve substantial improvements in face detection the wild. Classical CNN-based methods simply stack successive layers of filters where an input sample should pass through all before reaching a face/non-face decision. Inspired by fact that for detection, deeper can discriminate between difficult samples while those shallower efficiently reject simple non-face samples, we propose Inside Cascaded Structure introduces classifiers at different within...

10.1109/iccv.2017.344 article EN 2017-10-01

Exploiting relationships between visual regions and question words have achieved great success in learning multi-modality features for Visual Question Answering (VQA). However, we argue that existing methods mostly model relations individual words, which are not enough to correctly answer the question. From humans' perspective, answering a requires understanding summarizations of language information. In this paper, proposed Multi-modality Latent Interaction module (MLI) tackle problem. The...

10.1109/iccv.2019.00592 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Recent advancements in Large Language Models (LLMs) and Vision-Language (VLMs) have made them powerful tools embodied navigation, enabling agents to leverage commonsense spatial reasoning for efficient exploration unfamiliar environments. Existing LLM-based approaches convert global memory, such as semantic or topological maps, into language descriptions guide navigation. While this improves efficiency reduces redundant exploration, the loss of geometric information language-based...

10.48550/arxiv.2502.14254 preprint EN arXiv (Cornell University) 2025-02-19

Data-driven approach for grasping shows significant advance recently. But these approaches usually require much training data. To increase the efficiency of data collection, this paper presents a novel grasp system including whole pipeline from collection to model inference. The can collect effective sample with corrective strategy assisted by antipodal rule, and we design an affordance interpreter network predict pixelwise map. We define graspability, ungraspability background as...

10.1109/icra.2019.8793912 article EN 2022 International Conference on Robotics and Automation (ICRA) 2019-05-01

Exploiting relationships between visual regions and question words have achieved great success in learning multi-modality features for Visual Question Answering (VQA). However, we argue that existing methods mostly model relations individual words, which are not enough to correctly answer the question. From humans' perspective, answering a requires understanding summarizations of language information. In this paper, proposed Multi-modality Latent Interaction module (MLI) tackle problem. The...

10.48550/arxiv.1908.04289 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of to deal the object scale variations and leverage context. How perform multi-scale context aggregation within budget important. In this paper, firstly, we introduce a novel efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It lightweight cas-caded structure for Convolutional Neural Networks (CNNs) efficiently information. On other...

10.1109/icra40945.2020.9196599 article EN 2020-05-01

What is a proper representation for objects in manipulation? would human try to perceive when manipulating new object environment? In fact, instead of focusing on the texture and illumination, can infer "affordance" [36] from vision. Here describes object's intrinsic property that affords particular type manipulation. this work, we investigate whether such affordance be learned by deep neural network. particular, propose an Affordance Space Perception Network (ASPN) takes image as input...

10.1109/icra40945.2020.9196783 article EN 2020-05-01

Instance grasping is a challenging robotic task when robot aims to grasp specified target object in cluttered scenes. In this paper, we propose novel end-to-end instance method using only monocular workspace and query images, where the image includes several objects contains object. To effectively extract discriminative features facilitate training process, learning-based method, referred as Constraint Co-Attention Network (CCAN), proposed which consists of constraint co-attention module...

10.1109/icra40945.2020.9197182 article EN 2020-05-01

Interpersonal relation defines the association, e.g., warm, friendliness, and dominance, between two or more people. Motivated by psychological studies, we investigate if such fine-grained high-level traits can be characterized quantified from face images in wild. We address this challenging problem first studying a deep network architecture for robust recognition of facial expressions. Unlike existing models that typically learn expression labels alone, devise an effective multitask is...

10.48550/arxiv.1609.06426 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Learning based robotic grasping methods achieve substantial progress with the development of deep neural networks. However, requirement large-scale training data in real world limits application scopes these methods. Given 3D models target objects, we propose a new learning-based approach built on 6D object poses estimation from monocular RGB image. We aim to leverage both synthesized pose dataset and small scale real-world weakly labeled (e.g., mark number objects image), reduce system...

10.1145/3284398.3284408 article EN 2018-11-29

Vision-based robotic manipulation with deep learning method has achieved substantial advances in the field of automatic agriculture, which can be deployed and applied picking, sorting transporting agricultural products so on. Deep reinforcement (DRL) is one learning-methods that help robot learn policy itself by exploration exploitation. Training real robots DRL would take a great price limits its application scope. Some approaches train simulation deploy model to transferring images...

10.33440/j.ijpaa.20200302.77 article EN International Journal of Precision Agricultural Aviation 2018-01-01

Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information. However, previous approaches largely ignore recovery. This paper proposes Super-Identity Convolutional Neural Network (SICNN) recover information for generating faces closed real identity. Specifically, we define super-identity loss measure difference between hallucinated and its corresponding high-resolution within hypersphere...

10.48550/arxiv.1811.02328 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Data-driven approach for grasping shows significant advance recently. But these approaches usually require much training data. To increase the efficiency of data collection, this paper presents a novel grasp system including whole pipeline from collection to model inference. The can collect effective sample with corrective strategy assisted by antipodal rule, and we design an affordance interpreter network predict pixelwise map. We define graspability, ungraspability background as...

10.48550/arxiv.1902.06554 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of to deal the object scale variations and leverage context. How perform multi-scale context aggregation within budget important. In this paper, firstly, we introduce a novel efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It lightweight cascaded structure for Convolutional Neural Networks (CNNs) efficiently information. On other...

10.48550/arxiv.2003.03913 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Learning-based robot arm grasping approach attracts increasing interests recently. The algorithm needs to accurately locate the point and angle. Existing methods usually require large amount of training data from physical robotic trial or synthetic samples simulation. system can show promising result with pre-defined objects, but performance may degrade for novel objects without annotation. Inspired by fact that we have a set pre-collected external source, only small quantity target...

10.1109/rcar47638.2019.9043954 article EN 2022 IEEE International Conference on Real-time Computing and Robotics (RCAR) 2019-08-01

Migrane is a common, chronic multifactorial disorders syndrome with multi-nervous system and non-nervous disorder.The pathogens of migrane are still unclear, mind, diet, endocrine, heredity have been considered to be attributed it.Pathogenesis therapeutic explored constantly.Now, an effective migraine modern medicine based on non-pharmacological treatment as well acute preventive medication.Traditional Chinese has developed day by day, which will establish new directions for migrainous...

10.2991/msetasse-16.2016.382 article EN cc-by-nc 2016-01-01

A cognitive robot usually needs to perform multiple tasks in practice and locate the desired area for each task. Since deep learning has achieved substantial progress image recognition, solve this detection problem, it is straightforward label a functional (affordance) dataset apply well-trained deep-model-based classifier on all potential regions. However, annotating time consuming requirement of large amount training data limits application scope. We observe that are related surrounding...

10.1109/icra.2018.8460590 article EN 2018-05-01

10.11925/infotech.2096-3467.2020.1211 article EN Shuju fenxi yu zhishi faxian 2021-05-27
Coming Soon ...