Baocai Yin

ORCID: 0000-0002-8125-4648
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Computer Graphics and Visualization Techniques
  • 3D Shape Modeling and Analysis
  • Advanced Vision and Imaging
  • Human Pose and Action Recognition
  • Human Motion and Animation
  • Hand Gesture Recognition Systems
  • Video Surveillance and Tracking Methods
  • Traffic Prediction and Management Techniques
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Robotics and Sensor-Based Localization
  • Advanced Numerical Analysis Techniques
  • Evacuation and Crowd Dynamics
  • Anomaly Detection Techniques and Applications
  • Digital Image Processing Techniques
  • Image Retrieval and Classification Techniques
  • Visual Attention and Saliency Detection
  • Face recognition and analysis
  • Advanced Graph Neural Networks
  • Image and Video Stabilization
  • Topic Modeling
  • Simulation and Modeling Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image Processing Techniques

Beijing University of Technology
2009-2024

Dalian University of Technology
2016-2021

Dalian University
2016-2021

National Engineering Research Center for Information Technology in Agriculture
2010

Beijing Polytechnic
1998-2003

10.1007/s00371-022-02485-3 article EN The Visual Computer 2022-04-23

Weakly supervised crowd counting involves the regression of number individuals present in an image, using only total as label. However, this task is plagued by two primary challenges: large variation head size and uneven distribution density. To address these issues, we propose a novel Hypergraph Association Crowd Counting (HACC) framework. Our approach consists new multi-scale dilated pyramid module that can efficiently handle size. Further, hypergraph association to solve problem density...

10.1145/3594670 article EN ACM Transactions on Multimedia Computing Communications and Applications 2023-04-26

Most existing weakly supervised crowd counting methods utilize Convolutional Neural Networks (CNN) or Transformer to estimate the total number of individuals in an image. However, both CNN-based (grid-to-count paradigm) and Transformer-based (sequence-to-count take images as inputs a regular form. This approach treats all pixels equally but cannot address uneven distribution problem within human crowds. challenge would lead decline performance model. Compared with grid sequence, graph...

10.1145/3638774 article EN ACM Transactions on Multimedia Computing Communications and Applications 2023-12-27

Abstract Crowd counting provides an important foundation for public security and urban management. Due to the existence of small targets large density variations in crowd images, is a challenging task. Mainstream methods usually apply convolution neural networks (CNNs) regress map, which requires annotations individual persons counts. Weakly-supervised can avoid detailed labeling only require counts as but existing fail achieve satisfactory performance because global perspective field...

10.1007/s41095-022-0313-5 article EN cc-by Computational Visual Media 2023-04-02

Although deep networks based methods outperform traditional 3D reconstruction which require multiocular images or class labels to recover the full geometry, they may produce incomplete recovery and unfaithful when facing occluded parts of objects. To address these issues, we propose Depth-preserving Latent Generative Adversarial Network (DLGAN) consists Encoder-Decoder GAN (EDGAN, serving as a generator discriminator) Extreme Learning Machine (ELM, classifier) for from monocular depth image...

10.1109/tmm.2020.3017924 article EN IEEE Transactions on Multimedia 2020-08-24

10.1631/fitee.2000068 article EN Frontiers of Information Technology & Electronic Engineering 2021-05-01

We propose two-dimensional pose estimation from a single range image of the human body, using sparse regression with componentwise clustering feature point representation (CCFPR) model. CCFPR includes primary points and secondary points. The consist torso center five extremal further serve to classify all body pixels as six components. are given by cluster centers each components other than torso, K-means cluster. is obtained learning projection matrix, which maps skeleton based on...

10.1109/tmm.2016.2556859 article EN IEEE Transactions on Multimedia 2016-04-20

Pose tracking from range image sequences remains a difficult task due to strong noise and serious self-occlusion of human body. Existing work either rely on extremely large precisely annotated datasets, or accurate mesh model GPU acceleration. In this paper, we propose an unsupervised real-time framework pose sequences. Our consists visible hybrid (VHM), componentwise correspondence optimization (CCO) dynamic database lookup (DDL). VHM component sphere sets spherical point which exhibits...

10.1109/tmm.2019.2953380 article EN IEEE Transactions on Multimedia 2019-11-13

Recovering the geometry of an object from a single depth image is interesting yet challenging problem. While previous learning based approaches have demonstrated promising performance, they don’t fully explore spatial relationships objects, which leads to unfaithful and incomplete 3D reconstruction. To address these issues, we propose Spatial Relationship Preserving Adversarial Network (SRPAN) consisting Capsule Attention Generative (3DCAGAN) 2D (2DGAN) for coarse-to-fine reconstruction view...

10.1145/3506733 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-03-04

Human poses admit complicated articulations and multigranular similarity. Previous works on learning human pose metric utilize sparse models, which concentrate large weights highly similar fail to depict an overall structure of with Moreover, previous require a number similar/dissimilar annotated pairwise poses, is tedious task remains inaccurate due different subjective judgments experts. Motivated by graph-based neighbor assignment techniques, we propose unsupervised model called sparsity...

10.1109/tmm.2018.2859029 article EN IEEE Transactions on Multimedia 2018-07-26

Abstract Traffic forecasting is an important part in realising intelligent traffic management, which helps controllers and travellers make effective decisions. However, accuracy often affected by missing data due to hardware software failure. Therefore, accurate prediction based on incomplete problem as well a challenge. Though many approaches recover the values before prediction, errors from data‐filling step are likely cause additional bias result. Besides, this tactic difficult guarantee...

10.1049/itr2.12200 article EN cc-by-nc-nd IET Intelligent Transport Systems 2022-05-06

Abstract This paper presents a Chinese Sign Language Markup (CSLML), which is developed for expressive sign language synthesis by introducing features and structure of prosody. The tags CSLML are divided into two levels: function level phonetic level. Function provides abstract information about signed content prosody, so it facilitates text annotating text‐driven automatic synthetic system adapts to diversified methods, such as motion capture animation or image‐based synthesis, may not be...

10.1002/cav.307 article EN Computer Animation and Virtual Worlds 2009-06-01

Image inpainting is a challenging task in image processing and widely applied many areas such as photo editing. Traditional patch-based methods are not effective to deal with complex or non-repetitive structures. Recently, deep learning-based approaches have shown promising results for inpainting. However, they usually generate contents artificial boundaries, distorted structures blurry textures. To handle this problem, we propose novel method based on wavelet transform attention model...

10.1109/iscas45731.2020.9180927 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2020-09-29

In order to show the realistic 3D mesh in geometry image-based compression, addition coding image, normal-map image is usually required code. But are difficult compress because it captures more details of original mesh, and has less spatial correlation between pixels than image. This paper proposes a novel framework solve this problem, we effectively predict based on also utilize strong among three components improve predicting accuracy. only need code residual which generated from its...

10.1109/pcs.2012.6213304 article EN Picture Coding Symposium 2012-05-01
Coming Soon ...