Yi Zheng

ORCID: 0000-0001-5547-0956
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Face recognition and analysis
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Handwritten Text Recognition Techniques
  • Image and Signal Denoising Methods
  • Advanced Data Compression Techniques
  • Gait Recognition and Analysis
  • Image Processing and 3D Reconstruction
  • Geographic Information Systems Studies
  • Natural Language Processing Techniques
  • Advanced Computational Techniques and Applications
  • Robotics and Sensor-Based Localization
  • Vehicle License Plate Recognition
  • Multimodal Machine Learning Applications
  • Energy Efficient Wireless Sensor Networks
  • Underwater Vehicles and Communication Systems
  • Neuroscience and Neural Engineering
  • Underwater Acoustics Research
  • Image Retrieval and Classification Techniques
  • Medical Image Segmentation Techniques
  • Biometric Identification and Security
  • Face and Expression Recognition
  • Photoreceptor and optogenetics research
  • Video Coding and Compression Technologies

National Administration of Surveying, Mapping and Geoinformation of China
2024-2025

Tsinghua University
2025

Fudan University
2020-2023

Shandong Institute of Business and Technology
2016-2022

China University of Mining and Technology
2019-2022

Zhejiang University
2021-2022

Ministry of Education of the People's Republic of China
2021

Boston University
2021

iQIYI (China)
2019-2020

Institute of Oceanographic Instrumentation
2018-2019

Adaptive person re-identification (adaptive ReID) targets at transferring learned knowledge from the labeled source domain to unlabeled target domain. Pseudo-label-based methods that alternatively generate pseudo labels and optimize training model have demonstrated great effectiveness in this field. However, generated are inaccurate cannot reflect true semantic meaning of samples. We consider such inaccuracy stems both lagged update as well simple criterion employed clustering method. To...

10.1109/iccv48922.2021.00826 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

National fundamental geo-entity construction has been vigorously promoted by the Ministry of Natural Resources People's Republic China. As one core products, it is a digital abstract expression real world in formof2 dimensions, which difficult to fully meet actual needs analysis, calculation and use 3-dimensional space. This paper designs process key technologies for collecting elevation height information geo-entities upgrade them from 2D 3D form. elaborates on implementation such as...

10.1117/12.3044940 article EN 2025-01-16

The increasing demand for electricity and the imperatives of climate change have made optimization power system planning critical energy transition grid efficiency. This study presents an innovative method inter-regional AC-DC hybrid systems, leveraging Classification Regression Tree (CART) algorithm to optimize operational characteristics direct current (DC) channels. By designing a closed-loop iteration, precise constraints are considered by CART algorithm, which immerged into model...

10.3390/en18040783 article EN cc-by Energies 2025-02-07

Recent years have witnessed increasing attention in cartoon media, powered by the strong demands of industrial applications. As first step to understand this face recognition is a crucial but less-explored task with few datasets proposed. In work, we present new challenging benchmark dataset, consisting 389,678 images 5,013 characters annotated identity, bounding box, pose, and other auxiliary attributes. The named iCartoonFace, currently largest-scale, high-quality, rich-annotated, spanning...

10.1145/3394171.3413726 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Recent neural models for video captioning usually employ an attention-based encoder-decoder framework. However, current approaches mainly attend to the motion features and object of when generating caption, but ignore potential useful historical information. Besides, exposure bias vanishing gradients problems always exist in caption generation models. In this paper, we propose a novel framework, named Stacked Multimodal Attention Network (SMAN). It adopts additional visual textual...

10.1109/tcsvt.2021.3058626 article EN IEEE Transactions on Circuits and Systems for Video Technology 2021-12-23

Person identification in the wild is very challenging due to great variation poses, face quality, clothes, makeup and so on. Traditional research, such as recognition, person re-identification, speaker often focuses on a single modal of information, which inadequate handle all situations practice. Multi-modal more promising way that we can jointly utilize face, head, body, audio features, In this paper, introduce iQIYI-VID, largest video dataset for multi-modal identification. It composed...

10.48550/arxiv.1811.07548 preprint EN other-oa arXiv (Cornell University) 2018-01-01

In person re-identification (Re-ID) , the data annotation cost of supervised learning, is huge and it cannot adapt well to complex situations. Therefore, compared with deep learning methods, unsupervised methods are more in line actual needs. a key solving Re-ID find standard that can effectively distinguish difference (distance) between features images belonging different pedestrian identities. However, there some differences captured by cameras (such as brightness, angle, etc.). It known...

10.1145/3501404 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-03-15

A new method for speeding up the integer wavelet transforms constructed by lifting scheme is proposed. The proposed packs multiple pixels (wavelet coefficients) in a single word; therefore, it can make use of 32-bit or 64-bit computational capability modern computers to accomplish addition/subtraction operations one instruction cycle. As result, our save decomposition/reconstruction time 37% on machines and require much less working memory comparison with original transform algorithms.

10.1109/76.889059 article EN IEEE Transactions on Circuits and Systems for Video Technology 2000-01-01

We held the iQIYI Celebrity Video Identification Challenge in ACMMULTIMEDIA 2019. The purpose was to encourage research on video-based person identification. released iQIYI-VID-2019 dataset, which contains 200K videos of 10K celebrities. In this paper, we introduce organization challenge, evaluation process, and results.

10.1145/3343031.3356081 article EN Proceedings of the 30th ACM International Conference on Multimedia 2019-10-15

The Lifting Scheme provides us a flexible and easy way for constructing wavelet, which enables to construct wavelet according application needs. Its flexibility allows introduce nonlinearity into modify it based on signal processed. Adaptive provided the of adjusting filters via updater U predictor P in lifting stages characters. But lackness self-learning ability existing is big shortcoming. In this paper, BP neural networks introduced scheme. It used replace respectively. Experiment shows...

10.1109/icacia.2010.5709909 article EN International Conference on Apperceiving Computing and Intelligence Analysis Proceeding 2010-12-01

Understanding the meaning of text in images natural scenes like highway signs or store front emblems is particularly challenging if foreshortened image letters are artistically distorted. We introduce a pipeline-based spotting framework that can both detect and recognize various fonts, shapes, orientations scene with complicated backgrounds. The main contribution our work detection component, which we call UHT, short for UNet, Heatmap, Textfill. UHT uses UNet to compute heatmaps candidate...

10.1109/cvprw50498.2020.00278 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

The Birge-Massart threshold wavelet methods were used to denoise lightning transient electrical signals, compared with the traditional de-noising such as Smoothing filter and FIR digital low-pass filter. Means square error (MSE) Magnitude (ME) of simulation signals calculated compare effect. double exponential-decay pulse Gaussian white-noise composed simulating signals. measured also de-nosing methods. results data ones proved that method was better than ones.

10.1109/fskd.2012.6234045 article EN 2012-05-01

Efficient rendezvous node selection and routing algorithm (RNSRA) for wireless sensor networks with mobile sink that visits to gather data from nodes is proposed.In order plan an optimal moving tour avoid energy hole problem, we develop the RNSRA find (RN) visit.The can select set of RNs act as store points sink, search multi-hop path between source node, so could information periodically.Fitness function several factors calculated suitable nodes, artificial bee colony optimization (ABC)...

10.3837/tiis.2018.10.007 article EN KSII Transactions on Internet and Information Systems 2018-10-31

State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word in images of natural scenes and ignore the semantic coherence within a region text. However, when interpreted together, seemingly may be easier recognize. On this basis, we propose novel "semantic-based recognition" (STR) deep learning model that reads with help understanding context. STR consists several modules. We introduce Text Grouping Arranging (TGA) algorithm connect order regions. A...

10.48550/arxiv.1908.01403 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Abstract In recent years, cross‐modal retrieval has been a popular research topic in both fields of computer vision and natural language processing. There is huge semantic gap between different modalities on account heterogeneous properties. How to establish the correlation among modality data faces enormous challenges. this work, we propose novel end‐to‐end framework named Dual Multi‐Angle Self‐Attention (DMASA) for retrieval. Multiple self‐attention mechanisms are applied extract...

10.1002/asi.24373 article EN Journal of the Association for Information Science and Technology 2020-07-16

Scene text recognition is the task of recognizing character sequences in images natural scenes. The considerable diversity appearance a scene image and potentially highly complex backgrounds make challenging. Previous approaches employ sequence generators to analyze regions and, subsequently, compare candidate against language model. In this work, we propose bimodal framework that simultaneously utilizes visual linguistic information enhance performance. Our linguistically aware learning...

10.1145/3394171.3413913 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Bronze inscription is one of the earliest well-established writing systems dating back to Shang dynasty in China. The Recognition character recognition plays an important role identification and interpretation which traditionally a tough challenging task. To deal with class imbalance training data bronze recognition, we propose method based on few-shot learning. process consists three stages. In first stage, model pretrained large-scale dataset novel negative margin loss. second weights...

10.1117/12.2627100 article EN 2022-02-15
Coming Soon ...