- Video Surveillance and Tracking Methods
- Face recognition and analysis
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Handwritten Text Recognition Techniques
- Image and Signal Denoising Methods
- Advanced Data Compression Techniques
- Gait Recognition and Analysis
- Image Processing and 3D Reconstruction
- Geographic Information Systems Studies
- Natural Language Processing Techniques
- Advanced Computational Techniques and Applications
- Robotics and Sensor-Based Localization
- Vehicle License Plate Recognition
- Multimodal Machine Learning Applications
- Energy Efficient Wireless Sensor Networks
- Underwater Vehicles and Communication Systems
- Neuroscience and Neural Engineering
- Underwater Acoustics Research
- Image Retrieval and Classification Techniques
- Medical Image Segmentation Techniques
- Biometric Identification and Security
- Face and Expression Recognition
- Photoreceptor and optogenetics research
- Video Coding and Compression Technologies
National Administration of Surveying, Mapping and Geoinformation of China
2024-2025
Tsinghua University
2025
Fudan University
2020-2023
Shandong Institute of Business and Technology
2016-2022
China University of Mining and Technology
2019-2022
Zhejiang University
2021-2022
Ministry of Education of the People's Republic of China
2021
Boston University
2021
iQIYI (China)
2019-2020
Institute of Oceanographic Instrumentation
2018-2019
Adaptive person re-identification (adaptive ReID) targets at transferring learned knowledge from the labeled source domain to unlabeled target domain. Pseudo-label-based methods that alternatively generate pseudo labels and optimize training model have demonstrated great effectiveness in this field. However, generated are inaccurate cannot reflect true semantic meaning of samples. We consider such inaccuracy stems both lagged update as well simple criterion employed clustering method. To...
National fundamental geo-entity construction has been vigorously promoted by the Ministry of Natural Resources People's Republic China. As one core products, it is a digital abstract expression real world in formof2 dimensions, which difficult to fully meet actual needs analysis, calculation and use 3-dimensional space. This paper designs process key technologies for collecting elevation height information geo-entities upgrade them from 2D 3D form. elaborates on implementation such as...
The increasing demand for electricity and the imperatives of climate change have made optimization power system planning critical energy transition grid efficiency. This study presents an innovative method inter-regional AC-DC hybrid systems, leveraging Classification Regression Tree (CART) algorithm to optimize operational characteristics direct current (DC) channels. By designing a closed-loop iteration, precise constraints are considered by CART algorithm, which immerged into model...
Recent years have witnessed increasing attention in cartoon media, powered by the strong demands of industrial applications. As first step to understand this face recognition is a crucial but less-explored task with few datasets proposed. In work, we present new challenging benchmark dataset, consisting 389,678 images 5,013 characters annotated identity, bounding box, pose, and other auxiliary attributes. The named iCartoonFace, currently largest-scale, high-quality, rich-annotated, spanning...
Recent neural models for video captioning usually employ an attention-based encoder-decoder framework. However, current approaches mainly attend to the motion features and object of when generating caption, but ignore potential useful historical information. Besides, exposure bias vanishing gradients problems always exist in caption generation models. In this paper, we propose a novel framework, named Stacked Multimodal Attention Network (SMAN). It adopts additional visual textual...
Person identification in the wild is very challenging due to great variation poses, face quality, clothes, makeup and so on. Traditional research, such as recognition, person re-identification, speaker often focuses on a single modal of information, which inadequate handle all situations practice. Multi-modal more promising way that we can jointly utilize face, head, body, audio features, In this paper, introduce iQIYI-VID, largest video dataset for multi-modal identification. It composed...
In person re-identification (Re-ID) , the data annotation cost of supervised learning, is huge and it cannot adapt well to complex situations. Therefore, compared with deep learning methods, unsupervised methods are more in line actual needs. a key solving Re-ID find standard that can effectively distinguish difference (distance) between features images belonging different pedestrian identities. However, there some differences captured by cameras (such as brightness, angle, etc.). It known...
A new method for speeding up the integer wavelet transforms constructed by lifting scheme is proposed. The proposed packs multiple pixels (wavelet coefficients) in a single word; therefore, it can make use of 32-bit or 64-bit computational capability modern computers to accomplish addition/subtraction operations one instruction cycle. As result, our save decomposition/reconstruction time 37% on machines and require much less working memory comparison with original transform algorithms.
We held the iQIYI Celebrity Video Identification Challenge in ACMMULTIMEDIA 2019. The purpose was to encourage research on video-based person identification. released iQIYI-VID-2019 dataset, which contains 200K videos of 10K celebrities. In this paper, we introduce organization challenge, evaluation process, and results.
The Lifting Scheme provides us a flexible and easy way for constructing wavelet, which enables to construct wavelet according application needs. Its flexibility allows introduce nonlinearity into modify it based on signal processed. Adaptive provided the of adjusting filters via updater U predictor P in lifting stages characters. But lackness self-learning ability existing is big shortcoming. In this paper, BP neural networks introduced scheme. It used replace respectively. Experiment shows...
Understanding the meaning of text in images natural scenes like highway signs or store front emblems is particularly challenging if foreshortened image letters are artistically distorted. We introduce a pipeline-based spotting framework that can both detect and recognize various fonts, shapes, orientations scene with complicated backgrounds. The main contribution our work detection component, which we call UHT, short for UNet, Heatmap, Textfill. UHT uses UNet to compute heatmaps candidate...
The Birge-Massart threshold wavelet methods were used to denoise lightning transient electrical signals, compared with the traditional de-noising such as Smoothing filter and FIR digital low-pass filter. Means square error (MSE) Magnitude (ME) of simulation signals calculated compare effect. double exponential-decay pulse Gaussian white-noise composed simulating signals. measured also de-nosing methods. results data ones proved that method was better than ones.
Efficient rendezvous node selection and routing algorithm (RNSRA) for wireless sensor networks with mobile sink that visits to gather data from nodes is proposed.In order plan an optimal moving tour avoid energy hole problem, we develop the RNSRA find (RN) visit.The can select set of RNs act as store points sink, search multi-hop path between source node, so could information periodically.Fitness function several factors calculated suitable nodes, artificial bee colony optimization (ABC)...
State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word in images of natural scenes and ignore the semantic coherence within a region text. However, when interpreted together, seemingly may be easier recognize. On this basis, we propose novel "semantic-based recognition" (STR) deep learning model that reads with help understanding context. STR consists several modules. We introduce Text Grouping Arranging (TGA) algorithm connect order regions. A...
Abstract In recent years, cross‐modal retrieval has been a popular research topic in both fields of computer vision and natural language processing. There is huge semantic gap between different modalities on account heterogeneous properties. How to establish the correlation among modality data faces enormous challenges. this work, we propose novel end‐to‐end framework named Dual Multi‐Angle Self‐Attention (DMASA) for retrieval. Multiple self‐attention mechanisms are applied extract...
Scene text recognition is the task of recognizing character sequences in images natural scenes. The considerable diversity appearance a scene image and potentially highly complex backgrounds make challenging. Previous approaches employ sequence generators to analyze regions and, subsequently, compare candidate against language model. In this work, we propose bimodal framework that simultaneously utilizes visual linguistic information enhance performance. Our linguistically aware learning...
Bronze inscription is one of the earliest well-established writing systems dating back to Shang dynasty in China. The Recognition character recognition plays an important role identification and interpretation which traditionally a tough challenging task. To deal with class imbalance training data bronze recognition, we propose method based on few-shot learning. process consists three stages. In first stage, model pretrained large-scale dataset novel negative margin loss. second weights...