Fan Wang

ORCID: 0000-0001-7320-1119
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Particle physics theoretical and experimental studies
  • Quantum Chromodynamics and Particle Interactions
  • High-Energy Particle Collisions Research
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Multimodal Machine Learning Applications
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Advanced Image Processing Techniques
  • Neutrino Physics Research
  • Advanced Vision and Imaging
  • Visual Attention and Saliency Detection
  • Gait Recognition and Analysis
  • Computer Graphics and Visualization Techniques
  • Robot Manipulation and Learning
  • Digital Media and Visual Art
  • Image Processing Techniques and Applications
  • Robotics and Sensor-Based Localization
  • 3D Shape Modeling and Analysis
  • Advanced Fiber Optic Sensors
  • Energy Load and Power Forecasting
  • Image Enhancement Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Smart Grid and Power Systems

University of South China
2022-2025

Alibaba Group (United States)
2020-2024

China Southern Power Grid (China)
2023-2024

Amazon (United States)
2023-2024

Alibaba Group (China)
2021-2024

University of Chinese Academy of Sciences
2024

Institute of Biophysics
2024

Chinese Academy of Sciences
2019-2024

Dalian University of Technology
2024

Shandong Normal University
2024

Extracting robust feature representation is one of the key challenges in object re-identification (ReID). Although convolution neural network (CNN)-based methods have achieved great success, they only process local neighborhood at a time and suffer from information loss on details caused by downsampling operators (e.g. pooling strided convolution). To overcome these limitations, we propose pure transformer-based ReID framework named TransReID. Specifically, first encode an image as sequence...

10.1109/iccv48922.2021.01474 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Locating 3D objects from a single RGB image via Perspective-n-Points (PnP) is long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as differentiable layer, so that 2D-3D point correspondences can be partly learned backpropagating the gradient w.r.t. object pose. Yet, learning entire set of unrestricted points scratch fails to converge with existing approaches, since deterministic pose inherently non-differentiable. In this...

10.1109/cvpr52688.2022.00280 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Existing semantic segmentation works have been mainly focused on designing effective decoders; however, the computational load introduced by overall structure has long ignored, which hinders their applications resource-constrained hardwares. In this paper, we propose a head-free lightweight architecture specifically for segmentation, named Adaptive Frequency Transformer (AFFormer). AFFormer adopts parallel to leverage prototype representations as specific learnable local descriptions...

10.1609/aaai.v37i1.25126 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

To build a high-quality open-domain chatbot, we introduce the effective training process of PLATO-2 via curriculum learning.There are two stages involved in learning process.In first stage, coarse-grained generation model is trained to learn response under simplified framework oneto-one mapping.In second finegrained generative augmented with latent variables and an evaluation further generate diverse responses select best response, respectively.PLATO-2 was on both Chinese English data, whose...

10.18653/v1/2021.findings-acl.222 article EN cc-by 2021-01-01

Person Re-identification (ReID) plays a more and crucial role in recent years with wide range of applications. Existing ReID methods are suffering from the challenges misalignment occlusions, which degrade performance dramatically. Most tackle such by utilizing external tools to locate body parts or exploiting matching strategies. Nevertheless, inevitable domain gap between datasets utilized for complicated process make these unreliable sensitive noises. In this paper, we propose Region...

10.1109/tifs.2023.3318956 article EN IEEE Transactions on Information Forensics and Security 2023-09-25

Locating 3D objects from a single RGB image via Perspective-n-Point (PnP) is long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as differentiable layer, allowing for partial learning of 2D-3D point correspondences backpropagating the gradients pose loss. Yet, entire scratch highly challenging, particularly ambiguous solutions, where globally optimal theoretically non-differentiable w.r.t. points. In this paper, we propose...

10.1109/tpami.2024.3354997 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-01-01

We developed a large multiplexing capacity dense ultra-short (DUS)-FBG array for high spatial resolution distributed sensing applications. With identical central wavelength and low peak reflectivity (−40 dB), all the FBGs share short length (1 mm) extremely small spacing (500 μm). The lower crosstalk of DUS-FBG is investigated through both simulation experiment. Use interrogated by optical frequency domain reflectometry (OFDR) temperature non-uniform strain was conducted. demonstrated over...

10.1364/oe.25.028112 article EN cc-by Optics Express 2017-10-30

Over the past two decades, traditional block-based video coding has made remarkable progress and spawned a series of well-known standards such as MPEG-4, H.264/AVC H.265/HEVC. On other hand, deep neural networks (DNNs) have shown their powerful capacity for visual content understanding, feature extraction compact representation. Some previous works explored learnt algorithms in an end-to-end manner, which show great potential compared with methods. In this paper, we propose framework (NVC),...

10.1109/tcsvt.2020.3035680 article EN IEEE Transactions on Circuits and Systems for Video Technology 2020-11-03

Extracting robust feature representation is one of the key challenges in object re-identification (ReID). Although convolution neural network (CNN)-based methods have achieved great success, they only process local neighborhood at a time and suffer from information loss on details caused by downsampling operators (e.g. pooling strided convolution). To overcome these limitations, we propose pure transformer-based ReID framework named TransReID. Specifically, first encode an image as sequence...

10.48550/arxiv.2102.04378 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Vision transformers (ViTs) have been an alternative design paradigm to convolutional neural networks (CNNs). However, the training of ViTs is much harder than CNNs, as it sensitive parameters, such learning rate, optimizer and warmup epoch. The reasons for difficulty are empirically analysed in paper Early Convolutions Help Transformers See Better, authors conjecture that issue lies with patchify-stem ViT models. In this paper, we further investigate problem extend above conclusion: only...

10.1609/aaai.v36i3.20150 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely score distillation to leverage the priors supervise 3D models, e.g., NeRF. However, is prone suffer view inconsistency problem, and implicit NeRF modeling can also lead an arbitrary shape, thus leading less realistic uncontrollable generation. In this work, we propose a flexible framework Points-to-3D bridge gap between...

10.1145/3581783.3612232 article EN 2023-10-26

The quadratic computational complexity to the number of tokens limits practical applications Vision Transformers (ViTs). Several works propose prune redundant achieve efficient ViTs. However, these methods generally suffer from (i) dramatic accuracy drops, (ii) application difficulty in local vision transformer, and (iii) non-general-purpose networks for downstream tasks. In this work, we a novel Semantic Token ViT (STViT), global transformers, which can also be revised serve as backbone...

10.1109/cvpr52729.2023.00600 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Image-text retrieval is a central problem for understanding the semantic relationship between vision and language, serves as basis various visual language tasks. Most previous works either simply learn coarse-grained representations of overall image text, or elaborately establish correspondence regions pixels text words. However, close relations coarse- fine-grained each modality are important image-text but almost neglected. As result, such inevitably suffer from low accuracy heavy...

10.1109/tip.2023.3286710 article EN IEEE Transactions on Image Processing 2023-01-01

10.1109/tcsvt.2025.3543384 article EN IEEE Transactions on Circuits and Systems for Video Technology 2025-01-01

This paper explores the application of Orff music concept in mass entertainment. is an artistic form that holistic, natural and closely connected to life. Its elemental nature encompasses multiple elements, breaking excessive emphasis on traditional musical elements returning expression perception. When applied entertainment experience, it has advantages being simple operate highly improvisational, enabling people different age groups participate easily enhancing interactive cooperation....

10.32996/jhsss.2025.7.4.4 article EN Journal of Humanities and Social Sciences Studies 2025-04-06

With the advancement of geo-systems and increased availability satellite data, a plethora Land-Use Land-Cover (LULC) products have been developed. The existing LULC primarily relied on time-series imagery to classify land by pixel-based classifiers, allowing for local analysis accurate boundary detection. However, advent deep learning has shifted towards use patch-based CNN models generating cover maps. In this paper, (1) we create training dataset China using voting strategy based three...

10.1109/tgrs.2023.3285912 article EN IEEE Transactions on Geoscience and Remote Sensing 2023-01-01

Current research on cross-modal retrieval is mostly English-oriented, as the availability of a large number English-oriented human-labeled vision-language corpora. In order to break limit non-English labeled data, cross-lingual (CCR) has attracted increasing attention. Most CCR methods construct pseudo-parallel corpora via Machine Translation (MT) achieve transfer. However, translated sentences from MT are generally imperfect in describing corresponding visual contents. Improperly assuming...

10.1109/tip.2024.3365248 article EN IEEE Transactions on Image Processing 2024-01-01

Video object detection (VOD) has been a rising topic in recent years due to the challenges such as occlusion, motion blur, etc. To deal with these challenges, feature aggregation from local or global support frames is verified effective. exploit better aggregation, this paper, we propose two improvements over previous works: class-constrained spatial-temporal relation network and correlation-based alignment module. For class constrained network, it operates on region proposals, learns kinds...

10.1145/3394171.3413927 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Instance segmentation is an accurate and reliable method to segment adhesive pigs’ images, critical for providing health welfare information on individual pigs, such as body condition score, live weight, activity behaviors in group-housed pig environments. In this paper, a PigMS R-CNN framework based mask scoring (MS R-CNN) explored areas group-pig separate the identification location of pigs. The consists three processes. First, residual network 101-layers, combined with feature pyramid...

10.3390/s21093251 article EN cc-by Sensors 2021-05-07

Motion recognition is a promising direction in computer vision, but the training of video classification models much harder than images due to insufficient data and considerable parameters. To get around this, some works strive explore multimodal cues from RGB-D data. Although improving motion extent, these methods still face sub-optimal situations following aspects: (i) Data augmentation, i.e., scale datasets limited, few efforts have been made novel augmentation strategies for videos; (ii)...

10.1109/tpami.2023.3274783 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-05-11

This paper introduces Amazon Robotic Manipulation Benchmark (ARMBench), a large-scale, object-centric benchmark dataset for robotic manipulation in the context of warehouse. Automation operations modern warehouses requires manipulator to deal with wide variety objects, unstructured storage, and dynamically changing inventory. Such settings pose challenges perceiving identity, physical characteristics, state objects during manipulation. Existing datasets consider limited set or utilize 3D...

10.1109/icra48891.2023.10160846 article EN 2023-05-29
Coming Soon ...