Fan Ma

ORCID: 0000-0002-4131-1222
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Video Analysis and Summarization
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Image Retrieval and Classification Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Smart Agriculture and AI
  • Video Surveillance and Tracking Methods
  • Functional Brain Connectivity Studies
  • Computer Graphics and Visualization Techniques
  • Human Motion and Animation
  • Anomaly Detection Techniques and Applications
  • Handwritten Text Recognition Techniques
  • Maritime Ports and Logistics
  • 3D Shape Modeling and Analysis
  • Machine Learning and Data Classification
  • Advanced Text Analysis Techniques
  • Evaluation Methods in Various Fields
  • Robotics and Sensor-Based Localization
  • Diabetic Foot Ulcer Assessment and Management
  • Land Use and Ecosystem Services
  • Impact of Light on Environment and Health

Zhejiang University
2024

Huazhong University of Science and Technology
2012-2024

Zhejiang University of Science and Technology
2024

Hefei Institutes of Physical Science
2022-2023

Chinese Academy of Sciences
2022-2023

University of Technology Sydney
2018-2022

Xi'an Jiaotong University
2017

In this paper, we study object detection using a large pool of unlabeled images and only few labeled per category, named "few-example detection". The key challenge consists in generating trustworthy training samples as many possible from the pool. Using examples seeds, our method iterates between model high-confidence sample selection. training, easy are generated first and, then poorly initialized undergoes improvement. As becomes more discriminative, challenging but reliable selected....

10.1109/tpami.2018.2844853 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-06-07

As an important area in computer vision, object tracking has formed two separate communities that respectively study Single Object Tracking (SOT) and Multiple (MOT). However, current methods one scenario are not easily adapted to the other due divergent training datasets objects of both tasks. Although UniTrack [45] demonstrates a shared appearance model with multiple heads can be used tackle individual tasks, it fails exploit large-scale for performs poorly on single tracking. In this work,...

10.1109/cvpr52688.2022.00858 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

10.1109/cvpr52733.2024.00651 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Reconstructing perceived images from human brain activity forms a crucial link between and machine learning through Brain-Computer Interfaces. Early methods primarily focused on training separate models for each individual to account variability in activity, overlooking valuable cross-subject commonalities. Recent advancements have explored multisubject methods, but these approaches face significant challenges, particularly data privacy effectively managing variability. To overcome we...

10.48550/arxiv.2501.14309 preprint EN arXiv (Cornell University) 2025-01-24

Reconstructing perceived images from human brain activity forms a crucial link between and machine learning through Brain-Computer Interfaces. Early methods primarily focused on training separate models for each individual to account variability in activity, overlooking valuable cross-subject commonalities. Recent advancements have explored multisubject methods, but these approaches face significant challenges, particularly data privacy effectively managing variability. To overcome we...

10.1609/aaai.v39i13.33579 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

A major challenge that arises in Weakly Supervised Object Detection (WSOD) is only image-level labels are available, whereas WSOD trains instance-level object detectors. typical approach to 1) generate a series of region proposals for each image and assign the label all image; 2) train classifier using proposals; 3) use select with high confidence scores as positive instances another round training. In this way, iteratively transferred labels.

10.1145/3123266.3123455 article EN Proceedings of the 30th ACM International Conference on Multimedia 2017-10-19

Actor and action video segmentation with language queries aims to segment out the expression referred objects in video. This process requires comprehensive reasoning fine-grained understanding. Previous methods mainly leverage dynamic convolutional networks match visual semantic representations. However, convolution neglects spatial context when processing each region frame is thus challenging similar complex scenarios. To address such limitation, we construct a modulated network....

10.1609/aaai.v34i07.6895 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Although deep neural networks have been proved effective in many applications, they are data hungry, and training models often requires laboriously labeled data. However, when contain erroneous labels, lead to model performance degradation. A common solution is assign each sample with a dynamic weight during optimization, the adjusted accordance loss. those weights usually unreliable since measured by losses of corrupted labels. Thus, this scheme might impede discriminative ability trained...

10.1109/tnnls.2021.3073248 article EN IEEE Transactions on Neural Networks and Learning Systems 2021-05-07

10.1109/cvpr52733.2024.00030 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

10.1109/cvpr52733.2024.01413 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

10.1007/s11263-022-01600-0 article EN International Journal of Computer Vision 2022-03-19

Inland all-electric ship is receiving more and attention due to its advantages in terms of low carbon, energy conservation, flexibility operation mode. However, extensive promotion application are held back by various problems such as long charging time, high cost supplement, environmental factors, etc. In addition, the complexity electric fleet also poses a challenge optimal operation. Hence, an voyage scheduling model inland fleet, with departure time speed each decision variables, firstly...

10.1109/acpee60788.2024.10532657 article EN 2022 7th Asia Conference on Power and Electrical Engineering (ACPEE) 2024-04-11

CLIP-based models have made significant advancements in text-to-image retrieval tasks. However, these are typically trained on public datasets with optimizing all parameters, which limits their ability to generalize and adapt quickly personalized private datasets. In this paper, we introduce a lightweight federated learning solution, namely <u>Fed</u>erated <u>P</u>ersonalized <u>A</u>ugmentation <u>M</u>odel (<u>FedPAM</u>), achieve from multiple database. Specifically, for the query text,...

10.1145/3652583.3657627 article EN 2024-05-30

10.1109/cvpr52733.2024.02545 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Local climate zone (LCZ) classification system provides standard urban morphological for heat island studies and weather modelling. Based on the definition of LCZ, various semi-supervised approaches have been proposed to generate LCZ maps different cities using available satellite data. Given that acquisition training data is labor intensive, it practical develop new models are suitable any without need data/samples. In this study, a novel domain-adaptation co-training approach with...

10.1109/igarss.2017.8127175 article EN 2017-07-01

Recent advances in large video-language models have displayed promising outcomes video comprehension. Current approaches straightforwardly convert into language tokens and employ for multi-modal tasks. However, this method often leads to the generation of irrelevant content, commonly known as "hallucination", length text increases impact diminishes. To address problem, we propose Vista-LLaMA, a novel framework that maintains consistent distance between all visual any tokens, irrespective...

10.48550/arxiv.2312.08870 preprint EN cc-by arXiv (Cornell University) 2023-01-01
Coming Soon ...