Siming Fu

ORCID: 0000-0003-3257-1011
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Generative Adversarial Networks and Image Synthesis
  • COVID-19 diagnosis using AI
  • Medical Image Segmentation Techniques
  • Computer Graphics and Visualization Techniques
  • Image Retrieval and Classification Techniques
  • Network Packet Processing and Optimization
  • 3D Shape Modeling and Analysis
  • Anomaly Detection Techniques and Applications
  • Advanced Image Processing Techniques
  • Image Processing and 3D Reconstruction
  • Machine Learning and ELM
  • Visual Attention and Saliency Detection
  • Remote Sensing and LiDAR Applications
  • Geophysical Methods and Applications
  • International Business and FDI
  • Genomics and Phylogenetic Studies
  • International Arbitration and Investment Law
  • Image and Video Quality Assessment
  • Topic Modeling
  • 3D Surveying and Cultural Heritage
  • Video Surveillance and Tracking Methods
  • Imbalanced Data Classification Techniques

Zhejiang University
2021-2024

State Key Laboratory of Clean Energy Utilization
2021

Change detection in remote sensing imagery is a critical technique for Earth observation, primarily focusing on pixel-level segmentation of change regions between bi-temporal images. The essence lies determining whether corresponding pixels images have changed. In deep learning, the spatial and channel dimensions feature maps represent different information from original this study, we found that tasks, difference can be computed not only dimension features but also dimension. Therefore,...

10.48550/arxiv.2501.10905 preprint EN arXiv (Cornell University) 2025-01-18

Customized generation has achieved significant progress in image synthesis, yet personalized video remains challenging due to temporal inconsistencies and quality degradation. In this paper, we introduce CustomVideoX, an innovative framework leveraging the diffusion transformer for from a reference image. CustomVideoX capitalizes on pre-trained networks by exclusively training LoRA parameters extract features, ensuring both efficiency adaptability. To facilitate seamless interaction between...

10.48550/arxiv.2502.06527 preprint EN arXiv (Cornell University) 2025-02-10

Auto-regressive models have made significant progress in the realm of text-to-image synthesis, yet devising an appropriate model architecture and training strategy to achieve a satisfactory level remains important avenue exploration. In this work, we introduce MARS, novel framework for T2I generation that incorporates specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic visual...

10.1609/aaai.v39i16.33882 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent propose a finetuning-free approach with decoupled cross-attention mechanism to personalized requiring no test-time finetuning. However, when multiple are provided, current encounters object confusion problem and fails map each image its corresponding object, thereby seriously limiting scope of application. To address problem, in this...

10.1609/aaai.v39i4.32386 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Existing image semantic segmentation methods favor learning consistent representations by extracting long-range contextual features with the attention, multi-scale, or graph aggregation strategies. These usually treat misclassified and correctly classified pixels equally, hence misleading optimization process causing inconsistent intra-class pixel feature in embedding space during learning. In this paper, we propose auxiliary representation calibration head (RCH), which consists of...

10.1609/aaai.v36i3.20145 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the data distribution can corrupt representation space, where distance between head and tail categories is much larger than two categories. This uneven feature space causes model to exhibit unclear inseparable decision boundaries on uniformly distributed test set, which lowers its performance. To address these challenges, we propose category...

10.1145/3581783.3611904 article EN 2023-10-26

Training AI models has always been challenging, especially when there is a need for custom to provide personalized services. Algorithm engineers often face lengthy process iteratively develop tailored specific business requirements, making it even more difficult non-experts. The quest high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, become key focus in industry. Leveraging powerful analytical, planning, decision-making capabilities...

10.48550/arxiv.2311.06622 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Though diffusion models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies, they don't generalize well on long-tailed datasets due to minority classes lacking and semantic information. To overcome aforementioned challenges, we first take a closer look at collapse tail category patterns under long-tail distributed propose an alternative but easy-to-use effective solution, Long-Tailed Bias Solver model image synthesis (LTB-Solver),...

10.2139/ssrn.4822238 preprint EN 2024-01-01

Recent advancements in text-to-image generation models have dramatically enhanced the of photorealistic images from textual prompts, leading to an increased interest personalized applications, particularly multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, need accurately maintain details each referenced subject accordance with descriptions; and secondly, difficulty achieving a cohesive representation multiple subjects single image without...

10.48550/arxiv.2406.07209 preprint EN arXiv (Cornell University) 2024-06-11

Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion domain image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic and visual information, freezing textual while fine-tuning component. methodology...

10.48550/arxiv.2407.07614 preprint EN arXiv (Cornell University) 2024-07-10

Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent propose a finetuning-free approach with decoupled cross-attention mechanism to personalized requiring no test-time finetuning. However, when multiple are provided, current encounters object confusion problem and fails map each image its corresponding object, thereby seriously limiting scope of application. To address problem, in this...

10.48550/arxiv.2409.17920 preprint EN arXiv (Cornell University) 2024-09-26

Long-tail learning seeks to address the key issue of head classes dominating process under extreme class imbalance in real-world circumstances. Data augmentation, which tries pack a set augmentation approaches increase size and quality datasets for model training, has shown be worthwhile research topic. The long-tail problem cannot solved using current data techniques. subject how undertake long-tailed more effectively is yet unanswered. diffusion-based method, referred as DiffuRC, enables...

10.2139/ssrn.4341206 article EN 2023-01-01

Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the data distribution can corrupt representation space, where distance between head and tail categories is much larger than two categories. This uneven feature space causes model to exhibit unclear inseparable decision boundaries on uniformly distributed test set, which lowers its performance. To address these challenges, we propose category...

10.48550/arxiv.2308.12522 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Video surveillance systems are playing increasingly important roles in our everyday lives. To get meaningful information a timely and accurate manner, it is vital to optimally allocate computation communication resources for image classification tasks. In this paper, taking face recognition as an example, we propose novel end-to-edge collaborative computing system based on multi-exit network dynamically at the front end (the camera sensor) back mobile edge server). With ∊-greedy algorithm...

10.1109/icc45041.2023.10279166 article EN ICC 2022 - IEEE International Conference on Communications 2023-05-28

Long-tailed learning aims to tackle the crucial challenge that head classes dominate training procedure under severe class imbalance in real-world scenarios. However, little attention has been given how quantify dominance severity of representation space. Motivated by this, we generalize cosine-based classifiers a von Mises-Fisher (vMF) mixture model, denoted as vMF classifier, which enables quantitatively measure quality upon hyper-sphere space via calculating distribution overlap...

10.48550/arxiv.2208.10043 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Data-free quantization has recently been a promising method to perform without access the original data. However, drawback of such approaches is homogenization synthetic data due low efficiency for diverse generation and performance collapse generator. To alleviate above issue, we propose novel Meta-BNS adversarial data-free scheme which consists module exploration module. automatically learns an enhancement coefficient matrix function BN loss provide suitable constrain on Adversarial...

10.1109/icip46576.2022.9897652 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2022-10-16

While network binarization is a promising method in memory saving and speedup on hardware, it inevitably leads to residual errors of intermediate features, resulting performance capability degradation. To alleviate the above issue, we focus architecture design more suitable structure for extreme-low bit scenario. In this paper, propose baseline-auxiliary compensate features via searching auxiliary branches guided by feature similarity confidence score. The maps are reasonably enhanced...

10.1145/3579109.3579132 article EN 2022-12-23
Coming Soon ...