NFDI4DS | UHH-SEMS - Publication Details

A Remote Sensing Image Change Detection Method Integrating Layer Exchange and Channel-Spatial Differences

OPENALEX - Publications

Sijun Dong F. Zuo Geng Chen Siming Fu Xiaoliang Meng

Change detection in remote sensing imagery is a critical technique for Earth observation, primarily focusing on pixel-level segmentation of change regions between bi-temporal images. The essence lies determining whether corresponding pixels images have changed. In deep learning, the spatial and channel dimensions feature maps represent different information from original this study, we found that tasks, difference can be computed not only dimension features but also dimension. Therefore,...

10.48550/arxiv.2501.10905 preprint EN arXiv (Cornell University) 2025-01-18

CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers

OPENALEX - Publications

Dongmei She Mushui Liu Jihong Pang Jin Wang Zhen Yang and 7 more

Customized generation has achieved significant progress in image synthesis, yet personalized video remains challenging due to temporal inconsistencies and quality degradation. In this paper, we introduce CustomVideoX, an innovative framework leveraging the diffusion transformer for from a reference image. CustomVideoX capitalizes on pre-trained networks by exclusively training LoRA parameters extract features, ensuring both efficiency adaptability. To facilitate seamless interaction between...

10.48550/arxiv.2502.06527 preprint EN arXiv (Cornell University) 2025-02-10

LTB-Solver: Long-tailed Bias Solver for image synthesis of diffusion models

OPENALEX - Publications

Siming Fu Xiaoxuan He Haoji Hu

10.1016/j.neucom.2025.129651 article EN Neurocomputing 2025-02-01

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

OPENALEX - Publications

Wanggui He Siming Fu Mushui Liu Xierui Wang Wenyi Xiao and 8 more

Auto-regressive models have made significant progress in the realm of text-to-image synthesis, yet devising an appropriate model architecture and training strategy to achieve a satisfactory level remains important avenue exploration. In this work, we introduce MARS, novel framework for T2I generation that incorporates specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic visual...

10.1609/aaai.v39i16.33882 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

OPENALEX - Publications

Qihan Huang Siming Fu Jinlong Liu Hao Jiang Yan Yu and 1 more

Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent propose a finetuning-free approach with decoupled cross-attention mechanism to personalized requiring no test-time finetuning. However, when multiple are provided, current encounters object confusion problem and fails map each image its corresponding object, thereby seriously limiting scope of application. To address problem, in this...

10.1609/aaai.v39i4.32386 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Comparatively investigation of real MSW and biomass tar treatment by a rotating gliding arc

OPENALEX - Publications

Xiangzhi Kong Angjian Wu Siming Fu Ruiyang Xu Yucheng Zhao and 2 more

10.1016/j.fuel.2021.120745 article EN Fuel 2021-04-14

Class semantic enhancement network for semantic segmentation

OPENALEX - Publications

Siming Fu Hualiang Wang Haoji Hu Xiaoxuan He Yongwen Long and 4 more

10.1016/j.jvcir.2023.103924 article EN Journal of Visual Communication and Image Representation 2023-08-18

SemiGMMPoint: Semi-supervised point cloud segmentation based on Gaussian mixture models

OPENALEX - Publications

Xianwei Zhuang Hualiang Wang Xiaoxuan He Siming Fu Haoji Hu

10.1016/j.patcog.2024.111045 article EN Pattern Recognition 2024-09-01

Renovate Yourself: Calibrating Feature Representation of Misclassified Pixels for Semantic Segmentation

OPENALEX - Publications

Hualiang Wang Huanpeng Chu Siming Fu Zuozhu Liu Haoji Hu

Existing image semantic segmentation methods favor learning consistent representations by extracting long-range contextual features with the attention, multi-scale, or graph aggregation strategies. These usually treat misclassified and correctly classified pixels equally, hence misleading optimization process causing inconsistent intra-class pixel feature in embedding space during learning. In this paper, we propose auxiliary representation calibration head (RCH), which consists of...

10.1609/aaai.v36i3.20145 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

OPENALEX - Publications

Xiaoxuan He Siming Fu Xinpeng Ding Yuchen Cao Hualiang Wang

Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the data distribution can corrupt representation space, where distance between head and tail categories is much larger than two categories. This uneven feature space causes model to exhibit unclear inseparable decision boundaries on uniformly distributed test set, which lowers its performance. To address these challenges, we propose category...

10.1145/3581783.3611904 article EN 2023-10-26

TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System

OPENALEX - Publications

Haoyuan Li Hao Jiang Tianke Zhang Zhelun Yu Aoxiong Yin and 4 more

Training AI models has always been challenging, especially when there is a need for custom to provide personalized services. Algorithm engineers often face lengthy process iteratively develop tailored specific business requirements, making it even more difficult non-experts. The quest high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, become key focus in industry. Leveraging powerful analytical, planning, decision-making capabilities...

10.48550/arxiv.2311.06622 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Ltb-Solver: Long-Tailed Bias Solver for Image Synthesis of Diffusion Models

OPENALEX - Publications

Xiaoxuan He Siming Fu Haoji Hu

Though diffusion models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies, they don't generalize well on long-tailed datasets due to minority classes lacking and semantic information. To overcome aforementioned challenges, we first take a closer look at collapse tail category patterns under long-tail distributed propose an alternative but easy-to-use effective solution, Long-Tailed Bias Solver model image synthesis (LTB-Solver),...

10.2139/ssrn.4822238 preprint EN 2024-01-01

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

OPENALEX - Publications

Xiaohua Wang Siming Fu Qihan Huang Wanggui He Hao Jiang

Recent advancements in text-to-image generation models have dramatically enhanced the of photorealistic images from textual prompts, leading to an increased interest personalized applications, particularly multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, need accurately maintain details each referenced subject accordance with descriptions; and secondly, difficulty achieving a cohesive representation multiple subjects single image without...

10.48550/arxiv.2406.07209 preprint EN arXiv (Cornell University) 2024-06-11

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

OPENALEX - Publications

Wanggui He Siming Fu Mushui Liu Xierui Wang Wenyi Xiao and 8 more

Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion domain image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic and visual information, freezing textual while fine-tuning component. methodology...

10.48550/arxiv.2407.07614 preprint EN arXiv (Cornell University) 2024-07-10

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

OPENALEX - Publications

Qihan Huang Siming Fu Jinlong Liu Hao Jiang Yan Yu and 1 more

Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent propose a finetuning-free approach with decoupled cross-attention mechanism to personalized requiring no test-time finetuning. However, when multiple are provided, current encounters object confusion problem and fails map each image its corresponding object, thereby seriously limiting scope of application. To address problem, in this...

10.48550/arxiv.2409.17920 preprint EN arXiv (Cornell University) 2024-09-26

AuxBranch: Binarization residual-aware network design via auxiliary branch search

OPENALEX - Publications

Siming Fu Huanpeng Chu Lu Yu Bo Peng Zheyang Li and 2 more

10.1016/j.patcog.2022.109263 article EN Pattern Recognition 2022-12-16

Unlocking the Power of Diffusion Probabilistic Models for Long-Tailed Recognition via Data Synthesis

OPENALEX - Publications

Siming Fu Xiaoxuan He Haoji Hu

Long-tail learning seeks to address the key issue of head classes dominating process under extreme class imbalance in real-world circumstances. Data augmentation, which tries pack a set augmentation approaches increase size and quality datasets for model training, has shown be worthwhile research topic. The long-tail problem cannot solved using current data techniques. subject how undertake long-tailed more effectively is yet unanswered. diffusion-based method, referred as DiffuRC, enables...

10.2139/ssrn.4341206 article EN 2023-01-01

Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

OPENALEX - Publications

Siming Fu Xiaoxuan He Xinpeng Ding Yuchen Cao Hualiang Wang

Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the data distribution can corrupt representation space, where distance between head and tail categories is much larger than two categories. This uneven feature space causes model to exhibit unclear inseparable decision boundaries on uniformly distributed test set, which lowers its performance. To address these challenges, we propose category...

10.48550/arxiv.2308.12522 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Video Surveillance on Mobile Edge Networks: Exploiting Multi-Exit Network

OPENALEX - Publications

Yuchen Cao Siming Fu Xiaoxuan He Haoji Hu Hangguan Shan and 1 more

Video surveillance systems are playing increasingly important roles in our everyday lives. To get meaningful information a timely and accurate manner, it is vital to optimally allocate computation communication resources for image classification tasks. In this paper, taking face recognition as an example, we propose novel end-to-edge collaborative computing system based on multi-exit network dynamically at the front end (the camera sensor) back mobile edge server). With ∊-greedy algorithm...

10.1109/icc45041.2023.10279166 article EN ICC 2022 - IEEE International Conference on Communications 2023-05-28

Towards Calibrated Hyper-Sphere Representation via Distribution Overlap Coefficient for Long-tailed Learning

OPENALEX - Publications

Hualiang Wang Siming Fu Xiaoxuan He Hangxiang Fang Zuozhu Liu and 1 more

Long-tailed learning aims to tackle the crucial challenge that head classes dominate training procedure under severe class imbalance in real-world scenarios. However, little attention has been given how quantify dominance severity of representation space. Motivated by this, we generalize cosine-based classifiers a von Mises-Fisher (vMF) mixture model, denoted as vMF classifier, which enables quantitatively measure quality upon hyper-sphere space via calculating distribution overlap...

10.48550/arxiv.2208.10043 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Meta-BNS FOR Adversarial Data-Free Quantization

OPENALEX - Publications

Siming Fu Hualiang Wang Yuchen Cao Haoji Hu Bo Peng and 2 more

Data-free quantization has recently been a promising method to perform without access the original data. However, drawback of such approaches is homogenization synthetic data due low efficiency for diverse generation and performance collapse generator. To alleviate above issue, we propose novel Meta-BNS adversarial data-free scheme which consists module exploration module. automatically learns an enhancement coefficient matrix function BN loss provide suitable constrain on Adversarial...

10.1109/icip46576.2022.9897652 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2022-10-16

Baseline-auxiliary Network Architecture Design Scheme to Compensate for Binarization Residual Errors

OPENALEX - Publications

Siming Fu Tian Ni Haoji Hu

While network binarization is a promising method in memory saving and speedup on hardware, it inevitably leads to residual errors of intermediate features, resulting performance capability degradation. To alleviate the above issue, we focus architecture design more suitable structure for extreme-low bit scenario. In this paper, propose baseline-auxiliary compensate features via searching auxiliary branches guided by feature similarity confidence score. The maps are reasonably enhanced...

10.1145/3579109.3579132 article EN 2022-12-23