NFDI4DS | UHH-SEMS - Publication Details

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

OPENALEX - Publications

Zhouxia Wang Tianshui Chen Guanbin Li Ruijia Xu Liang Lin

This paper proposes a novel deep architecture to address multi-label image recognition, fundamental and practical task towards general visual understanding. Current solutions for this usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation sub-optimal performance. In work, we achieve the interpretable contextualized classification by developing recurrent memorized-attention module. module consists two alternately performed...

10.1109/iccv.2017.58 article EN 2017-10-01

Deep Reasoning with Knowledge Graph for Social Relationship Understanding

OPENALEX - Publications

Zhouxia Wang Tianshui Chen Jimmy Ren Weihao Yu Hui Cheng and 1 more

Social relationships (e.g., friends, couple etc.) form the basis of social network in our daily life. Automatically interpreting such bears a great potential for intelligent systems to understand human behavior depth and better interact with people at level. Human beings interpret within group not only based on alone, interplay between contextual information around also plays significant role. However, these additional cues are largely overlooked by previous studies. We found that two...

10.24963/ijcai.2018/142 preprint EN 2018-07-01

LSTM Pose Machines

OPENALEX - Publications

Yue Luo Jimmy Ren Zhouxia Wang Wenxiu Sun Jinshan Pan and 3 more

We observed that recent state-of-the-art results on single image human pose estimation were achieved by multistage Convolution Neural Networks (CNN). Notwithstanding the superior performance static images, application of these models videos is not only computationally intensive, it also suffers from degeneration and flicking. Such suboptimal are mainly attributed to inability imposing sequential geometric consistency, handling severe quality degradation (e.g. motion blur occlusion) as well...

10.1109/cvpr.2018.00546 article EN 2018-06-01

RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs

OPENALEX - Publications

Zhouxia Wang Jiawei Zhang Runjian Chen Wenping Wang Ping Luo

Blind face restoration is to recover a high-quality image from unknown degradations. As contains abundant contextual information, we propose method, RestoreFormer, which explores fully-spatial attentions model information and surpasses existing works that use local operators. RestoreFormer has several benefits compared prior arts. First, unlike the conventional multi-head self-attention in previous Vision Transformers (ViTs), incorporates cross-attention layer learn interactions between...

10.1109/cvpr52688.2022.01699 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

OPENALEX - Publications

Zhouxia Wang Ziyang Yuan Xintao Wang Y. Li Tianshui Chen and 3 more

10.1145/3641519.3657518 article EN 2024-07-12

Image Conductor: Precision Control for Interactive Video Synthesis

OPENALEX - Publications

Yaowei Li Xintao Wang Zhaoyang Zhang Zhouxia Wang Ziyang Yuan and 3 more

Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI video creation, achieving precise control over motion interactive asset generation remains challenging. To this end, we propose Image Conductor, a method of movements to generate assets from single image. An well-cultivated training strategy is proposed separate distinct...

10.1609/aaai.v39i5.32533 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

OPENALEX - Publications

Zhouxia Wang Tianshui Chen Guanbin Li Ruijia Xu Liang Lin

This paper proposes a novel deep architecture to address multi-label image recognition, fundamental and practical task towards general visual understanding. Current solutions for this usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation sub-optimal performance. In work, we achieve the interpretable contextualized classification by developing recurrent memorized-attention module. module consists two alternately performed...

10.48550/arxiv.1711.02816 preprint EN other-oa arXiv (Cornell University) 2017-01-01

RestoreFormer++: Towards Real-World Blind Face Restoration From Undegraded Key-Value Pairs

OPENALEX - Publications

Zhouxia Wang Jiawei Zhang Tianshui Chen Wenping Wang Ping Luo

Blind face restoration aims at recovering high-quality images from those with unknown degradations. Current algorithms mainly introduce priors to complement details and achieve impressive progress. However, most of these ignore abundant contextual information in the its interplay priors, leading sub-optimal performance. Moreover, they pay less attention gap between synthetic real-world scenarios, limiting robustness generalization applications. In this work, we propose RestoreFormer++, which...

10.1109/tpami.2023.3315753 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-09-15

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

OPENALEX - Publications

Zhouxia Wang Xintao Wang Liangbin Xie Zhongang Qi Ying Shan and 2 more

This paper presents a LoRA-free method for stylized image generation that takes text prompt and style reference images as inputs produces an output in single pass. Unlike existing methods rely on training separate LoRA each style, our can adapt to various styles with unified model. However, this poses two challenges: 1) the loses controllability over generated content, 2) inherits both semantic features of image, compromising its content fidelity. To address these challenges, we introduce...

10.48550/arxiv.2309.01770 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Learning a Reinforced Agent for Flexible Exposure Bracketing Selection

OPENALEX - Publications

Zhouxia Wang Jiawei Zhang Mude Lin Jiong Wang Ping Luo and 1 more

Automatically selecting exposure bracketing (images exposed differently) is important to obtain a high dynamic range image by using multi-exposure fusion. Unlike previous methods that have many restrictions such as requiring camera response function, sensor noise model, and stream of preview images with different exposures (not accessible in some scenarios e.g. mobile applications), we propose novel deep neural network automatically select bracketing, named EBSNet, which sufficiently...

10.1109/cvpr42600.2020.00189 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Multi-label image recognition with attentive transformer-localizer module

OPENALEX - Publications

Lin Nie Tianshui Chen Zhouxia Wang Wenxiong Kang Liang Lin

10.1007/s11042-021-11818-8 article EN Multimedia Tools and Applications 2022-01-29

StyleAdapter: A Unified Stylized Image Generation Model

OPENALEX - Publications

Zhouxia Wang Xintao Wang Liangbin Xie Zhongang Qi Ying Shan and 2 more

10.1007/s11263-024-02253-x article EN International Journal of Computer Vision 2024-10-25

Deep Reasoning with Knowledge Graph for Social Relationship Understanding

OPENALEX - Publications

Zhouxia Wang Tianshui Chen Jimmy Ren Weihao Yu Hui Cheng and 1 more

Social relationships (e.g., friends, couple etc.) form the basis of social network in our daily life. Automatically interpreting such bears a great potential for intelligent systems to understand human behavior depth and better interact with people at level. Human beings interpret within group not only based on alone, interplay between contextual information around also plays significant role. However, these additional cues are largely overlooked by previous studies. We found that two...

10.48550/arxiv.1807.00504 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Recovering Extremely Degraded Faces by Joint Super-Resolution and Facial Composite

OPENALEX - Publications

Xiu Li Guichun Duan Zhouxia Wang Jimmy Ren Yongbing Zhang and 2 more

In the past a few years, we witnessed rapid advancement in face super-resolution from very low resolution(VLR) images. However, most of previous studies focus on solving such problem without explicitly considering impact severe real-life image degradation (e.g. blur and noise). We can show that robustly recover details VLR images is task beyond ability current state-of-the-art method. this paper, borrow ideas "facial composite" propose an alternative approach to tackle problem. endow...

10.1109/ictai.2019.00079 article EN 2019-11-01

Image Deblurring Aided by Low-Resolution Events

OPENALEX - Publications

Zhouxia Wang Jimmy Ren Jiawei Zhang Ping Luo

Due to the limitation of event sensors, spatial resolution data is relatively low compared conventional frame-based camera. However, low-spatial-resolution events recorded by cameras are rich in temporal information which helpful for image deblurring, while intensity images captured frame high and have potential promote quality events. Considering complementarity between images, an alternately performed model proposed this paper deblur high-resolution with help low-resolution This composed...

10.3390/electronics11040631 article EN Electronics 2022-02-18

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

OPENALEX - Publications

Haonan Qiu Zhaoxi Chen Zhouxia Wang Yingqing He Menghan Xia and 1 more

Diffusion model has demonstrated remarkable capability in video generation, which further sparks interest introducing trajectory control into the generation process. While existing works mainly focus on training-based methods (e.g., conditional adapter), we argue that diffusion itself allows decent over generated content without requiring any training. In this study, introduce a tuning-free framework to achieve trajectory-controllable by imposing guidance both noise construction and...

10.48550/arxiv.2406.16863 preprint EN arXiv (Cornell University) 2024-06-24

Image Conductor: Precision Control for Interactive Video Synthesis

OPENALEX - Publications

Yaowei Li Xintao Wang Zhaoyang Zhang Zhouxia Wang Ziyang Yuan and 3 more

Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI video creation, achieving precise control over motion interactive asset generation remains challenging. To this end, we propose Image Conductor, a method of movements to generate assets from single image. An well-cultivated training strategy is proposed separate distinct...

10.48550/arxiv.2406.15339 preprint EN arXiv (Cornell University) 2024-06-21

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

OPENALEX - Publications

Kang Liao Zongsheng Yue Zhouxia Wang Chen Change Loy

Although deep learning-based image restoration methods have made significant progress, they still struggle with limited generalization to real-world scenarios due the substantial domain gap caused by training on synthetic data. Existing address this issue improving data synthesis pipelines, estimating degradation kernels, employing internal learning, and performing adaptation regularization. Previous sought bridge learning domain-invariant knowledge in either feature or pixel space. However,...

10.48550/arxiv.2406.18516 preprint EN arXiv (Cornell University) 2024-06-26

Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos

OPENALEX - Publications

Zhouxia Wang Jiawei Zhang Xintao Wang Tianshui Chen Ying Shan and 2 more

Recent progress in blind face restoration has resulted producing high-quality restored results for static images. However, efforts to extend these advancements video scenarios have been minimal, partly because of the absence benchmarks that allow a comprehensive and fair comparison. In this work, we first present evaluation benchmark, which introduce Real-world Low-Quality Face Video benchmark (RFV-LQ), evaluate several leading image-based algorithms, conduct thorough systematical analysis...

10.1109/tip.2024.3463414 article EN IEEE Transactions on Image Processing 2024-01-01

LSTM Pose Machines

OPENALEX - Publications

Yue Luo Jimmy Ren Zhouxia Wang Wenxiu Sun Jinshan Pan and 3 more

We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance static images, application of these models videos is not only computationally intensive, it also suffers from degeneration and flicking. Such suboptimal are mainly attributed to inability imposing sequential geometric consistency, handling severe quality degradation (e.g. motion blur occlusion) as well...

10.48550/arxiv.1712.06316 preprint EN other-oa arXiv (Cornell University) 2017-01-01

RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs

OPENALEX - Publications

Zhouxia Wang Jiawei Zhang Tianshui Chen Wenping Wang Ping Luo

Blind face restoration aims at recovering high-quality images from those with unknown degradations. Current algorithms mainly introduce priors to complement details and achieve impressive progress. However, most of these ignore abundant contextual information in the its interplay priors, leading sub-optimal performance. Moreover, they pay less attention gap between synthetic real-world scenarios, limiting robustness generalization applications. In this work, we propose RestoreFormer++, which...

10.48550/arxiv.2308.07228 preprint EN other-oa arXiv (Cornell University) 2023-01-01

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

OPENALEX - Publications

Zhouxia Wang Ziyang Yuan Xintao Wang Tianshui Chen Menghan Xia and 2 more

Motions in a video primarily consist of camera motion, induced by movement, and object resulting from movement. Accurate control both motion is essential for generation. However, existing works either mainly focus on one type or do not clearly distinguish between the two, limiting their capabilities diversity. Therefore, this paper presents MotionCtrl, unified flexible controller generation designed to effectively independently motion. The architecture training strategy MotionCtrl are...

10.48550/arxiv.2312.03641 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Diffusion-based Blind Text Image Super-Resolution

OPENALEX - Publications

Yuzhe Zhang Jiawei Zhang Hao Li Zhouxia Wang Luwei Hou and 2 more

Recovering degraded low-resolution text images is challenging, especially for Chinese with complex strokes and severe degradation in real-world scenarios. Ensuring both fidelity style realness crucial high-quality image super-resolution. Recently, diffusion models have achieved great success natural synthesis restoration due to their powerful data distribution modeling abilities generation capabilities. In this work, we propose an Image Diffusion Model (IDM) restore realistic styles. For...

10.48550/arxiv.2312.08886 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Learning a Reinforced Agent for Flexible Exposure Bracketing Selection

OPENALEX - Publications

Zhouxia Wang Jiawei Zhang Mude Lin Jiong Wang Ping Luo and 1 more

Automatically selecting exposure bracketing (images exposed differently) is important to obtain a high dynamic range image by using multi-exposure fusion. Unlike previous methods that have many restrictions such as requiring camera response function, sensor noise model, and stream of preview images with different exposures (not accessible in some scenarios e.g. mobile applications), we propose novel deep neural network automatically select bracketing, named EBSNet, which sufficiently...

10.48550/arxiv.2005.12536 preprint EN other-oa arXiv (Cornell University) 2020-01-01