Haoning Wu

ORCID: 0009-0001-8717-338X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Analysis and Summarization
  • Advanced Vision and Imaging
  • Image Processing Techniques and Applications
  • Advanced Image Processing Techniques
  • Human Motion and Animation
  • Multimodal Machine Learning Applications
  • Medical Image Segmentation Techniques
  • Computer Graphics and Visualization Techniques
  • Digital Storytelling and Education
  • Image and Signal Denoising Methods
  • Microfluidic and Capillary Electrophoresis Applications
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • Conducting polymers and applications
  • Advanced Image Fusion Techniques
  • Radiomics and Machine Learning in Medical Imaging
  • Imbalanced Data Classification Techniques
  • Artificial Intelligence in Games
  • Sports Analytics and Performance
  • Organic Electronics and Photovoltaics
  • Organic Light-Emitting Diodes Research
  • Electronic and Structural Properties of Oxides
  • 3D Shape Modeling and Analysis
  • Image Retrieval and Classification Techniques

Shanghai Jiao Tong University
2022-2025

Shandong Jiaotong University
2022

Previous super-resolution (SR) approaches often formulate SR as a regression problem and pixel wise restoration, which leads to blurry unreal output. Recent works combine adversarial loss with pixel-wise train GAN-based model or introduce normalizing flows into problems generate more realistic images. As another powerful generative approach, autoregressive (AR) has not been noticed in low level tasks due its limitation. Based on the fact that given structural in-formation, textural details...

10.1109/cvpr52688.2022.00195 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

10.1109/cvpr52733.2024.00592 article 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Portrait images typically consist of a salient person against diverse backgrounds. With the development mobile devices and image processing techniques, users can conveniently capture portrait anytime anywhere. However, quality these portraits may suffer from degradation caused by unfavorable environmental conditions, subpar photography inferior capturing devices. In this paper, we introduce dual-branch network for assessment (PIQA), which effectively address how background influence its...

10.48550/arxiv.2405.08555 preprint EN arXiv (Cornell University) 2024-05-14

This paper introduces the real image Super-Resolution (SR) challenge that was part of Advances in Image Manipulation (AIM) workshop, held conjunction with ECCV 2020. involves three tracks to super-resolve an input for $\times$2, $\times$3 and $\times$4 scaling factors, respectively. The goal is attract more attention realistic degradation SR task, which much complicated challenging, contributes real-world super-resolution applications. 452 participants were registered total, 24 teams...

10.48550/arxiv.2009.12072 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive in video. Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability due the motion distribution of training datasets. In this paper, we propose novel optimization-based method can adapt unseen motions at test time. Our based on cycle-consistency adaptation strategy leverages characteristics among video frames....

10.48550/arxiv.2306.13933 preprint EN other-oa arXiv (Cornell University) 2023-01-01

10.18653/v1/2024.emnlp-main.99 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

The outstanding performance of Large Multimodal Models (LMMs) has made them widely applied in vision-related tasks. However, various corruptions the real world mean that images will not be as ideal simulations, presenting significant challenges for practical application LMMs. To address this issue, we introduce R-Bench, a benchmark focused on **Real-world Robustness LMMs**. Specifically, we: (a) model complete link from user capture to LMMs reception, comprising 33 corruption dimensions,...

10.48550/arxiv.2410.05474 preprint EN arXiv (Cornell University) 2024-10-07

Diffusion models have emerged as frontrunners in text-to-image generation, however, their fixed image resolution during training often leads to challenges high-resolution such semantic deviations and object replication. This paper introduces MegaFusion, a novel approach that extends existing diffusion-based generation towards efficient higher-resolution without additional fine-tuning or extra adaptation. Specifically, we employ an innovative truncate relay strategy bridge the denoising...

10.48550/arxiv.2408.11001 preprint EN arXiv (Cornell University) 2024-08-20

Large multimodal models (LMMs) with advanced video analysis capabilities have recently garnered significant attention. However, most evaluations rely on traditional methods like multiple-choice questions in benchmarks such as VideoMME and LongVideoBench, which are prone to lack the depth needed capture complex demands of real-world users. To address this limitation-and due prohibitive cost slow pace human annotation for tasks-we introduce VideoAutoArena, an arena-style benchmark inspired by...

10.48550/arxiv.2411.13281 preprint EN arXiv (Cornell University) 2024-11-20

Given the remarkable success that large visual language models (LVLMs) have achieved in image perception tasks, endeavor to make LVLMs perceive world like humans is drawing increasing attention. Current multi-modal benchmarks primarily focus on facts or specific topic-related knowledge contained within individual images. However, they often overlook associative relations between multiple images, which require identification and analysis of similarities among entities content present...

10.48550/arxiv.2407.17379 preprint EN arXiv (Cornell University) 2024-07-24

Medical image segmentation has recently demonstrated impressive progress with deep neural networks, yet the heterogeneous modalities and scarcity of mask annotations limit development models on unannotated modalities. This paper investigates a new paradigm for leveraging generative in medical applications: controllably synthesizing data modalities, without requiring registered pairs. Specifically, we make following contributions this paper: (i) collect curate large-scale radiology image-text...

10.48550/arxiv.2412.04106 preprint EN arXiv (Cornell University) 2024-12-04

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently. In this work, we focus on a novel, yet challenging task of generating coherent sequence based given storyline, denoted as open-ended visual storytelling. We make the following three contributions: (i) fulfill storytelling, propose learning-based auto-regressive generation model, termed StoryGen, with novel vision-language context module,...

10.48550/arxiv.2306.00973 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In recent years, neural radiance fields have exhibited impressive performance in novel view synthesis. However, exploiting complex network structures to achieve generalizable NeRF usually results inefficient rendering. Existing methods for accelerating rendering directly employ simpler inference networks or fewer sampling points, leading unsatisfactory synthesis quality. To address the challenge of balancing speed and quality NeRF, we propose a framework, NeRF-SDP, which achieves both...

10.1145/3595916.3626380 article EN 2023-12-06
Coming Soon ...