Yu-Xiong Wang

ORCID: 0000-0003-4414-0198
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Semantic Web and Ontologies
  • Topic Modeling
  • Image Processing and 3D Reconstruction
  • 3D Shape Modeling and Analysis
  • Categorization, perception, and language
  • 3D Surveying and Cultural Heritage
  • Multi-Agent Systems and Negotiation
  • Data Mining Algorithms and Applications
  • Advanced Memory and Neural Computing
  • Image Retrieval and Classification Techniques
  • CCD and CMOS Imaging Sensors
  • Advanced Vision and Imaging
  • Advanced Image Fusion Techniques
  • 3D Modeling in Geospatial Applications
  • Radiomics and Machine Learning in Medical Imaging
  • Image and Signal Denoising Methods
  • Speech and dialogue systems
  • Generative Adversarial Networks and Image Synthesis
  • Neural Networks and Reservoir Computing
  • Natural Language Processing Techniques
  • Image Enhancement Techniques

University of Illinois Urbana-Champaign
2024

Manhattan College
2016-2022

University of Alabama in Huntsville
2016

A novel switching median filter integrated with a learning-based noise detection method is proposed for suppression of impulse in highly corrupted colour images. Noise employs new machine learning algorithm, called margin setting (MS), to detect pixels. MS achieved by classifying and clean pixels decision surface. yields very high accuracy, i.e. zero miss rate fairly low over wide range levels. After detection, scheme the noise-free two-stage (NFTS) triggered. NFTS corrects using two stages....

10.1080/13682199.2015.1104068 article EN The Imaging Science Journal 2016-01-02

Being able to carry out complicated vision language reasoning tasks in 3D space represents a significant milestone developing household robots and human-centered embodied AI. In this work, we demonstrate that critical distinct challenge is situational awareness, which incorporates two key components: (1) The autonomous agent grounds its self-location based on prompt. (2) answers open-ended questions from the perspective of calculated position. To address challenge, introduce SIG3D, an...

10.48550/arxiv.2406.07544 preprint EN arXiv (Cornell University) 2024-06-11

This paper proposes Instruct 4D-to-4D that achieves 4D awareness and spatial-temporal consistency for 2D diffusion models to generate high-quality instruction-guided dynamic scene editing results. Traditional applications of in often result inconsistency, primarily due their inherent frame-by-frame methodology. Addressing the complexities extending 4D, our key insight is treat a as pseudo-3D scene, decoupled into two sub-problems: achieving temporal video applying these edits scene....

10.48550/arxiv.2406.09402 preprint EN arXiv (Cornell University) 2024-06-13

Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. In this work, we first show fundamental reasons such misalignment identifying issues related to low attention activation and mask overlaps. Then propose a finetuning framework two novel objectives, Separate loss Enhance loss, that reduce object...

10.1145/3641519.3657527 article EN 2024-07-12

In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs). This poses significant challenges due to the explosion in graph size, dependencies among entities, and need for controllability conditions. To address these challenges, propose a context-conditioned diffusion model called InstructG2I. InstructG2I first exploits structure information conduct informative neighbor sampling by combining personalized page rank...

10.48550/arxiv.2410.07157 preprint EN arXiv (Cornell University) 2024-10-09

Complex 3D scene understanding has gained increasing attention, with encoding strategies playing a crucial role in this success. However, the optimal for various scenarios remain unclear, particularly compared to their image-based counterparts. To address issue, we present comprehensive study that probes visual models understanding, identifying strengths and limitations of each model across different scenarios. Our evaluation spans seven vision foundation encoders, including image-based,...

10.48550/arxiv.2409.03757 preprint EN arXiv (Cornell University) 2024-09-05

Vision Foundation Models (VFMs) have demonstrated outstanding performance on numerous downstream tasks. However, due to their inherent representation biases originating from different training paradigms, VFMs exhibit advantages and disadvantages across distinct vision Although amalgamating the strengths of multiple for tasks is an intuitive strategy, effectively exploiting these remains a significant challenge. In this paper, we propose novel versatile "Swiss Army Knife" (SAK) solution,...

10.48550/arxiv.2410.14633 preprint EN arXiv (Cornell University) 2024-10-18

This paper proposes ProEdit - a simple yet effective framework for high-quality 3D scene editing guided by diffusion distillation in novel progressive manner. Inspired the crucial observation that multi-view inconsistency is rooted model's large feasible output space (FOS), our controls size of FOS and reduces decomposing overall task into several subtasks, which are then executed progressively on scene. Within this framework, we design difficulty-aware subtask decomposition scheduler an...

10.48550/arxiv.2411.05006 preprint EN arXiv (Cornell University) 2024-11-07

The vision of a broadly capable and goal-directed agent, such as an Internet-browsing agent in the digital world household humanoid physical world, has rapidly advanced, thanks to generalization capability foundation models. Such generalist needs have large diverse skill repertoire, finding directions between two travel locations buying specific items from Internet. If each be specified manually through fixed set human-annotated instructions, agent's repertoire will necessarily limited due...

10.48550/arxiv.2412.13194 preprint EN arXiv (Cornell University) 2024-12-17

The purpose of this study is to present agile, intelligent, and efficient computer vision architectures, operating on quantum neuromorphic computing, as part a Space Situational Awareness (SSA) network. Quantum paired with polarimetric Dynamic Vision Sensors p(DVS) principles, would give rise the next generation highly engineering systems for SSA, at fast speeds, while reduced bandwidth, low-power, low-memory. A deep-learning network has been designed high accuracy classify different target...

10.1109/ist55454.2022.9827746 article EN 2022-06-21
Coming Soon ...