Richard Zhang

ORCID: 0000-0003-2507-4674
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Computer Graphics and Visualization Techniques
  • Image Retrieval and Classification Techniques
  • Advanced Vision and Imaging
  • Generative Adversarial Networks and Image Synthesis
  • Topological and Geometric Data Analysis
  • Medical Image Segmentation Techniques
  • Philosophy, Science, and History
  • Neural dynamics and brain function
  • Cinema and Media Studies
  • Advanced Image and Video Retrieval Techniques
  • Advanced Image Fusion Techniques
  • Multidisciplinary Warburg-centric Studies

Adobe Systems (United States)
2023-2024

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse, high-quality images. However, directly applying these for real image editing remains challenging two reasons. First, it is hard users craft a perfect text prompt depicting every visual detail in the input image. Second, while existing can introduce desirable changes certain regions, they often dramatically alter content and unexpected unwanted regions. In this work, we pix2pix-zero, an...

10.1145/3588432.3591513 article EN cc-by 2023-07-19

Model customization introduces new concepts to existing text-to-image models, enabling the generation of these concepts/objects in novel contexts. However, such methods lack accurate camera view control with respect object, and users must resort prompt engineering (e.g., adding "top-view") achieve coarse control. In this work, we introduce a task – explicit object viewpoint diffusion models. This allows us modify custom object's properties generate it various background scenes via text...

10.1145/3680528.3687564 article EN cc-by 2024-12-03

Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The of a single frame requires the model process entire sequence, including future. We address this limitation by adapting pretrained transformer causal that generates frames on-the-fly. To further reduce latency, we extend distribution matching distillation (DMD) videos, distilling 50-step into 4-step generator. enable stable and...

10.48550/arxiv.2412.07772 preprint EN arXiv (Cornell University) 2024-12-10

With rapid advancements in virtual reality (VR) headsets, effectively measuring stereoscopic quality of experience (SQoE) has become essential for delivering immersive and comfortable 3D experiences. However, most existing stereo metrics focus on isolated aspects the viewing such as visual discomfort or image quality, have traditionally faced data limitations. To address these gaps, we present SCOPE (Stereoscopic COntent Preference Evaluation), a new dataset comprised real synthetic images...

10.48550/arxiv.2412.21127 preprint EN arXiv (Cornell University) 2024-12-30
Coming Soon ...