NFDI4DS | UHH-SEMS - Publication Details

Semantic Image Inpainting with Deep Generative Models

OPENALEX - Publications

Raymond A. Yeh Chen Chen Teck Yian Lim Alexander G. Schwing Mark Hasegawa–Johnson and 1 more

Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only single generally produce unsatisfactory results due lack of high level context. In this paper, we propose novel method for semantic inpainting, generates content by conditioning Given trained generative model, search closest encoding corrupted in latent manifold using our context and prior losses. This then...

10.1109/cvpr.2017.728 preprint EN 2017-07-01

Video Frame Synthesis Using Deep Voxel Flow

OPENALEX - Publications

Ziwei Liu Raymond A. Yeh Xiaoou Tang Yiming Liu Aseem Agarwala

We address the problem of synthesizing new video frames in an existing video, either in-between (interpolation), or subsequent to them (extrapolation). This is challenging because appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation challenging, while newer neural-network-based methods that hallucinate pixel values directly produce blurry results. combine advantages these two by training a deep network learns synthesize...

10.1109/iccv.2017.478 article EN 2017-10-01

Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation

OPENALEX - Publications

Haochen Wang Xiaodan Du Jiahao Li Raymond A. Yeh Greg Shakhnarovich

A diffusion model learns to predict a vector field of gradients. We propose apply chain rule on the learned gradients, and back-propagate score through Jacobian differentiable renderer, which we instantiate be voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into 3D score, re-purposes pretrained for data generation. identify technical challenge distribution mismatch that arises in this application, novel estimation mechanism resolve it. run our algorithm...

10.1109/cvpr52729.2023.01214 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Diverse Generation for Multi-Agent Sports Games

OPENALEX - Publications

Raymond A. Yeh Alexander G. Schwing Jonathan Huang Kevin Murphy

In this paper, we propose a new generative model for multi-agent trajectory data, focusing on the case of multi-player sports games. Our leverages graph neural networks (GNNs) and variational recurrent (VRNNs) to achieve permutation equivariant suitable sports. On two challenging datasets (basketball soccer), show that are able produce more accurate forecasts than previous methods. We assess accuracy using various metrics, such as log-likelihood "best N" loss, based N different samples...

10.1109/cvpr.2019.00474 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Time-Frequency Networks for Audio Super-Resolution

OPENALEX - Publications

Teck Yian Lim Raymond A. Yeh Yijia Xu N. Minh Mark Hasegawa–Johnson

Audio super-resolution (a.k.a. bandwidth extension) is the challenging task of increasing temporal resolution audio signals. Recent deep networks approaches achieved promising results by modeling as a regression problem in either time or frequency domain. In this paper, we introduced Time-Frequency Network (TFNet), network that utilizes supervision both and We proposed novel model architecture which allows two domains to be jointly optimized. Results demonstrate our method outperforms...

10.1109/icassp.2018.8462049 article EN 2018-04-01

Semantic Facial Expression Editing using Autoencoded Flow

OPENALEX - Publications

Raymond A. Yeh Ziwei Liu Dan B Goldman Aseem Agarwala

High-level manipulation of facial expressions in images --- such as changing a smile to neutral expression is challenging because changes are highly non-linear, and vary depending on the appearance face. We present fully automatic approach editing faces that combines advantages flow-based face with more recent generative capabilities Variational Autoencoders (VAEs). During training, our model learns encode flow from one another over low-dimensional latent space. At test time, can be done...

10.48550/arxiv.1611.09961 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts

OPENALEX - Publications

Raymond A. Yeh Jinjun Xiong Wen‐mei Hwu Nguyen Q. Minh Alexander G. Schwing

Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the as selection from a set of bounding box proposals obtained deep net based systems. In this work, we demonstrate that can cast problem textual into unified framework permits efficient search over all possible boxes. Hence, method able to consider significantly more doesn't rely on successful first stage hypothesizing proposals....

10.48550/arxiv.1803.11209 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Semantic Image Inpainting with Deep Generative Models

OPENALEX - Publications

Raymond A. Yeh Chen Chen Teck Yian Lim Alexander G. Schwing Mark Hasegawa–Johnson and 1 more

Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only single generally produce unsatisfactory results due lack of high level context. In this paper, we propose novel method for semantic inpainting, generates content by conditioning Given trained generative model, search closest encoding corrupted in latent manifold using our context and prior losses. This then...

10.48550/arxiv.1607.07539 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

OPENALEX - Publications

Khoi-Nguyen C. Mac Dhiraj Joshi Raymond A. Yeh Jinjun Xiong Rogério Feris and 1 more

Fine-grained action detection is an important task with numerous applications in robotics and human-computer interaction. Existing methods typically utilize a two-stage approach including extraction of local spatio-temporal features followed by temporal modeling to capture long-term dependencies. While most recent papers have focused on the latter (long-temporal modeling), here, we focus producing capable fine-grained motion more efficiently. We propose novel locally-consistent deformable...

10.1109/iccv.2019.00638 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Unsupervised Textual Grounding: Linking Words to Image Concepts

OPENALEX - Publications

Raymond A. Yeh N. Minh Alexander G. Schwing

Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress deep learning generally formulate the as supervised problem, selecting bounding box set of possible options. To train these net based approaches, access large-scale datasets required, however, constructing such dataset time-consuming expensive. Therefore, we develop completely unsupervised mechanism...

10.1109/cvpr.2018.00641 article EN 2018-06-01

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning

OPENALEX - Publications

Zhongzheng Ren Raymond A. Yeh Alexander G. Schwing

Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all examples are equally weighted. But not data equal. In this paper we study how different for every example. Manual tuning those weights -- as done in prior work is no longer possible. Instead, adjust via an algorithm based on influence function, measure model's dependency one training To make approach efficient, propose fast effective approximation function....

10.48550/arxiv.2007.01293 preprint EN other-oa arXiv (Cornell University) 2020-01-01

PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

OPENALEX - Publications

Iou-Jen Liu Raymond A. Yeh Alexander G. Schwing

Sample efficiency and scalability to a large number of agents are two important goals for multi-agent reinforcement learning systems. Recent works got us closer those goals, addressing non-stationarity the environment from single agent's perspective by utilizing deep net critic which depends on all observations actions. The input concatenates agent actions in user-specified order. However, since nets aren't permutation invariant, permuted changes output despite remaining identical. To avoid...

10.48550/arxiv.1911.00025 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Cooperative Exploration for Multi-Agent Deep Reinforcement Learning

OPENALEX - Publications

Iou-Jen Liu Unnat Jain Raymond A. Yeh Alexander G. Schwing

Exploration is critical for good results in deep reinforcement learning and has attracted much attention. However, existing multi-agent algorithms still use mostly noise-based techniques. Very recently, exploration methods that consider cooperation among multiple agents have been developed. suffer from a common challenge: struggle to identify states are worth exploring, hardly coordinate efforts toward those states. To address this shortcoming, paper, we propose cooperative (CMAE): share...

10.48550/arxiv.2107.11444 preprint EN other-oa arXiv (Cornell University) 2021-01-01

GeoCode: Interpretable Shape Programs

OPENALEX - Publications

Ofek Pearl Itai Lang HU Yu-hua Raymond A. Yeh Rana Hanocka

Abstract The task of crafting procedural programs capable generating structurally valid 3D shapes easily and intuitively remains an elusive goal in computer vision graphics. Within the graphics community, models has shifted to using node graph systems. They allow artist create complex animations through visual programming. Being a high‐level design tool, they made modelling more accessible. However, those graphs demands expertise training. We present GeoCode, novel framework designed extend...

10.1111/cgf.15276 article EN cc-by Computer Graphics Forum 2025-02-12

3D markerless tracking of speech movements with submillimeter accuracy

OPENALEX - Publications

A. E. Lovell James N. K. Liu Arielle Borovsky Raymond A. Yeh Kwang S. Kim

Speech movements are highly complex and require precise tuning of both spatial timing oral articulators to support intelligible communication. These properties also make measurement speech challenging, often requiring extensive physical sensors placed around the mouth face that not easily tolerated by certain populations such as young children. Recent progress in machine learning-based markerless facial landmark tracking technology demonstrated its potential provide lip without need for...

10.1101/2025.02.13.638009 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2025-02-16

Multi-concept Model Immunization through Differentiable Model Merging

OPENALEX - Publications

Amber Yijia Zheng Raymond A. Yeh

Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea make released models' weights difficult fine-tune on certain harmful applications, hence name "immunized". Recent work model focuses single-concept setting. However, in real-world situations, need be immunized against multiple concepts. To address this gap, we propose algorithm that, simultaneously, learns a single...

10.1609/aaai.v39i10.33145 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Tree Instance Segmentation with Temporal Contour Graph

OPENALEX - Publications

Adnan Firoze Cameron Wingren Raymond A. Yeh Bedřich Beneš Daniel G. Aliaga

We present a novel approach to perform instance segmentation and counting for densely packed self-similar trees using top-view RGB image sequence. propose solution that leverages pixel content, shape, self-occlusion. First, we an initial over-segmentation of the sequence aggregate structural characteristics into contour graph with temporal information incorporated. Second, convolutional network its inherent local messaging passing abilities, merge adjacent tree crown patches final set...

10.1109/cvpr52729.2023.00218 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Online real-time machining chatter sound detection using convolutional neural network by adopting expert knowledge

OPENALEX - Publications

Eunseob Kim Thang Bui Junyi Yuan S. Chandra Mouli Bruno Ribeiro and 3 more

10.1016/j.mfglet.2024.09.165 article EN Manufacturing Letters 2024-10-01

Image Restoration with Deep Generative Models

OPENALEX - Publications

Raymond A. Yeh Teck Yian Lim Chen Chen Alexander G. Schwing Mark Hasegawa–Johnson and 1 more

Many image restoration problems are ill-posed in nature, hence, beyond the input image, most existing methods rely on a carefully engineered prior, which enforces some local consistency recovered image. How tightly prior assumptions fulfilled has big impact resulting task performance. To obtain more flexibility, this work, we proposed to design data-driven manner. Instead of explicitly defining learn it using deep generative models. We demonstrate that learned can be applied many an unified...

10.1109/icassp.2018.8462317 article EN 2018-04-01

Video Frame Synthesis using Deep Voxel Flow

OPENALEX - Publications

Ziwei Liu Raymond A. Yeh Xiaoou Tang Yiming Liu Aseem Agarwala

We address the problem of synthesizing new video frames in an existing video, either in-between (interpolation), or subsequent to them (extrapolation). This is challenging because appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation challenging, while newer neural-network-based methods that hallucinate pixel values directly produce blurry results. combine advantages these two by training a deep network learns synthesize...

10.48550/arxiv.1702.02463 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Chirality Nets for Human Pose Regression

OPENALEX - Publications

Raymond A. Yeh Yuan-Ting Hu Alexander G. Schwing

We propose Chirality Nets, a family of deep nets that is equivariant to the "chirality transform," i.e., transformation create chiral pair. Through parameter sharing, odd and even symmetry, we prove variants standard building blocks satisfy equivariance property, including fully connected layers, convolutional batch-normalization, LSTM/GRU cells. The proposed layers lead more data efficient representation reduction in computation by exploiting symmetry. evaluate chirality on task human pose...

10.48550/arxiv.1911.00029 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs

OPENALEX - Publications

Md Ashiqur Rahman Robert Joseph George Mogab Elleithy Daniel Leibovici Zongyi Li and 7 more

Existing neural operator architectures face challenges when solving multiphysics problems with coupled partial differential equations (PDEs), due to complex geometries, interactions between physical variables, and the lack of large amounts high-resolution training data. To address these issues, we propose Codomain Attention Neural Operator (CoDA-NO), which tokenizes functions along codomain or channel space, enabling self-supervised learning pretraining multiple PDE systems. Specifically,...

10.48550/arxiv.2403.12553 preprint EN arXiv (Cornell University) 2024-03-19

Making Vision Transformers Truly Shift-Equivariant

OPENALEX - Publications

Renán A. Rojas-Gómez Teck-Yian Lim N. Minh Raymond A. Yeh

10.1109/cvpr52733.2024.00532 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16