- Generative Adversarial Networks and Image Synthesis
- Human Pose and Action Recognition
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Domain Adaptation and Few-Shot Learning
- Reinforcement Learning in Robotics
- Computer Graphics and Visualization Techniques
- 3D Shape Modeling and Analysis
- Natural Language Processing Techniques
- Video Analysis and Summarization
- Machine Learning and Data Classification
- Topic Modeling
- Speech and Audio Processing
- Anomaly Detection Techniques and Applications
- Adversarial Robustness in Machine Learning
- Neural Networks and Applications
- Speech Recognition and Synthesis
- Remote Sensing and LiDAR Applications
- Advanced Image and Video Retrieval Techniques
- Evolutionary Algorithms and Applications
- Model Reduction and Neural Networks
- Music and Audio Processing
- Analytical Chemistry and Chromatography
Purdue University West Lafayette
2023-2025
Toyota Technological Institute at Chicago
2022
University of Illinois Urbana-Champaign
2016-2021
International University of the Caribbean
2019
Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only single generally produce unsatisfactory results due lack of high level context. In this paper, we propose novel method for semantic inpainting, generates content by conditioning Given trained generative model, search closest encoding corrupted in latent manifold using our context and prior losses. This then...
We address the problem of synthesizing new video frames in an existing video, either in-between (interpolation), or subsequent to them (extrapolation). This is challenging because appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation challenging, while newer neural-network-based methods that hallucinate pixel values directly produce blurry results. combine advantages these two by training a deep network learns synthesize...
A diffusion model learns to predict a vector field of gradients. We propose apply chain rule on the learned gradients, and back-propagate score through Jacobian differentiable renderer, which we instantiate be voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into 3D score, re-purposes pretrained for data generation. identify technical challenge distribution mismatch that arises in this application, novel estimation mechanism resolve it. run our algorithm...
In this paper, we propose a new generative model for multi-agent trajectory data, focusing on the case of multi-player sports games. Our leverages graph neural networks (GNNs) and variational recurrent (VRNNs) to achieve permutation equivariant suitable sports. On two challenging datasets (basketball soccer), show that are able produce more accurate forecasts than previous methods. We assess accuracy using various metrics, such as log-likelihood "best N" loss, based N different samples...
Audio super-resolution (a.k.a. bandwidth extension) is the challenging task of increasing temporal resolution audio signals. Recent deep networks approaches achieved promising results by modeling as a regression problem in either time or frequency domain. In this paper, we introduced Time-Frequency Network (TFNet), network that utilizes supervision both and We proposed novel model architecture which allows two domains to be jointly optimized. Results demonstrate our method outperforms...
High-level manipulation of facial expressions in images --- such as changing a smile to neutral expression is challenging because changes are highly non-linear, and vary depending on the appearance face. We present fully automatic approach editing faces that combines advantages flow-based face with more recent generative capabilities Variational Autoencoders (VAEs). During training, our model learns encode flow from one another over low-dimensional latent space. At test time, can be done...
Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the as selection from a set of bounding box proposals obtained deep net based systems. In this work, we demonstrate that can cast problem textual into unified framework permits efficient search over all possible boxes. Hence, method able to consider significantly more doesn't rely on successful first stage hypothesizing proposals....
Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only single generally produce unsatisfactory results due lack of high level context. In this paper, we propose novel method for semantic inpainting, generates content by conditioning Given trained generative model, search closest encoding corrupted in latent manifold using our context and prior losses. This then...
Fine-grained action detection is an important task with numerous applications in robotics and human-computer interaction. Existing methods typically utilize a two-stage approach including extraction of local spatio-temporal features followed by temporal modeling to capture long-term dependencies. While most recent papers have focused on the latter (long-temporal modeling), here, we focus producing capable fine-grained motion more efficiently. We propose novel locally-consistent deformable...
Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress deep learning generally formulate the as supervised problem, selecting bounding box set of possible options. To train these net based approaches, access large-scale datasets required, however, constructing such dataset time-consuming expensive. Therefore, we develop completely unsupervised mechanism...
Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all examples are equally weighted. But not data equal. In this paper we study how different for every example. Manual tuning those weights -- as done in prior work is no longer possible. Instead, adjust via an algorithm based on influence function, measure model's dependency one training To make approach efficient, propose fast effective approximation function....
Sample efficiency and scalability to a large number of agents are two important goals for multi-agent reinforcement learning systems. Recent works got us closer those goals, addressing non-stationarity the environment from single agent's perspective by utilizing deep net critic which depends on all observations actions. The input concatenates agent actions in user-specified order. However, since nets aren't permutation invariant, permuted changes output despite remaining identical. To avoid...
Exploration is critical for good results in deep reinforcement learning and has attracted much attention. However, existing multi-agent algorithms still use mostly noise-based techniques. Very recently, exploration methods that consider cooperation among multiple agents have been developed. suffer from a common challenge: struggle to identify states are worth exploring, hardly coordinate efforts toward those states. To address this shortcoming, paper, we propose cooperative (CMAE): share...
Abstract The task of crafting procedural programs capable generating structurally valid 3D shapes easily and intuitively remains an elusive goal in computer vision graphics. Within the graphics community, models has shifted to using node graph systems. They allow artist create complex animations through visual programming. Being a high‐level design tool, they made modelling more accessible. However, those graphs demands expertise training. We present GeoCode, novel framework designed extend...
Speech movements are highly complex and require precise tuning of both spatial timing oral articulators to support intelligible communication. These properties also make measurement speech challenging, often requiring extensive physical sensors placed around the mouth face that not easily tolerated by certain populations such as young children. Recent progress in machine learning-based markerless facial landmark tracking technology demonstrated its potential provide lip without need for...
Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea make released models' weights difficult fine-tune on certain harmful applications, hence name "immunized". Recent work model focuses single-concept setting. However, in real-world situations, need be immunized against multiple concepts. To address this gap, we propose algorithm that, simultaneously, learns a single...
We present a novel approach to perform instance segmentation and counting for densely packed self-similar trees using top-view RGB image sequence. propose solution that leverages pixel content, shape, self-occlusion. First, we an initial over-segmentation of the sequence aggregate structural characteristics into contour graph with temporal information incorporated. Second, convolutional network its inherent local messaging passing abilities, merge adjacent tree crown patches final set...
Many image restoration problems are ill-posed in nature, hence, beyond the input image, most existing methods rely on a carefully engineered prior, which enforces some local consistency recovered image. How tightly prior assumptions fulfilled has big impact resulting task performance. To obtain more flexibility, this work, we proposed to design data-driven manner. Instead of explicitly defining learn it using deep generative models. We demonstrate that learned can be applied many an unified...
We address the problem of synthesizing new video frames in an existing video, either in-between (interpolation), or subsequent to them (extrapolation). This is challenging because appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation challenging, while newer neural-network-based methods that hallucinate pixel values directly produce blurry results. combine advantages these two by training a deep network learns synthesize...
We propose Chirality Nets, a family of deep nets that is equivariant to the "chirality transform," i.e., transformation create chiral pair. Through parameter sharing, odd and even symmetry, we prove variants standard building blocks satisfy equivariance property, including fully connected layers, convolutional batch-normalization, LSTM/GRU cells. The proposed layers lead more data efficient representation reduction in computation by exploiting symmetry. evaluate chirality on task human pose...
Existing neural operator architectures face challenges when solving multiphysics problems with coupled partial differential equations (PDEs), due to complex geometries, interactions between physical variables, and the lack of large amounts high-resolution training data. To address these issues, we propose Codomain Attention Neural Operator (CoDA-NO), which tokenizes functions along codomain or channel space, enabling self-supervised learning pretraining multiple PDE systems. Specifically,...