NFDI4DS | UHH-SEMS - Publication Details

Krishna Kumar Singh

ORCID: 0000-0002-8066-6835

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101960524

Research Areas

Domain Adaptation and Few-Shot Learning
Generative Adversarial Networks and Image Synthesis
Multimodal Machine Learning Applications
Advanced Neural Network Applications
Advanced Image and Video Retrieval Techniques
Video Analysis and Summarization
Computer Graphics and Visualization Techniques
Multimedia Communication and Technology
Digital Media Forensic Detection
Subtitles and Audiovisual Media
Machine Learning and ELM
Machine Learning and Data Classification
Image Processing and 3D Reconstruction
Artificial Intelligence in Healthcare
Imbalanced Data Classification Techniques
ECG Monitoring and Analysis
Private Equity and Venture Capital
Digital Holography and Microscopy
Model Reduction and Neural Networks
Advanced Optical Imaging Technologies
Advanced Vision and Imaging
Photorefractive and Nonlinear Optics
AI in cancer detection
Face recognition and analysis
Video Surveillance and Tracking Methods

Adobe Systems (United States)
2023-2024

National Institute of Technology Andhra Pradesh
2023

Rajiv Gandhi University of Knowledge Technologies
2023

University of California, Davis
2016-2020

University of California System
2016

Indian Institute of Technology Madras
1992

Zero-shot Image-to-Image Translation

OPENALEX - Publications

Gaurav Parmar Krishna Kumar Singh Richard Zhang Yijun Li Jingwan Lu and 1 more

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse, high-quality images. However, directly applying these for real image editing remains challenging two reasons. First, it is hard users craft a perfect text prompt depicting every visual detail in the input image. Second, while existing can introduce desirable changes certain regions, they often dramatically alter content and unexpected unwanted regions. In this work, we pix2pix-zero, an...

10.1145/3588432.3591513 article EN cc-by 2023-07-19

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

OPENALEX - Publications

Krishna Kumar Singh Utkarsh Ojha Yong Jae Lee

We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and appearance to hierarchically generate images of fine-grained categories. To disentangle factors without supervision, our key idea is use information theory associate each factor latent code, condition relationships between codes in specific way induce desired hierarchy. Through extensive experiments, we show that FineGAN achieves disentanglement realistic diverse belonging classes...

10.1109/cvpr.2019.00665 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Don’t Judge an Object by Its Context: Learning to Overcome Contextual Bias

OPENALEX - Publications

Krishna Kumar Singh Dhruv Mahajan Kristen Grauman Yong Jae Lee Matt Feiszli and 1 more

Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on risks a model's generalizability, especially when typical co-occurrence patterns are absent. This work focuses addressing such contextual biases the robustness of learnt feature representations. Our goal is accurately recognize category in absence its context, without compromising performance it co-occurs with context. key idea decorrelate...

10.1109/cvpr42600.2020.01108 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

OPENALEX - Publications

Yuheng Li Krishna Kumar Singh Utkarsh Ojha Yong Jae Lee

We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, texture from real images with minimal supervision, for mix-and-match image generation. build upon FineGAN, an unconditional model, learn the desired disentanglement generator, leverage adversarial joint image-code distribution matching latent factor encoders. MixNMatch requires bounding boxes during training but no other supervision. Through extensive experiments, we...

10.1109/cvpr42600.2020.00806 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection

OPENALEX - Publications

Krishna Kumar Singh Yong Jae Lee

We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised detection. Existing detection approaches use off-the-shelf proposal methods like edge boxes or selective search candidate boxes. These provide recall but at the expense thousands noisy proposals. Thus, entire burden finding few relevant regions is left ensuing mining step. To mitigate this issue, we focus instead on improving initial Since cannot rely localization annotations, turn video...

10.1109/cvpr.2019.00964 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

OPENALEX - Publications

Cusuh Ham James Hays Jingwan Lu Krishna Kumar Singh Zhifei Zhang and 1 more

We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous works rely on training networks from scratch or fine-tuning networks, both of which are computationally expensive large, state-of-the-art Our method uses but does not require any updates to the network's parameters. MCM is a small module trained modulate predictions during sampling 2D modalities (e.g., semantic segmentation maps, sketches) that were unseen...

10.1145/3588432.3591549 article EN cc-by 2023-07-19

Discovering and Mitigating Biases in CLIP-based Image Editing

OPENALEX - Publications

Md Mehrab Tanjim Krishna Kumar Singh Kushal Kafle Ritwik Sinha Garrison W. Cottrell

In recent years, the use of CLIP (Contrastive Language-Image Pre-Training) has become increasingly popular in a wide range downstream applications, including zero-shot image classification and text-to-image synthesis. Despite being trained on vast dataset, model been found to exhibit biases against certain protected attributes, such as gender race. While previous research focused impact classification, there little investigation into their effects CLIP-based generative tasks. this paper, we...

10.1109/wacv57701.2024.00296 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Consistent Multimodal Generation via A Unified GAN Framework

OPENALEX - Publications

Zhu Zhen Yijun Li Weijie Lyu Krishna Kumar Singh Zhixin Shu and 2 more

We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is produce outputs that are realistic, also consistent each other. Our solution builds on the StyleGAN3 architecture, shared backbone modality-specific branches in last layers of synthesis network, we propose per-modality fidelity discriminators cross-modality consistency discriminator. In experiments Stanford2D3D dataset, demonstrate realistic...

10.1109/wacv57701.2024.00497 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data

OPENALEX - Publications

Utkarsh Ojha Krishna Kumar Singh Cho‐Jui Hsieh Yong‐Jae Lee

We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data. first investigate the issues surrounding assumptions about uniformity made by InfoGAN, and demonstrate its ineffectiveness properly imbalanced Our key idea is make discovery of discrete latent factor variation invariant identity-preserving transformations real images, use as signal learn appropriate distribution representing identity. Experiments...

10.48550/arxiv.1910.01112 preprint EN other-oa arXiv (Cornell University) 2019-01-01

ActAnywhere: Subject-Aware Video Background Generation

OPENALEX - Publications

Boxiao Pan Xu Zhan Chun-Hao P. Huang Krishna Kumar Singh Yang Zhou and 2 more

Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing aligns with appearance of subject, while also complies artist's creative intention. We introduce ActAnywhere, a generative model automates this process which traditionally requires tedious manual efforts. Our leverages power large-scale diffusion models, specifically tailored task. ActAnywhere takes sequence...

10.48550/arxiv.2401.10822 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Separate-and-Enhance: Compositional Finetuning for Text-to-Image Diffusion Models

OPENALEX - Publications

Zhipeng Bao Yijun Li Krishna Kumar Singh Yu-Xiong Wang Martial Hebert

Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. In this work, we first show fundamental reasons such misalignment identifying issues related to low attention activation and mask overlaps. Then propose a finetuning framework two novel objectives, Separate loss Enhance loss, that reduce object...

10.1145/3641519.3657527 article EN 2024-07-12

GroupDiff: Diffusion-based Group Portrait Editing

OPENALEX - Publications

Yuming Jiang Nanxuan Zhao Qing Liu Krishna Kumar Singh Shuai Yang and 2 more

Group portrait editing is highly desirable since users constantly want to add a person, delete or manipulate existing persons. It also challenging due the intricate dynamics of human interactions and diverse gestures. In this work, we present GroupDiff, pioneering effort tackle group photo with three dedicated contributions: 1) Data Engine: Since there no labeled data for editing, create engine generate paired training. The training covers needs editing. 2) Appearance Preservation: To keep...

10.48550/arxiv.2409.14379 preprint EN arXiv (Cornell University) 2024-09-22

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

OPENALEX - Publications

Yuheng Li Haotian Liu Mu Cai Yijun Li Eli Shechtman and 3 more

In this paper, we introduce a model designed to improve the prediction of image-text alignment, targeting challenge compositional understanding in current visual-language models. Our approach focuses on generating high-quality training datasets for alignment task by producing mixed-type negative captions derived from positive ones. Critically, address distribution imbalance between and ensure that does not depend solely textual information but also considers associated images predicting...

10.48550/arxiv.2410.00905 preprint EN arXiv (Cornell University) 2024-10-01

Generative Portrait Shadow Removal

OPENALEX - Publications

Jae Shin Yoon Zhixin Shu Mengwei Ren Cecilia Zhang Yannick Hold-Geoffroy and 2 more

We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of by predicting its appearance under disturbing shadows and highlights. Portrait is highly ill-posed problem where multiple plausible solutions be found based on single image. For example, disentangling complex environmental lighting from original skin color non-trivial problem. While existing works have solved this residuals propagate local distribution, such methods are often incomplete lead...

10.1145/3687903 article EN ACM Transactions on Graphics 2024-11-19

Coming Soon ...