Krishna Kumar Singh

ORCID: 0000-0002-8066-6835
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Domain Adaptation and Few-Shot Learning
  • Generative Adversarial Networks and Image Synthesis
  • Multimodal Machine Learning Applications
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Video Analysis and Summarization
  • Computer Graphics and Visualization Techniques
  • Multimedia Communication and Technology
  • Digital Media Forensic Detection
  • Subtitles and Audiovisual Media
  • Machine Learning and ELM
  • Machine Learning and Data Classification
  • Image Processing and 3D Reconstruction
  • Artificial Intelligence in Healthcare
  • Imbalanced Data Classification Techniques
  • ECG Monitoring and Analysis
  • Private Equity and Venture Capital
  • Digital Holography and Microscopy
  • Model Reduction and Neural Networks
  • Advanced Optical Imaging Technologies
  • Advanced Vision and Imaging
  • Photorefractive and Nonlinear Optics
  • AI in cancer detection
  • Face recognition and analysis
  • Video Surveillance and Tracking Methods

Adobe Systems (United States)
2023-2024

National Institute of Technology Andhra Pradesh
2023

Rajiv Gandhi University of Knowledge Technologies
2023

University of California, Davis
2016-2020

University of California System
2016

Indian Institute of Technology Madras
1992

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse, high-quality images. However, directly applying these for real image editing remains challenging two reasons. First, it is hard users craft a perfect text prompt depicting every visual detail in the input image. Second, while existing can introduce desirable changes certain regions, they often dramatically alter content and unexpected unwanted regions. In this work, we pix2pix-zero, an...

10.1145/3588432.3591513 article EN cc-by 2023-07-19

We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and appearance to hierarchically generate images of fine-grained categories. To disentangle factors without supervision, our key idea is use information theory associate each factor latent code, condition relationships between codes in specific way induce desired hierarchy. Through extensive experiments, we show that FineGAN achieves disentanglement realistic diverse belonging classes...

10.1109/cvpr.2019.00665 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on risks a model's generalizability, especially when typical co-occurrence patterns are absent. This work focuses addressing such contextual biases the robustness of learnt feature representations. Our goal is accurately recognize category in absence its context, without compromising performance it co-occurs with context. key idea decorrelate...

10.1109/cvpr42600.2020.01108 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, texture from real images with minimal supervision, for mix-and-match image generation. build upon FineGAN, an unconditional model, learn the desired disentanglement generator, leverage adversarial joint image-code distribution matching latent factor encoders. MixNMatch requires bounding boxes during training but no other supervision. Through extensive experiments, we...

10.1109/cvpr42600.2020.00806 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised detection. Existing detection approaches use off-the-shelf proposal methods like edge boxes or selective search candidate boxes. These provide recall but at the expense thousands noisy proposals. Thus, entire burden finding few relevant regions is left ensuing mining step. To mitigate this issue, we focus instead on improving initial Since cannot rely localization annotations, turn video...

10.1109/cvpr.2019.00964 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous works rely on training networks from scratch or fine-tuning networks, both of which are computationally expensive large, state-of-the-art Our method uses but does not require any updates to the network's parameters. MCM is a small module trained modulate predictions during sampling 2D modalities (e.g., semantic segmentation maps, sketches) that were unseen...

10.1145/3588432.3591549 article EN cc-by 2023-07-19

In recent years, the use of CLIP (Contrastive Language-Image Pre-Training) has become increasingly popular in a wide range downstream applications, including zero-shot image classification and text-to-image synthesis. Despite being trained on vast dataset, model been found to exhibit biases against certain protected attributes, such as gender race. While previous research focused impact classification, there little investigation into their effects CLIP-based generative tasks. this paper, we...

10.1109/wacv57701.2024.00296 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is produce outputs that are realistic, also consistent each other. Our solution builds on the StyleGAN3 architecture, shared backbone modality-specific branches in last layers of synthesis network, we propose per-modality fidelity discriminators cross-modality consistency discriminator. In experiments Stanford2D3D dataset, demonstrate realistic...

10.1109/wacv57701.2024.00497 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data. first investigate the issues surrounding assumptions about uniformity made by InfoGAN, and demonstrate its ineffectiveness properly imbalanced Our key idea is make discovery of discrete latent factor variation invariant identity-preserving transformations real images, use as signal learn appropriate distribution representing identity. Experiments...

10.48550/arxiv.1910.01112 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing aligns with appearance of subject, while also complies artist's creative intention. We introduce ActAnywhere, a generative model automates this process which traditionally requires tedious manual efforts. Our leverages power large-scale diffusion models, specifically tailored task. ActAnywhere takes sequence...

10.48550/arxiv.2401.10822 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. In this work, we first show fundamental reasons such misalignment identifying issues related to low attention activation and mask overlaps. Then propose a finetuning framework two novel objectives, Separate loss Enhance loss, that reduce object...

10.1145/3641519.3657527 article EN 2024-07-12

Group portrait editing is highly desirable since users constantly want to add a person, delete or manipulate existing persons. It also challenging due the intricate dynamics of human interactions and diverse gestures. In this work, we present GroupDiff, pioneering effort tackle group photo with three dedicated contributions: 1) Data Engine: Since there no labeled data for editing, create engine generate paired training. The training covers needs editing. 2) Appearance Preservation: To keep...

10.48550/arxiv.2409.14379 preprint EN arXiv (Cornell University) 2024-09-22

In this paper, we introduce a model designed to improve the prediction of image-text alignment, targeting challenge compositional understanding in current visual-language models. Our approach focuses on generating high-quality training datasets for alignment task by producing mixed-type negative captions derived from positive ones. Critically, address distribution imbalance between and ensure that does not depend solely textual information but also considers associated images predicting...

10.48550/arxiv.2410.00905 preprint EN arXiv (Cornell University) 2024-10-01

We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of by predicting its appearance under disturbing shadows and highlights. Portrait is highly ill-posed problem where multiple plausible solutions be found based on single image. For example, disentangling complex environmental lighting from original skin color non-trivial problem. While existing works have solved this residuals propagate local distribution, such methods are often incomplete lead...

10.1145/3687903 article EN ACM Transactions on Graphics 2024-11-19
Coming Soon ...