NFDI4DS | UHH-SEMS - Publication Details

C. V. Jawahar

ORCID: 0000-0001-6767-7057

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5053112307

Research Areas

Advanced Image and Video Retrieval Techniques
Handwritten Text Recognition Techniques
Image Retrieval and Classification Techniques
Multimodal Machine Learning Applications
Natural Language Processing Techniques
Advanced Vision and Imaging
Video Analysis and Summarization
Domain Adaptation and Few-Shot Learning
Human Pose and Action Recognition
Robotics and Sensor-Based Localization
Advanced Neural Network Applications
Video Surveillance and Tracking Methods
Vehicle License Plate Recognition
Image Processing and 3D Reconstruction
Face recognition and analysis
Topic Modeling
Hand Gesture Recognition Systems
Music and Audio Processing
Algorithms and Data Compression
Face and Expression Recognition
Anomaly Detection Techniques and Applications
Image Processing Techniques and Applications
Speech and Audio Processing
Advanced Image Processing Techniques
Digital Media Forensic Detection

Indian Institute of Technology Hyderabad
2015-2024

International Institute of Information Technology, Hyderabad
2015-2024

International Institute of Information Technology
2004-2024

Indian Institute of Technology Delhi
2011-2024

Amrita Vishwa Vidyapeetham
2023

Indian Institute of Technology Mandi
2023

International Institute of Islamic Thought
2022

University of Bath
2021

Indian Institute of Technology Kanpur
2011-2019

Chinese University of Hong Kong
2017

Cats and dogs

OPENALEX - Publications

Omkar Parkhi Andrea Vedaldi A. Zisserman C. V. Jawahar

We investigate the fine grained object categorization problem of determining breed animal from an image. To this end we introduce a new annotated dataset pets covering 37 different breeds cats and dogs. The visual is very challenging as these animals, particularly cats, are deformable there can be quite subtle differences between breeds. make number contributions: first, model to classify pet automatically combines shape, captured by part detecting face, appearance, bag-of-words that...

10.1109/cvpr.2012.6248092 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

Blocks That Shout: Distinctive Parts for Scene Classification

OPENALEX - Publications

Mayank Juneja Andrea Vedaldi C. V. Jawahar Andrew Zisserman

The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also identify occurrences in images. In this paper, we propose a simple, efficient, effective method do so. We address problem by learning incrementally, starting from single occurrence with Exemplar SVM. manner, additional instances are discovered aligned reliably before being considered as training examples. entropy-rank curves means...

10.1109/cvpr.2013.124 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

Top-down and bottom-up cues for scene text recognition

OPENALEX - Publications

Anand Mishra Karteek Alahari C. V. Jawahar

Scene text recognition has gained significant attention from the computer vision community in recent years. Recognizing such is a challenging problem, even more so than of scanned documents. In this work, we focus on problem recognizing extracted street images. We present framework that exploits both bottom-up and top-down cues. The cues are derived individual character detections image. build Conditional Random Field model these to jointly strength interactions between them. impose obtained...

10.1109/cvpr.2012.6247990 preprint EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

Ego4D: Around the World in 3,000 Hours of Egocentric Video

OPENALEX - Publications

Kristen Grauman Andrew Westbury Eugene H. Byrne Zachary Chavis Antonino Furnari and 80 more

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of dailylife activity spanning hundreds scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations 9 different countries. The approach to collection is designed uphold rigorous privacy ethics standards, with consenting participants robust de-identification procedures where relevant. Ego4D dramatically expands the volume...

10.1109/cvpr52688.2022.01842 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments

OPENALEX - Publications

Girish Varma Anbumani Subramanian Anoop Namboodiri Manmohan Chandraker C. V. Jawahar

While several datasets for autonomous navigation have become available in recent years, they tended to focus on structured driving environments. This usually corresponds well-delineated infrastructure such as lanes, a small number of well-defined categories traffic participants, low variation object or background appearance and strong adherence rules. We propose DS, novel dataset road scene understanding unstructured environments where the above assumptions are largely not satisfied. It...

10.1109/wacv.2019.00190 article EN 2019-01-01

Multi-label Cross-Modal Retrieval

OPENALEX - Publications

Viresh Ranjan Nikhil Rasiwasia C. V. Jawahar

In this work, we address the problem of cross-modal retrieval in presence multi-label annotations. particular, introduce Canonical Correlation Analysis (ml-CCA), an extension CCA, for learning shared subspaces taking into account high level semantic information form Unlike ml-CCA does not rely on explicit pairing between modalities, instead it uses to establish correspondences. This results a discriminative subspace which is better suited tasks. We also present Fast ml-CCA, computationally...

10.1109/iccv.2015.466 article EN 2015-12-01

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

OPENALEX - Publications

Zheng Huang Kai Chen Jianhua He Xiang Bai Dìmosthenis Karatzas and 2 more

The ICDAR 2019 Challenge on "Scanned receipts OCR and key information extraction" (SROIE) covers important aspects related to the automated analysis of scanned receipts. SROIE tasks play a role in many document systems hold significant commercial potential. Although lot work has been published over years administrative analysis, community advanced relatively slowly, as most datasets have kept private. One contributions is offer first, standardized dataset 1000 whole receipt images...

10.1109/icdar.2019.00244 preprint EN 2019-09-01

Scene Text Visual Question Answering

OPENALEX - Publications

Ali Furkan Biten Rubèn Tito Andrés Mafla Lluís Gómez Marçal Rusiñol and 3 more

Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight importance of exploiting high-level in images as textual cues Visual Question Answering process. We use dataset define series tasks increasing difficulty for which reading scene context provided is necessary reason and generate appropriate answer. propose evaluation metric these account both reasoning...

10.1109/iccv.2019.00439 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

First Person Action Recognition Using Deep Learned Descriptors

OPENALEX - Publications

Suriya Singh Chetan Arora C. V. Jawahar

We focus on the problem of wearer's action recognition in first person a.k.a. egocentric videos. This is more challenging than third activity due to unavailability pose and sharp movements videos caused by natural head motion wearer. Carefully crafted features based hands objects cues for have been shown be successful limited targeted datasets. propose convolutional neural networks (CNNs) end learning classification actions. The proposed network makes use capturing hand pose, saliency map....

10.1109/cvpr.2016.287 article EN 2016-06-01

Pan-Renal Cell Carcinoma classification and survival prediction from histopathology images using deep learning

OPENALEX - Publications

Sairam Tabibu P. K. Vinod C. V. Jawahar

Abstract Histopathological images contain morphological markers of disease progression that have diagnostic and predictive values. In this study, we demonstrate how deep learning framework can be used for an automatic classification Renal Cell Carcinoma (RCC) subtypes, identification features predict survival outcome from digital histopathological images. Convolutional neural networks (CNN’s) trained on whole-slide distinguish clear cell chromophobe RCC normal tissue with a accuracy 93.39%...

10.1038/s41598-019-46718-3 article EN cc-by Scientific Reports 2019-07-19

DocVQA: A Dataset for VQA on Document Images

OPENALEX - Publications

Minesh Mathew Dìmosthenis Karatzas C. V. Jawahar

We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The consists of 50,000 questions defined 12,000+ images. Detailed analysis the in comparison with similar datasets VQA and reading comprehension is presented. report several baseline results by adopting existing models. Although models perform reasonably well certain types questions, there large performance gap compared to human (94.36% accuracy). need improve specifically where understanding...

10.1109/wacv48630.2021.00225 article EN 2021-01-01

Improved Road Connectivity by Joint Learning of Orientation and Segmentation

OPENALEX - Publications

Anil Batra Suriya Singh Guan Pang Saikat Basu C. V. Jawahar and 1 more

Road network extraction from satellite images often produce fragmented road segments leading to maps unfit for real applications. Pixel-wise classification fails predict topologically correct and connected masks due the absence of connectivity supervision difficulty in enforcing topological constraints. In this paper, we propose a task called Orientation Learning, motivated by human behavior annotating roads tracing it at specific orientation. We also develop stacked multi-branch...

10.1109/cvpr.2019.01063 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Choosing Linguistics over Vision to Describe Images

OPENALEX - Publications

Ankush Gupta Yashaswi Verma C. V. Jawahar

In this paper, we address the problem of automatically generating human-like descriptions for unseen images, given a collection images and their corresponding human-generated descriptions. Previous attempts task mostly rely on visual clues corpus statistics, but do not take much advantage semantic information inherent in available image Here, present generic method which benefits from all these three sources (i.e. clues, statistics descriptions) simultaneously, is capable constructing novel...

10.1609/aaai.v26i1.8205 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-09-20

Depth really Matters: Improving Visual Salient Region Detection with Depth

OPENALEX - Publications

Karthik Desingh Madhava Krishna K Deepu Rajan C. V. Jawahar

Depth information has been shown to affect identification of visually salient regions in images. In this paper, we investigate the role depth saliency detection presence (i) competing saliencies due appearance, (ii) depth-induced blur and (iii) centre-bias. Having established through experiments that continues be a significant contributor these cues, propose 3D-saliency formulation takes into account structural features objects an indoor setting identify at levels. Computed is used...

10.5244/c.27.98 article EN 2013-01-01

Improving CNN-RNN Hybrid Networks for Handwriting Recognition

OPENALEX - Publications

Kartik Dutta Praveen Krishnan Minesh Mathew C. V. Jawahar

The success of deep learning based models have centered around recent architectures and the availability large scale annotated data. In this work, we explore these two factors systematically for improving handwritten recognition scanned off-line document images. We propose a modified CNN-RNN hybrid architecture with major focus on effective training using: (i) efficient initialization network using synthetic data pretraining, (ii) image normalization slant correction (iii) domain specific...

10.1109/icfhr-2018.2018.00023 article EN 2018-08-01

Blind Authentication: A Secure Crypto-Biometric Verification Protocol

OPENALEX - Publications

Maneesh Upmanyu Anoop Namboodiri Kannan Srinathan C. V. Jawahar

Concerns on widespread use of biometric authentication systems are primarily centered around template security, revocability, and privacy. The cryptographic primitives to bolster the process can alleviate some these concerns as shown by cryptosystems. In this paper, we propose a <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">provably secure</i> xmlns:xlink="http://www.w3.org/1999/xlink">blind</i> protocol, which addresses user's privacy,...

10.1109/tifs.2010.2043188 article EN IEEE Transactions on Information Forensics and Security 2010-03-02

The truth about cats and dogs

OPENALEX - Publications

Omkar Parkhi Andrea Vedaldi C. V. Jawahar Andrew Zisserman

Template-based object detectors such as the deformable parts model of Felzenszwalb et al. [11] achieve state-of-the-art performance for a variety categories, but are still outperformed by simpler bag-of-words models highly flexible objects cats and dogs. In these cases we propose to use template-based detect distinctive part class, followed detecting rest via segmentation on image specific information learnt from that part. This approach is motivated two observations: (i) many classes...

10.1109/iccv.2011.6126398 article EN International Conference on Computer Vision 2011-11-01

Universal Semi-Supervised Semantic Segmentation

OPENALEX - Publications

Tarun Kalluri Girish Varma Manmohan Chandraker C. V. Jawahar

In recent years, the need for semantic segmentation has arisen across several different applications and environments. However, expense redundancy of annotation often limits quantity labels available training in any domain, while deployment is easier if a single model works well domains. this paper, we pose novel problem universal semi-supervised propose solution framework, to meet dual needs lower costs. contrast counterpoints such as fine tuning, joint or unsupervised domain adaptation,...

10.1109/iccv.2019.00536 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Word Image Retrieval Using Bag of Visual Words

OPENALEX - Publications

Ravi Shekhar C. V. Jawahar

This paper presents a Bag of Visual Words (BoVW) based approach to retrieve similar word images from large database, efficiently and accurately. We show that text retrieval system can be adapted build image solution. helps in achieving scalability. demonstrate the method on more than 1 Million with sub-second time. validate four Indian languages, report mean average precision 0.75. represent as histogram visual words present image. are quantized representation local regions, for this work,...

10.1109/das.2012.96 article EN 2012-03-01

Whole is Greater than Sum of Parts: Recognizing Scene Text Words

OPENALEX - Publications

Vibhor Goel Anand Mishra Karteek Alahari C. V. Jawahar

Recognizing text in images taken the wild is a challenging problem that has received great attention recent years. Previous methods addressed this by first detecting individual characters, and then forming them into words. Such approaches often suffer from weak character detections, due to large intra-class variations, even more so than characters scanned documents. We take different view of present holistic word recognition framework. In this, we represent scene image synthetic generated...

10.1109/icdar.2013.87 preprint EN 2013-08-01

Indian Movie Face Database: A benchmark for face recognition under wide variations

OPENALEX - Publications

Shankar Setty Moula Husain M. Parisa Beham Jyothi Gudavalli Menaka Kandasamy and 7 more

Recognizing human faces in the wild is emerging as a critically important, and technically challenging computer vision problem. With few notable exceptions, most previous works last several decades have focused on recognizing captured laboratory setting. However, with introduction of databases such LFW Pubfigs, face recognition community gradually shifting its focus much more unconstrained settings. Since introduction, verification benchmark getting lot attention various researchers...

10.1109/ncvpripg.2013.6776225 article EN 2013-12-01

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

OPENALEX - Publications

Rui Zhang Mingkun Yang Xiang Bai Baoguang Shi Dìmosthenis Karatzas and 10 more

Chinese scene text reading is one of the most challenging problems in computer vision and has attracted great interest. Different from English text, more than 6000 commonly used characters can be arranged various layouts with numerous fonts. The signboards street view are a good choice for images since they have different backgrounds, fonts layouts. We organized competition called ICDAR2019-ReCTS, which mainly focuses on signboard. This report presents final results competition. A...

10.1109/icdar.2019.00253 preprint EN 2019-09-01

Word Spotting and Recognition Using Deep Embedding

OPENALEX - Publications

Praveen Krishnan Kartik Dutta C. V. Jawahar

Deep convolutional features for word images and textual embedding schemes have shown great success in spotting. In this work, we follow these motivations to propose an End2End framework which jointly learns both the text image embeddings using state of art deep architectures. The three major contributions work are: (i) scheme learn a common representation its labels, (ii) building descriptor demonstrating utility as off-the-shelf spotting, (iii) use synthetic data complementary modality...

10.1109/das.2018.70 article EN 2018-04-01

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

OPENALEX - Publications

K R Prajwal Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar

Humans involuntarily tend to infer parts of the conversation from lip movements when speech is absent or corrupted by external noise. In this work, we explore task synthesis, i.e., learning generate natural given only a speaker. Acknowledging importance contextual and speaker-specific cues for accurate lip-reading, take different path existing works. We focus on sequences mappings individual speakers in unconstrained, large vocabulary settings. To end, collect release large-scale benchmark...

10.1109/cvpr42600.2020.01381 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Coming Soon ...