NFDI4DS | UHH-SEMS - Publication Details

Anand Mishra

ORCID: 0000-0002-7806-2557

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5075621328

Research Areas

Advanced Image and Video Retrieval Techniques
Multimodal Machine Learning Applications
Handwritten Text Recognition Techniques
Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Visual Attention and Saliency Detection
Face recognition and analysis
Topic Modeling
Image Retrieval and Classification Techniques
Vehicle License Plate Recognition
Natural Language Processing Techniques
Human Pose and Action Recognition
Video Analysis and Summarization
Image and Signal Denoising Methods
Digital Media Forensic Detection
3D Shape Modeling and Analysis
Generative Adversarial Networks and Image Synthesis
Algorithms and Data Compression
Face and Expression Recognition
Advanced Image Processing Techniques
Management and Marketing Education
Information Retrieval and Data Mining
Hand Gesture Recognition Systems
Intelligent Tutoring Systems and Adaptive Learning
Image Enhancement Techniques

Indian Institute of Technology Jodhpur
2019-2024

Office for Students
2023

Sinhgad Dental College and Hospital
2023

Indian Institute of Science Bangalore
2018-2019

Indian Institute of Technology Hyderabad
2011-2017

International Institute of Information Technology, Hyderabad
2016

Center for Visual Communication (United States)
2012

Top-down and bottom-up cues for scene text recognition

OPENALEX - Publications

Anand Mishra Karteek Alahari C. V. Jawahar

Scene text recognition has gained significant attention from the computer vision community in recent years. Recognizing such is a challenging problem, even more so than of scanned documents. In this work, we focus on problem recognizing extracted street images. We present framework that exploits both bottom-up and top-down cues. The cues are derived individual character detections image. build Conditional Random Field model these to jointly strength interactions between them. impose obtained...

10.1109/cvpr.2012.6247990 preprint EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

OCR-VQA: Visual Question Answering by Reading Text in Images

OPENALEX - Publications

Anand Mishra Shashank Shekhar Ajeet Kumar Singh Anirban Chakraborty

The problem of answering questions about an image is popularly known as visual question (or VQA in short). It a well-established computer vision. However, none the methods currently utilize text often present image. These "texts images" provide additional useful cues and facilitate better understanding content. In this paper, we introduce novel task by reading images, i.e., optical character recognition or OCR. We refer to OCR-VQA. To systematic way studying new problem, large-scale dataset,...

10.1109/icdar.2019.00156 article EN 2019-09-01

KVQA: Knowledge-Aware Visual Question Answering

OPENALEX - Publications

Sanket Shah Anand Mishra Naganand Yadati Partha Talukdar

Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In conventional VQA, one may ask questions about image which can be answered purely based on its content. For example, given with people in it, a typical VQA question inquire the number of image. More recently, there is growing interest answering require commonsense knowledge involving common nouns (e.g., cats, dogs, microphones) present...

10.1609/aaai.v33i01.33018876 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Whole is Greater than Sum of Parts: Recognizing Scene Text Words

OPENALEX - Publications

Vibhor Goel Anand Mishra Karteek Alahari C. V. Jawahar

Recognizing text in images taken the wild is a challenging problem that has received great attention recent years. Previous methods addressed this by first detecting individual characters, and then forming them into words. Such approaches often suffer from weak character detections, due to large intra-class variations, even more so than characters scanned documents. We take different view of present holistic word recognition framework. In this, we represent scene image synthetic generated...

10.1109/icdar.2013.87 preprint EN 2013-08-01

An MRF Model for Binarization of Natural Scene Text

OPENALEX - Publications

Anand Mishra Karteek Alahari C. V. Jawahar

Inspired by the success of MRF models for solving object segmentation problems, we formulate binarization problem in this framework. We represent pixels a document image as random variables an MRF, and introduce new energy (or cost) function on these variables. Each variable takes foreground or background label, quality labelling) is determined value function. minimize function, i.e. find optimal binarization, using iterative graph cut scheme. Our model robust to variations colours use...

10.1109/icdar.2011.12 article EN International Conference on Document Analysis and Recognition 2011-09-01

Image Retrieval Using Textual Cues

OPENALEX - Publications

Anand Mishra Karteek Alahari C. V. Jawahar

We present an approach for the text-to-image retrieval problem based on textual content in images. Given recent developments understanding text images, appealing to address this is localize and recognize text, then query database, as a problem. show that such approach, despite being state-of-the-art methods, insufficient, propose method, where we do not rely exact localization recognition pipeline. take query-driven search find approximate locations of characters query, impose spatial...

10.1109/iccv.2013.378 preprint EN 2013-12-01

From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason

OPENALEX - Publications

Ajeet Kumar Singh Anand Mishra Shashank Shekhar Anirban Chakraborty

Text present in images are not merely strings, they provide useful cues about the image. Despite their utility better image understanding, scene texts used traditional visual question answering (VQA) models. In this work, we a VQA model which can read and perform reasoning on knowledge graph to arrive at an accurate answer. Our proposed has three mutually interacting modules: i. proposal module get word content proposals from image, ii. fusion fuse these proposals, base mine relevant facts,...

10.1109/iccv.2019.00470 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Towards Making Flowchart Images Machine Interpretable

OPENALEX - Publications

Shreya Shukla Prajwal Gatti Yogesh Kumar Vikash Yadav Anand Mishra

Computer programming textbooks and software documentations often contain flowcharts to illustrate the flow of an algorithm or procedure. Modern OCR engines tag these as graphics ignore them in further processing. In this paper, we work towards making flowchart images machine-interpretable by converting executable Python codes. To end, inspired recent success natural language code generation literature, present a novel transformer-based framework, namely FloCo-T5. Our model is well-suited for...

10.48550/arxiv.2501.17441 preprint EN arXiv (Cornell University) 2025-01-29

Enhancing energy minimization framework for scene text recognition with top-down cues

OPENALEX - Publications

Anand Mishra Karteek Alahari C. V. Jawahar

10.1016/j.cviu.2016.01.002 article EN Computer Vision and Image Understanding 2016-01-21

A Simple and Effective Solution for Script Identification in the Wild

OPENALEX - Publications

Ajeet Kumar Singh Anand Mishra Pranav Dabral C. V. Jawahar

We present an approach for automatically identifying the script of text localized in scene images. Our is inspired by advancements mid-level features. represent images using features which are pooled from densely computed local Once represented proposed feature representation, we use off-the-shelf classifier to identify image. efficient and requires very less labeled data. evaluate performance our method on a recently introduced CVSI dataset, demonstrating that can correctly 96.70% In...

10.1109/das.2016.57 article EN 2016-04-01

Unsupervised refinement of color and stroke features for text binarization

OPENALEX - Publications

Anand Mishra Karteek Alahari C. V. Jawahar

10.1007/s10032-017-0283-9 article EN International Journal on Document Analysis and Recognition (IJDAR) 2017-04-03

Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering

OPENALEX - Publications

Abhirama Subramanyam Penamakuri Manish Gupta Mithun Das Gupta Anand Mishra

We study visual question answering in a setting where the answer has to be mined from pool of relevant and irrelevant images given as context. For such setting, model must first retrieve these retrieved images. refer this problem retrieval-based (or RETVQA short). The is distinctively different more challenging than traditionally-studied Visual Question Answering (VQA), answered with single image Towards solving task, we propose unified Multi Image BART (MI-BART) that takes using our...

10.24963/ijcai.2023/146 article EN 2023-08-01

Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions

OPENALEX - Publications

Prajwal Gatti Kshitij Parikh Dhriti Prasanna Paul Manish Gupta Anand Mishra

Non-native speakers with limited vocabulary often struggle to name specific objects despite being able visualize them, e.g., people outside Australia searching for ‘numbats.’ Further, users may want search such elusive difficult-to-sketch interactions, “numbat digging in the ground.” In common but complex situations, desire a interface that accepts composite multimodal queries comprising hand-drawn sketches of “difficult-to-name easy-to-draw” and text describing “difficult-to-sketch...

10.1609/aaai.v38i3.27956 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos

OPENALEX - Publications

Yogesh Kumar Saswat Mallick Anand Mishra Sowmya Rasipuram Anutosh Maitra and 1 more

In this work, we study one-shot video object localization problem that aims to localize instances of unseen objects in the target using a single query image object. Toward addressing challenging problem, extend popular and successful detection method, namely DETR (Detection Transformer), introduce novel approach –query-guided transformer for videos (QDETRv). A distinctive feature QDETRv is its capacity exploit information from spatio-temporal context video, which significantly aids precisely...

10.1609/aaai.v38i3.28063 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Semantic Labels-Aware Transformer Model for Searching over a Large Collection of Lecture-Slides

OPENALEX - Publications

K. V. Jobin Anand Mishra C. V. Jawahar

Massive Open Online Courses (MOOCs) enable easy access to many educational materials, particularly lecture slides, on the web. Searching through them based user queries becomes an essential problem due availability of such vast information. To address this, we present Lecture Slide Deck Search Engine – a model that supports natural language and hand-drawn sketches performs searches large collection slide images computer science topics. This search engine is trained using novel semantic...

10.1109/wacv57701.2024.00591 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Sparse Document Image Coding for Restoration

OPENALEX - Publications

Vijay Kumar Amit Kumar Bansal Goutam Hari Tulsiyan Anand Mishra Anoop Namboodiri and 1 more

Sparse representation based image restoration techniques have shown to be successful in solving various inverse problems such as denoising, painting, and super-resolution, etc. on natural images videos. In this paper, we explore the use of sparse methods specifically restore degraded document images. While form a very small subset all possible admitting possibility representation, are significantly more restricted expected ideally suited for representation. However, binary nature textual...

10.1109/icdar.2013.146 article EN 2013-08-01

Few-Shot Referring Relationships in Videos

OPENALEX - Publications

Yogesh Kumar Anand Mishra

Interpreting visual relationships is a core aspect of comprehensive video understanding. Given query relationship as <subject, predicate, object> and test video, our objective to localize the subject object that are connected via predicate. modern visio-lingual understanding capabilities, solving this problem achievable, provided there large-scale annotated training examples available. However, annotating for every combination subject, object, predicate cumbersome, expensive, possibly...

10.1109/cvpr52729.2023.00227 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

DHFML: deep heterogeneous feature metric learning for matching photograph and cartoon pairs

OPENALEX - Publications

Anand Mishra

10.1007/s13735-018-0160-4 article EN International Journal of Multimedia Information Retrieval 2018-11-16

Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing

OPENALEX - Publications

Aditay Tripathi Anand Mishra Anirban Chakraborty

This paper presents a framework for jointly grounding objects that follow certain semantic relationship constraints given in scene graph. A typical natural contains several objects, often exhibiting visual relationships of varied complexities between them. These inter-object provide strong contextual cues towards improving performance compared to traditional object query-only-based localization task. graph is an efficient and structured way represent all the their image. In attempt bridging...

10.1109/wacv56688.2023.00437 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

Coming Soon ...