NFDI4DS | UHH-SEMS - Publication Details

R. Manmatha

ORCID: 0000-0003-2315-8583

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5055459215

Research Areas

Advanced Image and Video Retrieval Techniques
Image Retrieval and Classification Techniques
Handwritten Text Recognition Techniques
Multimodal Machine Learning Applications
Video Analysis and Summarization
Natural Language Processing Techniques
Image Processing and 3D Reconstruction
Medical Image Segmentation Techniques
Music and Audio Processing
Domain Adaptation and Few-Shot Learning
Topic Modeling
Vehicle License Plate Recognition
Advanced Vision and Imaging
Data Management and Algorithms
Human Pose and Action Recognition
Anomaly Detection Techniques and Applications
Information Retrieval and Search Behavior
Image and Object Detection Techniques
Advanced Text Analysis Techniques
Biomedical Text Mining and Ontologies
Text and Document Classification Technologies
Visual Attention and Saliency Detection
Generative Adversarial Networks and Image Synthesis
Optical measurement and interference techniques
Advanced Image Processing Techniques

Amazon (United States)
2017-2023

Amazon (Germany)
2019-2022

Technion – Israel Institute of Technology
2021

California Institute of Technology
2021

University of Massachusetts Amherst
2008-2017

Amherst College
2001-2014

Universitat Autònoma de Barcelona
2014

Defense Advanced Research Projects Agency
2003

University of Hawaii System
1987

Automatic image annotation and retrieval using cross-media relevance models

OPENALEX - Publications

Jiwoon Jeon Victor Lavrenko R. Manmatha

Libraries have traditionally used manual image annotation for indexing and then later retrieving their collections. However, is an expensive labor intensive procedure hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose approach annotating a training set of images. We assume that regions can be described using small vocabulary blobs. Blobs are generated from features clustering. Given annotations, show probabilistic models...

10.1145/860435.860459 article EN 2003-07-28

ResNeSt: Split-Attention Networks

OPENALEX - Publications

Hang Zhang Chongruo Wu Zhongyue Zhang Yi Zhu Haibin Lin and 7 more

The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation-learning in convolutional neural networks, we present a multi-branch architecture, which applies channel-wise attention across different branches leverage complementary strengths both feature-map and multi-path representation. Our proposed Split-Attention module provides simple modular computation block that can serve as drop-in replacement for popular...

10.1109/cvprw56347.2022.00309 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Sampling Matters in Deep Embedding Learning

OPENALEX - Publications

Chao-Yuan Wu R. Manmatha Alexander J. Smola Philipp Krähenbühl

Deep embeddings answer one simple question: How similar are two images? Learning these is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with suitable loss function, such as contrastive or triplet loss. While rich line work focuses solely on functions, we show in this paper that selecting training examples plays an equally important role. We propose distance weighted sampling, which selects more...

10.1109/iccv.2017.309 preprint EN 2017-10-01

Multiple Bernoulli relevance models for image and video annotation

OPENALEX - Publications

Siwei Feng R. Manmatha Victor Lavrenko

Retrieving images in response to textual queries requires some knowledge of the semantics picture. Here, we show how can do both automatic image annotation and retrieval (using one word queries) from videos using a multiple Bernoulli relevance model. The model assumes that training set or along with keyword annotations is provided. Multiple keywords are provided for an specific correspondence between not Each partitioned into rectangular regions real-valued feature vector computed over these...

10.1109/cvpr.2004.1315274 article EN 2004-11-12

Word image matching using dynamic time warping

OPENALEX - Publications

T.M. Rath R. Manmatha

Libraries and other institutions are interested in providing access to scanned versions of their large collections handwritten historical manuscripts on electronic media. Convenient a collection requires an index, which is manually created at great labor expense. Since current handwriting recognizers do not perform well documents, technique called word spotting has been developed: clusters with occurrences the same established using image matching. By annotating "interesting" clusters, index...

10.1109/cvpr.2003.1211511 article EN 2003-11-21

Automatic image annotation and retrieval using cross-media relevance models

OPENALEX - Publications

Jiwoon Jeon Victor Lavrenko R. Manmatha

10.1145/860458.860459 article EN Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03 2003-01-01

Compressed Video Action Recognition

OPENALEX - Publications

Chao-Yuan Wu Manzil Zaheer Hexiang Hu R. Manmatha Alexander J. Smola and 1 more

Training robust deep video representations has proven to be much more challenging than learning image representations. This is in part due the enormous size of raw streams and high temporal redundancy; true interesting signal often drowned too irrelevant data. Motivated by that superfluous information can reduced up two orders magnitude compression (using H.264, HEVC, etc.), we propose train a network directly on compressed video. representation higher density, found training easier. In...

10.1109/cvpr.2018.00631 preprint EN 2018-06-01

Word spotting for historical documents

OPENALEX - Publications

Toni M. Rath R. Manmatha

10.1007/s10032-006-0035-8 article EN International Journal on Document Analysis and Recognition (IJDAR) 2006-12-13

A Novel Word Spotting Method Based on Recurrent Neural Networks

OPENALEX - Publications

Volkmar Frinken Andreas Fischer R. Manmatha Horst Bunke

Keyword spotting refers to the process of retrieving all instances a given keyword from document. In present paper, novel method for handwritten documents is described. It derived neural network-based system unconstrained handwriting recognition. As such it performs template-free spotting, i.e., not necessary appear in training set. The done using modification CTC Token Passing algorithm conjunction with recurrent network. We demonstrate that proposed systems outperform only classical...

10.1109/tpami.2011.113 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2011-06-17

DocFormer: End-to-End Transformer for Document Understanding

OPENALEX - Publications

Srikar Appalaraju Bhavan Jasani Bhargava Urala Kota Yusheng Xie R. Manmatha

We present DocFormer - a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and layouts. In addition, pre-trained an unsupervised fashion using carefully designed tasks encourage interaction. uses text, vision spatial features combines them novel self-attention layer. also shares learned embeddings across modalities makes it easy model...

10.1109/iccv48922.2021.00103 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

SCATTER: Selective Context Attentional Scene Text Recognizer

OPENALEX - Publications

Ron Litman Oron Anschel Shahar Tsiper Roee Litman Shai Mazor and 1 more

Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area research. Current state-of-the-art (SOTA) methods still struggle to recognize written in arbitrary shapes. In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Recognizer (SCATTER). SCATTER utilizes stacked block with intermediate supervision during training, that paves way successfully train deep BiLSTM encoder, thus improving encoding...

10.1109/cvpr42600.2020.01198 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

DocFormerv2: Local Features for Document Understanding

OPENALEX - Publications

Srikar Appalaraju Peng Tang Qi Dong Nishant Sankaran Yichu Zhou and 1 more

We propose DocFormerv2, a multi-modal transformer for Visual Document Understanding (VDU). The VDU domain entails understanding documents (beyond mere OCR predictions) e.g., extracting information from form, VQA and other tasks. is challenging as it needs model to make sense of multiple modalities (visual, language spatial) prediction. Our approach, termed DocFormerv2 an encoder-decoder which takes input - vision, spatial features. pre-trained with unsupervised tasks employed asymmetrically...

10.1609/aaai.v38i2.27828 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Textfinder: an automatic system to detect and recognize text in images

OPENALEX - Publications

Victor Wu R. Manmatha Edward M. Riseman

A robust system is proposed to automatically detect and extract text in images from different sources, including video, newspapers, advertisements, stock certificates, photographs, checks. Text first detected using multiscale texture segmentation spatial cohesion constraints, then cleaned up extracted a histogram-based binarization algorithm. An automatic performance evaluation scheme also proposed.

10.1109/34.809116 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 1999-01-01

Word spotting for historical documents

OPENALEX - Publications

T.M. Rath R. Manmatha

10.1007/s10032-006-0027-8 article EN International Journal on Document Analysis and Recognition (IJDAR) 2006-08-25

Word spotting: a new approach to indexing handwriting

OPENALEX - Publications

R. Manmatha Chengfeng Han Edward M. Riseman

There are many historical manuscripts written in a single hand which it would be useful to index. Examples include the W.B. DuBois collection at University of Massachusetts and early Presidential libraries Library Congress. Since Optical Character Recognition (OCR) does not work well on handwriting, an alternative scheme based matching images words is proposed for indexing such texts. The current paper deals with aspects this process. Two different techniques discussed. first method matches...

10.1109/cvpr.1996.517139 article EN 1996-01-01

Holistic word recognition for handwritten historical documents

OPENALEX - Publications

Victor Lavrenko T.M. Rath R. Manmatha

Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The result of a word is then the composition individually parts. Inspired results in cognitive psychology, researchers have begun to focus on holistic approaches. Here we present approach for single-author historical documents, motivated fact that severely degraded documents segmentation characters will produce very poor results. quality...

10.1109/dial.2004.1263256 article EN 2004-06-10

Challenges in information retrieval and language modeling

OPENALEX - Publications

James Allan Jay Aslam Nicholas J. Belkin Chris Buckley Jamie Callan and 31 more

article Share on Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent retrieval, University Massachusetts Amherst, September 2002 Authors: James Allan View Profile , Jay Aslam Nicholas Belkin Chris Buckley Jamie Callan Bruce Croft Sue Dumais Norbert Fuhr Donna Harman David J. Harper Djoerd Hiemstra Thomas Hofmann Eduard Hovy Wessel Kraaij John Lafferty Victor Lavrenko Lewis Liz Liddy R. Manmatha Andrew McCallum Ponte Prager...

10.1145/945546.945549 article EN ACM SIGIR Forum 2003-04-01

Modeling score distributions for combining the outputs of search engines

OPENALEX - Publications

R. Manmatha T.M. Rath Fan Feng

In this paper the score distributions of a number text search engines are modeled. It is shown empirically that on per query basis may be fitted using an exponential distribution for set non-relevant documents and normal relevant documents. Experiments show model fits TREC-3 TREC-4 data not only probabilistic like INQUERY but also vector space SMART English. We have used to fit output other LSI indexing languages Chinese.

10.1145/383952.384005 article EN 2001-09-01

Finding text in images

OPENALEX - Publications

Victor Wu R. Manmatha Edward M. Riseman

Article Free Access Share on Finding text in images Authors: Victor Wu Computer Science Department, University of Massachusetts, Amherst, MA MAView Profile , R. Manmatha Edward M. Riseman Authors Info & Claims DL '97: Proceedings the second ACM international conference Digital librariesJuly 1997Pages 3–12https://doi.org/10.1145/263690.263766Published:01 July 1997Publication History 141citation3,052DownloadsMetricsTotal Citations141Total Downloads3,052Last 12 Months584Last 6 weeks239 Get...

10.1145/263690.263766 article EN 1997-01-01

Features for word spotting in historical manuscripts

OPENALEX - Publications

T.M. Rath R. Manmatha

For the transition from traditional to digital libraries, large number of handwritten manuscripts that exist pose a great challenge. Easy access such collections requires an index, which is currently created manually at cost. Because automatic handwriting recognizers fail on historical manuscripts, word spotting technique has been developed: words in collection are matched as images and grouped into clusters contain all instances same word. By annotating "interesting" clusters, index links...

10.1109/icdar.2003.1227662 article EN 2004-02-03

A scale space approach for automatically segmenting words from historical handwritten documents

OPENALEX - Publications

R. Manmatha Jamie L. Rothfeder

Many libraries, museums, and other organizations contain large collections of handwritten historical documents, for example, the papers early presidents like George Washington at Library Congress. The first step in providing recognition/retrieval tools is to automatically segment pages into words. State art segmentation techniques gap metrics algorithm have been mostly developed tested on highly constrained documents bank checks postal addresses. There has little work full this usually...

10.1109/tpami.2005.150 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2005-06-22

Automatic Image Annotation using Deep Learning Representations

OPENALEX - Publications

Venkatesh N. Murthy Subhransu Maji R. Manmatha

We propose simple and effective models for the image annotation that make use of Convolutional Neural Network (CNN) features extracted from an word embedding vectors to represent their associated tags. Our first set is based on Canonical Correlation Analysis (CCA) framework helps in modeling both views - visual (CNN feature) textual (word vectors) data. Results all three variants CCA models, namely linear CCA, kernel with k-nearest neighbor (CCA-KNN) clustering, are reported. The best...

10.1145/2671188.2749391 article EN 2015-06-22

A Comprehensive Study of Deep Video Action Recognition

OPENALEX - Publications

Yi Zhu Xinyu Li Chunhui Liu Mohammadreza Zolfaghari Yuanjun Xiong and 5 more

Video action recognition is one of the representative tasks for video understanding. Over last decade, we have witnessed great advancements in thanks to emergence deep learning. But also encountered new challenges, including modeling long-range temporal information videos, high computation costs, and incomparable results due datasets evaluation protocol variances. In this paper, provide a comprehensive survey over 200 existing papers on learning recognition. We first introduce 17 that...

10.48550/arxiv.2012.06567 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Sequence-to-Sequence Contrastive Learning for Text Recognition

OPENALEX - Publications

Aviad Aberdam Ron Litman Shahar Tsiper Oron Anschel Ron Slossberg and 3 more

We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account the structure, each feature map is divided into different instances over loss computed. This operation enables us contrast in sub-word level, where from image extract several positive pairs and multiple negative examples. yield effective representations recognition, further suggest novel augmentation heuristics, encoder architectures custom...

10.1109/cvpr46437.2021.01505 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Coming Soon ...