- Handwritten Text Recognition Techniques
- Image Processing and 3D Reconstruction
- Image Retrieval and Classification Techniques
- Advanced Image and Video Retrieval Techniques
- Digital and Cyber Forensics
- Natural Language Processing Techniques
- Vehicle License Plate Recognition
- Anomaly Detection Techniques and Applications
- Currency Recognition and Detection
- Single-cell and spatial transcriptomics
- Digital Media Forensic Detection
- Cultural Heritage Materials Analysis
- Cell Image Analysis Techniques
- Acute Myeloid Leukemia Research
- Acute Lymphoblastic Leukemia research
- Image and Object Detection Techniques
- Archaeological Research and Protection
- Mobile Agent-Based Network Management
- Microfluidic and Bio-sensing Technologies
- Gene expression and cancer classification
- Bone and Joint Diseases
- Mathematics, Computing, and Information Processing
- 3D Surveying and Cultural Heritage
- Music and Audio Processing
- Advanced Neural Network Applications
TU Wien
2011-2021
University of Vienna
2019
University of Applied Sciences Technikum Wien
2013
Institute of Automation
2010
In this paper a public database for writer retrieval, identification and word spotting is presented. The CVL-Database consists of 7 different handwritten texts (1 German 6 English Texts) 311 writers. For each text an RGB color image (300 dpi) comprising the printed sample are available as well cropped version (only handwritten). A unique ID identifies writer, whereas bounding boxes single stored in XML file. An evaluation best algorithms ICDAR ICHFR contest has been performed on CVL-database.
Purpose An overview of the current use handwritten text recognition (HTR) on archival manuscript material, as provided by EU H2020 funded Transkribus platform. It explains HTR, demonstrates , gives examples cases, highlights affect HTR may have scholarship, and evidences this turning point advanced digitised heritage content. The paper aims to discuss these issues. Design/methodology/approach This adopts a case study approach, using development delivery one openly available platform for...
The cBAD competition aims at benchmarking state-of-the-art baseline detection algorithms. It is in line with previous competitions such as the ICDAR 2013 Handwriting Segmentation Contest. A new, challenging, dataset was created to test behavior of systems on real world data. Since traditional evaluation schemes are not applicable size and modality this dataset, we present a new one that introduces baselines measure performance. We received submissions from five different teams for both tracks.
Text line detection is crucial for any application associated with Automatic Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations time periods. The dataset contains varying layouts degradations that challenge text segmentation methods. Well established evaluation schemes such as the Detection Rate...
Minimal residual disease (MRD) as measured by multiparameter flow cytometry (FCM) is an independent and strong prognostic factor in B-cell acute lymphoblastic leukemia (B-ALL). However, reliable cytometric detection of MRD strongly depends on operator skills expert knowledge. Hence, objective, automated tool for FCM-MRD quantification, able to overcome the technical diversity analytical subjectivity, would be most helpful. We developed a supervised machine learning approach using combination...
The ICDAR 2017 Competition on Historical Document Writer Identification is dedicated to record the most recent advances made in field of writer identification.The goal identification task retrieval pages, which have been written by same author.The test dataset used this competition consists 3600 handwritten pages originating from 13 th 20 century.It contains manuscripts 720 different writers where each contributed five pages.This paper describes dataset, as well details competition.Five...
This paper presents the results of HDRC 2013 competition for recognition handwritten digits organized in conjunction with ICDAR 2013. The general objective this is to identify, evaluate and compare recent developments character introduce a new challenging dataset benchmarking. We describe details including evaluation measures used, give comparative performance analysis nine (9) submitted methods along short description respective methodologies.
We propose a layout analysis method for historical manuscripts that relies on the part-based identification of entities. A entity -- such as letters text, initials or headings is composed set characteristic segments structures, which dissimilar distinct classes in under consideration. This fact exploited order to segment manuscript page into homogeneous regions. Historical documents traditionally involve challenges uneven writing support and varying shapes characters, fluctuating text lines,...
Text recognition in natural scene images is an application for several computer vision applications like licence plate recognition, automated translation of street signs, help visually impaired people or image retrieval. In this work end-to-end text system presented. For detection AdaBoost ensemble with a modified Local Ternary Pattern (LTP) feature-set post-processing stage build upon Maximally Stable Extremely Region (MSER) used. The done using deep Convolution Neural Network (CNN) trained...
This paper presents the results of HDSRC 2014 competition on handwritten digit string recognition in challenging datasets organized conjunction with ICFHR 2014. The general objective this is to identify, evaluate and compare recent developments Western Arabic varying length. In addition, introduces two new for benchmarking. We describe details including evaluation measures used, give a comparative performance analysis six (6) participating methods along short description respective methodologies.
The main problems of Optical Character Recognition (OCR) systems are solved if printed latin text is considered. Since OCR based upon binary images, their results poor the degraded. In this paper a codex consisting ancient manuscripts investigated. Due to environmental effects characters analyzed washed out which leads gained by state art binarization methods. Hence, segmentation free approach on local descriptors being developed. Regarding information allows for recognizing that only...
Text line detection is a pre-processing step for automated document analysis such as word spotting or OCR. It additionally used structure layout analysis. Considering mixed layouts, degraded documents and handwritten documents, text still challenging. We present novel approach that targets torn having varying layouts writing. The proposed method bottom up fuses words, to globally minimize their fusing distance. In order improve processing time further analysis, lines are represented by...
Baseline detection is a simplified text-line extraction that typically serves as pre-processing for Automated Text Recognition. The cBAD competition benchmarks state-of-the-art baseline algorithms. It the successor of 2017 with larger dataset contains more diverse document pages. images together manually annotated groundtruth are made publicly available which allows other teams to benchmark and compare their methods. We could also evaluate winning method on newly introduced now baseline....
Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved. However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from 11th century investigated. order to minimize consequences false character segmentation, a binarization-free approach based on local descriptors is proposed. Additionally information allows...
Document reconstruction affects different areas such as archeology, philology and forensics. A of fragmented writing materials allows to retrieve analyze the lost content. Due complexity reconstruction, automated algorithms are necessary. methodology for shredded documents is presented in this paper which recognizes characters at stripes' borders matches them subsequently. In order achieve this, an Optical Character Recognition (OCR) system exploited, that capable recognizing partially...
In general document image analysis methods are pre-processing steps for Optical Character Recognition (OCR) systems. contrast, the proposed method aims at clustering snippets, so that an automated of documents can be performed. Therefore, words classified according to printed text, manuscripts, and noise. Where, third class corrects falsely segmented background elements. Having text elements, a layout is carried out which groups into lines paragraphs. A back propagation weights - assigned...
Two medieval manuscripts are recorded, investigated and analyzed by philologists in collaboration with computer scientists. Due to mold, air humidity water the parchment is partially damaged consequently hard read. In order enhance readability of text, manuscript pages imaged different spectral bands ranging from 360 1000nm. A registration process necessary for further image processing methods which combine information gained bands. Therefore, images coarsely aligned using rotationally...
Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or describe the layout/structure of a document. In this paper document applied snippets torn documents calculate features that can be used for reconstruction. The main intention handle varying size and different contents handwritten printed text). Documents either destroyed by make content unavailable business crime) due time induced degeneration ancient bad storage conditions). Current...
An approach for the detection of decorative elements - such as initials and headlines text regions, focused on ancient manuscripts, is presented. Due to their age, manuscripts suffer from degradation staining well ink faded-out over time. Identifying regions allows indexing a manuscript serves input Optical Character Recognition (OCR) it localizes interest within document pages. We propose robust method inspired by state-of-the-art object recognition methodologies. Scale Invariant Feature...
Acute Myeloid Leukaemia (AML) is a rare type of childhood acute leukaemia. During treatment, the assessment number cancer cells particularly important to determine treatment response and consequently adapt scheme if necessary. Minimal Residual Disease (MRD) diagnostic measure based on Flow CytoMetry (FCM) data that captures amount blasts in blood sample clinical tool for planning patients' individual therapy, which requires reliable blast identification. In this work we propose novel...
MultiSpectral (MS) imaging enriches document digitization by increasing the spectral resolution. We present a methodology which detects target ink in images taking into account this additional information. The proposed method performs rough foreground estimation to localize possible regions. Then, Adaptive Coherence Estimator (ACE), detection algorithm, transforms MS input space single gray-scale image where values close one indicate ink. A spatial segmentation using GrabCut on detection's...
An automated assembling of torn documents (2D) will support philologists, archaeologists and forensic experts. Especially if the amount fragments is large (up to 1000), a human puzzle solver not be feasible due cost time. Ancient manuscripts may broken bad storage conditions, or are manually make information unreadable. In Germany project reconstruct "Stasi-files" running for historical investigations. Also disasters like collapse archive city cologne (Germany), where part archived have been...
Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or describe the layout/structure of a document for further processing. A pre-processing step methods skew estimation scanned photographed documents. Current require existence large text areas, are dependent on type and can be limited specific angle range. The proposed method gradient based in combination with Focused Nearest Neighbor Clustering interest points has no limitations regarding...