Markus Diem

ORCID: 0000-0002-5048-5128
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Handwritten Text Recognition Techniques
  • Image Processing and 3D Reconstruction
  • Image Retrieval and Classification Techniques
  • Advanced Image and Video Retrieval Techniques
  • Digital and Cyber Forensics
  • Natural Language Processing Techniques
  • Vehicle License Plate Recognition
  • Anomaly Detection Techniques and Applications
  • Currency Recognition and Detection
  • Single-cell and spatial transcriptomics
  • Digital Media Forensic Detection
  • Cultural Heritage Materials Analysis
  • Cell Image Analysis Techniques
  • Acute Myeloid Leukemia Research
  • Acute Lymphoblastic Leukemia research
  • Image and Object Detection Techniques
  • Archaeological Research and Protection
  • Mobile Agent-Based Network Management
  • Microfluidic and Bio-sensing Technologies
  • Gene expression and cancer classification
  • Bone and Joint Diseases
  • Mathematics, Computing, and Information Processing
  • 3D Surveying and Cultural Heritage
  • Music and Audio Processing
  • Advanced Neural Network Applications

TU Wien
2011-2021

University of Vienna
2019

University of Applied Sciences Technikum Wien
2013

Institute of Automation
2010

In this paper a public database for writer retrieval, identification and word spotting is presented. The CVL-Database consists of 7 different handwritten texts (1 German 6 English Texts) 311 writers. For each text an RGB color image (300 dpi) comprising the printed sample are available as well cropped version (only handwritten). A unique ID identifies writer, whereas bounding boxes single stored in XML file. An evaluation best algorithms ICDAR ICHFR contest has been performed on CVL-database.

10.1109/icdar.2013.117 article EN 2013-08-01

Purpose An overview of the current use handwritten text recognition (HTR) on archival manuscript material, as provided by EU H2020 funded Transkribus platform. It explains HTR, demonstrates , gives examples cases, highlights affect HTR may have scholarship, and evidences this turning point advanced digitised heritage content. The paper aims to discuss these issues. Design/methodology/approach This adopts a case study approach, using development delivery one openly available platform for...

10.1108/jd-07-2018-0114 article EN Journal of Documentation 2019-07-23

The cBAD competition aims at benchmarking state-of-the-art baseline detection algorithms. It is in line with previous competitions such as the ICDAR 2013 Handwriting Segmentation Contest. A new, challenging, dataset was created to test behavior of systems on real world data. Since traditional evaluation schemes are not applicable size and modality this dataset, we present a new one that introduces baselines measure performance. We received submissions from five different teams for both tracks.

10.1109/icdar.2017.222 article EN 2017-11-01

Text line detection is crucial for any application associated with Automatic Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations time periods. The dataset contains varying layouts degradations that challenge text segmentation methods. Well established evaluation schemes such as the Detection Rate...

10.1109/das.2018.38 article EN 2018-04-01

Minimal residual disease (MRD) as measured by multiparameter flow cytometry (FCM) is an independent and strong prognostic factor in B-cell acute lymphoblastic leukemia (B-ALL). However, reliable cytometric detection of MRD strongly depends on operator skills expert knowledge. Hence, objective, automated tool for FCM-MRD quantification, able to overcome the technical diversity analytical subjectivity, would be most helpful. We developed a supervised machine learning approach using combination...

10.1002/cyto.a.23852 article EN Cytometry Part A 2019-07-07

The ICDAR 2017 Competition on Historical Document Writer Identification is dedicated to record the most recent advances made in field of writer identification.The goal identification task retrieval pages, which have been written by same author.The test dataset used this competition consists 3600 handwritten pages originating from 13 th 20 century.It contains manuscripts 720 different writers where each contributed five pages.This paper describes dataset, as well details competition.Five...

10.1109/icdar.2017.225 article EN 2017-11-01

This paper presents the results of HDRC 2013 competition for recognition handwritten digits organized in conjunction with ICDAR 2013. The general objective this is to identify, evaluate and compare recent developments character introduce a new challenging dataset benchmarking. We describe details including evaluation measures used, give comparative performance analysis nine (9) submitted methods along short description respective methodologies.

10.1109/icdar.2013.287 article EN 2013-08-01

We propose a layout analysis method for historical manuscripts that relies on the part-based identification of entities. A entity -- such as letters text, initials or headings is composed set characteristic segments structures, which dissimilar distinct classes in under consideration. This fact exploited order to segment manuscript page into homogeneous regions. Historical documents traditionally involve challenges uneven writing support and varying shapes characters, fluctuating text lines,...

10.1109/icdar.2011.108 article EN International Conference on Document Analysis and Recognition 2011-09-01

Text recognition in natural scene images is an application for several computer vision applications like licence plate recognition, automated translation of street signs, help visually impaired people or image retrieval. In this work end-to-end text system presented. For detection AdaBoost ensemble with a modified Local Ternary Pattern (LTP) feature-set post-processing stage build upon Maximally Stable Extremely Region (MSER) used. The done using deep Convolution Neural Network (CNN) trained...

10.1109/das.2014.29 article EN 2014-04-01

This paper presents the results of HDSRC 2014 competition on handwritten digit string recognition in challenging datasets organized conjunction with ICFHR 2014. The general objective this is to identify, evaluate and compare recent developments Western Arabic varying length. In addition, introduces two new for benchmarking. We describe details including evaluation measures used, give a comparative performance analysis six (6) participating methods along short description respective methodologies.

10.1109/icfhr.2014.136 article EN 2014-09-01

The main problems of Optical Character Recognition (OCR) systems are solved if printed latin text is considered. Since OCR based upon binary images, their results poor the degraded. In this paper a codex consisting ancient manuscripts investigated. Due to environmental effects characters analyzed washed out which leads gained by state art binarization methods. Hence, segmentation free approach on local descriptors being developed. Regarding information allows for recognizing that only...

10.1109/icdar.2009.158 article EN 2009-01-01

Text line detection is a pre-processing step for automated document analysis such as word spotting or OCR. It additionally used structure layout analysis. Considering mixed layouts, degraded documents and handwritten documents, text still challenging. We present novel approach that targets torn having varying layouts writing. The proposed method bottom up fuses words, to globally minimize their fusing distance. In order improve processing time further analysis, lines are represented by...

10.1109/icdar.2013.152 article EN 2013-08-01

Baseline detection is a simplified text-line extraction that typically serves as pre-processing for Automated Text Recognition. The cBAD competition benchmarks state-of-the-art baseline algorithms. It the successor of 2017 with larger dataset contains more diverse document pages. images together manually annotated groundtruth are made publicly available which allows other teams to benchmark and compare their methods. We could also evaluate winning method on newly introduced now baseline....

10.1109/icdar.2019.00240 article EN 2019-09-01

Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved. However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from 11th century investigated. order to minimize consequences false character segmentation, a binarization-free approach based on local descriptors is proposed. Additionally information allows...

10.1117/12.843532 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2010-01-19

Document reconstruction affects different areas such as archeology, philology and forensics. A of fragmented writing materials allows to retrieve analyze the lost content. Due complexity reconstruction, automated algorithms are necessary. methodology for shredded documents is presented in this paper which recognizes characters at stripes' borders matches them subsequently. In order achieve this, an Optical Character Recognition (OCR) system exploited, that capable recognizing partially...

10.1049/ic.2011.0132 article EN 2011-01-01

In general document image analysis methods are pre-processing steps for Optical Character Recognition (OCR) systems. contrast, the proposed method aims at clustering snippets, so that an automated of documents can be performed. Therefore, words classified according to printed text, manuscripts, and noise. Where, third class corrects falsely segmented background elements. Having text elements, a layout is carried out which groups into lines paragraphs. A back propagation weights - assigned...

10.1109/icdar.2011.175 article EN International Conference on Document Analysis and Recognition 2011-09-01

Two medieval manuscripts are recorded, investigated and analyzed by philologists in collaboration with computer scientists. Due to mold, air humidity water the parchment is partially damaged consequently hard read. In order enhance readability of text, manuscript pages imaged different spectral bands ranging from 360 1000nm. A registration process necessary for further image processing methods which combine information gained bands. Therefore, images coarsely aligned using rotationally...

10.5281/zenodo.41086 article EN European Signal Processing Conference 2008-08-25

Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or describe the layout/structure of a document. In this paper document applied snippets torn documents calculate features that can be used for reconstruction. The main intention handle varying size and different contents handwritten printed text). Documents either destroyed by make content unavailable business crime) due time induced degeneration ancient bad storage conditions). Current...

10.1145/1815330.1815381 article EN 2010-06-09

An approach for the detection of decorative elements - such as initials and headlines text regions, focused on ancient manuscripts, is presented. Due to their age, manuscripts suffer from degradation staining well ink faded-out over time. Identifying regions allows indexing a manuscript serves input Optical Character Recognition (OCR) it localizes interest within document pages. We propose robust method inspired by state-of-the-art object recognition methodologies. Scale Invariant Feature...

10.1109/icfhr.2010.35 article EN 2010-11-01

Acute Myeloid Leukaemia (AML) is a rare type of childhood acute leukaemia. During treatment, the assessment number cancer cells particularly important to determine treatment response and consequently adapt scheme if necessary. Minimal Residual Disease (MRD) diagnostic measure based on Flow CytoMetry (FCM) data that captures amount blasts in blood sample clinical tool for planning patients' individual therapy, which requires reliable blast identification. In this work we propose novel...

10.1109/icpr.2018.8546177 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2018-08-01

MultiSpectral (MS) imaging enriches document digitization by increasing the spectral resolution. We present a methodology which detects target ink in images taking into account this additional information. The proposed method performs rough foreground estimation to localize possible regions. Then, Adaptive Coherence Estimator (ACE), detection algorithm, transforms MS input space single gray-scale image where values close one indicate ink. A spatial segmentation using GrabCut on detection's...

10.1109/das.2016.39 article EN 2016-04-01

An automated assembling of torn documents (2D) will support philologists, archaeologists and forensic experts. Especially if the amount fragments is large (up to 1000), a human puzzle solver not be feasible due cost time. Ancient manuscripts may broken bad storage conditions, or are manually make information unreadable. In Germany project reconstruct "Stasi-files" running for historical investigations. Also disasters like collapse archive city cologne (Germany), where part archived have been...

10.1109/vsmm.2009.27 article EN 2009-09-01

Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or describe the layout/structure of a document for further processing. A pre-processing step methods skew estimation scanned photographed documents. Current require existence large text areas, are dependent on type and can be limited specific angle range. The proposed method gradient based in combination with Focused Nearest Neighbor Clustering interest points has no limitations regarding...

10.1109/das.2012.81 article EN 2012-03-01
Coming Soon ...