Jesse de Does

ORCID: 0000-0003-0170-8943
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Handwritten Text Recognition Techniques
  • Topic Modeling
  • Lexicography and Language Studies
  • Image Processing and 3D Reconstruction
  • Digital Humanities and Scholarship
  • Linguistics, Language Diversity, and Identity
  • Species Distribution and Climate Change
  • Authorship Attribution and Profiling
  • Linguistic Variation and Morphology
  • Linguistics and language evolution
  • Archaeology and Historical Studies
  • linguistics and terminology studies
  • Vehicle License Plate Recognition
  • Speech and dialogue systems
  • Maritime and Coastal Archaeology
  • Computational and Text Analysis Methods
  • Distributed and Parallel Computing Systems
  • Discourse Analysis in Language Studies
  • Linguistic research and analysis
  • Literature, Language, and Rhetoric Studies
  • Mineralogy and Gemology Studies
  • Environmental DNA in Biodiversity Studies
  • Scientific Computing and Data Management
  • Identification and Quantification in Food

Age Institute
2017

Instituut voor de Nederlandse Taal
2002-2015

Institut für Nachhaltige Landbewirtschaftung (Germany)
2014

This paper presents the ParlaMint corpora containing transcriptions of sessions 17 European national parliaments with half a billion words. The are uniformly encoded, contain rich meta-data about 11 thousand speakers, and linguistically annotated following Universal Dependencies formalism named entities. Samples conversion scripts available from project's GitHub repository, complete openly via CLARIN.SI repository for download, as well through NoSketch Engine KonText concordancers Parlameter...

10.1007/s10579-021-09574-0 article EN cc-by Language Resources and Evaluation 2022-02-02

Transcription of historical handwritten documents is a crucial problem for making easier the access to these general public. Currently, huge amount are being made available by on-line portals worldwide. It not realistic obtain transcription manually, and therefore automatic techniques has be used. tranScriptorium project that aims at researching on modern Handwritten Text Recognition (HTR) technology transcribing documents. The HTR used in based models learnt automatically from examples....

10.1145/2595188.2595193 article EN 2014-05-19

The tranScriptorium project aims to develop innovative, efficient and cost-effective solutions for annotating handwritten historical documents using modern, holistic Handwritten Text Recognition (HTR) technology. Three actions are planned in tranScriptorium: i) improve basic image preprocessing HTR techniques; ii) novel indexing keyword searching approaches; iii) capitalize on new, user-friendly interactive-predictive approaches computer-assisted operation.

10.1145/2494266.2494294 article EN 2013-09-03

Text line segmentation is the process by which text lines in a document image are localized and extracted. It an important step off-line Handwritten Recognition (HTR) given that input of these systems to be transcribed. A myriad solutions problem have been proposed literature. Although may differ greatly on what actually applied perform segmentation, they can classified level precision detail final extracted lines. In this paper we study influence real needs different levels HTR task. We...

10.1109/icdar.2015.7333819 article EN 2015-08-01

We report on a case study OCR of eighteenth century books conducted in the IMPACT project. After introducing project and its approach to lexicon building deployment, we zoom application tools data Dutch EDBO collection. The results are exemplified by detailed discussion various practical options improve text recognition beyond baseline running an uncustomized Finereader 10. In particular, discuss improved long s.

10.1117/12.2008423 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2013-02-04

We present an intelligent sample selection approach to language model adaptation for handwritten text recognition, which exploits a combination of in-domain and out-of-domain data construction models. In comparison approaches proposed in the literature, our is characterized by careful consideration criteria used ranking samples innovative iteratively extends training set two propose methods, agreement or disagreement (one each model) guides add sets Both are shown clearly outperform strong...

10.1109/icfhr.2014.65 article EN 2014-09-01

Hand-written text recognition (HTR) is often carried out line-by-line: the decoding of lines independently. This approach known to deteriorate accuracy words and characters close line boundaries. The present study investigates this issue from point view language modeling component HTR system. Obviously, lack linguistic context may be one reasons for loss accuracy, but it certainly not only factor in play. We seek clarify which extent problem can influenced by first discuss how develop...

10.1109/icdar.2015.7333903 article EN 2015-08-01

The intensified circulation of people, commodities and ideas is one the characteristics a globalizing world. To understand causes consequences these circulations, we have to know which circulated when where, on what scale who made them circulate. In our paper want present first results CLARIAH Research Pilot1 diamonds in Borneo, using large historical newspaper collection KB (Royal Library Netherlands) Delpher2. So far diamond industry Borneo has been true blind spot knowledge global...

10.1145/3322905.3322924 article EN 2019-05-08

Abstract At the Instituut voor de Nederlandse Taal (Dutch Language Institute), DiaMaNT, a diachronic semantic computational lexicon of Dutch, is being developed, based on scholarly historical dictionaries Dutch. The main purpose this to enhance text accessibility and foster research in development concepts. This article explores feasibility enriching DiaMaNT with an existing classification by linking subset vocabulary Dictionary Old Dutch A Thesaurus English .

10.1163/18756719-12340240 article EN Amsterdamer Beiträge zur älteren Germanistik 2021-11-26
Coming Soon ...