Jenna Kanerva

ORCID: 0000-0003-4580-5366
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Semantic Web and Ontologies
  • Speech and dialogue systems
  • Multimodal Machine Learning Applications
  • Authorship Attribution and Profiling
  • Biomedical Text Mining and Ontologies
  • Video Analysis and Summarization
  • Advanced Text Analysis Techniques
  • Web Data Mining and Analysis
  • Data Quality and Management
  • Mathematics, Computing, and Information Processing
  • Sentiment Analysis and Opinion Mining
  • Digital Communication and Language
  • Software Engineering Research
  • Digital Games and Media
  • Machine Learning and Algorithms
  • Banana Cultivation and Research
  • Digital Humanities and Scholarship
  • Philosophy, History, and Historiography
  • Algorithms and Data Compression
  • Computational and Text Analysis Methods
  • Italian Fascism and Post-war Society
  • Generative Adversarial Networks and Image Synthesis

University of Turku
2014-2024

Cambridge University Press
2021

Finland University
2017

Information Technology University
2015

Daniel Zeman, Martin Popel, Milan Straka, Jan Hajič, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinková, Hajič jr., Jaroslava Hlaváčová, Václava Kettnerová, Zdeňka Urešová, Jenna Kanerva, Stina Ojala, Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi...

10.18653/v1/k17-3001 article EN cc-by 2017-01-01

Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural processing, with recent approaches such as the transformer-based BERT model advancing state of art across a variety tasks. While most work these has focused high-resource languages, in particular English, number efforts introduced multilingual that can be fine-tuned address tasks different languages. However, we still lack thorough...

10.48550/arxiv.1912.07076 preprint EN other-oa arXiv (Cornell University) 2019-01-01

In this paper we describe the TurkuNLP entry at CoNLL 2018 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. Compared last year, year shared task includes two new main metrics measure morphological tagging and lemmatization accuracies in addition syntactic trees. Basing our motivation into these metrics, developed an end-to-end parsing pipeline especially focusing developing a novel state-of-the-art component for lemmatization. Our system reached highest aggregate...

10.18653/v1/k18-2013 article EN cc-by Proceedings of the اولین کنفرانس بین المللی پیشرفت های نوین در مهندسی عمران 2018-01-01

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies. apply a rule-based system developed English and data-driven trained on Finnish Swedish Italian. find that both systems are accurate enough bootstrap UD treebanks. In the case of Italian, results even par with those prototype language-specific system.

10.18653/v1/w18-6012 article EN cc-by 2018-01-01

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language tool for encoding sentences. We explore how well the performs several across tasks: diagnostic classification probing embeddings particular syntactic property, cloze task testing modelling ability fill in gaps sentence, natural generation produce coherent text fitting given context. find that currently available clearly inferior monolingual counterparts, cannot many cases substitute...

10.48550/arxiv.1910.03806 preprint EN cc-by arXiv (Cornell University) 2019-01-01

Abstract In this paper, we present a novel lemmatization method based on sequence-to-sequence neural network architecture and morphosyntactic context representation. the proposed method, our context-sensitive lemmatizer generates lemma one character at time surface form characters its features obtained from morphological tagger. We argue that sliding window representation suffers sparseness, while in majority of cases word bring enough information to resolve ambiguities keeping dense more...

10.1017/s1351324920000224 article EN cc-by Natural Language Engineering 2020-05-27

Optical Character Recognition (OCR) systems often introduce errors when transcribing historical documents, leaving room for post-correction to improve text quality. This study evaluates the use of open-weight LLMs OCR error correction in English and Finnish datasets. We explore various strategies, including parameter optimization, quantization, segment length effects, continuation methods. Our results demonstrate that while modern show promise reducing character rates (CER) English, a...

10.48550/arxiv.2502.01205 preprint EN arXiv (Cornell University) 2025-02-03

We performed a zero-shot information extraction study on historical collection of 89,339 brief Finnish-language interviews refugee families relocated post-WWII from Finnish Eastern Karelia. Our research objective is two-fold. First, we aim to extract social organizations and hobbies the free text interviews, separately for each family member. These can act as proxy variable indicating degree integration refugees in their new environment. Second, evaluate several alternative ways approach...

10.48550/arxiv.2502.13566 preprint EN arXiv (Cornell University) 2025-02-19

Deep neural language models such as BERT have enabled substantial recent advances in many natural processing tasks. Due to the effort and computational cost involved their pre-training, language-specific are typically introduced only for a small number of high-resource languages English. While multilingual covering large numbers available, work suggests monolingual training can produce better models, our understanding tradeoffs between mono- is incomplete. In this paper, we introduce simple,...

10.48550/arxiv.2006.01538 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna...

10.18653/v1/2022.emnlp-demos.27 article EN cc-by 2022-01-01

We present the Finnish PropBank, a resource for semantic role labeling (SRL) of based on Turku Dependency Treebank whose syntax is annotated in well-known Stanford (SD) scheme. The contribution this paper consists lexicon verbs and their arguments treebank, as well predicate-argument annotation all verb occurrences treebank text. demonstrate that high quality, SD scheme highly compatible with PropBank annotation, further additional dependencies are clearly beneficial annotation. Further, we...

10.1007/s10579-015-9310-y article EN cc-by Language Resources and Evaluation 2015-09-16

Jenna Kanerva, Filip Ginter, Sampo Pyysalo. Proceedings of the 16th International Conference on Parsing Technologies and IWPT 2020 Shared Task into Enhanced Universal Dependencies. 2020.

10.18653/v1/2020.iwpt-1.17 article EN cc-by 2020-01-01

Jörg Tiedemann, Fabienne Cap, Jenna Kanerva, Filip Ginter, Sara Stymne, Robert Östling, Marion Weller-Di Marco. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 2016.

10.18653/v1/w16-2326 article EN cc-by 2016-01-01

In this paper, we introduce several vector space manipulation methods that are applied to trained models in a post-hoc fashion, and present an application of these techniques semantic role labeling for Finnish English.Specifically, show the vectors can be circularly shifted encode syntactic information subsequently averaged produce representations predicate senses arguments.Further, it is possible effectively learn linear transformation between predicates their arguments, within same space.

10.3115/v1/w14-1501 article EN cc-by 2014-01-01
Coming Soon ...