NFDI4DS | UHH-SEMS - Publication Details

Towards novel organic high- Tc superconductors: Data mining using density of states similarity search

OPENALEX - Publications

R. Matthias Geilhufe Stanislav S. Borysov Dmytro Kalpakchi Alexander V. Balatsky

Identifying novel functional materials with desired key properties is an important part of bridging the gap between fundamental research and technological advancement. In this context, high-throughput calculations combined data-mining techniques highly accelerated process in different areas during past years. The strength a data-driven approach for prediction lies narrowing down search space thousands to subset prospective candidates. Recently, open-access organic database OMDB was released...

10.1103/physrevmaterials.2.024802 article EN Physical Review Materials 2018-02-08

BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

An important part when constructing multiple-choice questions (MCQs) for reading comprehension assessment are the distractors, incorrect but preferably plausible answer options. In this paper, we present a new BERT-based method automatically generating distractors using only small-scale dataset. We also release such dataset of Swedish MCQs (used training model), and propose methodology assessing generated distractors. Evaluation shows that from student’s perspective, our one or more than 50%...

10.18653/v1/2021.inlg-1.43 article EN cc-by 2021-01-01

Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

Abstract We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our provides strong, deterministic and inexpensive-to-train baseline less-resourced languages. While language-specific corpus is still required, its size nowhere near those required by modern neural question generation (QG) architectures. surpasses QG baselines previously reported in the literature terms of automatic evaluation metrics shows good performance human evaluation.

10.1017/s1351324923000037 article EN cc-by Natural Language Engineering 2023-02-27

Generation and Evaluation of Multiple-choice Reading Comprehension Questions for Swedish

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

Multiple-choice questions (MCQs) provide a widely used means of assessing reading comprehension. The automatic generation such MCQs is challenging language-technological problem that also has interesting educational applications. This article presents several methods for automatically producing comprehension from Swedish text. Unlike previous approaches, we construct models to generate the whole MCQ in one go, rather than using pipeline architecture. Furthermore, propose two-stage method...

10.3384/nejlt.2000-1533.2024.4886 article EN Northern European Journal of Language Technology 2024-11-22

Collecting Visually-Grounded Dialogue with A Game Of Sorts

OPENALEX - Publications

Bram Willemsen Dmytro Kalpakchi Gabriel Skantze

An idealized, though simplistic, view of the referring expression production and grounding process in (situated) dialogue assumes that a speaker must merely appropriately specify their so target referent may be successfully identified by addressee. However, conversation is collaborative cannot aptly characterized as an exchange minimally-specified expressions. Concerns have been raised regarding assumptions made prior work on visually-grounded reveal oversimplified referential process. We...

10.48550/arxiv.2309.05162 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

SpaceRefNet: a neural approach to spatial reference resolution in a real city environment

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

Adding interactive capabilities to pedestrian wayfinding systems in the form of spoken dialogue will make them more natural humans. Such an system needs continuously understand and interpret pedestrian’s utterances referring spatial context. Achieving this requires identify exophoric expressions utterances, link these geographic entities vicinity. This reference resolution problem is difficult, as there are often several dozens candidate referents. We present a neural network-based approach...

10.18653/v1/w19-5949 article EN cc-by 2019-01-01

Quinductor: a multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our provides strong, mostly deterministic, and inexpensive-to-train baseline less-resourced languages. While language-specific corpus is still required, its size nowhere near those required by modern neural question generation (QG) architectures. surpasses QG baselines previously reported in the literature shows good performance terms of human evaluation.

10.48550/arxiv.2103.10121 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01

BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

An important part when constructing multiple-choice questions (MCQs) for reading comprehension assessment are the distractors, incorrect but preferably plausible answer options. In this paper, we present a new BERT-based method automatically generating distractors using only small-scale dataset. We also release such dataset of Swedish MCQs (used training model), and propose methodology assessing generated distractors. Evaluation shows that from student's perspective, our one or more than 50%...

10.48550/arxiv.2108.03973 preprint EN cc-by arXiv (Cornell University) 2021-01-01

SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

We present SweCTRL-Mini, a large Swedish language model that can be used for inference and fine-tuning on single consumer-grade GPU. The is based the CTRL architecture by Keskar, McCann, Varshney, Xiong, Socher (2019), which means users of SweCTRL-Mini control genre generated text inserting special tokens in generation prompts. trained subset part mC4 corpus set novels. In this article, we provide (1) detailed account utilized training data pre-processing steps, to extent it possible check...

10.48550/arxiv.2304.13994 preprint EN cc-by arXiv (Cornell University) 2023-01-01

EMBRACE: Evaluation and Modifications for Boosting RACE

OPENALEX - Publications

Mariia Zyrianova Dmytro Kalpakchi Johan Boye

When training and evaluating machine reading comprehension models, it is very important to work with high-quality datasets that are also representative of real-world tasks. This requirement includes, for instance, having questions based on texts different genres require generating inferences or reflecting the material. In this article we turn our attention RACE, a dataset English corresponding multiple-choice (MCQs). Each MCQ consists question four alternatives (of which one correct answer)....

10.48550/arxiv.2305.08433 preprint EN cc-by arXiv (Cornell University) 2023-01-01

QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian

OPENALEX - Publications

Mariia Zyrianova Dmytro Kalpakchi

In this article we present the first dataset of multiple choice questions for assessing reading comprehension in Ukrainian. The is based on texts from Ukrainian national tests comprehension, and MCQs themselves are created semi-automatically three stages. stage was to use GPT-3 generate zero-shot, second select sufficient quality revise ones with minor errors, whereas final expand written manually. by language native speakers, one whom also a teacher. resulting corpus has slightly more than...

10.3384/nejlt.2000-1533.2023.4939 article EN Northern European Journal of Language Technology 2023-11-16

[Re] Learning to Learn By Self-Critique

OPENALEX - Publications

Isac Arnekvist Dmytro Kalpakchi

This work is a reproducibility study of the paper Antoniou and Storkey [2019], published at NeurIPS 2019. Our results are in parts similar to ones reported original paper, supporting central claim that proposed novel method, called Self-Critique Adapt (SCA), improves performance MAML++. The conducted additional experiments on Caltech-UCSD Birds 200 dataset confirm superiority SCA compared In addition, reproduced suggests high-end version MAML++ for which we could not reproduce same results....

10.48550/arxiv.1912.00183 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Automatically generating question-answer pairs for assessing basic reading comprehension in Swedish

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

This paper presents an evaluation of the quality automatically generated reading comprehension questions from Swedish text, using Quinductor method. method is a light-weight, data-driven but non-neural for automatic question generation (QG). The shows that viable QG can provide strong baseline neural-network-based methods.

10.48550/arxiv.2211.15568 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

Minor changes make a difference: a case study on the consistency of UD-based dependency parsers

OPENALEX - Publications

Dmytro Kalpakchi Johan Boye

Many downstream applications are using dependency trees, and thus relying on parsers producing correct, or at least consistent, output. However, trained machine learning, therefore susceptible to unwanted inconsistencies due biases in the training data. This paper explores effects of such four languages - English, Swedish, Russian, Ukrainian though an experiment where we study effect replacing numerals sentences. We show that seemingly insignificant changes input can cause large differences...

10.48550/arxiv.2111.15413 preprint EN cc-by arXiv (Cornell University) 2021-01-01