- Topic Modeling
- Natural Language Processing Techniques
- Text Readability and Simplification
- Multimodal Machine Learning Applications
- Geographic Information Systems Studies
- Machine Learning and Data Classification
- Machine Learning in Materials Science
- Speech and dialogue systems
- Domain Adaptation and Few-Shot Learning
- Language, Metaphor, and Cognition
- Advanced Database Systems and Queries
- Computational and Text Analysis Methods
- Spatial Cognition and Navigation
- Organic and Molecular Conductors Research
- Educational Assessment and Pedagogy
- Crystallography and molecular interactions
- Data Quality and Management
- Semantic Web and Ontologies
- Student Assessment and Feedback
- Educational Technology and Assessment
KTH Royal Institute of Technology
2018-2024
Stockholm University
2018
Identifying novel functional materials with desired key properties is an important part of bridging the gap between fundamental research and technological advancement. In this context, high-throughput calculations combined data-mining techniques highly accelerated process in different areas during past years. The strength a data-driven approach for prediction lies narrowing down search space thousands to subset prospective candidates. Recently, open-access organic database OMDB was released...
An important part when constructing multiple-choice questions (MCQs) for reading comprehension assessment are the distractors, incorrect but preferably plausible answer options. In this paper, we present a new BERT-based method automatically generating distractors using only small-scale dataset. We also release such dataset of Swedish MCQs (used training model), and propose methodology assessing generated distractors. Evaluation shows that from student’s perspective, our one or more than 50%...
Abstract We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our provides strong, deterministic and inexpensive-to-train baseline less-resourced languages. While language-specific corpus is still required, its size nowhere near those required by modern neural question generation (QG) architectures. surpasses QG baselines previously reported in the literature terms of automatic evaluation metrics shows good performance human evaluation.
Multiple-choice questions (MCQs) provide a widely used means of assessing reading comprehension. The automatic generation such MCQs is challenging language-technological problem that also has interesting educational applications. This article presents several methods for automatically producing comprehension from Swedish text. Unlike previous approaches, we construct models to generate the whole MCQ in one go, rather than using pipeline architecture. Furthermore, propose two-stage method...
An idealized, though simplistic, view of the referring expression production and grounding process in (situated) dialogue assumes that a speaker must merely appropriately specify their so target referent may be successfully identified by addressee. However, conversation is collaborative cannot aptly characterized as an exchange minimally-specified expressions. Concerns have been raised regarding assumptions made prior work on visually-grounded reveal oversimplified referential process. We...
Adding interactive capabilities to pedestrian wayfinding systems in the form of spoken dialogue will make them more natural humans. Such an system needs continuously understand and interpret pedestrian’s utterances referring spatial context. Achieving this requires identify exophoric expressions utterances, link these geographic entities vicinity. This reference resolution problem is difficult, as there are often several dozens candidate referents. We present a neural network-based approach...
We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our provides strong, mostly deterministic, and inexpensive-to-train baseline less-resourced languages. While language-specific corpus is still required, its size nowhere near those required by modern neural question generation (QG) architectures. surpasses QG baselines previously reported in the literature shows good performance terms of human evaluation.
An important part when constructing multiple-choice questions (MCQs) for reading comprehension assessment are the distractors, incorrect but preferably plausible answer options. In this paper, we present a new BERT-based method automatically generating distractors using only small-scale dataset. We also release such dataset of Swedish MCQs (used training model), and propose methodology assessing generated distractors. Evaluation shows that from student's perspective, our one or more than 50%...
We present SweCTRL-Mini, a large Swedish language model that can be used for inference and fine-tuning on single consumer-grade GPU. The is based the CTRL architecture by Keskar, McCann, Varshney, Xiong, Socher (2019), which means users of SweCTRL-Mini control genre generated text inserting special tokens in generation prompts. trained subset part mC4 corpus set novels. In this article, we provide (1) detailed account utilized training data pre-processing steps, to extent it possible check...
When training and evaluating machine reading comprehension models, it is very important to work with high-quality datasets that are also representative of real-world tasks. This requirement includes, for instance, having questions based on texts different genres require generating inferences or reflecting the material. In this article we turn our attention RACE, a dataset English corresponding multiple-choice (MCQs). Each MCQ consists question four alternatives (of which one correct answer)....
In this article we present the first dataset of multiple choice questions for assessing reading comprehension in Ukrainian. The is based on texts from Ukrainian national tests comprehension, and MCQs themselves are created semi-automatically three stages. stage was to use GPT-3 generate zero-shot, second select sufficient quality revise ones with minor errors, whereas final expand written manually. by language native speakers, one whom also a teacher. resulting corpus has slightly more than...
This work is a reproducibility study of the paper Antoniou and Storkey [2019], published at NeurIPS 2019. Our results are in parts similar to ones reported original paper, supporting central claim that proposed novel method, called Self-Critique Adapt (SCA), improves performance MAML++. The conducted additional experiments on Caltech-UCSD Birds 200 dataset confirm superiority SCA compared In addition, reproduced suggests high-end version MAML++ for which we could not reproduce same results....
This paper presents an evaluation of the quality automatically generated reading comprehension questions from Swedish text, using Quinductor method. method is a light-weight, data-driven but non-neural for automatic question generation (QG). The shows that viable QG can provide strong baseline neural-network-based methods.
Many downstream applications are using dependency trees, and thus relying on parsers producing correct, or at least consistent, output. However, trained machine learning, therefore susceptible to unwanted inconsistencies due biases in the training data. This paper explores effects of such four languages - English, Swedish, Russian, Ukrainian though an experiment where we study effect replacing numerals sentences. We show that seemingly insignificant changes input can cause large differences...