Andraž Repar

ORCID: 0000-0001-9664-6145
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Semantic Web and Ontologies
  • Biomedical Text Mining and Ontologies
  • Lexicography and Language Studies
  • linguistics and terminology studies
  • Advanced Text Analysis Techniques
  • Linguistics and language evolution
  • Data Visualization and Analytics
  • Speech and dialogue systems
  • Text Readability and Simplification
  • Advanced Graph Neural Networks
  • Information Retrieval and Search Behavior

Jožef Stefan Institute
2019-2024

Jožef Stefan International Postgraduate School
2019

Abstract Automatic term extraction (ATE) is a natural language processing task that eases the effort of manually identifying terms from domain-specific corpora by providing list candidate terms. In this paper, we treat ATE as sequence-labeling and explore efficacy XLMR in evaluating cross-lingual multilingual learning against monolingual cross-domain context. Additionally, introduce NOBI, novel annotation mechanism enabling labeling single-word nested Our experiments are conducted on ACTER...

10.1007/s10994-023-06506-7 article EN cc-by Machine Learning 2024-03-27

Abstract This paper describes TermEnsembler, a bilingual term extraction and alignment system utilizing novel ensemble learning approach to alignment. In the proposed system, processing starts with monolingual from language industry standard file type containing aligned English Slovenian texts. The two separate lists are then automatically using an of seven methods, which first executed separately merged weights learned evolutionary algorithm. experiments, were on one domain tested other...

10.1075/term.00029.rep article EN cc-by-nc Terminology International Journal of Theoretical and Applied Issues in Specialized Communication 2019-07-24

Abstract In this paper, we look at the issue of reproducibility and replicability in bilingual terminology alignment (BTA). We propose a set best practices for NLP papers analyze several influential BTA from perspective. Next, present our attempts replication reproduction, where focus on approach described by Aker et al. (Extracting terminologies comparable corpora. In: Proceedings 51st annual meeting association computational linguistics, vol. 1 402–411, 2013) who treat term as binary...

10.1007/s10579-019-09477-1 article EN cc-by Language Resources and Evaluation 2019-11-18

The current dominance of deep neural networks in natural language processing is based on contextual embeddings such as ELMo, BERT, and BERT derivatives. Most existing work focuses English; contrast, we present here the first multilingual empirical comparison two ELMo several monolingual models using 14 tasks nine languages. In settings, our analysis shows that generally dominate, with a few exceptions dependency parsing task, where they are not competitive trained large corpora....

10.48550/arxiv.2107.10614 preprint EN cc-by-sa arXiv (Cornell University) 2021-01-01
Coming Soon ...