- Natural Language Processing Techniques
- Linguistics and language evolution
- linguistics and terminology studies
- Lexicography and Language Studies
- Basque language and culture studies
- Topic Modeling
- Hungarian Social, Economic and Educational Studies
- Linguistics, Language Diversity, and Identity
- European and International Law Studies
- Legal Language and Interpretation
- Hate Speech and Cyberbullying Detection
- Digital Communication and Language
- Linguistic research and analysis
- Freedom of Expression and Defamation
- Gender Studies in Language
- Mathematics, Computing, and Information Processing
- Computational and Text Analysis Methods
- Educational Technology and Assessment
- Semantic Web and Ontologies
- Service-Oriented Architecture and Web Services
- Biomedical Text Mining and Ontologies
- Digital Rights Management and Security
Árni Magnússon Institute for Icelandic Studies
2022-2024
This paper presents the ParlaMint corpora containing transcriptions of sessions 17 European national parliaments with half a billion words. The are uniformly encoded, contain rich meta-data about 11 thousand speakers, and linguistically annotated following Universal Dependencies formalism named entities. Samples conversion scripts available from project's GitHub repository, complete openly via CLARIN.SI repository for download, as well through NoSketch Engine KonText concordancers Parlameter...
Abstract The paper presents the results of ParlaMint II project, which comprise comparable corpora parliamentary debates 29 European countries and autonomous regions, covering at least period from 2015 to 2022, containing over 1 billion words. are uniformly encoded, contain rich metadata about their 24 thousand speakers, linguistically annotated up level Universal Dependencies syntax named entities. focuses on enhancement made since I project compilation corpora, including encoding...
In Iceland, the word of year is chosen annually, both by Icelandic National Broadcasting Service and Árni Magnússon Institute for Studies (AMI). We explore possibility doing same but a more than 100 years ago. try using methods as AMI does our times. This approach has various limitations, which we discuss, raises many questions, such how much texts from journals periodicals reflect actual use time.
Abstract The paper presents the results of ParlaMint II project, which comprise comparable corpora parliamentary debates 29 European countries and autonomous regions, covering at least period from 2015 to 2022, containing over 1 billion words. are uniformly encoded, contain rich metadata about their 24 thousand speakers, linguistically annotated up level Universal Dependencies syntax named entities. focuses on enhancement made since I project compilation corpora, including encoding...