NFDI4DS | UHH-SEMS - Publication Details

Senja Pollak

ORCID: 0000-0002-4380-0863

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5074881863

Research Areas

Natural Language Processing Techniques
Topic Modeling
Advanced Text Analysis Techniques
Biomedical Text Mining and Ontologies
Text Readability and Simplification
Semantic Web and Ontologies
Authorship Attribution and Profiling
Sentiment Analysis and Opinion Mining
Hate Speech and Cyberbullying Detection
Lexicography and Language Studies
Spam and Phishing Detection
Information Retrieval and Search Behavior
Misinformation and Its Impacts
Digital Communication and Language
Data Visualization and Analytics
Text and Document Classification Technologies
Linguistics and language evolution
Emotion and Mood Recognition
linguistics and terminology studies
Social Media and Politics
Scientific Computing and Data Management
Speech Recognition and Synthesis
Religious, Philosophical, and Educational Studies
Stock Market Forecasting Methods
Speech and dialogue systems

Jožef Stefan Institute
2015-2025

University of Edinburgh
2018-2019

Jožef Stefan International Postgraduate School
2015-2017

University of Antwerp
2017

University of Ljubljana
2010-2012

Knowledge graph informed fake news classification via heterogeneous representation ensembles

OPENALEX - Publications

Boshko Koloski Timen Stepišnik Perdih Marko Robnik–Šikonja Senja Pollak Blaž Škrlj

Increasing amounts of freely available data both in textual and relational form offers exploration richer document representations, potentially improving the model performance robustness. An emerging problem modern era is fake news detection -- many easily pieces information are not necessarily factually correct, can lead to wrong conclusions or used for manipulation. In this work we explore how different ranging from simple symbolic bag-of-words, contextual, neural language model-based ones...

10.1016/j.neucom.2022.01.096 article EN cc-by Neurocomputing 2022-01-29

Emotion recognition in low-resource settings: An evaluation of automatic feature selection methods

OPENALEX - Publications

Fasih Haider Senja Pollak Pierre Albert Saturnino Luz

10.1016/j.csl.2020.101119 article EN Computer Speech & Language 2020-06-02

Supervised and Unsupervised Neural Approaches to Text Readability

OPENALEX - Publications

Matej Martinc Senja Pollak Marko Robnik–Šikonja

Abstract We present a set of novel neural supervised and unsupervised approaches for determining the readability documents. In setting, we leverage language models, whereas in three different classification architectures are tested. show that proposed approach is robust, transferable across languages, allows adaptation to specific task data set. By systematic comparison several on number benchmark new labeled sets two this study also offers comprehensive analysis classification. expose their...

10.1162/coli_a_00398 article EN cc-by-nc-nd Computational Linguistics 2021-03-01

Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations

OPENALEX - Publications

Qingyu Chen Alexis Allot Robert Leaman Rezarta Islamaj Jingcheng Du and 34 more

Abstract The coronavirus disease 2019 (COVID-19) pandemic has been severely impacting global society since December 2019. related findings such as vaccine and drug development have reported in biomedical literature—at a rate of about 10 000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation interpretation. For instance, LitCovid is literature database COVID-19-related PubMed, which accumulated more than 200 with millions accesses each month by users...

10.1093/database/baac069 article EN cc-by Database 2022-01-01

TNT-KID: Transformer-based neural tagger for keyword identification

OPENALEX - Publications

Matej Martinc Blaž Škrlj Senja Pollak

Abstract With growing amounts of available textual data, development algorithms capable automatic analysis, categorization, and summarization these data has become a necessity. In this research, we present novel algorithm for keyword identification, that is, an extraction one or multiword phrases representing key aspects given document, called Transformer-Based Neural Tagger Keyword IDentification (TNT-KID). By adapting the transformer architecture specific task at hand leveraging language...

10.1017/s1351324921000127 article EN cc-by Natural Language Engineering 2021-06-10

Viewpoint detection on LGBT + reporting using contextual embeddings and qualitative thematic analysis: The use case on the word deep

OPENALEX - Publications

Matej Martinc Nina Perger Senja Pollak

This article presents an interdisciplinary study combining advanced natural language processing techniques by using contextual embeddings and manual thematic analysis. We analyse Slovenian news articles on LGBT + topics, focusing the differences in connotation of word deep, whose usage differs most between mainstream conservative media groups, according to system for automatic measuring changes based embedding. At content level, shows that, media, deep is predominantly used a conventional...

10.1177/07591063251317085 article EN Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 2025-02-13

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

OPENALEX - Publications

Matej Martinc Petra Kralj Novak Senja Pollak

We propose a new method that leverages contextual embeddings for the task of diachronic semantic shift detection by generating time specific word representations from BERT embeddings. The results our experiments in domain LiverpoolFC corpus suggest proposed has performance comparable to current state-of-the-art without requiring any consuming adaptation on large corpora. newly created Brexit news can be successfully used short-term yearly shift. And lastly, model also shows promising...

10.48550/arxiv.1912.01072 preprint EN cc-by arXiv (Cornell University) 2019-01-01

Tackling the ADReSS Challenge: A Multimodal Approach to the Automated Recognition of Alzheimer’s Dementia

OPENALEX - Publications

Matej Martinc Senja Pollak

10.21437/interspeech.2020-2202 article EN Interspeech 2022 2020-10-25

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

OPENALEX - Publications

Andraž Pelicon Marko Pranjić Dragana Miljković Blaž Škrlj Senja Pollak

In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given annotated dataset positive, neutral, and negative in Slovene, aim is to develop a classification system that assigns category not only Slovene news, but another language without any training data required. Our based on multilingual BERTmodel, while test different approaches for handling long documents propose novel technique enrichment BERT model as an intermediate step. With proposed approach,...

10.3390/app10175993 article EN cc-by Applied Sciences 2020-08-29

Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer's Diagnosis Based on Spontaneous Speech

OPENALEX - Publications

Matej Martinc Fasih Haider Senja Pollak Saturnino Luz

Background: Advances in machine learning (ML) technology have opened new avenues for detection and monitoring of cognitive decline. In this study, a multimodal approach to Alzheimer's dementia based on the patient's spontaneous speech is presented. This was tested standard, publicly available dataset comparability. The data comprise voice samples from 156 participants (1:1 ratio control), matched by age gender. Materials Methods: A recently developed Active Data Representation (ADR)...

10.3389/fnagi.2021.642647 article EN cc-by Frontiers in Aging Neuroscience 2021-06-14

Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?

OPENALEX - Publications

Hanh Thi Hong Tran Matej Martinc Andraž Repar Nikola Ljubešić Antoine Doucet and 1 more

Abstract Automatic term extraction (ATE) is a natural language processing task that eases the effort of manually identifying terms from domain-specific corpora by providing list candidate terms. In this paper, we treat ATE as sequence-labeling and explore efficacy XLMR in evaluating cross-lingual multilingual learning against monolingual cross-domain context. Additionally, introduce NOBI, novel annotation mechanism enabling labeling single-word nested Our experiments are conducted on ACTER...

10.1007/s10994-023-06506-7 article EN cc-by Machine Learning 2024-03-27

Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining

OPENALEX - Publications

Senja Pollak Roel Coesemans Walter Daelemans Nada Lavrač

Text mining aims at constructing classification models and finding interesting patterns in large text collections. This paper investigates the utility of applying these techniques to media analysis, more specifically support discourse analysis news reports about 2007 Kenyan elections post-election crisis local (Kenyan) Western (British US) newspapers. It illustrates how methods can assist by contrast which provide evidence for ideological differences between international press coverage. Our...

10.1075/prag.21.4.07pol article EN Pragmatics Quarterly Publication of the International Pragmatics Association (IPrA) 2015-02-18

SemEval-2020 Task 3: Graded Word Similarity in Context

OPENALEX - Publications

Carlos S. Armendariz Matthew Purver Senja Pollak Nikola Ljubešić Matej Ulčar and 2 more

This paper presents the Graded Word Similarity in Context (GWSC) task which asked participants to predict effects of context on human perception similarity English, Croatian, Slovene and Finnish. We received 15 submissions 11 system description papers. A new dataset (CoSimLex) was created for evaluation this task: it contains pairs words, each annotated within two different contexts. Systems beat baselines by significant margins, but few did well more than one language or subtask. Almost...

10.18653/v1/2020.semeval-1.3 article EN cc-by 2020-01-01

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

OPENALEX - Publications

Blaž Škrlj Matej Martinc Jan Kralj Nada Lavrač Senja Pollak

The use of background knowledge is largely unexploited in text classification tasks. This paper explores word taxonomies as means for constructing new semantic features, which may improve the performance and robustness learned classifiers. We propose tax2vec, a parallel algorithm taxonomy-based demonstrate its on six short problems: prediction gender, personality type, age, news topics, drug side effects effectiveness. constructed combination with fast linear classifiers, tested against...

10.1016/j.csl.2020.101104 article EN cc-by Computer Speech & Language 2020-04-29

Investigating cross-lingual training for offensive language detection

OPENALEX - Publications

Andraž Pelicon Ravi Shekhar Blaž Škrlj Matthew Purver Senja Pollak

Platforms that feature user-generated content (social media, online forums, newspaper comment sections etc.) have to detect and filter offensive speech within large, fast-changing datasets. While many automatic methods been proposed achieve good accuracies, most of these focus on the English language, are hard apply directly languages in which few labeled datasets exist. Recent work has therefore investigated use cross-lingual transfer learning solve this problem, training a model...

10.7717/peerj-cs.559 article EN cc-by PeerJ Computer Science 2021-06-25

autoBOT: evolving neuro-symbolic representations for explainable low resource text classification

OPENALEX - Publications

Blaž Škrlj Matej Martinc Nada Lavrač Senja Pollak

Abstract Learning from texts has been widely adopted throughout industry and science. While state-of-the-art neural language models have shown very promising results for text classification, they are expensive to (pre-)train, require large amounts of data tuning hundreds millions or more parameters. This paper explores how automatically evolved representations can serve as a basis explainable, low-resource branch with competitive performance that subject automated hyperparameter tuning. We...

10.1007/s10994-021-05968-x article EN cc-by Machine Learning 2021-04-14

Textual analysis of corporate sustainability reporting and corporate ESG scores

OPENALEX - Publications

Urša Ferjančič Riste Ichev Igor Lončarski Syrielle Montariol Andraž Pelicon and 5 more

10.1016/j.irfa.2024.103669 article EN cc-by International Review of Financial Analysis 2024-10-01

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

OPENALEX - Publications

Blaž Škrlj Jan Kralj Nada Lavrač Senja Pollak

Deep neural networks are becoming ubiquitous in text mining and natural language processing, but semantic resources, such as taxonomies ontologies, yet to be fully exploited a deep learning setting. This paper presents an efficient approach, which converts information related given set of documents into novel features that used for learning. The proposed Semantics-aware Recurrent Neural Architecture (SRNA) enables the system learn simultaneously from vectors raw documents. We test...

10.3390/make1020034 article EN cc-by Machine Learning and Knowledge Extraction 2019-04-04

E8-IJS@LT-EDI-ACL2022 - BERT, AutoML and Knowledge-graph backed Detection of Depression

OPENALEX - Publications

Ilija Tavchioski Boshko Koloski Blaž Škrlj Senja Pollak

Depression is a mental illness that negatively affects person’s well-being and can, if left untreated, lead to serious consequences such as suicide. Therefore, it important recognize the signs of depression early. In last decade, social media has become one most common places express one’s feelings. Hence, there possibility text processing applying machine learning techniques detect possible depression. this paper, we present our approaches solving shared task titled Detecting Signs from...

10.18653/v1/2022.ltedi-1.36 article EN cc-by 2022-01-01

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

OPENALEX - Publications

Carlos S. Armendariz Matthew Purver Matej Ulčar Senja Pollak Nikola Ljubešić and 3 more

State of the art natural language processing tools are built on context-dependent word embeddings, but no direct method for evaluating these representations currently exists. Standard tasks and datasets intrinsic evaluation embeddings based judgements similarity, ignore context; standard sense disambiguation take account context do not provide continuous measures meaning similarity. This paper describes an effort to build a new dataset, CoSimLex, intended fill this gap. Building pairwise...

10.48550/arxiv.1912.05320 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Coming Soon ...