Elaine Zosa

ORCID: 0000-0003-2482-0663
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Computational and Text Analysis Methods
  • Natural Language Processing Techniques
  • Web Data Mining and Analysis
  • Advanced Text Analysis Techniques
  • Hate Speech and Cyberbullying Detection
  • Sentiment Analysis and Opinion Mining
  • Genomics and Phylogenetic Studies
  • Language and cultural evolution
  • Digital Marketing and Social Media
  • Pain Management and Placebo Effect
  • Bioinformatics and Genomic Networks
  • Microbial Natural Products and Biosynthesis
  • Digital Humanities and Scholarship
  • Algorithms and Data Compression
  • Expert finding and Q&A systems
  • Recommender Systems and Techniques
  • RNA and protein synthesis mechanisms
  • Protist diversity and phylogeny
  • Biomedical Text Mining and Ontologies
  • Machine Learning in Bioinformatics
  • Speech Recognition and Synthesis
  • Advanced Graph Neural Networks
  • Anxiety, Depression, Psychometrics, Treatment, Cognitive Processes
  • Data Quality and Management

University of Helsinki
2019-2023

Utrecht University
2023

Jožef Stefan Institute
2022

La Rochelle Université
2022

Tieto (Finland)
2021

Queen Mary University of London
2021

Centre National de la Recherche Scientifique
2020

Université Paris-Saclay
2020

Université Paris-Sud
2020

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur
2020

Naihui Zhou Yuxiang Jiang Timothy Bergquist Alexandra Lee Balint Z. Kacsoh and 95 more Alex W. Crocker Kimberley A. Lewis George P. Georghiou Huy Nguyen Md-Nafiz Hamid L. Taylor Davis Tunca Doğan Volkan Atalay Ahmet Süreyya Rifaioğlu Alperen Dalkıran Rengül Çetin-Atalay Chengxin Zhang Rebecca L. Hurto Peter L. Freddolino Yang Zhang Prajwal Bhat Fran Supek José M. Fernández Branislava Gemović Vladimir Perović Radoslav Davidović Neven Šumonja Nevena Veljković Ehsaneddin Asgari Mohammad R. K. Mofrad Giuseppe Profiti Castrense Savojardo Pier Luigi Martelli Rita Casadio Florian Boecker Heiko Schoof Indika Kahanda Natalie Thurlby Alice C. McHardy Alexandre Renaux Rabie Saidi Julian Gough Alex A. Freitas Magdalena Antczak Fábio Fabris Mark N. Wass Jie Hou Jianlin Cheng Zheng Wang Alfonso E. Romero Alberto Paccanaro Haixuan Yang Tatyana Goldberg Chenguang Zhao Liisa Holm Petri Törönen Alan Medlar Elaine Zosa Itamar Borukhov Ilya B. Novikov Angela D. Wilkins Olivier Lichtarge Po-Han Chi Wei-Cheng Tseng Michal Linial Peter W. Rose Christophe Dessimoz Vedrana Vidulin Sašo Džeroski Ian Sillitoe Sayoni Das Jonathan Lees David T. Jones Cen Wan Domenico Cozzetto Rui Fa Mateo Torres Alex Warwick Vesztrocy José Manuel Rodrı́guez Michael L. Tress Marco Frasca Marco Notaro Giuliano Grossi Alessandro Petrini Matteo Ré Giorgio Valentini Marco Mesiti Daniel B. Roche Jonas Reeb David W. Ritchie Sabeur Aridhi Seyed Ziaeddin Alborzi Marie‐Dominique Devignes Da Chen Emily Koo Richard Bonneau Vladimir Gligorijević Meet Barot Hai Fang Stefano Toppo Enrico Lavezzo

Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation protein function. Results Here, we report on results third CAFA challenge, CAFA3, that featured expanded analysis over previous rounds, both in terms volume data analyzed types performed. In a novel major new development, predictions assessment goals drove some experimental assays, resulting functional annotations for...

10.1186/s13059-019-1835-8 article EN cc-by Genome biology 2019-11-19
Naihui Zhou Yuxiang Jiang Timothy Bergquist Alexandra Lee Balint Z. Kacsoh and 95 more Alex W. Crocker Kimberley A. Lewis George P. Georghiou Huy Nguyen Md-Nafiz Hamid L. Taylor Davis Tunca Doğan Volkan Atalay Ahmet Süreyya Rifaioğlu Alperen Dalkıran Rengül Çetin-Atalay Chengxin Zhang Rebecca L. Hurto Peter L. Freddolino Yang Zhang Prajwal Bhat Fran Supek José M. Fernández Branislava Gemović Vladimir Perović Radoslav Davidović Neven Šumonja Nevena Veljković Ehsaneddin Asgari Mohammad RK Mofrad Giuseppe Profiti Castrense Savojardo Pier Luigi Martelli Rita Casadio Florian Boecker Indika Kahanda Natalie Thurlby Alice C. McHardy Alexandre Renaux Rabie Saidi Julian Gough Alex A. Freitas Magdalena Antczak Fábio Fabris Mark N. Wass Jie Hou Jianlin Cheng Jie Hou Zheng Wang Alfonso E. Romero Alberto Paccanaro Haixuan Yang Tatyana Goldberg Chenguang Zhao Liisa Holm Petri Törönen Alan Medlar Elaine Zosa Itamar Borukhov Ilya B. Novikov Angela D. Wilkins Olivier Lichtarge Po-Han Chi Wei-Cheng Tseng Michal Linial Peter W. Rose Christophe Dessimoz Vedrana Vidulin Sašo Džeroski Ian Sillitoe Sayoni Das Jonathan Lees David T. Jones Cen Wan Domenico Cozzetto Rui Fa Mateo Torres Alex Wiarwick Vesztrocy José Manuel Rodrı́guez Michael L. Tress Marco Frasca Marco Notaro Giuliano Grossi Alessandro Petrini Matteo Ré Giorgio Valentini Marco Mesiti Daniel B. Roche Jonas Reeb David W. Ritchie Sabeur Aridhi Seyed Ziaeddin Alborzi Marie‐Dominique Devignes Da Chen Emily Koo Richard Bonneau Vladimir Gligorijević Meet Barot Hai Fang Stefano Toppo Enrico Lavezzo

Abstract The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation protein function. Here we report on results third CAFA challenge, CAFA3, that featured expanded analysis over previous rounds, both in terms volume data analyzed types performed. In a novel major new development, predictions assessment goals drove some experimental assays, resulting functional annotations for more than 1000...

10.1101/653105 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2019-05-29

The way the words are used evolves through time, mirroring cultural or technological evolution of society. Semantic change detection is task detecting and analysing word in textual data, even short periods time. In this paper we focus on a new set methods relying contextualised embeddings, type semantic modelling that revolutionised NLP field recently. We leverage ability transformer-based BERT model to generate embeddings capable across Several approaches compared common setting order...

10.1145/3366424.3382186 preprint EN Companion Proceedings of the The Web Conference 2018 2020-04-20

This paper describes the approaches used by Discovery Team to solve SemEval-2020 Task 1 - Unsupervised Lexical Semantic Change Detection. The proposed method is based on clustering of BERT contextual embeddings, followed a comparison cluster distributions across time. best results were obtained an ensemble this and static Word2Vec embeddings. According official results, our approach proved for Latin in Subtask 2.

10.18653/v1/2020.semeval-1.6 article EN cc-by 2020-01-01

Dynamic topic models (DTMs) capture the evolution of topics and trends in time series data.Current DTMs are applicable only to monolingual datasets.In this paper we present multilingual dynamic model (ML-DTM), a novel that combines DTM with an existing modeling method crosslingual evolve across time.We results on parallel German-English corpus news articles comparable Finnish Swedish articles.We demonstrate capability ML-DTM track significant events related show it finds distinct performs as...

10.26615/978-954-452-056-4_159 article EN 2019-10-22

Words with the suffix-ism are reductionist terms that help us navigate complex social issues by using a simple one-word label for them. On one hand they often associated political ideologies, but on other present in many domains of language, especially culture, science, and religion. This has not always been case. paper studies isms historical record digitized newspapers from 1820 to 1917 published Finland find out how language developed historically. We use diachronic word embeddings...

10.46298/jdmdh.6159 article EN cc-by Journal of Data Mining & Digital Humanities 2020-12-18

This paper presents the results of SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate. Such cases overgeneration put in jeopardy many NLG applications, where correctness is often mission-critical. The was conducted with newly constructed dataset 4000 model labeled by 5 annotators each, spanning 3 NLP tasks: machine translation, paraphrase and definition modeling. tackled total 58 different users...

10.48550/arxiv.2403.07726 preprint EN arXiv (Cornell University) 2024-03-12

This paper addresses methodological issues in diachronic data analysis for historical research. We apply two families of topic models (LDA and DTM) on a relatively large set newspapers, with the aim capturing understanding discourse dynamics. Our case study focuses newspapers periodicals published Finland between 1854 1917, but our method can easily be transposed to any data. main contributions are a) combined sampling, training inference procedure applying huge imbalanced text collections;...

10.5617/dhnbpub.11235 article EN Digital Humanities in the Nordic and Baltic Countries Publications 2021-05-14

In this paper, we present the participation of EMBEDDIA team in SemEval-2022 Task 8 (Multilingual News Article Similarity). We cover several techniques and propose different methods for finding multilingual news article similarity by exploring dataset its entirety. take advantage textual content articles, provided metadata (e.g., titles, keywords, topics), translated images (those that were available), knowledge graph-based representations entities relations articles. We, then, compute...

10.18653/v1/2022.semeval-1.156 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2022-01-01

This paper is a part of collaboration between computer scientists and historians aimed at development novel methods for historical newspapers analysis.We present case study ideological terms ending with -ism suffix in nineteenthcentury Finnish newspapers.We propose two-step procedure to trace differences word usages over time: training diachronic embeddings on several time slices when clustering selected words together their neighbours obtain context.The obtained clusters turn out be useful...

10.26615/978-954-452-059-5_002 article EN 2019-12-15

This paper addresses methodological issues in diachronic data analysis for historical research. We apply two families of topic models (LDA and DTM) on a relatively large set newspapers, with the aim capturing understanding discourse dynamics. Our case study focuses newspapers periodicals published Finland between 1854 1917, but our method can easily be transposed to any data. main contributions are a) combined sampling, training inference procedure applying huge imbalanced text collections;...

10.48550/arxiv.2011.10428 preprint EN cc-by-nc-nd arXiv (Cornell University) 2020-01-01

This paper presents M3L-Contrast -- a novel multimodal multilingual (M3L) neural topic model for comparable data that maps texts from multiple languages and images into shared space. Our is trained jointly on takes advantage of pretrained document image embeddings to abstract the complexities between different modalities. As model, it produces aligned language-specific topics as infers textual representations semantic concepts in images. We demonstrate our competitive with zero-shot...

10.48550/arxiv.2211.08057 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

Moderation of reader comments is a significant problem for online news platforms.Here, we experiment with models automatic moderation, using dataset from popular Croatian newspaper.Our analysis shows that while violate the moderation rules mostly share common linguistic and thematic features, their content varies across different sections newspaper.We therefore make our topic-aware, incorporating semantic features topic model into classification decision.Our results show information improves...

10.26615/978-954-452-072-4_185 article EN 2021-01-01

Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems. Literature divided into two camps: While some argue that grounding allows for qualitatively different generalizations, others believe it can compensated by mono-modal data quantity. Limited empirical evidence emerged or against either position, which we is due methodological challenges come with studying its effects on NLP In this...

10.48550/arxiv.2310.11938 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems. Literature divided into two camps: While some argue that grounding allows for qualitatively different generalizations, others believe it can compensated by mono-modal data quantity. Limited empirical evidence emerged or against either position, which we is due methodological challenges come with studying its effects on NLP In this...

10.18653/v1/2023.findings-emnlp.736 article EN cc-by 2023-01-01
Coming Soon ...