NFDI4DS | UHH-SEMS - Publication Details

Ilias Chalkidis

ORCID: 0000-0002-0706-7772

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5079179621

Research Areas

Topic Modeling
Artificial Intelligence in Law
Natural Language Processing Techniques
Comparative and International Law Studies
Text and Document Classification Technologies
Legal Language and Interpretation
Legal Education and Practice Innovations
Sentiment Analysis and Opinion Mining
Advanced Text Analysis Techniques
European and International Law Studies
Stock Market Forecasting Methods
Machine Learning and Data Classification
Auditing, Earnings Management, Governance
Explainable Artificial Intelligence (XAI)
Artificial Intelligence in Healthcare and Education
Semantic Web and Ontologies
Banking stability, regulation, efficiency
Medical Imaging and Pathology Studies
Fibroblast Growth Factor Research
Imbalanced Data Classification Techniques
Occupational Health and Safety Research
Financial Reporting and XBRL
Domain Adaptation and Few-Shot Learning
Law, Economics, and Judicial Systems
Machine Learning in Healthcare

University of Copenhagen
2019-2024

Athens University of Economics and Business
2017-2023

University of Essex
2021-2023

Tilburg University
2023

Utrecht University
2023

Chicago Kent College of Law
2023

Illinois Institute of Technology
2023

Ludwig-Maximilians-Universität München
2023

Munich Center for Machine Learning
2023

Commonwealth Scientific and Industrial Research Organisation
2022

LEGAL-BERT: The Muppets straight out of Law School

OPENALEX - Publications

Ilias Chalkidis Manos Fergadiotis Prodromos Malakasiotis Νικόλαος Αλέτρας Ion Androutsopoulos

BERT has achieved impressive performance in several NLP tasks. However, there been limited investigation on its adaptation guidelines specialised domains. Here we focus the legal domain, where explore approaches for applying models to downstream tasks, evaluating multiple datasets. Our findings indicate that previous pre-training and fine-tuning, often blindly followed, do not always generalize well domain. Thus propose a systematic of available strategies when These are: (a) use original...

10.18653/v1/2020.findings-emnlp.261 article EN cc-by 2020-01-01

Neural Legal Judgment Prediction in English

OPENALEX - Publications

Ilias Chalkidis Ion Androutsopoulos Νικόλαος Αλέτρας

Legal judgment prediction is the task of automatically predicting outcome a court case, given text describing case’s facts. Previous work on using neural models for this has focused Chinese; only feature-based (e.g., bags words and topics) have been considered in English. We release new English legal dataset, containing cases from European Court Human Rights. evaluate broad variety establishing strong baselines that surpass previous three tasks: (1) binary violation classification; (2)...

10.18653/v1/p19-1424 article EN cc-by 2019-01-01

Large-Scale Multi-Label Text Classification on EU Legislation

OPENALEX - Publications

Ilias Chalkidis Emmanouil Fergadiotis Prodromos Malakasiotis Ion Androutsopoulos

We consider Large-Scale Multi-Label Text Classification (LMTC) in the legal domain. release a new dataset of 57k legislative documents from EUR-LEX, annotated with ∼4.3k EUROVOC labels, which is suitable for LMTC, few- and zero-shot learning. Experimenting several neural classifiers, we show that BIGRUs label-wise attention perform better than other current state art methods. Domain-specific WORD2VEC context-sensitive ELMO embeddings further improve performance. also find considering only...

10.18653/v1/p19-1636 article EN cc-by 2019-01-01

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

OPENALEX - Publications

Ilias Chalkidis Abhik Jana Dirk Hartung Michael James Bommarito Ion Androutsopoulos and 2 more

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Katz, Nikolaos Aletras. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.297 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Deep learning in law: early adaptation and legal word embeddings trained on large corpora

OPENALEX - Publications

Ilias Chalkidis Dimitrios Kampas

10.1007/s10506-018-9238-9 article EN Artificial Intelligence and Law 2018-12-11

Extracting contract elements

OPENALEX - Publications

Ilias Chalkidis Ion Androutsopoulos Achilleas Michos

We study how contract element extraction can be automated. provide a labeled dataset with gold annotations, along an unlabeled of contracts that used to pre-train word embeddings. Both datasets are provided in encoded form bypass privacy issues. describe and experimentally compare several methods use manually written rules linear classifiers (logistic regression, SVMs) hand-crafted features, embeddings, part-of-speech tag The best results obtained by hybrid method combines machine learning...

10.1145/3086512.3086515 article EN 2017-06-12

Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases

OPENALEX - Publications

Ilias Chalkidis Manos Fergadiotis Dimitrios Tsarapatsanis Νικόλαος Αλέτρας Ion Androutsopoulos and 1 more

Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.

10.18653/v1/2021.naacl-main.22 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

OPENALEX - Publications

Ilias Chalkidis Manos Fergadiotis Ion Androutsopoulos

We introduce MULTI-EURLEX, a new multilingual dataset for topic classification of legal documents. The comprises 65k European Union (EU) laws, officially translated in 23 languages, annotated with multiple labels from the EUROVOC taxonomy. highlight effect temporal concept drift and importance chronological, instead random splits. use as testbed zero-shot cross-lingual transfer, where we exploit training documents one language (source) to classify another (target). find that fine-tuning...

10.18653/v1/2021.emnlp-main.559 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Challenges and Strategies in Cross-Cultural NLP

OPENALEX - Publications

Daniel Hershcovich Stella Frank Heather Lent Miryam de Lhoneux Mostafa Abdou and 9 more

Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.482 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

MultiLegalPile: A 689GB Multilingual Legal Corpus

OPENALEX - Publications

Joel Niklaus Veton Matoshi Matthias Stürmer Ilias Chalkidis Daniel E. Ho

10.18653/v1/2024.acl-long.805 article RO 2024-01-01

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

OPENALEX - Publications

Ilias Chalkidis Manos Fergadiotis Sotiris Kotitsas Prodromos Malakasiotis Νικόλαος Αλέτρας and 1 more

Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.

10.18653/v1/2020.emnlp-main.607 article EN cc-by 2020-01-01

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

OPENALEX - Publications

Ilias Chalkidis Abhik Jana Dirk Hartung Michael James Bommarito Ion Androutsopoulos and 2 more

Law, interpretations of law, legal arguments, agreements, etc. are typically expressed in writing, leading to the production vast corpora text. Their analysis, which is at center practice, becomes increasingly elaborate as these collections grow size. Natural language understanding (NLU) technologies can be a valuable tool support practitioners endeavors. usefulness, however, largely depends on whether current state-of-the-art models generalize across various tasks domain. To answer this...

10.2139/ssrn.3936759 article EN SSRN Electronic Journal 2021-01-01

Swiss-Judgment-Prediction: A Multilingual Legal Judgment Prediction Benchmark

OPENALEX - Publications

Joel Niklaus Ilias Chalkidis Matthias Stürmer

In many jurisdictions, the excessive workload of courts leads to high delays. Suitable predictive AI models can assist legal professionals in their work, and thus enhance speed up process. So far, Legal Judgment Prediction (LJP) datasets have been released English, French, Chinese. We publicly release a multilingual (German, Italian), diachronic (2000-2020) corpus 85K cases from Federal Supreme Court Switzer- land (FSCS). evaluate state-of-the-art BERT-based methods including two variants...

10.18653/v1/2021.nllp-1.3 preprint EN cc-by 2021-01-01

Revisiting Transformer-based Models for Long Document Classification

OPENALEX - Publications

Xiang Dai Ilias Chalkidis Sune Darkner Desmond Elliott

The recent literature in text classification is biased towards short sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of transformers encode much longer text, namely sparse attention hierarchical encoding methods.We examine several...

10.18653/v1/2022.findings-emnlp.534 article EN cc-by 2022-01-01

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

OPENALEX - Publications

Lefteris Loukas Manos Fergadiotis Ilias Chalkidis Eirini Spyropoulou Prodromos Malakasiotis and 2 more

Publicly traded companies are required to submit periodic reports with eXtensive Business Reporting Language (XBRL) word-level tags. Manually tagging the is tedious and costly. We, therefore, introduce XBRL as a new entity extraction task for financial domain release FiNER-139, dataset of 1.1M sentences gold Unlike typical datasets, FiNER-139 uses much larger label set 139 types. Most annotated tokens numeric, correct tag per token depending mostly on context, rather than itself. We show...

10.18653/v1/2022.acl-long.303 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain

OPENALEX - Publications

Joel Niklaus Veton Matoshi Pooja Rani Andrea Galassi Matthias Stürmer and 1 more

Lately, propelled by phenomenal advances around the transformer architecture, legal NLP field has enjoyed spectacular growth. To measure progress, well-curated and challenging benchmarks are crucial. Previous efforts have produced numerous for general models, typically based on news or Wikipedia. However, these may not fit specific domains such as law, with its unique lexicons intricate sentence structures. Even though there is a rising need to build systems languages other than English,...

10.18653/v1/2023.findings-emnlp.200 article EN cc-by 2023-01-01

ChatGPT May Pass the Bar Exam Soon, but Has a Long Way to Go for the LexGLUE Benchmark

OPENALEX - Publications

Ilias Chalkidis

Following the hype around OpenAI's ChatGPT conversational agent, last straw in recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit latest GPT-3.5 model, 'gpt-3.5-turbo', first available LexGLUE benchmark a fashion providing examples templated instruction-following format. The results indicate achieves an average micro-F1 score 49.0% across tasks, surpassing baseline guessing rates. Notably, model performs exceptionally...

10.2139/ssrn.4385460 article EN SSRN Electronic Journal 2023-01-01

Obligation and Prohibition Extraction Using Hierarchical RNNs

OPENALEX - Publications

Ilias Chalkidis Ion Androutsopoulos Achilleas Michos

We consider the task of detecting contractual obligations and prohibitions. show that a self-attention mechanism improves performance BILSTM classifier, previous state art for this task, by allowing it to focus on indicative tokens. also introduce hierarchical BILSTM, which converts each sentence an embedding, processes embeddings classify sentence. Apart from being faster train, outperforms flat one, even when latter considers surrounding sentences, because model has broader discourse view.

10.18653/v1/p18-2041 article EN cc-by 2018-01-01

Extreme Multi-Label Legal Text Classification: A Case Study in

OPENALEX - Publications

Ilias Chalkidis Emmanouil Fergadiotis Prodromos Malakasiotis Νικόλαος Αλέτρας Ion Androutsopoulos

We consider the task of Extreme Multi-Label Text Classification (XMTC) in legal domain. release a new dataset 57k legislative documents from EURLEX, European Union’s public document database, annotated with concepts EUROVOC, multidisciplinary thesaurus. The is substantially larger than previous EURLEX datasets and suitable for XMTC, few-shot zero-shot learning. Experimenting several neural classifiers, we show that BIGRUs self-attention outperform current multi-label state-of-the-art...

10.18653/v1/w19-2209 article EN 2019-01-01

Textual Information and IPO Underpricing: A Machine Learning Approach

OPENALEX - Publications

Apostolos G. Katsafados George N. Leledakis Emmanouil G. Pyrgiotakis Ion Androutsopoulos Ilias Chalkidis and 1 more

This study examines the predictive power of textual information from S-1 filings in explaining initial public offering (IPO) underpricing. The authors' approach differs previous research because they utilize several machine learning algorithms to predict whether an IPO will be underpriced or not, as well magnitude Using a sample 2,481 US IPOs, find that can effectively complement financial variables terms prediction accuracy models use both sources data produce more accurate estimates. In...

10.3905/jfds.2023.1.121 article EN The Journal of Financial Data Science 2023-03-14

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

OPENALEX - Publications

Ilias Chalkidis Nicolas Garneau Cătălina Goanță Daniel Katz Anders Søgaard

In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine interplay between their original objective, acquired knowledge, and legal understanding capacities which define as upstream, probing, downstream performance, respectively. consider not only models' size but also pre-training corpora used important dimensions in our study. To end, release multinational English corpus (LeXFiles) knowledge probing benchmark...

10.18653/v1/2023.acl-long.865 article EN cc-by 2023-01-01

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

OPENALEX - Publications

Ilias Chalkidis Tommaso Pasini Sheng Zhang Letizia Tomada Sebastian Felix Schwemer and 1 more

Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Schwemer, Anders Søgaard. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.301 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

GREEK-BERT: The Greeks visiting Sesame Street

OPENALEX - Publications

John Koutsikakis Ilias Chalkidis Prodromos Malakasiotis Ion Androutsopoulos

Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual -based model for modern Greek. We evaluate three NLP tasks, i.e., part-of-speech tagging, named entity recognition, inference, obtaining...

10.1145/3411408.3411440 preprint EN 2020-09-01

Using textual analysis to identify merger participants: Evidence from the U.S. banking industry

OPENALEX - Publications

Apostolos G. Katsafados Ion Androutsopoulos Ilias Chalkidis Emmanouel Fergadiotis George N. Leledakis and 1 more

10.1016/j.frl.2021.101949 article EN Finance research letters 2021-01-26

Coming Soon ...