Ilias Chalkidis

ORCID: 0000-0002-0706-7772
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Artificial Intelligence in Law
  • Natural Language Processing Techniques
  • Comparative and International Law Studies
  • Text and Document Classification Technologies
  • Legal Language and Interpretation
  • Legal Education and Practice Innovations
  • Sentiment Analysis and Opinion Mining
  • Advanced Text Analysis Techniques
  • European and International Law Studies
  • Stock Market Forecasting Methods
  • Machine Learning and Data Classification
  • Auditing, Earnings Management, Governance
  • Explainable Artificial Intelligence (XAI)
  • Artificial Intelligence in Healthcare and Education
  • Semantic Web and Ontologies
  • Banking stability, regulation, efficiency
  • Medical Imaging and Pathology Studies
  • Fibroblast Growth Factor Research
  • Imbalanced Data Classification Techniques
  • Occupational Health and Safety Research
  • Financial Reporting and XBRL
  • Domain Adaptation and Few-Shot Learning
  • Law, Economics, and Judicial Systems
  • Machine Learning in Healthcare

University of Copenhagen
2019-2024

Athens University of Economics and Business
2017-2023

University of Essex
2021-2023

Tilburg University
2023

Utrecht University
2023

Chicago Kent College of Law
2023

Illinois Institute of Technology
2023

Ludwig-Maximilians-Universität München
2023

Munich Center for Machine Learning
2023

Commonwealth Scientific and Industrial Research Organisation
2022

BERT has achieved impressive performance in several NLP tasks. However, there been limited investigation on its adaptation guidelines specialised domains. Here we focus the legal domain, where explore approaches for applying models to downstream tasks, evaluating multiple datasets. Our findings indicate that previous pre-training and fine-tuning, often blindly followed, do not always generalize well domain. Thus propose a systematic of available strategies when These are: (a) use original...

10.18653/v1/2020.findings-emnlp.261 article EN cc-by 2020-01-01

Legal judgment prediction is the task of automatically predicting outcome a court case, given text describing case’s facts. Previous work on using neural models for this has focused Chinese; only feature-based (e.g., bags words and topics) have been considered in English. We release new English legal dataset, containing cases from European Court Human Rights. evaluate broad variety establishing strong baselines that surpass previous three tasks: (1) binary violation classification; (2)...

10.18653/v1/p19-1424 article EN cc-by 2019-01-01

We consider Large-Scale Multi-Label Text Classification (LMTC) in the legal domain. release a new dataset of 57k legislative documents from EUR-LEX, annotated with ∼4.3k EUROVOC labels, which is suitable for LMTC, few- and zero-shot learning. Experimenting several neural classifiers, we show that BIGRUs label-wise attention perform better than other current state art methods. Domain-specific WORD2VEC context-sensitive ELMO embeddings further improve performance. also find considering only...

10.18653/v1/p19-1636 article EN cc-by 2019-01-01

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Katz, Nikolaos Aletras. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.297 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

We study how contract element extraction can be automated. provide a labeled dataset with gold annotations, along an unlabeled of contracts that used to pre-train word embeddings. Both datasets are provided in encoded form bypass privacy issues. describe and experimentally compare several methods use manually written rules linear classifiers (logistic regression, SVMs) hand-crafted features, embeddings, part-of-speech tag The best results obtained by hybrid method combines machine learning...

10.1145/3086512.3086515 article EN 2017-06-12

Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.

10.18653/v1/2021.naacl-main.22 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

We introduce MULTI-EURLEX, a new multilingual dataset for topic classification of legal documents. The comprises 65k European Union (EU) laws, officially translated in 23 languages, annotated with multiple labels from the EUROVOC taxonomy. highlight effect temporal concept drift and importance chronological, instead random splits. use as testbed zero-shot cross-lingual transfer, where we exploit training documents one language (source) to classify another (target). find that fine-tuning...

10.18653/v1/2021.emnlp-main.559 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.482 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.

10.18653/v1/2020.emnlp-main.607 article EN cc-by 2020-01-01

Law, interpretations of law, legal arguments, agreements, etc. are typically expressed in writing, leading to the production vast corpora text. Their analysis, which is at center practice, becomes increasingly elaborate as these collections grow size. Natural language understanding (NLU) technologies can be a valuable tool support practitioners endeavors. usefulness, however, largely depends on whether current state-of-the-art models generalize across various tasks domain. To answer this...

10.2139/ssrn.3936759 article EN SSRN Electronic Journal 2021-01-01

In many jurisdictions, the excessive workload of courts leads to high delays. Suitable predictive AI models can assist legal professionals in their work, and thus enhance speed up process. So far, Legal Judgment Prediction (LJP) datasets have been released English, French, Chinese. We publicly release a multilingual (German, Italian), diachronic (2000-2020) corpus 85K cases from Federal Supreme Court Switzer- land (FSCS). evaluate state-of-the-art BERT-based methods including two variants...

10.18653/v1/2021.nllp-1.3 preprint EN cc-by 2021-01-01

The recent literature in text classification is biased towards short sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of transformers encode much longer text, namely sparse attention hierarchical encoding methods.We examine several...

10.18653/v1/2022.findings-emnlp.534 article EN cc-by 2022-01-01

Publicly traded companies are required to submit periodic reports with eXtensive Business Reporting Language (XBRL) word-level tags. Manually tagging the is tedious and costly. We, therefore, introduce XBRL as a new entity extraction task for financial domain release FiNER-139, dataset of 1.1M sentences gold Unlike typical datasets, FiNER-139 uses much larger label set 139 types. Most annotated tokens numeric, correct tag per token depending mostly on context, rather than itself. We show...

10.18653/v1/2022.acl-long.303 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Lately, propelled by phenomenal advances around the transformer architecture, legal NLP field has enjoyed spectacular growth. To measure progress, well-curated and challenging benchmarks are crucial. Previous efforts have produced numerous for general models, typically based on news or Wikipedia. However, these may not fit specific domains such as law, with its unique lexicons intricate sentence structures. Even though there is a rising need to build systems languages other than English,...

10.18653/v1/2023.findings-emnlp.200 article EN cc-by 2023-01-01

Following the hype around OpenAI's ChatGPT conversational agent, last straw in recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit latest GPT-3.5 model, 'gpt-3.5-turbo', first available LexGLUE benchmark a fashion providing examples templated instruction-following format. The results indicate achieves an average micro-F1 score 49.0% across tasks, surpassing baseline guessing rates. Notably, model performs exceptionally...

10.2139/ssrn.4385460 article EN SSRN Electronic Journal 2023-01-01

We consider the task of detecting contractual obligations and prohibitions. show that a self-attention mechanism improves performance BILSTM classifier, previous state art for this task, by allowing it to focus on indicative tokens. also introduce hierarchical BILSTM, which converts each sentence an embedding, processes embeddings classify sentence. Apart from being faster train, outperforms flat one, even when latter considers surrounding sentences, because model has broader discourse view.

10.18653/v1/p18-2041 article EN cc-by 2018-01-01

We consider the task of Extreme Multi-Label Text Classification (XMTC) in legal domain. release a new dataset 57k legislative documents from EURLEX, European Union’s public document database, annotated with concepts EUROVOC, multidisciplinary thesaurus. The is substantially larger than previous EURLEX datasets and suitable for XMTC, few-shot zero-shot learning. Experimenting several neural classifiers, we show that BIGRUs self-attention outperform current multi-label state-of-the-art...

10.18653/v1/w19-2209 article EN 2019-01-01

This study examines the predictive power of textual information from S-1 filings in explaining initial public offering (IPO) underpricing. The authors' approach differs previous research because they utilize several machine learning algorithms to predict whether an IPO will be underpriced or not, as well magnitude Using a sample 2,481 US IPOs, find that can effectively complement financial variables terms prediction accuracy models use both sources data produce more accurate estimates. In...

10.3905/jfds.2023.1.121 article EN The Journal of Financial Data Science 2023-03-14

In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine interplay between their original objective, acquired knowledge, and legal understanding capacities which define as upstream, probing, downstream performance, respectively. consider not only models' size but also pre-training corpora used important dimensions in our study. To end, release multinational English corpus (LeXFiles) knowledge probing benchmark...

10.18653/v1/2023.acl-long.865 article EN cc-by 2023-01-01

Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Schwemer, Anders Søgaard. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.301 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual -based model for modern Greek. We evaluate three NLP tasks, i.e., part-of-speech tagging, named entity recognition, inference, obtaining...

10.1145/3411408.3411440 preprint EN 2020-09-01
Coming Soon ...