NFDI4DS | UHH-SEMS - Publication Details

Milan Gritta

ORCID: 0000-0003-0014-7275

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5017419164

Research Areas

Topic Modeling
Natural Language Processing Techniques
Speech and dialogue systems
Geographic Information Systems Studies
Multimodal Machine Learning Applications
AI in Service Interactions
Semantic Web and Ontologies
Advanced Text Analysis Techniques
Text Readability and Simplification
Speech Recognition and Synthesis
Neural Networks and Applications
Recommender Systems and Techniques
Linguistic Variation and Morphology
Software Engineering Research
Web Data Mining and Analysis
Translation Studies and Practices
Data Management and Algorithms
Biomedical Text Mining and Ontologies
Geological Modeling and Analysis
Cognitive Science and Education Research
Geochemistry and Geologic Mapping
Software Testing and Debugging Techniques
Fractal and DNA sequence analysis
Sentiment Analysis and Opinion Mining
Data-Driven Disease Surveillance

Huawei Technologies (United Kingdom)
2020-2023

Huawei Technologies (China)
2021

University of Cambridge
2017-2019

Center for Applied Linguistics
2017-2018

What’s missing in geographical parsing?

OPENALEX - Publications

Milan Gritta Mohammad Taher Pilehvar Nut Limsopatham Nigel Collier

Geographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information many real-world applications such as emergency responses, real-time social media event analysis, understanding location instructions auto-response systems and more. However, geoparsing is still widely regarded challenge because domain language diversity, name ambiguity, metonymic...

10.1007/s10579-017-9385-8 article EN cc-by Language Resources and Evaluation 2017-03-07

A pragmatic guide to geoparsing evaluation

OPENALEX - Publications

Milan Gritta Mohammad Taher Pilehvar Nigel Collier

Abstract Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by lack distinction between different types toponyms , which necessitates new guidelines, consolidation detailed toponym taxonomy with implications for Named Entity Recognition (NER) beyond. To address these deficiencies, our manuscript...

10.1007/s10579-019-09475-3 article EN cc-by Language Resources and Evaluation 2019-09-19

Which Melbourne? Augmenting Geocoding with Maps

OPENALEX - Publications

Milan Gritta Mohammad Taher Pilehvar Nigel Collier

The purpose of text geolocation is to associate geographic information contained in a document with set (or sets) coordinates, either implicitly by using linguistic features and/or explicitly metadata combined heuristics. We introduce geocoder (location mention disambiguator) that achieves state-of-the-art (SOTA) results on three diverse datasets exploiting the implicit lexical clues. Moreover, we propose new method for systematic encoding generate two distinct views same text. To end, Map...

10.18653/v1/p18-1119 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018-01-01

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

OPENALEX - Publications

Fenia Christopoulou Γεράσιμος Λάμπουρας Milan Gritta Guchun Zhang Yinpeng Guo and 17 more

We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. synthesis of programming solutions given natural problem description. train PanGu-Coder using two-stage strategy: first stage employs Causal Language Modelling (CLM) to pre-train on raw data, while second uses combination and Masked (MLM) training objectives that focus downstream task generation loosely curated pairs program definitions code functions....

10.48550/arxiv.2207.11280 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Conversation Graph: Data Augmentation, Training, and Evaluation for Non-Deterministic Dialogue Management

OPENALEX - Publications

Milan Gritta Γεράσιμος Λάμπουρας Ignacio Iacobacci

Task-oriented dialogue systems typically rely on large amounts of high-quality training data or require complex handcrafted rules. However, existing datasets are often limited in size con- sidering the complexity dialogues. Additionally, conventional signal in- ference is not suitable for non-deterministic agent behavior, namely, considering multiple actions as valid identical states. We propose Conversation Graph (ConvGraph), a graph-based representation dialogues that can be exploited...

10.1162/tacl_a_00352 article EN cc-by Transactions of the Association for Computational Linguistics 2021-02-01

Vancouver Welcomes You! Minimalist Location Metonymy Resolution

OPENALEX - Publications

Milan Gritta Mohammad Taher Pilehvar Nut Limsopatham Nigel Collier

Named entities are frequently used in a metonymic manner. They serve as references to related such people and organisations. Accurate identification interpretation of metonymy can be directly beneficial various NLP applications, Entity Recognition Geographical Parsing. Until now, resolution (MR) methods mainly relied on parsers, taggers, dictionaries, external word lists other handcrafted lexical resources. We show how minimalist neural approach combined with novel predicate window method...

10.18653/v1/p17-1115 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

XeroAlign: Zero-shot cross-lingual transformer alignment

OPENALEX - Publications

Milan Gritta Ignacio Iacobacci

The introduction of transformer-based crosslingual language models brought decisive improvements to multilingual NLP tasks.However, the lack labelled data has necessitated a variety methods that aim close gap high-resource languages.Zero-shot in particular, often use translated task as training signal bridge performance between source and target language(s).We introduce XeroAlign, simple method for taskspecific alignment cross-lingual pretrained transformers such XLM-R.XeroAlign uses...

10.18653/v1/2021.findings-acl.32 article EN cc-by 2021-01-01

Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

OPENALEX - Publications

Songbo Hu Han Zhou Mete Hergul Milan Gritta Guchun Zhang and 3 more

Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when goal create equitable, culturally adapted, large-scale ToD datasets multiple languages. Therefore, current still very scarce suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock landscape multilingual datasets,...

10.48550/arxiv.2307.14031 preprint EN other-oa arXiv (Cornell University) 2023-01-01

A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems

OPENALEX - Publications

Songbo Hu Han Zhou Moy Yuan Milan Gritta Guchun Zhang and 3 more

Achieving robust language technologies that can perform well across the world’s many languages is a central goal of multilingual NLP. In this work, we take stock and empirically analyse task performance disparities exist between task-oriented dialogue (ToD) systems. We first define new quantitative measures absolute relative equivalence in system performance, capturing within individual languages. Through series controlled experiments, demonstrate depend on number factors: nature ToD at...

10.18653/v1/2023.emnlp-main.422 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding

OPENALEX - Publications

Milan Gritta Ruoyu Hu Ignacio Iacobacci

Task-oriented personal assistants enable people to interact with a host of devices and services using natural language. One the challenges making neural dialogue systems available more users is lack training data for all but few languages. Zero-shot methods try solve this issue by acquiring task knowledge in high-resource language such as English aim transferring it low-resource language(s). To end, we introduce CrossAligner, principal method variety effective approaches zero-shot...

10.18653/v1/2022.findings-acl.319 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants

OPENALEX - Publications

Milan Gritta Γεράσιμος Λάμπουρας Ignacio Iacobacci

Language models (LMs) as conversational assistants recently became popular tools that help people accomplish a variety of tasks. These typically result from adapting LMs pretrained on general domain text sequences through further instruction-tuning and possibly preference optimisation methods. The evaluation such would ideally be performed using human judgement, however, this is not scalable. On the other hand, automatic featuring auxiliary judges and/or knowledge-based tasks scalable but...

10.48550/arxiv.2405.09186 preprint EN arXiv (Cornell University) 2024-05-15

Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency

OPENALEX - Publications

Leonidas Gee Milan Gritta Γεράσιμος Λάμπουρας Ignacio Iacobacci

Code Language Models have been trained to generate accurate solutions, typically with no regard for runtime. On the other hand, previous works that explored execution optimisation observed corresponding drops in functional correctness. To end, we introduce Code-Optimise, a framework incorporates both correctness (passed, failed) and runtime (quick, slow) as learning signals via self-generated preference data. Our is lightweight robust it dynamically selects solutions reduce overfitting while...

10.48550/arxiv.2406.12502 preprint EN arXiv (Cornell University) 2024-06-18

HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants

OPENALEX - Publications

Milan Gritta Γεράσιμος Λάμπουρας Ignacio Iacobacci

10.18653/v1/2024.naacl-long.456 article EN 2024-01-01

Mixture of Attentions For Speculative Decoding

OPENALEX - Publications

Matthieu Zimmer Milan Gritta Γεράσιμος Λάμπουρας Haitham Bou Ammar Jun Wang

The growth in the number of parameters Large Language Models (LLMs) has led to a significant surge computational requirements, making them challenging and costly deploy. Speculative decoding (SD) leverages smaller models efficiently propose future tokens, which are then verified by LLM parallel. Small that utilise activations from currently achieve fastest speeds. However, we identify several limitations SD including lack on-policyness during training partial observability. To address these...

10.48550/arxiv.2410.03804 preprint EN arXiv (Cornell University) 2024-10-04

Multi 3 WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

OPENALEX - Publications

Songbo Hu Han Zhou Mete Hergul Milan Gritta Guchun Zhang and 3 more

Abstract Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when goal create equitable, culturally adapted, large-scale ToD datasets multiple languages. Therefore, current still very scarce suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock landscape multilingual...

10.1162/tacl_a_00609 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

OPENALEX - Publications

Benjamin Minixhofer Milan Gritta Ignacio Iacobacci

Transfer learning has become the dominant paradigm for many natural language processing tasks. In addition to models being pretrained on large datasets, they can be further trained intermediate (supervised) tasks that are similar target task. For small Natural Language Inference (NLI) modelling is typically followed by pretraining a (labelled) NLI dataset before fine-tuning with each subtask. this work, we explore Gradient Boosted Decision Trees (GBDTs) as an alternative commonly used...

10.18653/v1/2021.findings-acl.26 preprint EN cc-by 2021-01-01

A Pragmatic Guide to Geoparsing Evaluation

OPENALEX - Publications

Milan Gritta Mohammad Taher Pilehvar Nigel Collier

Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real-world usage by lack distinction between different types toponyms, which necessitates new guidelines, consolidation detailed toponym taxonomy with implications for Named Entity Recognition (NER) beyond. To address these deficiencies, our manuscript introduces...

10.48550/arxiv.1810.12368 preprint EN other-oa arXiv (Cornell University) 2018-01-01

A Comparison of Techniques for Sentiment Classification of Film Reviews

OPENALEX - Publications

Milan Gritta

We undertake the task of comparing lexicon-based sentiment classification film reviews with machine learning approaches. look at existing methodologies and attempt to emulate improve on them using a 'given' lexicon bag-of-words approach. also utilise syntactical information such as part-of-speech dependency relations. will show that simple achieves good results however techniques prove be superior tool. more features do not necessarily deliver better performance well elaborate three further...

10.48550/arxiv.1905.04727 preprint EN other-oa arXiv (Cornell University) 2019-01-01

A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems

OPENALEX - Publications

Songbo Hu Han Zhou Moy Yuan Milan Gritta Guchun Zhang and 3 more

Achieving robust language technologies that can perform well across the world's many languages is a central goal of multilingual NLP. In this work, we take stock and empirically analyse task performance disparities exist between task-oriented dialogue (ToD) systems. We first define new quantitative measures absolute relative equivalence in system performance, capturing within individual languages. Through series controlled experiments, demonstrate depend on number factors: nature ToD at...

10.48550/arxiv.2310.12892 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coming Soon ...