- Natural Language Processing Techniques
- Topic Modeling
- semigroups and automata theory
- Speech and dialogue systems
- Syntax, Semantics, Linguistic Variation
- Algorithms and Data Compression
- Semantic Web and Ontologies
- Text Readability and Simplification
- Logic, programming, and type systems
- Speech Recognition and Synthesis
- linguistics and terminology studies
- Linguistic research and analysis
- Multimodal Machine Learning Applications
- Logic, Reasoning, and Knowledge
- Multi-Agent Systems and Negotiation
- Language, Metaphor, and Cognition
- Advanced Text Analysis Techniques
- Model-Driven Software Engineering Techniques
- DNA and Biological Computing
- Constraint Satisfaction and Optimization
- AI-based Problem Solving and Planning
- Sentiment Analysis and Opinion Mining
- Categorization, perception, and language
- Authorship Attribution and Profiling
- Gender Studies in Language
Heinrich Heine University Düsseldorf
2014-2023
University of Pavia
2023
Hochschule Düsseldorf University of Applied Sciences
2016-2021
Deutsche Nationalbibliothek
2021
Association for Computational Linguistics
2021
Chitose Institute of Science and Technology
2020
McGill University
2018
University of Tübingen
2000-2010
Société Française d'Allergologie
2008
Langues, Textes, Traitements Informatiques, Cognition
2003-2005
This paper describes the HHU-UH-G system submitted to EMNLP 2016 Second Workshop on Computational Approaches Code Switching.Our ranked first place for Arabic (MSA-Egyptian) with an F1-score of 0.83 and second Spanish-English 0.90.The introduces a novel unified neural network architecture language identification in code-switched tweets both MSA-Egyptian dialect.The makes use word character level representations identify code-switching.For dialect does not rely any kind language-specific...
The grammar framework presented in this paper combines Lexicalized Tree Adjoining Grammar (LTAG) with a (de)compositional frame semantics. We introduce elementary constructions as pairs of LTAG trees and decompositional frames. linking between syntax semantics can largely be captured by such since LTAG, represent full argument projections. Substitution adjunction the then trigger unification associated semantic frames, which are formally defined base-labelled feature structures. Moreover,...
The automated processing of Arabic Dialects is challenging due to the lack spelling standards and scarcity annotated data resources in general. Segmentation words into its constituent parts an important building block. In this paper, we show how a segmenter can be trained using only 350 tweets neural networks without any normalization or use lexical features resources. We deal with segmentation as sequence labeling problem at character level. experimentally that our model rival...
This paper presents the first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear Context-Free Rewriting Systems (PLCFRSs). LCFRS, an extension CFG, can describe discontinuities in straightforward way and is therefore natural candidate to be used data-driven parsing. To speed up parsing, we use different context-summary estimates parse items, some them allowing A* We evaluate our with grammars extracted from German NeGra treebank. Our experiments show that...
We propose a semantic construction method for Feature-Based Tree Adjoining Grammar which is based on the derived tree, compare it with related proposals and briefly discuss some implementation possibilities,
This paper presents Unsupervised Lexical Frame Induction, Task 2 of the International Workshop on Semantic Evaluation in 2019. Given a set prespecified syntactic forms context, task requires that verbs and their arguments be clustered to resemble semantic frame structures. Results are useful identifying polysemous words, i.e., those whose structures not easily distinguished, as well discerning relations arguments. unsupervised induction methods fell into two tracks: A) Verb Clustering based...
Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). 2017.
Large Language Models (LLMs), with their advanced architectures and training on massive language datasets, contain unexplored knowledge. One method to infer this knowledge is through the use of cloze-style prompts. Typically, these prompts are manually designed because phrasing impacts retrieval performance, even if LLM encodes desired information. In paper, we study impact prompt syntax capacity LLMs. We a template-based approach paraphrase simple into more complex grammatical structure....
This article addresses the problem that expressive power of tree-adjoining grammars (TAGs) is too limited to deal with certain syntactic phenomena, in particular, scrambling free-word-order languages. The TAG variants proposed so far order account for are not entirely satisfying. Therefore, introduces an alternative extension based on notion node sharing, so-called (restricted) tree-local multicomponent shared nodes (RSN-MCTAG). analysis some German data sketched show this can scrambling....
We present SAWT, a web-based tool for the annotation of token sequences with an arbitrary set labels.The key property is simplicity and ease use both annotators administrators.SAWT runs in any modern browser, including browsers on mobile devices, only has minimal server-side requirements.
We introduce TS-ANNO, an open-source web application for manual creation and evaluation of parallel corpora text simplification. TS-ANNO can be used i) sentence–wise alignment, ii) rating alignment pairs (e.g., w.r.t. grammaticality, meaning preservation, ...), iii) annotating simplification transformations lexical substitution, sentence splitting, iv) complex documents. For evaluation, calculates inter-annotator agreement alignments (i) annotations (ii).
In this paper, we investigate to which extent contextual neural language models (LMs) implicitly learn syntactic structure. More concretely, focus on constituent structure as represented in the Penn Treebank (PTB). Using standard probing techniques based diagnostic classifiers, assess accuracy of representing constituents different categories within neuron activations a LM such RoBERTa. order make sure that our probe focuses knowledge and not implicit semantic generalizations, also...
In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to of several mildly context-sensitive formalisms. This currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component with Tree Tuples (TT-MCTAG)) allows computation not only syntactic structures, but also corresponding semantic representations. It is used for...