Mustafa Jarrar

ORCID: 0000-0003-4351-4207
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Semantic Web and Ontologies
  • Natural Language Processing Techniques
  • Topic Modeling
  • Advanced Database Systems and Queries
  • Service-Oriented Architecture and Web Services
  • Text Readability and Simplification
  • Business Process Modeling and Analysis
  • Language, Linguistics, Cultural Analysis
  • Data Quality and Management
  • Text and Document Classification Technologies
  • Authorship Attribution and Profiling
  • Lexicography and Language Studies
  • Biomedical Text Mining and Ontologies
  • Advanced Text Analysis Techniques
  • Speech and dialogue systems
  • Big Data and Business Intelligence
  • Logic, Reasoning, and Knowledge
  • Data Management and Algorithms
  • AI-based Problem Solving and Planning
  • Multi-Agent Systems and Negotiation
  • Dispute Resolution and Class Actions
  • Educational Systems and Policies
  • Mathematics, Computing, and Information Processing
  • Asian Culture and Media Studies
  • Hate Speech and Cyberbullying Detection

Vrije Universiteit Brussel
2001-2024

Birzeit University
2013-2024

American University of Beirut
2023

Syrian Virtual University
2023

Doha Institute for Graduate Studies
2023

Lebanese University
2021

University of Milano-Bicocca
2016

University of Cyprus
2004-2009

Victoria University of Bangladesh
2004-2008

Tecnológico de Monterrey
2008

Ontologies in current computer science parlance are based resources that represent agreed domain semantics. Unlike data models, the fundamental asset of ontologies is their relative independence particular applications, i.e. an ontology consists relatively generic knowledge can be reused by different kinds applications/tasks. The first part this paper concerns some aspects help to understand differences and similarities between models. In second we present engineering framework supports...

10.1145/637411.637413 article EN ACM SIGMOD Record 2002-12-01

research-article Share on A panoramic survey of natural language processing in the Arab world Authors: Kareem Darwish Hamad Bin Khalifa University, Doha, Qatar QatarView Profile , Nizar Habash New York University Abu Dhabi, United Emirates EmiratesView Mourad Abbas (CRSTDLA), Bouzareah, Algeria AlgeriaView Hend Al-Khalifa King Saud Riyadh, Saudi Arabia ArabiaView Huseein T. Al-Natsheh Mawdoo3, Jordan JordanView Houda Bouamor Carnegie Mellon Karim Bouzoubaa Mohammed V Rabat, Morocco...

10.1145/3447735 article EN Communications of the ACM 2021-03-22

This paper presents preliminary results in building an annotated corpus of the Palestinian Arabic dialect.The consists about 43K words, stemming from diverse resources.The discusses some linguistic facts dialect, compared with Modern Standard Arabic, especially terms morphological, orthographic, and lexical variations, suggests directions to resolve challenges these differences pose annotation goal.Furthermore, we present two pilot studies that investigate whether existing tools for...

10.3115/v1/w14-3603 article EN cc-by 2014-01-01

We present a formal Arabic wordnet built on the basis of carefully designed ontology hereby referred to as Ontology. The provides representation concepts that terms convey, and its content was with ontological analysis in mind, benchmarked scientific advances rigorous knowledge sources much this is possible, rather than only speakers' beliefs lexicons typically are. A comprehensive evaluation conducted thereby demonstrating current version top-levels can top majority meanings. consists...

10.3233/ao-200241 article EN Applied Ontology 2020-12-18

This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another mention. Wojood consists of about 550K Modern Standard (MSA) and dialect tokens that are manually annotated with 21 types including person, organization, location, event date. More importantly, the instead more common flat annotations. The data contains 75K 22.5% which nested. inter-annotator evaluation demonstrated strong agreement...

10.48550/arxiv.2205.09651 preprint EN cc-by arXiv (Cornell University) 2022-01-01

A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Models (LLMs) have gained significant attention due their ability process text with human-like fluency coherence, making them valuable for wide range data-related tasks fashioned as pipelines. The capabilities LLMs in natural language understanding generation, combined scalability, versatility, state-of-the-art performance, enable innovative applications across...

10.1145/3663741.3664785 article EN 2024-06-01

Accessing or integrating data lexicalized in different languages is a challenge. Multilingual lexical resources play fundamental role reducing the language barriers to map concepts languages. In this paper we present large-scale study on effectiveness of automatic translations support two key cross-lingual ontology mapping tasks: retrieval candidate matches and selection correct for inclusion final alignment. We conduct our experiments using four large gold standards, each one consisting...

10.1613/jair.4789 article EN cc-by Journal of Artificial Intelligence Research 2016-01-25

We present a query formulation language (called MashQL) in order to easily and fuse structured data on the web. The main novelty of MashQL is that it allows people with limited IT skills explore one (or multiple) sources without prior knowledge about schema, structure, vocabulary, or any technical details these sources. More importantly, be robust cover most cases practice, we do not assume source should have - an offline inline schema. This poses several language-design performance...

10.1109/tkde.2011.41 article EN IEEE Transactions on Knowledge and Data Engineering 2011-02-11

This article is motivated by the importance of building web data mashups. Building on remarkable success Web 2.0 mashups, and specially Yahoo Pipes, we generalize idea mashups regard Internet as a database. Each internet source seen table, mashup query these tables. We assume that sources are represented in RDF, SPARQL language.

10.1145/1458484.1458499 article EN 2008-10-30

Words in Arabic consist of letters and short vowel symbols called diacritics inscribed atop regular letters. Changing may change the syntax semantics a word; turning it into another. This results difficulties when comparing words based solely on string matching. Typically, NLP applications resort to morphological analysis battle ambiguity originating from this other challenges. In article, we introduce three alternative algorithms compare two with possibly different diacritics. We propose...

10.1145/3242177 article EN ACM Transactions on Asian and Low-Resource Language Information Processing 2018-12-14

Using pre-trained transformer models such as BERT has proven to be effective in many NLP tasks. This paper presents our work fine-tune for Arabic Word Sense Disambiguation (WSD). We treated the WSD task a sentence-pair binary classification task. First, we constructed dataset of labeled context-gloss pairs (~167k pairs) extracted from Ontology and large lexicographic database available at Birzeit University. Each pair was True or False target words each context were identified annotated....

10.26615/978-954-452-072-4_005 preprint EN 2021-01-01

Reasoning with ontologies is a challenging task specially for non-logic experts. When checking whether an ontology contains rules that contradict each other, current description logic reasoners can only provide list of the unsatisfiable concepts. Figuring out why these concepts are unsatisfiable, which cause conflicts, and how to resolve all left modeler himself. The problem becomes even more in case large or medium size ontologies, because concept may many its neighboring be unsatisfiable....

10.1142/s0218213008004072 article EN International Journal of Artificial Intelligence Tools 2008-08-01
Coming Soon ...