- Natural Language Processing Techniques
- Speech and dialogue systems
- Language, Linguistics, Cultural Analysis
- Topic Modeling
- Spam and Phishing Detection
- Sentiment Analysis and Opinion Mining
- Speech Recognition and Synthesis
- Linguistic Studies and Language Acquisition
- Text and Document Classification Technologies
- Advanced Text Analysis Techniques
- Terrorism, Counterterrorism, and Political Violence
- Phonetics and Phonology Research
- Translation Studies and Practices
- Semantic Web and Ontologies
- Service-Oriented Architecture and Web Services
- Algorithms and Data Compression
- Misinformation and Its Impacts
- Lexicography and Language Studies
- Text Readability and Simplification
- Machine Learning and Algorithms
- Imbalanced Data Classification Techniques
University of Sfax
2015-2025
Le Mans Université
2014-2018
Université Nantes Angers Le Mans
2017
The evolution of information and communication technology has markedly influenced between correspondents. This facilitated the transmission engendered new forms written (email, chat, SMS, comments, etc.). Most these messages comments are in Latin script, also called Arabizi . Moreover, language used social media SMS messaging is characterized by use informal non-standard vocabulary, such as repeated letters for emphasis, typos, abbreviations, nonlinguistic content like emoticons. Since...
Automatic sentiment analysis has become one of the fastest growing research areas in Natural Language Processing (NLP) field. Despite its importance, this is first work towards at both aspect and sentence levels for Tunisian Dialect field supermarkets. Therefore, we experimentally evaluate, paper, three deep learning methods, namely convolution neural networks (CNN), long short-term memory (LSTM), bi-directional long-short-term-memory (Bi-LSTM). Both LSTM Bi-LSTM constitute two major types...
Modern Standard Arabic, as well Arabic dialect languages, are usually written without diacritics. The absence of these marks constitute a real problem in the automatic processing data by NLP tools. Indeed, writing diacritics introduces several types ambiguity. First, word diacratics could have many possible meanings depending on their diacritization. Second, undiacritized surface forms an might 200 readings complexity its morphology [12]. In fact, agglutination property produce that can only...
Abstract This paper focuses on the problem of hierarchical multi‐label classification research papers, which is task assigning set relevant labels for a from hierarchy, using reduced amounts labelled training data. Specifically, we study leveraging unlabelled data, are usually plentiful and easy to collect, in addition few available ones semi‐supervised learning framework achieving better performance results. Thus, this paper, propose approach papers based well‐known Co‐training algorithm,...
Detecting radicalization on social networks is crucial to the fight against violent extremism and terrorism. In most cases, online has clear warning indicators that can be detected at early stages of process. this paper, we focus mining from messages by exploiting structured domain knowledge. More precisely, propose an approach automatically annotate with concepts a ontology. Annotations are then exploited within inference phase identify exhibiting indicator. We conducted set experiments...
Scientific research teams have immense valuable knowledge that need to be managed. Organizing scientific contributions of team members constitutes a major challenge for the monitoring evolution, member's competences discovery, and facilitating information retrieval processes. However, performing manual annotations is often time consuming labor-intensive task, especially in case complex annotation schemas. Currently, existing management systems focus on ensuring creation, sharing,...
The absence of diacritical marks in the Arabic texts generally leads to morphological, syntactic and semantic ambiguities.This can be more blatant when one deals with under-resourced languages, such as Tunisian dialect, which suffers from unavailability basic tools linguistic resources, like sufficient amount corpora, multilingual dictionaries, morphological analyzers.Thus, this language processing faces greater challenges due lack these resources.The automatic diacritization MSA text is...
In this paper, we describe the process of creating a statistical Language Model (LM) for Tunisian Dialect.Indeed, work is part realization Automatic Speech Recognition (ASR) system Railway Transport Network.Since our field has been limited, there are several words with similar behaviors (semantic example) but they do not have same appearance probability; their class groupings will therefore be possible.For these reasons, propose to build an n-class LM that based mainly on integration purely...