- Topic Modeling
- Natural Language Processing Techniques
- Authorship Attribution and Profiling
- Spam and Phishing Detection
- Advanced Text Analysis Techniques
- Business Process Modeling and Analysis
- Academic integrity and plagiarism
- Semantic Web and Ontologies
- Text Readability and Simplification
- Service-Oriented Architecture and Web Services
- Text and Document Classification Technologies
- Sentiment Analysis and Opinion Mining
- Hate Speech and Cyberbullying Detection
- Handwritten Text Recognition Techniques
- Speech and dialogue systems
- Web Data Mining and Analysis
- Digital Communication and Language
- Names, Identity, and Discrimination Research
- Data Quality and Management
- Artificial Intelligence in Healthcare and Education
- Computational and Text Analysis Methods
- scientometrics and bibliometrics research
- Misinformation and Its Impacts
- Innovative Teaching and Learning Methods
- Video Analysis and Summarization
COMSATS University Islamabad
2016-2025
University of Sheffield
2012
The multi-label emotion classification task aims to identify all possible emotions in a written text that best represent the author's mental state. In recent years, attracted attention of researchers due its potential applications e-learning, health care, marketing, etc. There is need for standard benchmark corpora develop and evaluate methods. majority were developed English language (monolingual corpora) using tweets. However, problem not explored code-mixed text, example, Roman Urdu,...
Abstract Most existing studies are focused on popular languages like English, Spanish, Chinese, Japanese, and others, however, limited attention has been paid to Urdu despite having more than 60 million native speakers. In this paper, we develop a deep learning model for the sentiments expressed in under‐resourced language. We an open‐source corpus of 10,008 reviews from 566 online threads topics sports, food, software, politics, entertainment. The objectives work bi‐fold (a) creation...
Semantic word similarity is a quantitative method of determining how much two terms are contextually identical, which considerable challenge for computational linguistics. The research community has examined range approaches to address this issue. However, most these comparatively limited set languages, especially English. Research on semantic South Asian particularly Urdu, immature. In recent years, transformer-based have proved extremely successful language processing tasks. primary aim...
Due to vast digital data collections and paraphrasing tools, researchers have shown growing interest in Cross-lingual Paraphrase Detection (CLPD). Open-access tools make easier detection more challenging. Translation further exacerbate the issue by enabling effortless text translation across languages, leading increased cross-lingual paraphrasing. Most existing CLPD studies focus on European particularly English, while English-Urdu language pair remains underexplored due limited standard...
Paraphrasing involves rewording a text to maintain its meaning while using different language. In recent years, there has been growing interest among researchers in automatic paraphrase generation (APG). Previous studies have primarily focused on developing corpora and methods for APG tasks English other languages. However, is lack of comprehensive benchmark standardized specifically designed Urdu. To address this gap, study introduces two extensive corpora: the Urdu Phrasal Paraphrase...
Text reuse is the act of borrowing text from existing documents to create new texts. Freely available and easily accessible large online repositories are not only making more common in society but also harder detect. A major hindrance development evaluation existing/new mono-lingual detection methods, especially for South Asian languages, unavailability standardized benchmark corpora. Amongst other things, a gold standard corpus enables researchers directly compare state-of-the-art methods....
Abstract Gamification has gained much popularity in recent years the field of education. The use gaming elements, such as points, badges, and leaderboards, is suggested to increase motivation, which, turn, leads performance improvement. However, studies suggest that it not a “one‐size‐fits‐all” approach needs be tailored based on environment learner's psychology achieve effective results. This study evaluates gamification cultural context where most students come from rote learning...
Abstract This study describes a Natural Language Processing (NLP) toolkit, as the first contribution of larger project, for an under-resourced language—Urdu. In previous studies, standard NLP toolkits have been developed English and many other languages. There is also dire need text processing tools methods Urdu, despite it being widely spoken in different parts world with large amount digital readily available. presents version UNLT (Urdu Toolkit) which contains three key required Urdu...
Text reuse occurs when one borrows the text (either verbatim or paraphrased) from an earlier written text. A large and increasing amount of digital is easily readily available, making it simpler to but difficult detect. As a result, automatic detection has attracted attention research community due wide variety applications associated with it. To develop evaluate methods for detection, standard evaluation resources are required. In this paper, we propose such resource significantly...
The process of automatic identification an author's demographic traits like gender, age, native language, geographical location, personality type and others from his/her written text is termed as author profiling (AP). Currently, it has engaged the research community due to its promising uses in security, marketing, forensic, bogus account on public networks. A variety benchmark corpora (English text) released by PAN shared task used perform our experiments. This study presents a...
Word Sense Disambiguation (WSD) aims to automatically predict the correct sense of a word used in given context. All human languages exhibit ambiguity, and resolving this ambiguity can be difficult. Standard benchmark resources are required develop, compare, evaluate WSD techniques. These available for many languages, but not Urdu, despite being language with more than 300 million speakers large volumes text digitally. To fill gap, study proposes novel corpus Urdu All-Words task. The...
Abstract In the recent years, many benchmark author profiling corpora have been developed for various genres including Twitter, social media, blogs, hotel reviews and e-mail, etc. However, no such standard evaluation resource has Short Messaging Service (SMS), a popular medium of communication, which is very useful profiling. The primary aim this study to develop large multilingual (English Roman Urdu) SMS-based corpus. proposed corpus contains 810 profiles, wherein each profile consists an...
Cross-lingual plagiarism occurs when the source (or original) text(s) is in one language and plagiarized text another language. In recent years, cross-lingual detection has attracted attention of research community because a large amount digital easily accessible many languages through online repositories machine translation systems are readily available, making it easier to perform harder detect it. To develop evaluate systems, standard evaluation resources needed. The majority earlier...
The identification of duplicated and plagiarized passages text has become an increasingly active area research. In this paper, we investigate methods for plagiarism detection that aim to identify potential sources from MEDLINE, particularly when the original been modified through replacement words or phrases. A scalable approach based on Information Retrieval is used perform candidate document selection-the a subset source documents given suspicious text-from MEDLINE. Query expansion...