Tharindu Ranasinghe

ORCID: 0000-0003-3207-3821
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Hate Speech and Cyberbullying Detection
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Swearing, Euphemism, Multilingualism
  • Advanced Text Analysis Techniques
  • Bullying, Victimization, and Aggression
  • Misinformation and Its Impacts
  • Sentiment Analysis and Opinion Mining
  • Biomedical Text Mining and Ontologies
  • Advanced Malware Detection Techniques
  • Multimodal Machine Learning Applications
  • Web Data Mining and Analysis
  • Recommender Systems and Techniques
  • Authorship Attribution and Profiling
  • Text and Document Classification Technologies
  • Semantic Web and Ontologies
  • Spam and Phishing Detection
  • Software Engineering Research
  • Computational and Text Analysis Methods
  • Advanced Bandit Algorithms Research
  • Teaching and Learning Programming
  • Corporate Identity and Reputation
  • Speech Recognition and Synthesis
  • Adversarial Robustness in Machine Learning

Lancaster University
2024-2025

Aston University
2023-2024

George Mason University
2023

University of Luxembourg
2023

Rochester Institute of Technology
2023

University of Wolverhampton
2019-2022

University of Moratuwa
2019-2022

University of Sheffield
2021

Care International Sri Lanka
2019

Offensive content is pervasive in social media and a reason for concern to companies government organizations. Several studies have been recently published investigating methods detect the various forms of such (e.g. hate speech, cyberbulling, cyberaggression). The clear majority these deal with English partially because most annotated datasets available contain data. In this paper, we take advantage data by applying cross-lingual contextual word embeddings transfer learning make predictions...

10.18653/v1/2020.emnlp-main.470 article EN cc-by 2020-01-01

As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content developing systems to detect its different types: cyberbullying, hate speech, aggression, etc. With few notable exceptions, most research on this topic so far dealt English. This is mostly due the availability resources To address shortcoming, paper presents first Greek annotated dataset identification: Offensive Tweet...

10.48550/arxiv.2003.07459 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Offensive content is pervasive in social media and a reason for concern to companies government organizations. Several studies have been recently published investigating methods detect the various forms of such (e.g., hate speech, cyberbullying, cyberaggression). The clear majority these deal with English partially because most annotated datasets available contain data. In this article, we take advantage by applying cross-lingual contextual word embeddings transfer learning make predictions...

10.1145/3457610 article EN ACM Transactions on Asian and Low-Resource Language Information Processing 2021-11-10

Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result using neural-based architectures. However, majority these methods work only on language pair they are trained and need retraining for new pairs. This process can prove difficult from technical point view is usually computationally expensive. In this paper we propose simple QE framework based cross-lingual transformers, use it to implement evaluate two different neural Our...

10.18653/v1/2020.coling-main.445 article EN cc-by Proceedings of the 17th international conference on Computational linguistics - 2020-01-01

The HASOC track is dedicated to the evaluation of technology for finding Offensive Language and Hate Speech. creating a multilingual data corpus mainly English under-resourced languages(Hindi Marathi). This paper presents one subtrack with two tasks. In 2021, we organized classification task English, Hindi, Marathi. first consists tasks; Subtask 1A binary fine-grained into offensive non-offensive tweets. 1B asks classify tweets Hate, Profane offensive. Task 2 identifying given additional...

10.1145/3503162.3503176 article EN Forum for Information Retrieval Evaluation 2021-12-13

Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role many applications such as question answering, document summarisation, information retrieval and extraction.This paper evaluates Siamese recurrent architectures, special type of neural networks, are used here to measure STS.Several variants architecture compared with existing methods.

10.26615/978-954-452-056-4_116 article EN 2019-10-22

Tharindu Ranasinghe, Marcos Zampieri. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies: Demonstrations. 2021.

10.18653/v1/2021.naacl-demos.17 article EN cc-by 2021-01-01

In recent years, the spread of online offensive content has become great concern, motivating researchers to develop robust systems capable identifying such automatically. To carry out a fair evaluation these systems, several international shared tasks have been organized, providing community with essential benchmark data and methods for various languages. Organized since 2019, HASOC (Hate Speech Offensive Content Identification) task is one initiatives. its fourth iteration, 2022 included...

10.1145/3574318.3574326 article EN 2022-12-09

The widespread presence of offensive language on social media motivated the development systems capable recognizing such content automatically.Apart from a few notable exceptions, most research automatic identification has dealt with English.To address this shortcoming, we introduce MOLD 1 , Marathi Offensive Language Dataset.MOLD is first dataset its kind compiled for Marathi, thus opening new domain in lowresource Indo-Aryan languages.We present results several machine learning experiments...

10.26615/978-954-452-072-4_050 article EN 2021-01-01

Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language hate speech, an important problem in social media. In this paper, we present fBERT, a BERT model retrained on SOLID, largest English corpus available with over 1.4 million instances. We evaluate fBERT's identifying content multiple datasets test several thresholds for selecting instances from SOLID. The fBERT will be...

10.18653/v1/2021.findings-emnlp.154 preprint EN cc-by 2021-01-01

The pervasiveness of offensive content in social media has become an important reason for concern online platforms. With the aim improving safety, a large number studies applying computational models to identify such have been published last few years, with promising results. majority these studies, however, deal high-resource languages as English due availability datasets languages. Recent work addressed language identification from low-resource perspective, exploring data augmentation...

10.3390/info12080306 article EN cc-by Information 2021-07-29

Abstract Lexical Simplification (LS) is the task of substituting complex words within a sentence for simpler alternatives while maintaining sentence’s original meaning. LS lexical component Text (TS) systems with aim improving accessibility to various target populations such as individuals low literacy or reading disabilities. Prior surveys have been published several years before introduction transformers, transformer-based large language models (LLMs), and prompt learning that drastically...

10.1007/s10844-024-00882-9 article EN cc-by Journal of Intelligent Information Systems 2024-09-02

Abstract Machine translation (MT) is widely used to translate content on social media platforms aiming improve accessibility. A great part of the circulated user-generated and often contains non-standard spelling, hashtags, emojis that pose challenges MT systems. This leads many mistranslated instances are presented users these platforms, hindering their understanding written in other languages. In this paper, we investigate impact offensive language identification. We potential...

10.1007/s13278-024-01398-4 article EN cc-by Social Network Analysis and Mining 2025-01-09

This paper describes a novel research approach to detect type and target of offensive posts in social media using capsule network.The input the network was character embeddings combined with emoji embeddings.The evaluated on all subtasks SemEval-2019 Task 6: OffensEval: Identifying Categorizing Offensive Language Social Media.The evaluation also showed that even though networks have not been used commonly NLP tasks, they can outperform existing state art solutions for language detection media.

10.26615/978-954-452-056-4_056 article EN 2019-10-22

This paper presents the team TransQuest's participation in Sentence-Level Direct Assessment shared task WMT 2020. We introduce a simple QE framework based on cross-lingual transformers, and we use it to implement evaluate two different neural architectures. The proposed methods achieve state-of-the-art results surpassing obtained by OpenKiwi, baseline used task. further fine tune performing ensemble data augmentation. Our approach is winning solution all of language pairs according 2020...

10.48550/arxiv.2010.05318 preprint EN cc-by arXiv (Cornell University) 2020-01-01

In this paper, we describe the team BRUMS entry to OffensEval 2: Multilingual Offensive Language Identification in Social Media SemEval-2020. The organizers provided participants with annotated datasets containing posts from social media Arabic, Danish, English, Greek and Turkish. We present a multilingual deep learning model identify offensive language media. Overall, approach achieves acceptable evaluation scores, while maintaining flexibility between languages.

10.18653/v1/2020.semeval-1.251 article EN cc-by 2020-01-01

Tharindu Weerasooriya, Sujan Dutta, Ranasinghe, Marcos Zampieri, Christopher Homan, Ashiqur KhudaBukhsh. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

10.18653/v1/2023.emnlp-main.713 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Identifying informative tweets is an important step when building information extraction systems based on social media. WNUT-2020 Task 2 was organised to recognise from noise tweets. In this paper, we present our approach tackle the task objective using transformers. Overall, achieves 10th place in final rankings scoring 0.9004 F1 score for test set.

10.18653/v1/2020.wnut-1.49 article EN cc-by 2020-01-01

This paper presents the team BRUMS submission to SemEval-2020 Task 3: Graded Word Similarity in Context. The system utilises state-of-the-art contextualised word embeddings, which have some task-specific adaptations, including stacked embeddings and average embeddings. Overall, approach achieves good evaluation scores across all languages, while maintaining simplicity. Following final rankings, our is ranked within top 5 solutions of each language preserving 1st position Finnish subtask 2.

10.18653/v1/2020.semeval-1.16 article EN cc-by 2020-01-01

Abstract The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular, attracting over 1300 participating teams. two editions the task helped advance state art in offensive language identification by providing community with benchmark datasets Arabic, Danish, English, Greek, and Turkish. annotated using OLID hierarchical taxonomy, which since then has become de facto standard general research was widely used beyond OffensEval. We present a survey related competitions,...

10.1017/s1351324923000517 article EN cc-by Natural Language Engineering 2023-11-01

This paper describes the submissions by team HWR to Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set includes 16,674 YouTube comments written in Roman script containing code-mixed text with English and one of three South languages: Kannada, Malayalam, Tamil. We submitted results generated using two models, a Naive Bayes classifier adaptive language which has shown obtain competitive performance many dialect identification tasks,...

10.48550/arxiv.2103.05552 preprint EN cc-by arXiv (Cornell University) 2021-01-01
Coming Soon ...