Sandip Modha

ORCID: 0000-0003-2427-2433
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Hate Speech and Cyberbullying Detection
  • Topic Modeling
  • Natural Language Processing Techniques
  • Spam and Phishing Detection
  • Internet Traffic Analysis and Secure E-voting
  • Web Data Mining and Analysis
  • Sentiment Analysis and Opinion Mining
  • Advanced Text Analysis Techniques
  • Advanced Malware Detection Techniques
  • Bullying, Victimization, and Aggression
  • Complex Network Analysis Techniques
  • Public Relations and Crisis Communication
  • Misinformation and Its Impacts
  • Recommender Systems and Techniques
  • Semantic Web and Ontologies
  • Text and Document Classification Technologies
  • Library Science and Information Systems
  • Image Retrieval and Classification Techniques
  • Software Engineering Research
  • Academic integrity and plagiarism
  • Personal Information Management and User Behavior
  • Human Mobility and Location-Based Analysis
  • Seismology and Earthquake Studies
  • Video Analysis and Summarization
  • Handwritten Text Recognition Techniques

University of Milano-Bicocca
2024

Integrated Test Range
2021

Dhirubhai Ambani Institute of Information and Communication Technology
2016-2021

Indian Institute of Chemical Technology
2015-2018

The identification of Hate Speech in Social Media is great importance and receives much attention the text classification community. There a huge demand for research languages other than English. HASOC track intends to stimulate development Hindi, German Three datasets were developed from Twitter Facebook made available. Binary more fine-grained subclasses offered 3 subtasks. For all subtasks, 321 experiments submitted. approaches used most often LSTM networks processing word embedding...

10.1145/3368567.3368584 article EN 2019-12-12

This paper presents the HASOC track and its two parts. is dedicated to evaluate technology for finding Offensive Language Hate Speech. creating test collections languages with few resources English comparison. The first within has continued work from 2019 provided a testbed of Twitter posts Hindi, German English. second created Tamil Malayalam in native Latin script. Posts were extracted mainly Youtube Twitter. Both tracks have attracted much interest over 40 research groups participated as...

10.1145/3441501.3441517 article EN Forum for Information Retrieval Evaluation 2020-12-16

The HASOC track is dedicated to the evaluation of technology for finding Offensive Language and Hate Speech. creating a multilingual data corpus mainly English under-resourced languages(Hindi Marathi). This paper presents one subtrack with two tasks. In 2021, we organized classification task English, Hindi, Marathi. first consists tasks; Subtask 1A binary fine-grained into offensive non-offensive tweets. 1B asks classify tweets Hate, Profane offensive. Task 2 identifying given additional...

10.1145/3503162.3503176 article EN Forum for Information Retrieval Evaluation 2021-12-13

In recent years, the spread of online offensive content has become great concern, motivating researchers to develop robust systems capable identifying such automatically. To carry out a fair evaluation these systems, several international shared tasks have been organized, providing community with essential benchmark data and methods for various languages. Organized since 2019, HASOC (Hate Speech Offensive Content Identification) task is one initiatives. its fourth iteration, 2022 included...

10.1145/3574318.3574326 article EN 2022-12-09

With the growth of social media, spread hate speech is also increasing rapidly. Social media are widely used in many countries. Also Hate Speech spreading these This brings a need for multilingual detection algorithms. Much research this area dedicated to English at moment. The HASOC track intends provide platform develop and optimize algorithms Hindi, German English. dataset collected from Twitter archive pre-classified by machine learning system. has two sub-task all three languages: task...

10.48550/arxiv.2108.05927 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Modeling text in a numerical representation is prime task for any Natural Language Processing downstream such as classification. This paper attempts to study the effectiveness of schemes on classification task, aggressive detection, special case Hate speech from social media. Aggression levels are categorized into three predefined classes, namely: ‘Non-aggressive’ (NAG), ‘Overtly Aggressive’ (OAG), and ‘Covertly (CAG). Various based BoW techniques, word embedding, contextual sentence...

10.1080/0952813x.2021.1907792 article EN Journal of Experimental & Theoretical Artificial Intelligence 2021-04-24

This paper presents the participation of team DA-LD-Hildesheim Information Retrieval and Language Processing lab at DA-IICT, India in Semeval-19 OffenEval track. The aim this shared task is to identify offensive content fined-grained level granularity. divided into three sub-tasks. system required check whether social media posts contain any or profane not, targeted untargeted towards entity classifying individual, group other categories. Social suffer from data sparsity problem, Therefore,...

10.18653/v1/s19-2103 article EN 2019-01-01

This abstract provides a short overview of the first edition shared task on Indian Language Summarization (ILSUM) organized at 14th Forum for Information Retrieval Evaluation (FIRE 2022). A more detailed discussion is available in track paper. The objective this was to create benchmark data text summarization languages. included three languages Hindi, Gujarati, and English which an officially recognized dialect mainly used subcontinent. saw enthusiastic response, with registrations from over...

10.1145/3574318.3574328 article EN 2022-12-09

The recent success in language generation capabilities of large models (LLMs), such as GPT, Bard, Llama etc., can potentially lead to concerns about their possible misuse inducing mass agitation and communal hatred via generating fake news spreading misinformation. Traditional means developing a misinformation ground-truth dataset does not scale well because the extensive manual effort required annotate data. In this paper, we propose an LLM-based approach creating silver-standard datasets...

10.48550/arxiv.2401.04481 preprint EN cc-by arXiv (Cornell University) 2024-01-01

IIR 2024, the 14th Italian Information Retrieval Workshop, served as annual event for (IR) and Recommender Systems (RS) communities both in Italy collaborating with research institutions. This year's spanned two days featured studies on various topics within IR, RS, Large Language Models (LLMs). Key focus areas included enhanced retrieval models, personalized information systems, conversational interfaces user-centric comparative evaluations metrics, practical applications specific fields....

10.1145/3722449.3722464 article EN ACM SIGIR Forum 2024-12-01

The evaluation of content moderation systems requires reliable benchmark data. This task becomes particularly formidable for low-resource languages, where obtaining or curating such data poses significant challenges. Addressing this issue, HASOC 2023 organised various shared tasks focused on identifying offensive in languages. paper reports hate speech detection several Indo-Aryan languages—Assamese, Bengali, Gujarati, and Sinhala as well a Sino-Tibetan language, Bodo, which limited...

10.1145/3632754.3633278 article EN 2023-12-15

The widespread of offensive content online such as hate speech poses a growing societal problem. AI tools are necessary for supporting the moderation process at platforms. For evaluation these identification tools, continuous experimentation with data sets in different languages necessary. HASOC track (Hate Speech and Offensive Content Identification) is dedicated to develop benchmark this purpose. This paper presents subtrack English, Hindi, Marathi. set was assembled from Twitter. has two...

10.48550/arxiv.2112.09301 preprint EN cc-by arXiv (Cornell University) 2021-01-01
Coming Soon ...