Mohammadreza Ebrahimi

ORCID: 0000-0003-1367-3338
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Malware Detection Techniques
  • Adversarial Robustness in Machine Learning
  • Cybercrime and Law Enforcement Studies
  • Spam and Phishing Detection
  • Network Security and Intrusion Detection
  • Anomaly Detection Techniques and Applications
  • Digital and Cyber Forensics
  • Privacy-Preserving Technologies in Data
  • Topic Modeling
  • Privacy, Security, and Data Protection
  • Hate Speech and Cyberbullying Detection
  • Social Media and Politics
  • Fault Detection and Control Systems
  • Crime, Illicit Activities, and Governance
  • Terrorism, Counterterrorism, and Political Violence
  • Statistical and Computational Modeling
  • Data Analysis with R
  • Viral Infections and Outbreaks Research
  • Multimodal Machine Learning Applications
  • Parallel Computing and Optimization Techniques
  • Sentiment Analysis and Opinion Mining
  • Gaussian Processes and Bayesian Inference
  • AI in cancer detection
  • Security and Verification in Computing
  • Neural Networks and Applications

University of South Florida
2021-2024

University of Arizona
2018-2020

Concordia University
2016

Dark Net Marketplaces (DNMs), online selling platforms on the dark web, constitute a major component of underground economy. Due to anonymity and increasing accessibility these platforms, they are rich sources cyber threats such as hacking tools, data breaches, personal account information. As number products offered DNMs increases, researchers have begun develop automated machine learning-based threat identification approaches. A challenge in adopting an approach is that task typically...

10.1080/07421222.2020.1790186 article EN Journal of Management Information Systems 2020-07-02

International dark web platforms operating within multiple geopolitical regions and languages host a myriad of hacker assets such as malware, hacking tools, tutorials, malicious source code. Cybersecurity analytics organizations employ machine learning models trained on human-labeled data to automatically detect these bolster their situational awareness. However, the lack training is prohibitive when analyzing foreign-language content. In this research note, we adopt computational design...

10.25300/misq/2022/16618 article EN MIS Quarterly 2022-05-19

Automated monitoring of dark web (DW) platforms on a large scale is the first step toward developing proactive Cyber Threat Intelligence (CTI). While there are efficient methods for collecting data from surface web, large-scale collection often hindered by anti-crawling measures. In particular, text-based CAPTCHA serves as most prevalent and prohibiting type these measures in web. Text-based identifies blocks automated crawlers forcing user to enter combination hard-to-recognize alphanumeric...

10.1145/3505226 article EN ACM Transactions on Management Information Systems 2022-03-10

Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex tasks. Major companies with expensive infrastructures able develop and train these large models billions millions parameters from scratch. Third parties, researchers, practitioners increasingly adopting pre-trained fine-tuning them on their private data accomplish downstream However, it has been shown adversary can extract/reconstruct the exact training samples...

10.1109/icdmw58026.2022.00078 article EN 2022 IEEE International Conference on Data Mining Workshops (ICDMW) 2022-11-01

The frequency and costs of cyber-attacks are increasing each year. By the end 2019, total cost data breaches is expected to reach $2.1 trillion through ever-growing online presence enterprises their consumers. tools perform these attacks breached can often be purchased within Dark-net. Many threat actors this realm use its various platforms broker, discuss, strategize cyber-threat assets. To combat attacks, researchers developing Cyber-Threat Intelligence (CTI) proactively monitor hacker...

10.1109/isi.2019.8823501 article EN 2019-07-01

Recent advances in proactive cyber threat intelligence rely on early detection of threats hacker communities. Dark Net Markets (DNMs) are growing platforms community that provide hackers with highly- specialized tools and products which may not be found other platforms. While text classification techniques have been used for English DNMs, the task is hindered non-English due to language barrier lack ground-truth data. Current approaches use monolingual models machine translated data overcome...

10.1109/isi.2018.8587404 article EN 2018-11-01

Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect victims, researchers focus on collecting exposed PII assess privacy risk and identify at-risk individuals. However, existing studies mostly rely collected from either the dark web or surface web. Due wide exposure both web, only could result in an underestimation risk. Despite its research practical value, jointly sources is non-trivial task. In this...

10.1109/isi49825.2020.9280540 article EN 2020-11-09

Anti-malware engines are the first line of defense against malicious software. While widely used, feature engineering-based anti-malware vulnerable to unseen (zero-day) attacks. Recently, deep learning-based static detectors have achieved success in identifying attacks without requiring engineering and dynamic analysis. However, these susceptible malware variants with slight perturbations, known as adversarial examples. Generating effective examples is useful reveal vulnerabilities such...

10.48550/arxiv.2012.07994 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Recent machine learning- and deep learning-based static malware detectors have shown breakthrough performance in identifying unseen variants. As a result, they are increasingly being adopted to lower the cost of dynamic analysis manual signature identification. Despite their success, studies that can be vulnerable adversarial attacks, which an adversary modifies known executable subtly fool detector into recognizing it as benign file. automatically crafting these variants at scale is...

10.1109/spw53761.2021.00021 article EN 2021-05-01

Deep Learning (DL)-based malware detectors are increasingly adopted for early detection of malicious behavior in cybersecurity. However, their sensitivity to adversarial variants has raised immense security concerns. Generating such by the defender is crucial improving resistance DL-based against them. This necessity given rise an emerging stream machine learning research, Adversarial Malware example Generation (AMG), which aims generate evasive that preserve functionality a malware. Within...

10.1109/isi53945.2021.9624787 article EN 2021-11-02

Learning predictive models in new domains with scarce training data is a growing challenge modern supervised learning scenarios. This incentivizes developing domain adaptation methods that leverage the knowledge known (source) and adapt to (target) different probability distribution. becomes more challenging when source target are heterogeneous feature spaces, as (HDA). While most HDA utilize mathematical optimization map common space, they suffer from low transferability. Neural...

10.1109/tpami.2022.3163338 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-03-29

Cyber threat intelligence (CTI) necessitates automated monitoring of dark web platforms (e.g., Dark Net Markets and carding shops) on a large scale. While there are existing methods for collecting data from the surface web, large-scale collection is commonly hindered by anti-crawling measures. Text-based CAPTCHA serves as most prohibitive type these requires user to recognize combination hard-to-read characters. patterns intentionally designed have additional background noise variable...

10.1109/isi49825.2020.9280537 article EN 2020-11-09

The regularity of devastating cyber-attacks has made cybersecurity a grand societal challenge. Many professionals are closely examining the international Dark Web to proactively pinpoint potential cyber threats. Despite its potential, contains hundreds thousands non-English posts. While machine translation is prevailing approach process text, applying MT on hacker forum text results in mistranslations. In this study, we draw upon Long-Short Term Memory (LSTM), Cross-Lingual Knowledge...

10.1109/spw50608.2020.00021 article EN 2020-05-01

The information privacy of the Internet users has become a major societal concern. rapid growth online services increases risk unauthorized access to Personally Identifiable Information (PII) at-risk populations, who are unaware their PII exposure. To proactively identify populations and increase awareness, it is crucial conduct holistic assessment across internet. Current studies limited single platform within either surface web or dark web. A comprehensive requires matching exposed on...

10.1109/icdmw51313.2020.00072 article EN 2021 International Conference on Data Mining Workshops (ICDMW) 2020-11-01

Empowered by the recent development in Ma-chine Learning (ML), signatureless ML-based malware detectors present promising performance identifying unseen mal ware variants and zero days without requiring expensive dynamic analysis. However, it has been recently shown that are vulnerable to adversarial attacks, which an attacker modifies a known exe-cutable trick detector into recognizing modi-fied variant as benign. Adversarial example generation become emerging area ML studies creating...

10.1109/icdmw58026.2022.00079 article EN 2022 IEEE International Conference on Data Mining Workshops (ICDMW) 2022-11-01

Despite their recent successes, Transformer-based large language models show surprising failure modes. A well-known example of such modes is inability to length-generalize: solving problem instances at inference time that are longer than those seen during training. In this work, we further explore the root cause by performing a detailed analysis model behaviors on simple parity task. Our suggests length generalization failures intricately related model's perform random memory accesses within...

10.48550/arxiv.2408.05506 preprint EN arXiv (Cornell University) 2024-08-10

We consider multi-draft speculative sampling, where the proposal sequences are sampled independently from different draft models. At each step, a token-level selection scheme takes list of valid tokens as input and produces an output token whose distribution matches that target model. Previous works have demonstrated optimal (which maximizes probability accepting one tokens) can be cast solution to linear program. In this work we show decomposed into two-step solution: in first step...

10.48550/arxiv.2410.18234 preprint EN arXiv (Cornell University) 2024-10-23

This paper investigates a novel lossy compression framework operating under logarithmic loss, designed to handle situations where the reconstruction distribution diverges from source distribution. is especially relevant for applications that require joint and retrieval, in scenarios involving distributional shifts due processing. We show proposed formulation extends classical minimum entropy coupling by integrating bottleneck, allowing controlled degree of stochasticity coupling. explore...

10.48550/arxiv.2410.21666 preprint EN arXiv (Cornell University) 2024-10-28

As the Internet based applications become more and ubiquitous, drug retailing on Dark Net Marketplaces (DNMs) has raised public health law enforcement concerns due to its highly accessible anonymous nature. To combat illegal transaction among DNMs, authorities often require agents impersonate DNM customers in order identify key actors within community. This process can be costly time resource. Research DNMs have been conducted provide better understanding of characteristics sellers'...

10.1109/isi.2019.8823196 article EN 2019-07-01

Internet users have been exposing an increasing amount of Personally Identifiable Information (PII) on social media. Such exposed PII can be exploited by cybercriminals and cause severe losses to the users. Informing their exposure in media is crucial raise privacy awareness encourage them take protective measures. To this end, advanced techniques are needed extract users' automatically, whereas most existing studies remain manual. While Extraction (IE) used Deep Learning (DL)-based IE...

10.1109/isi53945.2021.9624678 article EN 2021-11-02
Coming Soon ...