- Advanced Malware Detection Techniques
- Network Security and Intrusion Detection
- Software Engineering Research
- Authorship Attribution and Profiling
- Topic Modeling
- Anomaly Detection Techniques and Applications
- Adversarial Robustness in Machine Learning
- Software Testing and Debugging Techniques
- Spam and Phishing Detection
- Hate Speech and Cyberbullying Detection
- Natural Language Processing Techniques
- Misinformation and Its Impacts
- Digital and Cyber Forensics
- Scientific Computing and Data Management
- Names, Identity, and Discrimination Research
- Software System Performance and Reliability
- Smart Grid Security and Resilience
- Physical Unclonable Functions (PUFs) and Hardware Security
- Online Learning and Analytics
- Security and Verification in Computing
- Reinforcement Learning in Robotics
- Distributed systems and fault tolerance
- Software Reliability and Analysis Research
- Cell Image Analysis Techniques
- Bluetooth and Wireless Communication Technologies
Queen's University
2019-2025
McGill University
2016-2025
Concordia University
2015
Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements released software. An assembly clone search engine facilitates work reverse engineers by identifying those duplicated or known parts. However, it challenging to design robust engine, since there exist various compiler optimization options code obfuscation techniques that make logically similar...
Authorship analysis (AA) is the study of unveiling hidden properties authors from textual data. It extracts an author's identity and sociolinguistic characteristics based on reflected writing styles in text. The process essential for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, most previous techniques critically depend manual feature engineering process. Consequently, choice set has been shown to be scenario- or...
Assembly code analysis is one of the critical processes for detecting and proving software plagiarism patent infringements when source unavailable. It also a common practice to discover exploits vulnerabilities in existing software. However, it manually intensive time-consuming process even experienced reverse engineers. An effective efficient assembly clone search engine can greatly reduce effort this process, since identify cloned parts that have been previously analyzed. The problem...
With the continuous miniaturization of electronic devices and recent advancements in wireless communication technologies, Unmanned Aerial Vehicles (UAVs), general, Small (SUAVs, a.k.a., drones), particular, are becoming progressively used by civilian sector within context a variety applications, bringing great convenience to public. However, due their resource-constrained nature, risky environmental application, way communication, drones not immune from cyberthreats. As consequence, security...
Finding similar code is important for software engineering, defense of intellectual property, and security, one the increasingly common ways adversaries use to defeat detection through obfuscations such as transformation scattering they wish hide amongst long sequences. Moving far enough apart poses a specific challenge solutions with localized features (e.g., n-grams) or attention mechanisms parts are distributed beyond local context window. We introduce neural network solution pattern...
Aerospace and defense industries are particularly vulnerable to cyber threats given their sensitive nature, significantly extending the consequences of security breaches national level. vehicles augmented by cooperative control, intelligent, connected, autonomous systems. The risk against such systems is further amplified due commonly relying on MIL-STD-1553 communication bus developed with a high focus reliability fault tolerance, albeit as second priority. (a.k.a., STANAG 3838 NATO)...
Software vulnerabilities have been posing tremendous reliability threats to the general public as well critical infrastructures, and there many studies aiming detect mitigate software defects at binary level. Most of standard practices leverage both static dynamic analysis, which several drawbacks like heavy manual workload high complexity. Existing deep learning-based solutions not only suffer capture complex relationships among different variables from raw code but also lack explainability...
The Internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Authorship attribution is study identifying actual author given documents based on text itself, and decades, many linguistic stylometry computational techniques have been extensively studied this purpose. However, most previous research...
Law enforcement faces problems in tracing the true identity of offenders cybercrime investigations. Most mask their identity, impersonate people high authority, or use deception and obfuscation tactics to avoid detection traceability. To address problem anonymity, authorship analysis is used identify individuals by writing styles without knowing actual identities. studies are dedicated English due its widespread over Internet, but recent cyber-attacks such as distribution Stuxnet indicate...
Users from all over the world increasingly adopt social media for newsgathering, especially during breaking news. Breaking news is an unexpected event that currently developing. Early stages of are usually associated with lots unverified information, i.e., rumors. Efficiently detecting and acting upon rumors in a timely fashion high importance to minimize their harmful effects. Yet, not have potential spread media. High-engaging those written manner ensures achievement highest prevalence...
Haohan Bo, Steven H. Ding, Benjamin C. M. Fung, Farkhund Iqbal. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.
Training a reinforcement learning agent to learn network penetration testing is challenging due the partially-observable, non-deterministic environment. The large action space leads extended training time, an issue of particular concern in mission-oriented deployment that requires timely hardening tests. Current solutions for automating are divided between (RL) and AI planning. This work integrates two paradigms establishes neuro-symbolic system through interactive symbolic logic engine. Two...
MIL-STD-1553 is a communication bus that has been used by many military avionics platforms, such as the F-15 and F-35 fighter jets, for almost 50 years. Recently, it become clear lack of security on requirement internet between planes revealed numerous potential attack vectors malicious parties. Prevention these attacks modernizing not practical due to applications existing far-reaching installations bus. We present software system can simulate transmissions create easy, replicable, large...
In recent years, the number of anonymous script-based fileless malware attacks, software copyright disputes, and code plagiarism issues has increased rapidly. literature, automated Code Authorship Analysis (CAA) techniques have been proposed to reduce manual effort in identifying those attacks issues. Most CAA aim solve task Attribution (AA), i.e., actual author a source fragment from given set candidate authors. However, many real-world scenarios, investigators do not predefined authors...
The impact of crisis events can be devastating in a multitude ways, many which are unpredictable due to the suddenness they occur. evolution social media (for example Twitter) has given directly affected individuals or those with valuable information platform effectively share their stories masses. As result, these platforms have become vast repositories helpful for emergency organizations. However, different often contain event-specific keywords, results difficult extraction useful single...
Most of privacy protection studies for textual data focus on removing explicit sensitive identifiers. However, personal writing style, as a strong indicator the authorship, is often neglected. Recent studies, such SynTF, have shown promising results privacy-preserving text mining. their anonymization algorithm can only output numeric term vectors which are difficult recipients to interpret. We propose novel generation model with two-set exponential mechanism authorship anonymization. By...