- Adversarial Robustness in Machine Learning
- Advanced Malware Detection Techniques
- Topic Modeling
- Advanced Data Storage Technologies
- Security and Verification in Computing
- Cloud Computing and Resource Management
- Privacy-Preserving Technologies in Data
- Network Security and Intrusion Detection
- Software Engineering Research
- Natural Language Processing Techniques
- Anomaly Detection Techniques and Applications
- Distributed and Parallel Computing Systems
- Software Testing and Debugging Techniques
- Explainable Artificial Intelligence (XAI)
- High-Energy Particle Collisions Research
- Caching and Content Delivery
- Cryptography and Data Security
- Web Application Security Vulnerabilities
- Advanced Graph Neural Networks
- Digital Media Forensic Detection
- Complex Network Analysis Techniques
- Face recognition and analysis
- Statistical Mechanics and Entropy
- Generative Adversarial Networks and Image Synthesis
- Digital and Cyber Forensics
Ningbo University
2022-2025
Zhejiang University
2020-2025
Zhejiang University of Science and Technology
2023-2025
Xi'an University of Technology
2023-2024
Guangdong Polytechnic Normal University
2013-2023
Shanxi University
2021-2023
Institute of Theoretical Physics
2021-2023
National University of Defense Technology
2022
State Key Laboratory of Quantum Optics and Quantum Optics Devices
2021-2022
Binzhou University
2022
Pre-trained general-purpose language models have been a dominating component in enabling real-world natural processing (NLP) applications. However, pre-trained model with backdoor can be severe threat to the Most existing attacks NLP are conducted fine-tuning phase by introducing malicious triggers targeted class, thus relying greatly on prior knowledge of task. In this paper, we propose new approach map inputs containing directly predefined output representation models, e.g., for...
Deep neural networks (DNNs) have demonstrated their outperformance in various domains. However, it raises a social concern whether DNNs can produce reliable and fair decisions especially when they are applied to sensitive domains involving valuable resource allocation, such as education, loan, employment. It is crucial conduct fairness testing before reliably deployed domains, i.e., generating many instances possible uncover violations. the existing methods still limited from three aspects:...
Vertical Federated Learning (VFL) is a trending collaborative machine learning model training solution. Existing industrial frameworks employ secure multi-party computation techniques such as homomorphic encryption to ensure data security and privacy. Despite these efforts, studies have revealed that leakage remains risk in VFL due the correlations between intermediate representations raw data. Neural networks can accurately capture correlations, allowing an adversary reconstruct This...
Recently, the use of large language models (LLMs) for Verilog code generation has attracted great research interest to enable hardware design automation. However, previous works have shown a gap between ability LLMs and practical demands description (HDL) engineering. This includes differences in how engineers phrase questions hallucinations generated. To address these challenges, we introduce HaVen, novel LLM framework designed mitigate align with practices HDL engineers. HaVen tackles...
DeepFakes pose a significant threat to our society. One representative DeepFake application is face-swapping, which replaces the identity in facial image with that of victim. Although existing methods partially mitigate these risks by degrading quality swapped images, they often fail disrupt transformation effectively. To fill this gap, we propose FaceSwapGuard (FSG), novel black-box defense mechanism against deepfake face-swapping threats. Specifically, FSG introduces imperceptible...
With the continuous advancement of machine learning, numerous malware detection methods that leverage this technology have emerged, presenting new challenges to generation adversarial malware. Existing function-preserving attacks fall short effectively modifying portable executable (PE) control flow graphs (CFGs), thereby failing bypass graph neural network (GNN) models utilize CFGs for detection. To solve issue, we introduce a novel base modification method called active opcode insertion,...
As the core of IoT devices, firmware is undoubtedly vital. Currently, development heavily depends on third-party components (TPCs), which significantly improves efficiency and reduces cost. Nevertheless, TPCs are not secure, vulnerabilities in will turn back influence security firmware. existing works pay less attention to caused by TPCs, we still lack a comprehensive understanding impact TPC vulnerability against To fill knowledge gap, design implement FirmSec, leverages syntactical...
Vertical federated learning (VFL) is an emerging privacy-preserving paradigm that enables collaboration between companies. These companies have the same set of users but different features. One them interested in expanding new business or improving its current service with others' For instance, e-commerce company, who wants to improve recommendation performance, can incorporate users' preferences from another corporation such as a social media company through VFL. On other hand, graph data...
In recent years, DeepFake technologies have seen widespread adoption in various domains, including entertainment and film production. However, they also been maliciously employed for disseminating false information engaging video fraud. Existing detection methods often experience significant performance degradation when confronted with unknown forgeries or exhibit limitations dealing low-quality images. To address this challenge, we introduce <italic...
The distributed file system, HDFS, is widely deployed as the bedrock for many parallel big data analysis. However, when running multiple applications over shared requests from different processes/executors will unfortunately be served in a surprisingly imbalanced fashion on storage servers. These access patterns among nodes are caused because a). unlike conventional system using striping policies to evenly distribute nodes, data-intensive such HDFS store each unit, referred chunk file, with...
In this paper, we aim to enable both efficient and accurate approximations on arbitrary sub-datasets of a large dataset. Due the prohibitive storage overhead caching offline samples for each sub-dataset, existing sample based systems provide high accuracy results only limited number sub-datasets, such as popular ones. On other hand, current online approximation systems, which generate at runtime, do not take into account uneven distribution sub-dataset. They work well uniform sub-dataset...
The energy allocation strategy is one of the most popular techniques in fuzzing to improve code coverage and vulnerability discovery. core intuition that fuzzers should allocate more computational seed files have high efficiency trigger unique paths crashes after mutation. Existing solutions usually define several properties, e.g., execution speed, file size, number triggered edges control flow graph, serve as key measurements their logics estimate potential a seed. property assumed be same...
With the rapid technology evolution of Internet Things (IoT) and increasing user needs, IoT device re-using becomes more common nowadays. For instance, than 300,000 used devices are selling on Craigslist. During re-using, sensitive data such as credentials biometrics residing in these may face risk leakage if a fails properly dispose data. Thus, critical security concern is raised: do (or can) users IoT? To best our knowledge, it still an unexplored problem that desires systematic study.In...
With the wide application and deployment of cloud computing in enterprises, virtualization developers security researchers are paying more attention to security. The core component products is hypervisor, which also known as virtual machine monitor (VMM) that can isolate multiple machines one host machine. However, compromising hypervisor lead escape elevation privilege, allowing attackers gain permission code execution host. Therefore, analysis vulnerability detection critical for...
Despite of its tremendous popularity and success in computer vision (CV) natural language processing, deep learning is inherently vulnerable to adversarial attacks which examples (AEs) are carefully crafted by imposing imperceptible perturbations on the clean deceive target neural networks (DNNs). Many defense solutions CV have been proposed. However, most them, e.g., training, suffer from a low generality due reliance limited AEs. Moreover, some even non-negligible negative impact...
Online Microlending, a new financial service, focuses on small loans without any sort of collateral. It provides more flexible and quicker funding for borrowers, as well higher interest rates return. For platforms that provide such services, an essential task is to adequately evaluate each loan's risk so minimize the possible loss. However, there exists special group namely <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">fraud-agents</i> ,...
Neural networks have become increasingly popular. Nevertheless, understanding their decision process turns out to be complicated. One vital method explain a models' behavior is feature attribution, i.e., attributing its pivotal features. Although many algorithms are proposed, most of them aim improve the faithfulness (fidelity) model. However, real environment contains random noises, which may cause attribution maps greatly perturbed for similar images. More seriously, recent works show that...
Mutation-based fuzzing is one of the most popular approaches to discover vulnerabilities in a program.To alleviate inefficiency mutation-based incurred by high randomness mutation process, multiple solutions are developed recent years, especially coverage-based fuzzing.They mainly employ adaptive strategies or integrate constraint-solving techniques make good exploration test cases which trigger unique paths and crashes.However, they lack fine-grained reusing history construct these...
In this paper, we study the problem of sub-dataset analysis over distributed file systems, e.g., Hadoop system. Our experiments show that sub-datasets distribution HDFS blocks, which is hidden by HDFS, can often cause corresponding analyses to suffer from a seriously imbalanced or inefficient parallel execution. Specifically, content clustering results in some computational nodes carrying out much more workload than others; furthermore, it leads sampling sub-datasets, as programs will read...