- Topic Modeling
- Adversarial Robustness in Machine Learning
- Fault Detection and Control Systems
- Software Engineering Research
- Real-time simulation and control systems
- Natural Language Processing Techniques
- Gastrointestinal disorders and treatments
- Gastric Cancer Management and Outcomes
- Advanced Malware Detection Techniques
- Digital and Cyber Forensics
- Image Retrieval and Classification Techniques
- Sensor Technology and Measurement Systems
- Advanced Neural Network Applications
- Software System Performance and Reliability
- Biomedical Text Mining and Ontologies
- Gastrointestinal Tumor Research and Treatment
- Web Application Security Vulnerabilities
- Ethics and Social Impacts of AI
- Scientific Measurement and Uncertainty Evaluation
- Anomaly Detection Techniques and Applications
Nanjing University
2022-2025
Peking University
2019
Peking University Cancer Hospital
2019
Code search is a widely used technique by developers during software development. It provides semantically similar implementations from large code corpus to based on their queries. Existing techniques leverage deep learning models construct embedding representations for snippets and queries, respectively. Features such as abstract syntactic trees, control flow graphs, etc., are commonly employed representing the semantics of snippets. However, same structure these features does not...
(Source) Code summarization aims to automatically generate summaries/comments for given code snippets in the form of natural language. Such summaries play a key role helping developers understand and maintain source code. Existing techniques can be categorized into extractive methods abstractive . The extract subset important statements keywords from snippet using retrieval summary that preserves factual details keywords. However, such may miss identifier or entity naming, consequently,...
Deep Neural Networks (DNNs) are becoming an integral part of many real-world applications, such as autonomous driving and financial management. While these models enable autonomy, there however concerns regarding their ethics in decision making. For instance, fairness is aspect that requires particular attention. A number testing techniques have been proposed to address this issue, e.g., by generating test cases called individual discriminatory instances for repairing DNNs. Although they...
To explore the intraperitoneal free cancer cell (IFCC) detection value of negative enrichment and immune fluorescence in situ hybridization (NEimFISH) on chromosomes (CEN) 8/17.To verify reliability NEimFISH, 29 gastric tumors, their adjacent tissues greater omental were tested. Our study then included 105 patients for IFCC. We defined as IFCC-positive if a signal was detected, regardless detailed numbers. A comparison clinicopathological features conducted among IFCC groups. also compared...
In this paper, we study a defense against poisoned encoders in SSL called distillation, which is used supervised learning originally. Distillation aims to distill knowledge from given model (a.k.a the teacher net) and transfer it another student net). Now, use benign pre-trained new encoder, resulting clean encoder. particular, conduct an empirical on effectiveness performance of distillation encoders. Using two state-of-the-art backdoor attacks image four commonly classification datasets,...
Self-supervised learning models are vulnerable to backdoor attacks. Existing attacks that effective in self-supervised often involve noticeable triggers, like colored patches, which human inspection. In this paper, we propose an imperceptible and attack against models. We first find existing triggers designed for supervised not as compromising then identify ineffectiveness is attributed the overlap distributions between augmented samples used learning. Building on insight, design using...
Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained can achieve nearly state-of-the-art performance. The by SSL, however, are vulnerable to backdoor attacks as demonstrated existing studies. Numerous mitigation techniques designed downstream task models. However, their effectiveness impaired and limited when adapted encoders, due the lack label information pre-training. To...
Language models for code (CodeLMs) have emerged as powerful tools code-related tasks, outperforming traditional methods and standard machine learning approaches. However, these are susceptible to security vulnerabilities, drawing increasing research attention from domains such software engineering, artificial intelligence, cybersecurity. Despite the growing body of focused on CodeLMs, a comprehensive survey in this area remains absent. To address gap, we systematically review 67 relevant...
Text-to-image diffusion models have shown an impressive ability to generate high-quality images from input textual descriptions. However, concerns been raised about the potential for these create content that infringes on copyrights or depicts disturbing subject matter. Removing specific concepts is a promising solution this problem. existing methods concept removal do not work well in practical but challenging scenarios where need be continuously removed. Specifically, lead poor alignment...
(Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play key role helping developers understand and maintain source code. Existing techniques can be categorized into extractive methods abstractive methods. The extract subset important statements keywords from using retrieval techniques, summary that preserves factual details keywords. However, such may miss identifier or entity naming,...