Xueqi Cheng

ORCID: 0009-0009-3990-4414
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Advanced Graph Neural Networks
  • Advanced Malware Detection Techniques
  • Network Security and Intrusion Detection
  • Internet Traffic Analysis and Secure E-voting
  • Privacy-Preserving Technologies in Data
  • Data Quality and Management
  • Multimodal Machine Learning Applications
  • Digital and Cyber Forensics
  • Persona Design and Applications
  • Caching and Content Delivery
  • Cryptography and Data Security
  • Topological and Geometric Data Analysis
  • Peer-to-Peer Network Technologies
  • Domain Adaptation and Few-Shot Learning
  • Complex Network Analysis Techniques
  • Text and Document Classification Technologies
  • Sentiment Analysis and Opinion Mining
  • Web Data Mining and Analysis
  • Law, AI, and Intellectual Property
  • Machine Learning and Data Classification
  • Library Science and Information Systems
  • Adversarial Robustness in Machine Learning
  • Digital Rights Management and Security

Institute of Computing Technology
2007-2025

University of Chinese Academy of Sciences
2022-2025

Vanderbilt University
2024-2025

Chinese Academy of Sciences
2010-2024

Beijing Institute of Technology
2012

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study natural language tasks still very limited. In this paper, we present a novel method for neural machine translation.Different from previous that randomly drop, swap or replace words with other sentence, softly augment chosen word sentence by contextual mixture multiple related words. More accurately, one-hot representation distribution (provided model) over...

10.18653/v1/p19-1555 preprint EN cc-by 2019-01-01

Recommender systems (RS) are effective tools for mitigating information overload and have seen extensive applications across various domains. However, the single focus on utility goals proves to be inadequate in addressing real-world concerns, leading increasing attention fairness-aware diversity-aware RS. While most existing studies explore fairness diversity independently, we identify strong connections between these two In this survey, first discuss each of them individually then dive...

10.1145/3664928 article EN ACM Transactions on Intelligent Systems and Technology 2024-05-21

Stance detection aims to identify whether the author of a text is in favor of, against, or neutral given target. The main challenge this task comes two-fold: few-shot learning resulting from varying targets and lack contextual information targets. Existing works mainly focus on solving second issue by designing attention-based models introducing noisy external knowledge, while first remains under-explored. In paper, inspired potential capability pre-trained language (PLMs) serving as...

10.1145/3477495.3531979 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022-07-06

10.7544/issn1000-1239.2015.20131340 article EN Journal of Computer Research and Development 2015-01-01

Retrieval-augmented generation (RAG) systems often suffer from performance degradation when encountering noisy or irrelevant documents, driving researchers to develop sophisticated training strategies enhance their robustness against such retrieval noise. However, as large language models (LLMs) continue advance, the necessity of these complex methods is increasingly questioned. In this paper, we systematically investigate whether robust remain necessary model capacity grows. Through...

10.48550/arxiv.2502.11400 preprint EN arXiv (Cornell University) 2025-02-16

10.1145/3701551.3707418 article EN 2025-02-26

The knowledge within large language models (LLMs) may become outdated quickly. While in-context editing (ICE) is currently the most effective method for (KE), it constrained by black-box modeling of LLMs and thus lacks interpretability. Our work aims to elucidate superior performance ICE on KE analyzing impacts new token-wise distributions. We observe that despite a significant boost in logits knowledge, still hindered stubborn knowledge. Stubborn refers as facts have gained excessive...

10.48550/arxiv.2405.11613 preprint EN arXiv (Cornell University) 2024-05-19

10.1109/icde60146.2024.00254 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2024-05-13

In the evolution process of World Wide Web, contents web pages play important roles because their direct effect on linking preference. this paper, we propose a model which combines vertex connectivity and content similarity in proportional manner. Analytical solutions indicate that our exhibits power-law degree distribution with variable exponent determined by weight similarity. Distribution connected pairs shows similar trend to be linked together. Simulation results show yields remarkably...

10.1109/wi.2007.33 article EN IEEE/WIC/ACM International Conference on Web Intelligence (WI'04) 2007-11-01

Private information leak has been discovered in many applications and become serious challenging problem to cyber security. Previous works are based on two categories of road map, the one focuses outbound network traffic application, other dives into inside flow application. We incorporate application dynamic behavior analysis with present abstract model called Privacy Petri Net (PPN) which is more applicable various meaningful users. apply our approach both real world regular malware. The...

10.1109/iscc.2012.6249324 article EN 2022 IEEE Symposium on Computers and Communications (ISCC) 2012-07-01

Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue. However, for the persona-based dialogue generation task, consistency and coherence are also key factors, which great challenges language models. Existing works mainly focus valuable data filtering, model structure modifying, or objective function designing, while their improvements limited hard to generalize all types of pre-trained we find that produce consistent coherent responses...

10.18653/v1/2023.acl-long.553 article EN cc-by 2023-01-01

Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge task that enjoys numerous real-world applications such as social network analysis cybersecurity, has not seen significant advancement. To address this gap, our study pioneers a comprehensive approach classification. We identify novel `Topological Imbalance Issue', which arises from skewed distribution edges across different classes,...

10.48550/arxiv.2406.11685 preprint EN arXiv (Cornell University) 2024-06-17

As Large Language Models (LLMs) become an important way of information seeking, there have been increasing concerns about the unethical content LLMs may generate. In this paper, we conduct a rigorous evaluation LLMs' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. Our attack methodology is inspired psychometric principles in cognitive and social psychology. We propose three approaches, i.e., Disguise, Deception, Teaching,...

10.48550/arxiv.2406.14023 preprint EN arXiv (Cornell University) 2024-06-20

The black-box nature of large language models (LLMs) poses challenges in interpreting results, impacting issues such as data intellectual property protection and hallucination tracing. Training attribution (TDA) methods are considered effective solutions to address these challenges. Most recent TDA rely on influence functions, assuming the model achieves minimized empirical risk. However, achieving this criterion is difficult, sourcing accuracy can be compromised by fitting errors during...

10.48550/arxiv.2410.01285 preprint EN arXiv (Cornell University) 2024-10-02

As Large Language Models (LLMs) grow increasingly powerful, ensuring their safety and alignment with human values remains a critical challenge. Ideally, LLMs should provide informative responses while avoiding the disclosure of harmful or sensitive information. However, current approaches, which rely heavily on refusal strategies, such as training models to completely reject prompts applying coarse filters are limited by binary nature. These methods either fully deny access information grant...

10.48550/arxiv.2410.02684 preprint EN arXiv (Cornell University) 2024-10-03

Empirical evidence suggests that LLMs exhibit spontaneous cross-lingual alignment. Our findings suggest although also demonstrate promising alignment in Information Extraction, there remains significant imbalance across languages, revealing an underlying deficiency the IE To address this issue, we propose AlignXIE, a powerful code-based LLM significantly enhances through two strategies. Firstly, AlignXIE formulates different especially non-English ones, as code generation tasks,...

10.48550/arxiv.2411.04794 preprint EN arXiv (Cornell University) 2024-11-07

10.18653/v1/2024.emnlp-main.698 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01
Coming Soon ...