Yu Cheng

ORCID: 0000-0003-4258-0499
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Adversarial Robustness in Machine Learning
  • Dark Matter and Cosmic Phenomena
  • Cosmology and Gravitation Theories
  • Particle physics theoretical and experimental studies
  • Speech and dialogue systems
  • Speech and Audio Processing
  • Computational Physics and Python Applications
  • Anomaly Detection Techniques and Applications
  • Scientific Research and Discoveries
  • Sentiment Analysis and Opinion Mining
  • Speech Recognition and Synthesis
  • Text Readability and Simplification
  • Advanced Graph Neural Networks
  • Recommender Systems and Techniques
  • Bacillus and Francisella bacterial research
  • Neural Networks and Applications
  • Psychological and Temporal Perspectives Research
  • Machine Learning in Healthcare
  • Neuroscience and Music Perception
  • Artificial Intelligence in Games
  • Relativity and Gravitational Theory
  • Quantum Mechanics and Applications
  • Adaptive Dynamic Programming Control

Ningbo University
2022-2025

Institute of Psychology, Chinese Academy of Sciences
2025

Microsoft Research (United Kingdom)
2022-2023

Shanghai Jiao Tong University
2023

Microsoft (Finland)
2023

Chinese University of Hong Kong
2023

Peking University
2022

Qingdao University
2022

Alibaba Group (United States)
2022

Suqian University
2022

Pre-trained language models such as BERT have proven to be highly effective for natural processing (NLP) tasks. However, the high demand computing resources in training hinders their application practice. In order alleviate this resource hunger large-scale model training, we propose a Patient Knowledge Distillation approach compress an original large (teacher) into equally-effective lightweight shallow network (student). Different from previous knowledge distillation methods, which only use...

10.48550/arxiv.1908.09355 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Recent studies show that pre-trained language models (LMs) are vulnerable to textual adversarial attacks. However, existing attack methods either suffer from low success rates or fail search efficiently in the exponentially large perturbation space. We propose an efficient and effective framework SemAttack generate natural text by constructing different semantic functions. In particular, optimizes generated perturbations constrained on generic spaces, including typo space, knowledge space...

10.18653/v1/2022.findings-naacl.14 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters a model, which becomes prohibitive when number are present. Therefore, many fine-tuning methods proposed to learn incremental updates weights parameter efficient way, e.g., low-rank increments. These often evenly distribute budget across weight matrices, and overlook varying importance different parameters. As consequence,...

10.48550/arxiv.2303.10512 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Linear sequence modeling approaches, such as linear attention, provide advantages like linear-time training and constant-memory inference over lengths. However, existing parallelism (SP) methods are either not optimized for the right-product-first feature of attention or use a ring-style communication strategy, which results in lower computation parallelism, limits their scalability longer sequences distributed systems. In this paper, we introduce LASP-2, new SP method to enhance both when...

10.48550/arxiv.2502.07563 preprint EN arXiv (Cornell University) 2025-02-11

With the global aging population, an increasing number of researchers are interested in intertemporal choice issues faced by older adults. Previous studies have examined how age-related differences time perception affect choices. However, impact strategy on decision-making among adults remains unclear. This study was designed to examine timing influence while also exploring possible mechanisms. We manipulated preferences through priming two experiments (Experiment 1, n = 160; Experiment 2,...

10.1080/13825585.2025.2459626 article EN Aging Neuropsychology and Cognition 2025-02-21

https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models Reasoning, a crucial ability for complex problem-solving, plays pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves fundamental methodology the field of Artificial General Intelligence (AGI). With ongoing development foundation models, there is growing interest exploring their abilities reasoning tasks. In this paper, we introduce seminal models...

10.31219/osf.io/ac4sp preprint EN 2023-12-13

The seesaw mechanism with three right-handed neutrinos has one as a well-motivated dark matter candidate if stable and the other two can explain baryon asymmetry via thermal leptogenesis scenario. We explore possibility of introducing additional particles to make neutrino in equilibrium freeze out through forbidden annihilation channel. Nowadays Universe, this channel be reactivated by strong gravitational potential such supermassive black hole our galaxy center. Fermi-LAT gamma ray data...

10.1103/physrevd.107.123013 article EN cc-by Physical review. D/Physical review. D. 2023-06-12

We introduce Zoomer, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs. Zoomer is designed tackling two challenges presented by massive user data Taobao: low training/serving efficiency due to huge scale of graphs, recommendation quality information overload which distracts model from specific intentions. achieves this introducing key concept, Region Interests (ROI) GNNs recommendations, i.e.,...

10.1109/icde53745.2022.00212 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2022-05-01

Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by query. All existing works first utilize sparse sampling strategy extract fixed number frames and then interact them with query for reasoning.However, we argue that these methods have overlooked two indispensable issues:1) Boundary-bias: The annotated target generally refers as corresponding start end timestamps. downsampling process may lose take adjacent irrelevant new...

10.18653/v1/2022.findings-emnlp.41 article EN cc-by 2022-01-01

Xuxi Chen, Tianlong Weizhu Ahmed Hassan Awadallah, Zhangyang Wang, Yu Cheng. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.acl-long.456 article EN cc-by 2023-01-01

Subword tokenization schemes are the dominant technique used in current NLP models. However, such can be rigid and tokenizers built on one corpus may not adapt well to other parallel corpora. It has also been observed that multilingual corpora, subword oversegment low-resource languages, leading a drop translation performance. An alternative is byte-based tokenization, i.e., into byte sequences using UTF-8 encoding scheme. Byte tokens often represent inputs at sub-character granularity,...

10.18653/v1/2023.acl-long.397 article EN cc-by 2023-01-01

We propose a new scenario that both the dark matter freeze-out in early Universe and its possible annihilation for indirect detection around supermassive black hole are enhanced by Breit-Wigner resonance. With mediator mass larger than total initial mass, this is almost forbidden at late times. Thus, stringent cosmic microwave background constraints do not apply. However, can accelerate particles to reactivate resonant whose subsequent decay photons leaves unique signal. The running...

10.48550/arxiv.2309.12043 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Adversarial training is so far the most effective strategy in defending against adversarial examples. However, it suffers from high computational costs due to iterative attacks each step. Recent studies show that possible achieve fast Training by performing a single-step attack with random initialization. such an approach still lags behind state-of-the-art algorithms on both stability and model robustness. In this work, we develop new understanding towards Fast Training, viewing...

10.48550/arxiv.2010.01278 preprint EN other-oa arXiv (Cornell University) 2020-01-01

With the development of social science and technology, artificial intelligence has been applied to many fields, translation provided great help for language learners. This paper analyzes necessity English learning, explores influence on proposes optimized learning modes which provide people involved.

10.1155/2022/7755297 article EN cc-by Discrete Dynamics in Nature and Society 2022-01-01

A new $U(1)_X$ gauge boson $X$ primarily interacting with a dark sector can have renormalizable kinetic mixing the standard model (SM) $U(1)_Y$ $Y$. This besides introduces interactions of photon and SM particles, it also modifies among particles. The modified be casted into oblique $S$, $T$ $U$ parameters. We find that mass larger than $Z$ mass, effects reduce tension W excess problem reported recently by CDF from $7σ$ deviation to within $3 σ$ compared theory prediction. If there is...

10.48550/arxiv.2204.10156 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks. instruction tuning has emerged as a successful strategy achieving by fine-tuning pre-trained models through instructions. As MLLMs grow complexity and size, the need parameter-efficient methods like Low-Rank Adaption (LoRA), which fine-tunes minimal set of...

10.48550/arxiv.2402.15896 preprint EN arXiv (Cornell University) 2024-02-24

In this paper, we revisit the f\'eeton (gauge boson of $U(1)_{B-L}$ symmetry) dark matter scenario, and first point out $U(1)$ gauge symmetry can be a linear combination $B-L$ SM hypercharge symmetries. With redefinition charge fermions, coupling between electron enhanced. After showing parameter space required from DM stability cosmic production, discuss potential for verifying them in direct detection experiments. The results show that future experiments, such as SuperCDMS, have...

10.48550/arxiv.2410.12554 preprint EN arXiv (Cornell University) 2024-10-16

Traffic sign recognition (TSR) system is essential for autonomous vehicle and vulnerable to security threats from adversarial attacks. The existing attacks TSR are invasive suffer poor concealment high computational complexity, thus have low feasibility in real-world scenarios. This paper proposes a non-invasive modulated LED illumination-based attack scheme. By generating luminance flashes imperceptible human eyes through fast intensity modulation of lighting such as streetlights exploiting...

10.1109/trustcom56396.2022.00139 article EN 2022-12-01

Electroweak precision observables are fundamentally important for testing the standard model (SM) or its extensions. The influences to from new physics within electroweak sector can be expressed in terms of oblique parameters S, T, U. recently reported W mass excess anomaly by CDF modifies these a significant way. By performing global fit with measurement data, we obtain $S=0.03 \pm 0.03$, $T=0.06 0.02$ and $U=0.16 0.03$ (or $S=0.14 $T=0.24 $U=0$) which is significantly away zero as SM would...

10.48550/arxiv.2208.06760 preprint EN other-oa arXiv (Cornell University) 2022-01-01

The seesaw mechanism with three right-handed neutrinos has one as a well-motivated dark matter candidate if stable and the other two can explain baryon asymmetry via thermal leptogenesis scenario. We explore possibility of introducing additional particles to make neutrino in equilibrium freeze out through forbidden annihilation channel. Nowadays Universe, this channel be reactivated by strong gravitational potential such supermassive black hole our galaxy center. Fermi-LAT gamma ray data...

10.48550/arxiv.2304.02997 preprint EN cc-by arXiv (Cornell University) 2023-01-01
Coming Soon ...