Fangcheng Fu

ORCID: 0000-0003-1658-0380
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Nuclear physics research studies
  • Nuclear Physics and Applications
  • Advanced Neural Network Applications
  • Stochastic Gradient Optimization Techniques
  • Privacy-Preserving Technologies in Data
  • Natural Language Processing Techniques
  • Topic Modeling
  • Advanced Image and Video Retrieval Techniques
  • Nuclear reactor physics and engineering
  • Atomic and Subatomic Physics Research
  • Advanced Graph Neural Networks
  • Parallel Computing and Optimization Techniques
  • Cryptography and Data Security
  • Complex Network Analysis Techniques
  • Advanced Data Storage Technologies
  • Machine Learning and Data Classification
  • Astronomical and nuclear sciences
  • Data Stream Mining Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Graph Theory and Algorithms
  • Quantum Chromodynamics and Particle Interactions
  • Neural Networks and Applications
  • Speech Recognition and Synthesis
  • Caching and Content Delivery
  • Neutrino Physics Research

Peking University
2005-2025

Hangzhou DAC Biotech (China)
2025

Software (Spain)
2023-2024

Hudson Institute
2020

John Wiley & Sons (United Kingdom)
2020

Tencent (China)
2019

Institute of Modern Physics
2009-2015

Chinese Academy of Sciences
2006-2015

China Institute of Atomic Energy
2013

State Key Laboratory of Nuclear Physics and Technology
2009

The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by advancements in model algorithms, the increasing scale foundation models, and availability ample high-quality datasets. While AIGC achieved remarkable performance, it still faces several challenges, such as difficulty maintaining up-to-date long-tail knowledge, risk data leakage, high costs associated with training inference. Retrieval-Augmented Generation(RAG) recently emerged a paradigm to address...

10.48550/arxiv.2402.19473 preprint EN arXiv (Cornell University) 2024-02-29

Abstract Shotgun metagenomics has become a pivotal technology in microbiome research, enabling in‐depth analysis of microbial communities at both the high‐resolution taxonomic and functional levels. This approach provides valuable insights diversity, interactions, their roles health disease. However, complexity data processing need for reproducibility pose significant challenges to researchers. To address these challenges, we developed EasyMetagenome, user‐friendly pipeline that supports...

10.1002/imt2.70001 article EN cc-by iMeta 2025-02-14

To address the challenge of explosive big data, distributed machine learning (ML) has drawn interests many researchers. Since ML algorithms trained by stochastic gradient descent (SGD) involve communicating gradients through network, it is important to compress transferred gradient. A category low-precision can significantly reduce size gradients, at expense some precision loss. However, existing methods are not suitable for cases where sparse and nonuniformly distributed. In this paper, we...

10.1145/3183713.3196894 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

With the ever-evolving concerns on privacy protection, vertical federated learning (FL), where participants own non-overlapping features for same set of instances, is becoming a heated topic since it enables multiple enterprises to strengthen machine models collaboratively with guarantees. Nevertheless, achieve preservation, FL algorithms involve complicated training routines and time-consuming cryptography operations, leading slow speed.

10.1145/3448016.3457241 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

Due to the rising concerns on privacy protection, how build machine learning (ML) models over different data sources with security guarantees is gaining more popularity. Vertical federated (VFL) describes such a case where ML are built upon private of participated parties that own disjoint features for same set instances, which fits many real-world collaborative tasks. Nevertheless, we find existing solutions VFL either support limited kinds input or suffer from potential leakage during...

10.1145/3514221.3526127 article EN Proceedings of the 2022 International Conference on Management of Data 2022-06-10

Gradient boosting decision tree (GBDT) is one of the most popular machine learning models widely used in both academia and industry. Although GBDT has been supported by existing systems such as XGBoost, LightGBM, MLlib, system bottleneck appears when dimensionality data becomes high. As a result, we tried to support our industrial partner on datasets dimension up 330K, observed suboptimal performance for all these aforementioned systems. In this paper, ask "Can build scalable training whose...

10.1145/3183713.3196892 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

The elastic scattering of ${}^{8}$B by a ${}^{\mathrm{nat}}$Pb target was measured at an incident energy 170.3 MeV. Special care taken with the limited intensity and broad profile secondary beam. angular distribution differential cross section shows that Coulomb-nuclear interference peak (CNIP) is not suppressed in this system, contrast to what observed neutron halo nuclei heavy targets energies around Coulomb barrier. Analyses were performed both terms optical model using...

10.1103/physrevc.87.044613 article EN Physical Review C 2013-04-29

Vertical federated learning (VFL) is an emerging paradigm that allows different parties (e.g., organizations or enterprises) to collaboratively build machine models with privacy protection. In the training phase, VFL only exchanges intermediate statistics, i.e., forward activations and backward derivatives, across compute model gradients. Nevertheless, due its geo-distributed nature, usually suffers from low WAN bandwidth. this paper, we introduce CELU-VFL, a novel efficient framework...

10.14778/3547305.3547316 article EN Proceedings of the VLDB Endowment 2022-06-01

Recent advancements in Large Language Models (LLMs) have led to increasingly diverse requests, accompanied with varying resource (compute and memory) demands serve them. However, this turn degrades the cost-efficiency of LLM serving as common practices primarily rely on homogeneous GPU resources. In response problem, work conducts a thorough study about LLMs over heterogeneous resources cloud platforms. The rationale is that different types exhibit distinct compute memory characteristics,...

10.48550/arxiv.2502.00722 preprint EN arXiv (Cornell University) 2025-02-02

Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long training poses great challenges considering the constraint of GPU memory. It not only leads substantial activation memory consumption during training, but also incurs considerable fragmentation. To facilitate existing frameworks adopted strategies such as recomputation and various forms parallelisms. Nevertheless, these techniques rely on redundant...

10.1145/3709703 article EN Proceedings of the ACM on Management of Data 2025-02-10

Recent developments in large language models (LLMs) have demonstrated their remarkable proficiency a range of tasks. Compared to in-house homogeneous GPU clusters, deploying LLMs cloud environments with diverse types GPUs is crucial for addressing the shortage problem and being more cost-effective. However, diversity network various on bring difficulties achieving high-performance serving. In this work, we propose ThunderServe, cost-efficient LLM serving system heterogeneous environments. We...

10.48550/arxiv.2502.09334 preprint EN arXiv (Cornell University) 2025-02-13

Vertical federated learning (VFL) trains model when the features of data samples are scattered over multiple clients. To improve efficiency, a promising approach is to find coreset and use it as smaller training set. However, existing methods produce large there many clients have long running time. address these problems, we propose HaCore for efficient construction in VFL setting. first employs locality sensitive hashing (LSH) map bit signatures locally on clients, then merges local...

10.1609/aaai.v39i21.34409 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving foundation advanced large-scale deep learning (DL) models. However, efficiently training these multiple GPUs remains a complex challenge due to abundance of parallelism options. Existing DL systems either require manual efforts design distributed plans or limit combinations constrained search space. In this paper, we present Galvatron-BMW, novel...

10.1109/tkde.2024.3370614 article EN IEEE Transactions on Knowledge and Data Engineering 2024-02-27

Due to the recent success of diffusion models, text-to-image generation is becoming increasingly popular and achieves a wide range applications. Among them, editing, or continuous generation, attracts lots attention can potentially improve quality generated images. It's common see that users may want slightly edit image by making minor modifications their input textual descriptions for several rounds inference. However, such an editing process suffers from low inference efficiency many...

10.1609/aaai.v38i15.29599 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

One of the paradigms under which split learning (SL) is used for vertical federated (VFL) setting, where two or more parties build models over feature-partitioned data. However, to protect private labels one party, random noises are needed perturb backward derivatives (i.e., gradients w.r.t. forward activations), incurs privacy-utility tradeoff. In this work, we introduce PROJPERT,, a novel algorithm that searches optimal "perturbation knobs" label protection in SL-based VFL. We formulate...

10.1109/tkde.2024.3349863 article EN IEEE Transactions on Knowledge and Data Engineering 2024-01-04

The first spectroscopic study for the $\ensuremath{\beta}$ decay of $^{21}\mathrm{N}$ is carried out based on $\ensuremath{\beta}--n$, $\ensuremath{\beta}--\ensuremath{\gamma}$, and $\ensuremath{\beta}--n--\ensuremath{\gamma}$ coincidence measurements. neutron-rich nuclei are produced by fragmentation $E/A=68.8$ MeV $^{26}\mathrm{Mg}$ primary beam a thick $^{9}\mathrm{Be}$ target implanted into thin plastic scintillator that also plays role detector. time flight emitted neutrons following...

10.1103/physrevc.80.054315 article EN Physical Review C 2009-11-24

With the commissioning of Cooler Storage Ring at Heavy Ion Research Facility in Lanzhou (HIRFL-CSR), a pilot experiment operating CSRe isochronous mode to test power HIRFL-CSR for measuring mass short-lived nucleus was performed December 2007. The transition point γ t is 1.395 which corresponds energy about 368 MeV/u ions with atomic number-to-charge ratio A/q = 2. fragments 2 36 Ar were injected into and their revolution frequencies measured fast time pick-up detector thin foil circulating...

10.1142/s0218301309012380 article EN International Journal of Modern Physics E 2009-02-01

Gradient boosting decision tree (GBDT) is a widely-used machine learning algorithm in both data analytic competitions and real-world industrial applications. Further, driven by the rapid increase volume, efforts have been made to train GBDT distributed setting support large-scale workloads. However, we find it surprising that existing systems manage training dataset different ways, but none of them studied impact management. To end, this paper aims study pros cons management methods...

10.14778/3342263.3342273 article EN Proceedings of the VLDB Endowment 2019-07-01

Network motif is a kind of frequently occurring subgraph that reflects local topology in graphs. Although network has been studied graph analytics, e.g., social and biological network, it yet unclear whether useful for analyzing online transaction generated applications such as instant messaging e-commerce. In this work, we analyze networks from the perspective motif. We define vertex features based on size-2 size-3 motifs, introduce motif-based centrality measurements. further design...

10.1145/3534678.3539096 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12
Coming Soon ...