NFDI4DS | UHH-SEMS - Publication Details

FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters

OPENALEX - Publications

Wenqi Jiang Zhenhao He Shuai Zhang Kai Zeng Feng Liang and 6 more

We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation memory to different types bridging their connections by high-speed network, gains best both worlds, can naturally scale out adding nodes cluster. Experiments on three production models up 114 GB show that outperforms optimized...

10.1145/3447548.3467139 article EN 2021-08-12

AtRec: Accelerating Recommendation Model Training on CPUs

OPENALEX - Publications

Siqi Wang Tianyu Feng Hailong Yang Xin You Bangduo Chen and 3 more

The popularity of recommendation models and the enhanced AI processing capability CPUs have provided massive performance opportunities to deliver satisfactory experiences a large number users. Unfortunately, existing model training methods fail achieve high efficiency due unique challenges such as dynamic shape parallelism. To address above limitations, we comprehensively study distinctive characteristics discover several unexploited optimization opportunities. exploit opportunities, propose...

10.1109/tpds.2024.3381186 article EN IEEE Transactions on Parallel and Distributed Systems 2024-03-25

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

OPENALEX - Publications

Wenqi Jiang Zhenhao He Shuai Zhang Thomas B. Preußer Kai Zeng and 7 more

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, is memory-bound due to the many random memory accesses needed lookup embedding tables. The also heavily constrained terms of latency because producing a for user must be done about tens milliseconds. In this paper, we propose MicroRec, high-performance engine MicroRec accelerates by (1) redesigning data structures involved embeddings reduce number lookups and (2) taking...

10.48550/arxiv.2010.05894 preprint EN other-oa arXiv (Cornell University) 2020-01-01

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

OPENALEX - Publications

Wenqi Jiang Zhenhao He Shuai Zhang Thomas B. Preußer Kai Zeng and 7 more

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, is memory-bound due to the many random memory accesses needed lookup embedding tables. The also heavily constrained terms of latency because producing a for user must be done about tens milliseconds. In this paper, we propose MicroRec, high-performance engine MicroRec accelerates by (1) redesigning data structures involved embeddings reduce number lookups and (2) taking...

10.3929/ethz-b-000470540 article EN 2021-03-15

Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding

OPENALEX - Publications

Siqi Wang Hailong Yang X. L. Wang Tongxuan Liu Pengbo Wang and 8 more

Large language models (LLM) have recently attracted surging interest due to their outstanding capabilities across various domains. However, enabling efficient LLM inference is challenging its autoregressive decoding that generates tokens only one at a time. Although research works apply pruning or quantization speed up inference, they typically require fine-tuning the LLM, incurring significant time and economic costs. Meanwhile, speculative has been proposed use small (SSMs) accelerate of...

10.48550/arxiv.2402.15678 preprint EN arXiv (Cornell University) 2024-02-23

Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving

OPENALEX - Publications

Xin You Hailong Yang Siqi Wang Tao Peng Chen Ding and 6 more

10.1109/tc.2024.3449749 article EN IEEE Transactions on Computers 2024-08-28

An Algorithm for Decision Tree Construction Based on Degree of Rough Classification

OPENALEX - Publications

Qiongsheng Zhang Wu Ming-quan Tongxuan Liu Xiaowei Chen

In the process of decision tree construction, property division standards directly affect classification results. Aimed at weakness ID3 in nicety grading, we provide concept degree rough as select criteria separation property. The method took into account grading and dependency between condition attributes attributes. Compared with traditional based entropy, experiment proved that constructed our effectively improves

10.1109/aici.2010.246 article EN 2010-10-01