NFDI4DS | UHH-SEMS - Publication Details

Zhenyuan Ruan

ORCID: 0009-0006-7851-680X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5069608616

Research Areas

Cloud Computing and Resource Management
Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Interconnection Networks and Systems
Caching and Content Delivery
Genomics and Phylogenetic Studies
Advanced Database Systems and Queries
Advanced Memory and Neural Computing
Algorithms and Data Compression
Semiconductor Lasers and Optical Devices
IoT and Edge/Fog Computing
Evolutionary Algorithms and Applications
Embedded Systems Design Techniques

IIT@MIT
2023

Moscow Institute of Thermal Technology
2020

University of California, Los Angeles
2018-2019

UCLA Health
2019

Microsoft Research (India)
2017

Microsoft Research (United Kingdom)
2017

KV-Direct

OPENALEX - Publications

Bojie Li Zhenyuan Ruan Wencong Xiao Yuanwei Lu Yongqiang Xiong and 3 more

Performance of in-memory key-value store (KVS) continues to be great importance as modern KVS goes beyond the traditional object-caching workload and becomes a key infrastructure support distributed main-memory computation in data centers. Recent years have witnessed rapid increase network bandwidth centers, shifting bottleneck most from CPU. RDMA-capable NIC partly alleviates problem, but primitives provided by RDMA abstraction are rather limited. Meanwhile, programmable NICs become...

10.1145/3132747.3132756 article EN 2017-10-12

Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU

OPENALEX - Publications

Licheng Guo Jason Lau Zhenyuan Ruan Peng Wei Jason Cong

In genome sequencing, it is a crucial but time-consuming task to detect potential overlaps between any pair of the input reads, especially those that are ultra-long. The state-of-the-art overlapping tool Minimap2 outperforms other popular tools in speed and accuracy. It has single computing hot-spot, chaining, takes 70% time needs be accelerated. There several issues for hardware acceleration because nature chaining. First, original computation pattern poorly parallelizable direct...

10.1109/fccm.2019.00027 article EN 2019-04-01

ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA

OPENALEX - Publications

Zhenyuan Ruan Tong He Bojie Li Peipei Zhou Jason Cong

In recent years we have witnessed the emergence of FPGA in many high-performance systems. This is due to FPGA's high reconfigurability and improved user-friendly programming environment. OpenCL, supported by major vendors, a high-level platform that liberates hardware developers from having deal with complex error-prone HDL development. While OpenCL exposes GPU-like model, which well-suited for compute-intensive tasks, state-of-art systems deploy FPGA, observe workloads are streaming-like,...

10.1109/fccm.2018.00011 article EN 2018-04-01

Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-memory Computing Framework

OPENALEX - Publications

Peipei Zhou Zhenyuan Ruan Zhenman Fang Megan Shand David Roazen and 1 more

In conventional Hadoop MapReduce applications, I/O used to play a heavy role in the overall system performance. More recently, study from Apache Spark community- state-of-the-art in-memory cluster computing framework- reports that is no longer bottleneck and has marginal performance impact on applications like SQL processing. However, we observe simply replacing HDDs with SSDs can have over 10x improvement for certain stages large-scale production-quality genome Therefore, one key question...

10.1109/ispass.2018.00011 article EN 2018-04-01

Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter

OPENALEX - Publications

Yuanwei Lu Guo Chen Zhenyuan Ruan Wencong Xiao Bojie Li and 4 more

Limited by the small on-chip memory, hardware-based transport typically implements go-back-N loss recovery mechanism, which costs very few memory but is well-known to perform inferior even under packet ratio. We present MELO, an efficient selective retransmission mechanism for transport, consumes only a constant regardless of number concurrent connections. Specifically, MELO employs architectural separation between data and meta storage uses shared bits pool allocation reduce footprint. By...

10.1145/3106989.3106993 article EN 2017-07-17

Unleashing True Utility Computing with Quicksand

OPENALEX - Publications

Zhenyuan Ruan Shihang Li K. Fan Marcos K. Aguilera Adam Belay and 2 more

Today's clouds are inefficient: their utilization of resources like CPUs, GPUs, memory, and storage is low. This inefficiency occurs because applications consume at variable rates ratios, while offer fixed ratios. mismatch offering consumption styles prevents fully realizing the utility computing vision.

10.1145/3593856.3595893 article EN 2023-06-22

Analyzing and Modeling In-Storage Computing Workloads On EISC — An FPGA-Based System-Level Emulation Platform

OPENALEX - Publications

Zhenyuan Ruan Tong He Jason Cong

Storage drive technology has made continuous improvements over the last decade, shifting bottleneck of data processing system from storage to host/drive interconnection. To overcome this “data movement wall,” people have proposed in-storage computing (ISC) architectures which add unit directly into drive. Rather than moving host, it offloads computation host drive, thereby alleviating interconnection bottleneck. Though existing work shows effectiveness ISC under some specific workloads, they...

10.1109/iccad45719.2019.8942135 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2019-11-01

Coming Soon ...