NFDI4DS | UHH-SEMS - Publication Details

Zhibin Yu

ORCID: 0000-0001-8067-9612

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5048614443

Research Areas

Parallel Computing and Optimization Techniques
Cloud Computing and Resource Management
Advanced Data Storage Technologies
Interconnection Networks and Systems
Distributed and Parallel Computing Systems
Embedded Systems Design Techniques
IoT and Edge/Fog Computing
Software System Performance and Reliability
Advanced Memory and Neural Computing
Data Stream Mining Techniques
Graph Theory and Algorithms
Advanced Neural Network Applications
Low-power high-performance VLSI design
Data Management and Algorithms
Network Packet Processing and Optimization
Network Security and Intrusion Detection
Distributed systems and fault tolerance
Blockchain Technology Applications and Security
Ferroelectric and Negative Capacitance Devices
Numerical Methods and Algorithms
Service-Oriented Architecture and Web Services
Caching and Content Delivery
Advanced Computational Techniques and Applications
Simulation Techniques and Applications
Software Engineering Research

Shenzhen Institutes of Advanced Technology
2015-2024

Chinese Academy of Sciences
2013-2024

University of Chinese Academy of Sciences
2016-2024

Huawei Technologies (China)
2023-2024

Electronics and Telecommunications Research Institute
2023

Sejong University
2023

Inje University
2023

Shenzhen University
2019

Shenzhen Institute of Information Technology
2019

University Town of Shenzhen
2019

The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload

OPENALEX - Publications

Qixiao Liu Zhibin Yu

Cloud computing with large-scale datacenters provides great convenience and cost-efficiency for end users. However, the resource utilization of cloud is very low, which wastes a huge amount infrastructure investment energy to operate. To improve utilization, providers usually co-locate workloads different types on shared resources. sharing makes quality service (QoS) unguaranteed. In fact, improving (IRU) guaranteeing QoS at same time in has been dilemma we name an IRU-QoS curse. tackle this...

10.1145/3267809.3267830 article EN 2018-09-28

RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration

OPENALEX - Publications

Zhendong Bei Zhibin Yu Huiling Zhang Wen Xiong Cheng‐Zhong Xu and 2 more

Hadoop is a widely-used implementation framework of the MapReduce programming model for large-scale data processing. performance however significantly affected by settings configuration parameters. Unfortunately, manually tuning these parameters very time-consuming, if at all practical. This paper proposes an approach, called RFHOC, to automatically tune optimized given application running on cluster. RFHOC constructs two ensembles models using random-forest approach map and reduce stage...

10.1109/tpds.2015.2449299 article EN IEEE Transactions on Parallel and Distributed Systems 2015-07-22

Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing

OPENALEX - Publications

Zhibin Yu Zhendong Bei Xuehai Qian

In-Memory cluster Computing (IMC) frameworks (e.g., Spark) have become increasingly important because they typically achieve more than 10× speedups over the traditional On-Disk (ODC) for iterative and interactive applications. Like ODC, IMC run same given programs repeatedly on a with similar input dataset size each time. It is challenging to build performance model program because: 1) of sensitive dataset, which known be difficult incorporated into due its complex effects performance; 2)...

10.1145/3173162.3173187 article EN 2018-03-19

Application-Aware Workload Consolidation to Minimize Both Energy Consumption and Network Load in Cloud Environments

OPENALEX - Publications

Nikos Tziritas Cheng‐Zhong Xu Thanasis Loukopoulos Samee U. Khan Zhibin Yu

In this paper we tackle the problem of virtual machine (VM) placement onto physical servers to jointly optimize two objective functions. The first is minimize total energy spent within a cloud due that are commissioned satisfy computational demands VMs. second network overhead incurred to: (a) communicational dependencies between VMs, and (b) VM migrations performed for transition from an old assignment scheme new one. We study different methodologies solving aforementioned problem. approach...

10.1109/icpp.2013.54 article EN 2013-10-01

A comparative study on resource allocation and energy efficient job scheduling strategies in large-scale parallel computing systems

OPENALEX - Publications

Aftab Ahmed Chandio Kashif Bilal Nikos Tziritas Zhibin Yu Qingshan Jiang and 2 more

10.1007/s10586-014-0384-x article EN Cluster Computing 2014-06-15

WASP: Workload-Aware Self-Replicating Page-Tables for NUMA Servers

OPENALEX - Publications

Hongliang Qu Zhibin Yu

Recently, page-table self-replication (PTSR) has been proposed to reduce the caused NUMA effect for large-memory workloads on servers. However, PTSR may improve or hurt performance of an application, depending its characteristics and co-located applications. This is hard users know, but current can only be manually enabled/disabled by users.

10.1145/3620665.3640369 article EN 2024-04-22

A characterization of big data benchmarks

OPENALEX - Publications

Wen Xiong Zhibin Yu Zhendong Bei J. W. Zhao Fan Zhang and 4 more

Recently, big data has been evolved into a buzzword from academia to industry all over the world. Benchmarks are important tools for evaluating an IT system. However, benchmarking systems is much more challenging than ever before. First, still in their infant stage and consequently they not well understood. Second, complicated compared previous such as single node computing platform. While some researchers started design benchmarks systems, do consider redundancy between benchmarks....

10.1109/bigdata.2013.6691707 article EN 2013-10-01

Configuring in-memory cluster computing using random forest

OPENALEX - Publications

Zhendong Bei Zhibin Yu Ni Luo Chuntao Jiang Cheng‐Zhong Xu and 1 more

10.1016/j.future.2017.08.011 article EN Future Generation Computer Systems 2017-09-01

System-level max power (SYMPO)

OPENALEX - Publications

Karthik Ganesan Jungho Jo W. Lloyd Bircher Dimitris Kaseridis Zhibin Yu and 1 more

To effectively design a computer system for the worst case power consumption scenario, architects often use hand-crafted maximum consuming benchmarks at assembly language level. These stressmarks, also called viruses, are very tedious to generate and require significant domain knowledge. In this paper, we propose SYMPO, an automatic SYstem level Max POwer virus generation framework, which maximizes of CPU memory using genetic algorithm abstract workload framework. For set three ISAs, show...

10.1145/1854273.1854282 article EN 2010-09-11

TIE: Fast Experiment-Driven ML-Based Configuration Tuning for In-Memory Data Analytics

OPENALEX - Publications

Chao Chen Jinhan Xin Zhibin Yu

Recently, experiment-driven machine-learning (ML) based configuration tuning for in-memory data analytics such as Apache Spark become popular because they can achieve high speedups. However, ML-based approaches naturally need a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">large</i> number of iterations and each iteration generates with probabilistic strategy executes the program on real cluster configuration. It therefore takes long time...

10.1109/tc.2024.3365937 article EN IEEE Transactions on Computers 2024-02-14

GPGPU-MiniBench: Accelerating GPGPU Micro-Architecture Simulation

OPENALEX - Publications

Zhibin Yu Lieven Eeckhout Nilanjan Goswami Tao Li Lizy K. John and 3 more

Graphics processing units (GPU), due to their massive computational power with up thousands of concurrent threads and general-purpose GPU (GPGPU) programming models such as CUDA OpenCL, have opened new opportunities for speeding parallel applications. Unfortunately, pre-silicon architectural simulation modern-day GPGPU architectures workloads is extremely time-consuming. This paper addresses the challenge by proposing a framework, called GPGPU-MiniBench, generating miniature, yet...

10.1109/tc.2015.2395427 article EN IEEE Transactions on Computers 2015-01-22

GML: Efficiently Auto-Tuning Flink's Configurations Via Guided Machine Learning

OPENALEX - Publications

Yijin Guo Huasong Shan Shixin Huang Kai Hwang Jianping Fan and 1 more

The increasingly popular fused batch-streaming big data framework, Apache Flink, has many performance-critical as well untamed configuration parameters. However, how to tune them for optimal performance not yet been explored. Machine learning (ML) chosen the configurations other frameworks (e.g., Spark), showing significant improvements. it needs a long time collect large amount of training by nature. In this article, we propose guided machine (GML) approach Flink with significantly shorter...

10.1109/tpds.2021.3081600 article EN IEEE Transactions on Parallel and Distributed Systems 2021-05-18

Simulating the future kilo-x86-64 core processors and their infrastructure

OPENALEX - Publications

Antoni Portero Alberto Scionti Zhibin Yu Paolo Faraboschi Caroline Concatto and 5 more

The continuous improvements offered by the silicon technology enables integration of always increasing number cores on a single chip. Following this trend, it is expected to approach microprocessor architectures composed thousands (i.e., kilo-core architectures) in next future. To cope with demand for high performance systems, many-core designs rely integrated network-on-chips deliver correct bandwidth and latency inter-core communications. In context, simulation tools represent crucial...

10.5555/2331751.2331760 article EN Annual Simulation Symposium 2012-03-26

MIA: Metric Importance Analysis for Big Data Workload Characterization

OPENALEX - Publications

Zhibin Yu Wen Xiong Lieven Eeckhout Zhendong Bei Avi Mendelson and 1 more

Data analytics is at the foundation of both high-quality products and services in modern economies societies. Big data workloads run on complex large-scale computing clusters, which implies significant challenges for deeply understanding characterizing overall system performance. In general, performance affected by many factors multiple layers stack, hence it challenging to identify key metrics when big workload this paper, we propose a novel characterization methodology using ensemble...

10.1109/tpds.2017.2758781 article EN IEEE Transactions on Parallel and Distributed Systems 2017-10-04

Barrier-Aware Warp Scheduling for Throughput Processors

OPENALEX - Publications

Yu-xi Liu Zhibin Yu Lieven Eeckhout Vijay Janapa Reddi Yingwei Luo and 3 more

Parallel GPGPU applications rely on barrier synchronization to align thread block activity. Few prior work has studied and characterized within a its impact performance. In this paper, we find that barriers cause substantial stall cycles in barrier-intensive although GPGPUs employ lightweight hardware-support barriers. To help investigate the reasons, define execution between two adjacent of as warp-phase. We progress warp-phase varies dramatically across warps, which call...

10.1145/2925426.2926267 article EN 2016-06-01

Democratic learning: hardware/software co-design for lightweight blockchain-secured on-device machine learning

OPENALEX - Publications

Rui Zhang Mingcong Song Tao Li Zhibin Yu Yuting Dai and 2 more

Recently, the trending 5G technology encourages extensive applications of on-device machine learning, which collects user data for model training. This requires cost-effective techniques to preserve privacy and security training within resource-constrained environment. Traditional learning methods rely on trust among system security. However, with increase scale, maintaining every edge device’s trustworthiness could be expensive. To cost-effectively establish in a trustless environment, this...

10.1016/j.sysarc.2021.102205 article EN cc-by-nc-nd Journal of Systems Architecture 2021-06-07

TERAFLUX: Exploiting Tera-device Computing Challenges

OPENALEX - Publications

Antoni Portero Zhibin Yu Roberto Giorgi

The number of cores per chip keeps increasing in order to improve performance while controlling the power. According semiconductor roadmaps, future computing systems reach scale 1 Tera devices a single package. Firstly, such Tera-device will expose large amount parallelism that cannot be easily and efficiently exploited by current applications programming models. Secondly, reliability become critical issue. Finally, we need simplify design systems. TERAFLUX aims at providing framework based...

10.1016/j.procs.2011.09.081 article EN Procedia Computer Science 2011-01-01

FractalMRC: Online Cache Miss Rate Curve Prediction on Commodity Systems

OPENALEX - Publications

Lulu He Zhibin Yu Hai Jin

Shared caches in chip multi-processors (CMPs) have important benefits such as accelerating inter-core communication, yet the inherent cache contention among multiple processes on architectures can significantly degrade performance. To address this problem, partitioning has been studied based prediction of miss rate curve (MRC) concurrently running programs. On-line MRC prediction, however, either requires special hardware support or incurs a high overhead when conducted purely software. This...

10.1109/ipdps.2012.121 article EN 2012-05-01

Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing

OPENALEX - Publications

Zhibin Yu Zhendong Bei Xuehai Qian

10.1145/3296957.3173187 article EN ACM SIGPLAN Notices 2018-03-19

CounterMiner: Mining Big Performance Data from Hardware Counters

OPENALEX - Publications

Yirong Lv Bin Sun Qingyi Luo Jing Wang Zhibin Yu and 1 more

Modern processors typically provide a small number of hardware performance counters to capture large microarchitecture events. These can easily generate huge amount (e.g., GB or TB per day) data, which we call big data in cloud computing platforms with more than thousands servers and millions complex workloads running "24/7/365" manner. The provides precious foundation for root cause analysis bottlenecks, architecture compiler optimization, many more. However, it is challenging extract value...

10.1109/micro.2018.00056 article EN 2018-10-01

Performance Modeling for Spark Using SVM

OPENALEX - Publications

Ni Luo Zhibin Yu Zhendong Bei Cheng‐Zhong Xu Chuntao Jiang and 1 more

At present, Spark is widely used in a number of enterprises. Although much faster than Hadoop for some applications, its configuration parameters can have great impact on performance due to the large parameters, interaction between them, and various characteristics applications as well. Unfortunately, there not yet any research conducted predict based sets.In this paper, we employ machine learning method-Support Vector Machine(SVM) build models Spark. The input sets collected by running...

10.1109/ccbd.2016.034 article EN 2016-11-01

Adaptive memory-side last-level GPU caching

OPENALEX - Publications

Xia Zhao Almutaz Adileh Zhibin Yu Zhiying Wang Aamer Jaleel and 1 more

Emerging GPU applications exhibit increasingly high computation demands which has led manufacturers to build GPUs with an large number of streaming multiprocessors (SMs). Providing data the SMs at bandwidth puts significant pressure on memory hierarchy and Network-on-Chip (NoC). Current typically partition memory-side last-level cache (LLC) in equally-sized slices that are shared by all SMs. Although a LLC results lower miss rate, we find for workloads degrees sharing across SMs, private...

10.1145/3307650.3322235 article EN 2019-06-14

BBS: Micro-Architecture Benchmarking Blockchain Systems through Machine Learning and Fuzzy Set

OPENALEX - Publications

Zhu Liang Chao Chen Zihao Su Weiguang Chen Tao Li and 1 more

Due to the decentralization, irreversibility, and traceability, blockchain has attracted significant attention been deployed in many critical industries such as banking logistics. However, micro-architecture characteristics of programs still remain unclear. What's worse, large number events make understanding extremely difficult. We even lack a systematic approach identify important focus on. In this paper, we propose novel benchmarking methodology dubbed BBS characterize at level. The key...

10.1109/hpca47549.2020.00041 article EN 2020-02-01

MEST: A Model-Driven Efficient Searching Approach for MapReduce Self-Tuning

OPENALEX - Publications

Zhendong Bei Zhibin Yu Qixiao Liu Cheng‐Zhong Xu Shengzhong Feng and 1 more

Hadoop is the most popular implementation framework of MapReduce programming model, and it has a number performance-critical configuration parameters. However, manually setting these parameters to their optimal values not only needs in-depth knowledge on as well job itself, but also requires large amount time efforts. Automatic approaches have therefore been proposed. Their usage, however, still quite limited due intolerably long searching time. In this paper, we introduce MapreducE...

10.1109/access.2017.2672675 article EN cc-by IEEE Access 2017-01-01

Accelerating GPGPU architecture simulation

OPENALEX - Publications

Zhibin Yu Lieven Eeckhout Nilanjan Goswami Tao Li Lizy K. John and 2 more

Recently, graphics processing units (GPUs) have opened up new opportunities for speeding general-purpose parallel applications due to their massive computational power and hundreds of thousands threads enabled by programming models such as CUDA. However, the serial nature existing micro-architecture simulators, these massively architectures workloads need be simulated sequentially. As a result, simulating GPGPU with typical benchmarks input data sets is extremely time-consuming. This paper...

10.1145/2465529.2465540 article EN 2013-06-17

Coming Soon ...