Zhibin Yu

ORCID: 0000-0001-8067-9612
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Cloud Computing and Resource Management
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Distributed and Parallel Computing Systems
  • Embedded Systems Design Techniques
  • IoT and Edge/Fog Computing
  • Software System Performance and Reliability
  • Advanced Memory and Neural Computing
  • Data Stream Mining Techniques
  • Graph Theory and Algorithms
  • Advanced Neural Network Applications
  • Low-power high-performance VLSI design
  • Data Management and Algorithms
  • Network Packet Processing and Optimization
  • Network Security and Intrusion Detection
  • Distributed systems and fault tolerance
  • Blockchain Technology Applications and Security
  • Ferroelectric and Negative Capacitance Devices
  • Numerical Methods and Algorithms
  • Service-Oriented Architecture and Web Services
  • Caching and Content Delivery
  • Advanced Computational Techniques and Applications
  • Simulation Techniques and Applications
  • Software Engineering Research

Shenzhen Institutes of Advanced Technology
2015-2024

Chinese Academy of Sciences
2013-2024

University of Chinese Academy of Sciences
2016-2024

Huawei Technologies (China)
2023-2024

Electronics and Telecommunications Research Institute
2023

Sejong University
2023

Inje University
2023

Shenzhen University
2019

Shenzhen Institute of Information Technology
2019

University Town of Shenzhen
2019

Cloud computing with large-scale datacenters provides great convenience and cost-efficiency for end users. However, the resource utilization of cloud is very low, which wastes a huge amount infrastructure investment energy to operate. To improve utilization, providers usually co-locate workloads different types on shared resources. sharing makes quality service (QoS) unguaranteed. In fact, improving (IRU) guaranteeing QoS at same time in has been dilemma we name an IRU-QoS curse. tackle this...

10.1145/3267809.3267830 article EN 2018-09-28

Hadoop is a widely-used implementation framework of the MapReduce programming model for large-scale data processing. performance however significantly affected by settings configuration parameters. Unfortunately, manually tuning these parameters very time-consuming, if at all practical. This paper proposes an approach, called RFHOC, to automatically tune optimized given application running on cluster. RFHOC constructs two ensembles models using random-forest approach map and reduce stage...

10.1109/tpds.2015.2449299 article EN IEEE Transactions on Parallel and Distributed Systems 2015-07-22

In-Memory cluster Computing (IMC) frameworks (e.g., Spark) have become increasingly important because they typically achieve more than 10× speedups over the traditional On-Disk (ODC) for iterative and interactive applications. Like ODC, IMC run same given programs repeatedly on a with similar input dataset size each time. It is challenging to build performance model program because: 1) of sensitive dataset, which known be difficult incorporated into due its complex effects performance; 2)...

10.1145/3173162.3173187 article EN 2018-03-19

In this paper we tackle the problem of virtual machine (VM) placement onto physical servers to jointly optimize two objective functions. The first is minimize total energy spent within a cloud due that are commissioned satisfy computational demands VMs. second network overhead incurred to: (a) communicational dependencies between VMs, and (b) VM migrations performed for transition from an old assignment scheme new one. We study different methodologies solving aforementioned problem. approach...

10.1109/icpp.2013.54 article EN 2013-10-01

Recently, page-table self-replication (PTSR) has been proposed to reduce the caused NUMA effect for large-memory workloads on servers. However, PTSR may improve or hurt performance of an application, depending its characteristics and co-located applications. This is hard users know, but current can only be manually enabled/disabled by users.

10.1145/3620665.3640369 article EN 2024-04-22

Recently, big data has been evolved into a buzzword from academia to industry all over the world. Benchmarks are important tools for evaluating an IT system. However, benchmarking systems is much more challenging than ever before. First, still in their infant stage and consequently they not well understood. Second, complicated compared previous such as single node computing platform. While some researchers started design benchmarks systems, do consider redundancy between benchmarks....

10.1109/bigdata.2013.6691707 article EN 2013-10-01

To effectively design a computer system for the worst case power consumption scenario, architects often use hand-crafted maximum consuming benchmarks at assembly language level. These stressmarks, also called viruses, are very tedious to generate and require significant domain knowledge. In this paper, we propose SYMPO, an automatic SYstem level Max POwer virus generation framework, which maximizes of CPU memory using genetic algorithm abstract workload framework. For set three ISAs, show...

10.1145/1854273.1854282 article EN 2010-09-11

Recently, experiment-driven machine-learning (ML) based configuration tuning for in-memory data analytics such as Apache Spark become popular because they can achieve high speedups. However, ML-based approaches naturally need a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">large</i> number of iterations and each iteration generates with probabilistic strategy executes the program on real cluster configuration. It therefore takes long time...

10.1109/tc.2024.3365937 article EN IEEE Transactions on Computers 2024-02-14

Graphics processing units (GPU), due to their massive computational power with up thousands of concurrent threads and general-purpose GPU (GPGPU) programming models such as CUDA OpenCL, have opened new opportunities for speeding parallel applications. Unfortunately, pre-silicon architectural simulation modern-day GPGPU architectures workloads is extremely time-consuming. This paper addresses the challenge by proposing a framework, called GPGPU-MiniBench, generating miniature, yet...

10.1109/tc.2015.2395427 article EN IEEE Transactions on Computers 2015-01-22

The increasingly popular fused batch-streaming big data framework, Apache Flink, has many performance-critical as well untamed configuration parameters. However, how to tune them for optimal performance not yet been explored. Machine learning (ML) chosen the configurations other frameworks (e.g., Spark), showing significant improvements. it needs a long time collect large amount of training by nature. In this article, we propose guided machine (GML) approach Flink with significantly shorter...

10.1109/tpds.2021.3081600 article EN IEEE Transactions on Parallel and Distributed Systems 2021-05-18

The continuous improvements offered by the silicon technology enables integration of always increasing number cores on a single chip. Following this trend, it is expected to approach microprocessor architectures composed thousands (i.e., kilo-core architectures) in next future. To cope with demand for high performance systems, many-core designs rely integrated network-on-chips deliver correct bandwidth and latency inter-core communications. In context, simulation tools represent crucial...

10.5555/2331751.2331760 article EN Annual Simulation Symposium 2012-03-26

Data analytics is at the foundation of both high-quality products and services in modern economies societies. Big data workloads run on complex large-scale computing clusters, which implies significant challenges for deeply understanding characterizing overall system performance. In general, performance affected by many factors multiple layers stack, hence it challenging to identify key metrics when big workload this paper, we propose a novel characterization methodology using ensemble...

10.1109/tpds.2017.2758781 article EN IEEE Transactions on Parallel and Distributed Systems 2017-10-04

Parallel GPGPU applications rely on barrier synchronization to align thread block activity. Few prior work has studied and characterized within a its impact performance. In this paper, we find that barriers cause substantial stall cycles in barrier-intensive although GPGPUs employ lightweight hardware-support barriers. To help investigate the reasons, define execution between two adjacent of as warp-phase. We progress warp-phase varies dramatically across warps, which call...

10.1145/2925426.2926267 article EN 2016-06-01

Recently, the trending 5G technology encourages extensive applications of on-device machine learning, which collects user data for model training. This requires cost-effective techniques to preserve privacy and security training within resource-constrained environment. Traditional learning methods rely on trust among system security. However, with increase scale, maintaining every edge device’s trustworthiness could be expensive. To cost-effectively establish in a trustless environment, this...

10.1016/j.sysarc.2021.102205 article EN cc-by-nc-nd Journal of Systems Architecture 2021-06-07

The number of cores per chip keeps increasing in order to improve performance while controlling the power. According semiconductor roadmaps, future computing systems reach scale 1 Tera devices a single package. Firstly, such Tera-device will expose large amount parallelism that cannot be easily and efficiently exploited by current applications programming models. Secondly, reliability become critical issue. Finally, we need simplify design systems. TERAFLUX aims at providing framework based...

10.1016/j.procs.2011.09.081 article EN Procedia Computer Science 2011-01-01

Shared caches in chip multi-processors (CMPs) have important benefits such as accelerating inter-core communication, yet the inherent cache contention among multiple processes on architectures can significantly degrade performance. To address this problem, partitioning has been studied based prediction of miss rate curve (MRC) concurrently running programs. On-line MRC prediction, however, either requires special hardware support or incurs a high overhead when conducted purely software. This...

10.1109/ipdps.2012.121 article EN 2012-05-01

In-Memory cluster Computing (IMC) frameworks (e.g., Spark) have become increasingly important because they typically achieve more than 10× speedups over the traditional On-Disk (ODC) for iterative and interactive applications. Like ODC, IMC run same given programs repeatedly on a with similar input dataset size each time. It is challenging to build performance model program because: 1) of sensitive dataset, which known be difficult incorporated into due its complex effects performance; 2)...

10.1145/3296957.3173187 article EN ACM SIGPLAN Notices 2018-03-19

Modern processors typically provide a small number of hardware performance counters to capture large microarchitecture events. These can easily generate huge amount (e.g., GB or TB per day) data, which we call big data in cloud computing platforms with more than thousands servers and millions complex workloads running "24/7/365" manner. The provides precious foundation for root cause analysis bottlenecks, architecture compiler optimization, many more. However, it is challenging extract value...

10.1109/micro.2018.00056 article EN 2018-10-01

At present, Spark is widely used in a number of enterprises. Although much faster than Hadoop for some applications, its configuration parameters can have great impact on performance due to the large parameters, interaction between them, and various characteristics applications as well. Unfortunately, there not yet any research conducted predict based sets.In this paper, we employ machine learning method-Support Vector Machine(SVM) build models Spark. The input sets collected by running...

10.1109/ccbd.2016.034 article EN 2016-11-01

Emerging GPU applications exhibit increasingly high computation demands which has led manufacturers to build GPUs with an large number of streaming multiprocessors (SMs). Providing data the SMs at bandwidth puts significant pressure on memory hierarchy and Network-on-Chip (NoC). Current typically partition memory-side last-level cache (LLC) in equally-sized slices that are shared by all SMs. Although a LLC results lower miss rate, we find for workloads degrees sharing across SMs, private...

10.1145/3307650.3322235 article EN 2019-06-14

Due to the decentralization, irreversibility, and traceability, blockchain has attracted significant attention been deployed in many critical industries such as banking logistics. However, micro-architecture characteristics of programs still remain unclear. What's worse, large number events make understanding extremely difficult. We even lack a systematic approach identify important focus on. In this paper, we propose novel benchmarking methodology dubbed BBS characterize at level. The key...

10.1109/hpca47549.2020.00041 article EN 2020-02-01

Hadoop is the most popular implementation framework of MapReduce programming model, and it has a number performance-critical configuration parameters. However, manually setting these parameters to their optimal values not only needs in-depth knowledge on as well job itself, but also requires large amount time efforts. Automatic approaches have therefore been proposed. Their usage, however, still quite limited due intolerably long searching time. In this paper, we introduce MapreducE...

10.1109/access.2017.2672675 article EN cc-by IEEE Access 2017-01-01

Recently, graphics processing units (GPUs) have opened up new opportunities for speeding general-purpose parallel applications due to their massive computational power and hundreds of thousands threads enabled by programming models such as CUDA. However, the serial nature existing micro-architecture simulators, these massively architectures workloads need be simulated sequentially. As a result, simulating GPGPU with typical benchmarks input data sets is extremely time-consuming. This paper...

10.1145/2465529.2465540 article EN 2013-06-17
Coming Soon ...