Dazhao Cheng

ORCID: 0000-0003-2869-7623
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Cloud Computing and Resource Management
  • IoT and Edge/Fog Computing
  • Caching and Content Delivery
  • Advanced Neural Network Applications
  • Parallel Computing and Optimization Techniques
  • Privacy-Preserving Technologies in Data
  • Advanced Data Storage Technologies
  • Software-Defined Networks and 5G
  • Topic Modeling
  • Stochastic Gradient Optimization Techniques
  • Blockchain Technology Applications and Security
  • Software System Performance and Reliability
  • Graph Theory and Algorithms
  • Brain Tumor Detection and Classification
  • Nuclear Materials and Properties
  • Cryptography and Data Security
  • Generative Adversarial Networks and Image Synthesis
  • Distributed and Parallel Computing Systems
  • Distributed systems and fault tolerance
  • Data Stream Mining Techniques
  • Natural Language Processing Techniques
  • Nuclear reactor physics and engineering
  • Privacy, Security, and Data Protection
  • Green IT and Sustainability
  • Advanced Graph Neural Networks

Wuhan University
2022-2025

Northwestern Polytechnical University
2024

University of North Carolina at Charlotte
2016-2021

Weatherford College
2021

Flint Institute Of Arts
2021

University of Colorado Colorado Springs
2013-2016

Datacenter-scale clusters are evolving toward heterogeneous hardware architectures due to continuous server replacement. Meanwhile, datacenters commonly shared by many users for quite different uses. It often exhibits significant performance heterogeneity multi-tenant interferences. The deployment of MapReduce on such presents challenges in achieving good application compared in-house dedicated clusters. As most implementations originally designed homogeneous environments, can cause...

10.1109/tpds.2016.2594765 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2016-07-27

Hadoop is a popular implementation of the MapReduce framework for running data-intensive jobs on clusters commodity servers. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Shuffle</i> , all-to-all input data fetching phase between map and reduce can significantly affect job performance. However, shuffle are coupled together in only be performed by tasks. This leaves potential parallelism multiple waves unexploited resource wastage...

10.1109/tpds.2016.2587645 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2016-07-07

The deployment of MapReduce in datacenters and clouds present several challenges achieving good job performance. Compared to in-house dedicated clusters, often exhibit significant hardware performance heterogeneity due continuous server replacement multi-tenant interferences. As most Mapreduce implementations assume homogeneous can cause load imbalance task execution, leading poor low cluster utilizations. Despite existing optimizations on scheduling balancing, still performs poorly...

10.1145/2663165.2666089 article EN 2014-01-01

As Hadoop is becoming increasingly popular in large-scale data analysis, there a growing need for providing predictable services to users who have strict requirements on job completion times. While earliest deadline first scheduling (EDF) like algorithms are guaranteeing deadlines real-time systems, they not effective dynamic environment, i.e., cluster with dynamically available resources. number of clusters deployed hybrid e.g., infrastructure powered by mix traditional and renewable...

10.1109/ipdps.2015.36 article EN 2015-05-01

While Hadoop ecosystems become increasingly important for practitioners of large-scale data analysis, they also incur tremendous energy cost. This trend is driving up the need designing energy-efficient clusters in order to reduce operational costs and carbon emission associated with its consumption. However, despite extensive studies problem, existing approaches efficiency have not fully considered heterogeneity both workload machine hardware found production environments. In this paper, we...

10.1109/tpds.2017.2745571 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2017-08-28

While MapReduce is inherently designed for batch and high throughput processing workloads, there an increasing demand non-batch processes on big data, e.g., interactive jobs, real-time queries, stream computations. Emerging Apache Spark fills in this gap, which can run established Hadoop cluster take advantages of existing HDFS. As a result, the deployment model Spark-on-YARN widely applied by many industry leaders. However, we identify three key challenges to deploy YARN, inflexible...

10.1109/tc.2017.2669964 article EN publisher-specific-oa IEEE Transactions on Computers 2017-02-15

Recent advances in deep neural networks have substantially improved the accuracy and speed of various intelligent applications. Nevertheless, one obstacle is that DNN inference imposes a heavy computation burden on end devices, but offloading tasks to cloud causes large volume data transmission. Motivated by fact size some intermediate layers significantly smaller than raw input data, we designed surgery, which allows partitioned be processed at both edge while limiting The challenge...

10.1109/tcc.2023.3258982 article EN IEEE Transactions on Cloud Computing 2023-03-20

Streaming data analytics has become increasingly vital in many applications such as dynamic content delivery (e.g., advertisements), Twitter sentiment analysis, and security event processing intrusion detection systems, spam filters). Emerging stream Spark Streaming, treat the continuous a series of micro-batches continuously process these micro-batch jobs. Such based provides several advantages over traditional which streaming one record at time, including fast recovery from failures,...

10.1109/infocom.2017.8057206 article EN IEEE INFOCOM 2022 - IEEE Conference on Computer Communications 2017-05-01

Today enterprises have massive stream data that require to be processed in real time due explosion recent years. Spark Streaming as an emerging system is developed process analytics by using micro-batch approach. The unified programming model of Steaming leads some unique benefits over other traditional streaming systems, such fast recovery from failures, better load balancing and resource usage. It treats the continuous a series micro-batches continuously these jobs. However, efficient...

10.1109/tpds.2018.2846234 article EN IEEE Transactions on Parallel and Distributed Systems 2018-06-12

During the past few years, serverless computing has changed paradigm of application development and deployment in cloud edge due to its unique advantages, including easy administration, automatic scaling, built-in fault tolerance, etc. Nevertheless, is also facing challenges such as long latency cold start. In this paper, we present an in-depth performance analysis start framework propose HotC, a container-based runtime management that leverages lightweight containers mitigate improve...

10.1109/cluster48925.2021.00018 article EN 2021-09-01

While major cloud service operators have taken various initiatives to operate their sustainable data enters with green energy, it is challenging effectively utilize the energy since its generation depends on dynamic natural conditions. Fortunately, geographical distribution of provides an opportunity for optimizing system performance by distributing workloads. In this paper, we propose a holistic heterogeneity-aware workload placement and migration approach, sCloud, that aims maximize good...

10.1109/ipdps.2014.41 article EN 2014-05-01

As MapReduce is becoming ubiquitous in large-scale data analysis, many recent studies have shown that the performance of could be improved by different job scheduling approaches, e.g., Fair Scheduler and Capacity Scheduler. However, most exiting schedulers focus on scenario cluster stable pay little attention to with dynamic resource availability. In fact, resources may fluctuate as there a growing number Hadoop clusters deployed hybrid systems, infrastructure powered mix traditional...

10.1109/tpds.2018.2873373 article EN IEEE Transactions on Parallel and Distributed Systems 2018-10-01

Temporal Graph Neural Networks(TGNNs) extend the success of Networks to dynamic graphs. Distributed TGNN training requires efficiently tackling temporal dependency, which often leads excessive cross-device communication that generates significant redundant data. However, existing systems are unable remove redundancy in data reuse and transfer, suffer from severe overhead a distributed setting. This paper presents Sven, an algorithm system co-designed library for end-to-end performance...

10.1145/3588195.3592990 article EN 2023-08-07

Function-as-a-Service (FaaS) is a promising cloud computing model known for its scalability and elasticity. In various application domains, FaaS workflows have been widely adopted to manage user requests complete computational tasks efficiently. Motivated by the fact that function containers collaboratively use image layer's memory, co-placing functions would leverage memory sharing reduce cluster footprint, this paper studies layer- wise serverless functions. We find overwhelming placing in...

10.1109/tpds.2024.3391858 article EN IEEE Transactions on Parallel and Distributed Systems 2024-04-22

The cost of powering servers, storage platforms and related cooling systems has become a major component the operational costs in big data deployments. Hence, design energy-efficient Hadoop clusters attracted significant research attentions recent years. However, existing studies do not consider impact complex interplay between workload hardware heterogeneity on energy efficiency. In this paper, we find that heterogeneity-oblivious task assignment approaches are detrimental to both...

10.1109/icdcs.2015.44 article EN 2015-06-01
Coming Soon ...