NFDI4DS | UHH-SEMS - Publication Details

Multi-resource packing for cluster schedulers

OPENALEX - Publications

Robert Grandl Ganesh Ananthanarayanan Srikanth Kandula Sriram Rao Aditya Akella

Tasks in modern data parallel clusters have highly diverse resource requirements, along CPU, memory, disk and network. Any of these resources may become bottlenecks hence, the likelihood wasting due to fragmentation is now larger. Today's schedulers do not explicitly reduce fragmentation. Worse, since they only allocate cores that ignore (disk network) can be over-allocated leading interference, failures hogging or memory could been used by other tasks. We present Tetris, a cluster scheduler...

10.1145/2619239.2626334 article EN 2014-08-12

Multi-resource packing for cluster schedulers

OPENALEX - Publications

Robert Grandl Ganesh Ananthanarayanan Srikanth Kandula Sriram Rao Aditya Akella

Tasks in modern data parallel clusters have highly diverse resource requirements, along CPU, memory, disk and network. Any of these resources may become bottlenecks hence, the likelihood wasting due to fragmentation is now larger. Today's schedulers do not explicitly reduce fragmentation. Worse, since they only allocate cores that ignore (disk network) can be over-allocated leading interference, failures hogging or memory could been used by other tasks. We present Tetris, a cluster scheduler...

10.1145/2740070.2626334 article EN ACM SIGCOMM Computer Communication Review 2014-08-17

Morpheus: towards automated SLOs for enterprise clusters

OPENALEX - Publications

Sangeetha Abdu Jyothi Carlo Curino Ishai Menache Shravan Narayanamurthy Alexey Tumanov and 6 more

Modern resource management frameworks for large-scale analytics leave unresolved the problematic tension between high cluster utilization and job's performance predictability--respectively coveted by operators users. We address this in Morpheus, a new system that: 1) codifies implicit user expectations as explicit Service Level Objectives (SLOs), inferred from historical data, 2) enforces SLOs using novel scheduling techniques that isolate jobs sharing-induced variability, 3) mitigates...

10.5555/3026877.3026887 article EN Operating Systems Design and Implementation 2016-11-02

Network-Aware Scheduling for Data-Parallel Jobs

OPENALEX - Publications

Virajith Jalaparti Peter Bodík Ishai Menache Sriram Rao Konstantin Makarychev and 1 more

To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or flows. Most these schedulers consider job input fixed and greedily flows that are ready run. However, a large fraction production jobs recurring with predictable characteristics, which allows us plan ahead for them. Coordinating placement significantly improving their locality freeing up bandwidth, can be used by other running cluster. With this...

10.1145/2785956.2787488 article EN 2015-08-17

The quantcast file system

OPENALEX - Publications

Michael Ovsiannikov Silvius Rus Damian Reeves Paul M. Sutter Sriram Rao and 1 more

The Quantcast File System (QFS) is an efficient alternative to the Hadoop Distributed (HDFS). QFS written in C++, plugin compatible with MapReduce, and offers several efficiency improvements relative HDFS: 50% disk space savings through erasure coding instead of replication, a resulting doubling write throughput, faster name node, support for sorting logging concurrent append feature, native command line client much than hadoop fs, global feedback-directed I/O device management. As works out...

10.14778/2536222.2536234 article EN Proceedings of the VLDB Endowment 2013-08-01

Dhalion

OPENALEX - Publications

Avrilia Floratou Ashvin Agrawal W. G. Graham Sriram Rao Karthik Ramasamy

In recent years, there has been an explosion of large-scale real-time analytics needs and a plethora streaming systems have developed to support such applications. These are able continue stream processing even when faced with hardware software failures. However, these do not address some crucial challenges facing their operators: the manual, time-consuming error-prone tasks tuning various configuration knobs achieve service level objectives (SLO) as well maintenance SLOs in face sudden,...

10.14778/3137765.3137786 article EN Proceedings of the VLDB Endowment 2017-08-01

Reservation-based Scheduling

OPENALEX - Publications

Carlo Curino Djellel Difallah Chris Douglas Subru Krishnan Raghu Ramakrishnan and 1 more

The continuous shift towards data-driven approaches to business, and a growing attention improving return on investments (ROI) for cluster infrastructures is generating new challenges big-data frameworks. Systems originally designed big batch jobs now handle an increasingly complex mix of computations. Moreover, they are expected guarantee stringent SLAs production minimize latency best-effort jobs.

10.1145/2670979.2670981 article EN 2014-11-03

Graphene: packing and dependency-aware scheduling for data-parallel clusters

OPENALEX - Publications

Robert Grandl Srikanth Kandula Sriram Rao Aditya Akella Janardhan Kulkarni

We present a newcluster scheduler, GRAPHENE, aimed at jobs that have complex dependency structure and heterogeneous resource demands. Relaxing either of these challenges, i.e., scheduling DAG homogeneous tasks or an independent set tasks, leads to NP-hard problems. Reasonable heuristics exist for simpler problems, but they perform poorly when DAGs. Our key insights are: (1) focus on the long-running those with tough-to-pack demands, (2) compute schedule, offline, by first such troublesome...

10.5555/3026877.3026885 article EN Operating Systems Design and Implementation 2016-11-02

Efficient queue management for cluster scheduling

OPENALEX - Publications

Jeff Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca Milan Vojnović and 1 more

Job scheduling in Big Data clusters is crucial both for cluster operators' return on investment and overall user experience. In this context, we observe several anomalies how modern schedulers manage queues, argue that maintaining queues of tasks at worker nodes has significant benefits. On one hand, centralized approaches do not use worker-side queues. Given the inherent feedback delays these systems incur, they achieve suboptimal utilization, particularly workloads dominated by short...

10.1145/2901318.2901354 article EN 2016-04-12

Towards a learning optimizer for shared clouds

OPENALEX - Publications

Chenggang Wu Alekh Jindal Saeed Amizadeh Hiren Patel Wangchao Le and 2 more

Query optimizers are notorious for inaccurate cost estimates, leading to poor performance. The root of the problem lies in cardinality i.e., size intermediate (and final) results a query plan. These estimates also determine resources consumed modern shared cloud infrastructures. In this paper, we present C ARD L EARNER , machine learning based approach learn models from previous job executions and use them predict cardinalities future jobs. key intuition our is that workloads often recurring...

10.14778/3291264.3291267 article EN Proceedings of the VLDB Endowment 2018-11-01

Tango

OPENALEX - Publications

Mahesh Balakrishnan Dahlia Malkhi Ted Wobber Ming Wu Vijayan Prabhakaran and 5 more

Distributed systems are easier to build than ever with the emergence of new, data-centric abstractions for storing and computing over massive datasets. However, similar do not exist accessing meta-data. To fill this gap, Tango provides developers abstraction a replicated, in-memory data structure (such as map or tree) backed by shared log. objects easy use, replicating state via simple append read operations on log instead complex distributed protocols; in process, they obtain properties...

10.1145/2517349.2522732 article EN other-oa 2013-10-08

Sailfish

OPENALEX - Publications

Sriram Rao Raghu Ramakrishnan Adam Silberstein Mike Ovsiannikov Damian Reeves

In this paper, we present Sailfish, a new Map-Reduce framework for large scale data processing. The Sailfish design is centered around aggregating intermediate data, specifically produced by map tasks and consumed later reduce tasks, to improve performance batching disk I/O. We introduce an abstraction called I-files supporting aggregation, describe how implemented it as extension of the distributed filesystem, efficiently batch written multiple writers read readers. adapts layer in Hadoop...

10.1145/2391229.2391233 article EN 2012-10-14

Selecting subexpressions to materialize at datacenter scale

OPENALEX - Publications

Alekh Jindal Konstantinos Karanasos Sriram Rao Hiren Patel

We observe significant overlaps in the computations performed by user jobs modern shared analytics clusters. Naïvely computing same subexpressions multiple times results wasting cluster resources and longer execution times. Given that these workloads consist of tens thousands jobs, identifying overlapping across is great interest to both operators users. Nevertheless, existing approaches support orders magnitude smaller or employ heuristics with limited effectiveness. In this paper, we focus...

10.14778/3192965.3192971 article EN Proceedings of the VLDB Endowment 2018-03-01

True elasticity in multi-tenant data-intensive compute clusters

OPENALEX - Publications

Ganesh Ananthanarayanan Christopher Douglas Raghu Ramakrishnan Sriram Rao Ion Stoica

Data-intensive computing (DISC) frameworks scale by partitioning a job across set of fault-tolerant tasks, then diffusing those tasks large clusters. Multi-tenanted clusters must accommodate service-level objectives (SLO) in their resource model, often expressed as maximum latency for allocating the desired resources to every job. When jobs are partitioned into statically, cluster cannot meet its SLOs while maintaining both high utilization and efficiency. Ideally, we want give when they...

10.1145/2391229.2391253 article EN 2012-10-14

Medea

OPENALEX - Publications

Panagiotis Garefalakis Konstantinos Karanasos Peter Pietzuch Arun Suresh Sriram Rao

The rise in popularity of machine learning, streaming, and latency-sensitive online applications shared production clusters has raised new challenges for cluster schedulers. To optimize their performance resilience, these require precise control placements, by means complex constraints, e.g., to collocate or separate long-running containers across groups nodes. In the presence applications, scheduler must attain global optimization objectives, such as maximizing number deployed minimizing...

10.1145/3190508.3190549 article EN 2018-04-18

Chi

OPENALEX - Publications

Luo Mai Kai Zeng Rahul Potharaju Le Xu Steve Suh and 7 more

Stream-processing workloads and modern shared cluster environments exhibit high variability unpredictability. Combined with the large parameter space diverse set of user SLOs, this makes streaming systems very challenging to statically configure tune. To address these issues, in paper we investigate a novel control-plane design, Chi, which supports continuous monitoring feedback, enables dynamic re-configuration. Chi leverages key insight embedding messages data-plane channels achieve...

10.14778/3231751.3231765 article EN Proceedings of the VLDB Endowment 2018-06-01

Computation Reuse in Analytics Job Service at Microsoft

OPENALEX - Publications

Alekh Jindal Shi Qiao Hiren Patel Zhicheng Yin Jieming Di and 5 more

Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for data analytics, be it in cloud environment within enterprises. In this setting, users are not required to manage tune their hardware and software infrastructure, they pay only the processing resources consumed per job. However, shared nature of these services across several teams leads significant overlaps partial computations, i.e., parts duplicated multiple jobs, thus generating redundant costs. paper, we...

10.1145/3183713.3190656 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Network-Aware Scheduling for Data-Parallel Jobs

OPENALEX - Publications

Virajith Jalaparti Peter Bodík Ishai Menache Sriram Rao Konstantin Makarychev and 1 more

To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or flows. Most these schedulers consider job input fixed and greedily flows that are ready run. However, a large fraction production jobs recurring with predictable characteristics, which allows us plan ahead for them. Coordinating placement significantly improving their locality freeing up bandwidth, can be used by other running cluster. With this...

10.1145/2829988.2787488 article EN ACM SIGCOMM Computer Communication Review 2015-08-17

Walnut

OPENALEX - Publications

Jianjun Chen Chris Douglas Michi Mutsuzaki Patrick Quaid Raghu Ramakrishnan and 2 more

Walnut is an object-store being developed at Yahoo! with the goal of serving as a common low-level storage layer for variety cloud data management systems including Hadoop (a MapReduce system), MObStor multimedia and PNUTS (an extended key-value system). Thus, key performance challenge to meet latency throughput requirements wide range workloads commonly observed across these diverse systems. The motivation leverage carefully optimized system, support elasticity high-availability, all...

10.1145/2213836.2213947 article EN 2012-05-20

QuickCast: Fast and Efficient Inter-Datacenter Transfers Using Forwarding Tree Cohorts

OPENALEX - Publications

Mohammad Noormohammadpour C.S. Raghavendra Srikanth Kandula Sriram Rao

Large inter-datacenter transfers are crucial for cloud service efficiency and increasingly used by organizations that have dedicated wide area networks between datacenters. A recent work uses multicast forwarding trees to reduce the bandwidth needs improve completion times of point-to-multipoint transfers. Using a single tree per transfer, however, leads poor performance because slowest receiver dictates time all receivers. multiple transfer alleviates this concern--the average could finish...

10.1109/infocom.2018.8486324 article EN IEEE INFOCOM 2022 - IEEE Conference on Computer Communications 2018-04-01

DCCast: Efficient Point to Multipoint Transfers Across Datacenters

OPENALEX - Publications

Mohammad Noormohammadpour C.S. Raghavendra Sriram Rao Srikanth Kandula

Using multiple datacenters allows for higher availability, load balancing and reduced latency to customers of cloud services. To distribute copies data, providers depend on inter-datacenter WANs that ought be used efficiently considering their limited capacity the ever-increasing data demands. In this paper, we focus applications transfer objects from one datacenter several over dedicated networks. We present DCCast, a centralized Point Multi-Point (P2MP) algorithm uses forwarding trees...

10.31219/osf.io/fg2e5 preprint EN 2017-07-13

Twitter Heron: Towards Extensible Streaming Engines

OPENALEX - Publications

Maosong Fu Ashvin Agrawal Avrilia Floratou W. G. Graham Andrew Jorgensen and 5 more

Twitter's data centers process billions of events per day the instant is generated. To achieve real-time performance, Twitter has developed Heron, a streaming engine that provides unparalleled performance at large scale. Heron been recently open-sourced and thus now accessible to various other organizations. In this paper, we discuss challenges faced when transforming from system tailored for applications software stack efficiently handles with diverse characteristics on top Big Data...

10.1109/icde.2017.161 article EN 2017-04-01