Gregory R. Ganger

ORCID: 0000-0002-3065-7316
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Distributed systems and fault tolerance
  • Parallel Computing and Optimization Techniques
  • Caching and Content Delivery
  • Distributed and Parallel Computing Systems
  • Cloud Computing and Resource Management
  • Peer-to-Peer Network Technologies
  • Software System Performance and Reliability
  • Advanced Database Systems and Queries
  • Cloud Data Security Solutions
  • Advanced Malware Detection Techniques
  • Network Security and Intrusion Detection
  • Algorithms and Data Compression
  • Advanced Neural Network Applications
  • Data Management and Algorithms
  • Service-Oriented Architecture and Web Services
  • Scientific Computing and Data Management
  • Network Traffic and Congestion Control
  • Stochastic Gradient Optimization Techniques
  • Data Quality and Management
  • Interconnection Networks and Systems
  • User Authentication and Security Systems
  • IoT and Edge/Fog Computing
  • Security and Verification in Computing
  • Cloud Computing and Remote Desktop Technologies

Carnegie Mellon University
2016-2025

University of Toronto
2003

University of Michigan
1992-2002

Hewlett-Packard (United States)
2000

Hitachi (Japan)
2000

Intel (United States)
2000

Infineon Technologies (United States)
2000

Micro Focus (United States)
2000

United States International Trade Commission
1996

To better understand the challenges in developing effective cloud-based resource schedulers, we analyze first publicly available trace data from a sizable multi-purpose cluster. The most notable workload characteristic is heterogeneity: types (e.g., cores:RAM per machine) and their usage duration resources needed). Such heterogeneity reduces effectiveness of traditional slot- core-based scheduling. Furthermore, some tasks are constrained as to kind machine they can use, increasing complexity...

10.1145/2391229.2391236 article EN 2012-10-14

DNN training is extremely time-consuming, necessitating efficient multi-accelerator parallelization. Current approaches to parallelizing primarily use intra-batch parallelization, where a single iteration of split over the available workers, but suffer from diminishing returns at higher worker counts. We present PipeDream, system that adds inter-batch pipelining parallelism further improve parallel throughput, helping better overlap computation with communication and reduce amount when...

10.1145/3341301.3359646 article EN 2019-10-21

This paper presents a practical solution to problem facing high-fan-in, high-bandwidth synchronized TCP workloads in datacenter Ethernets---the incast problem. In these networks, receivers can experience drastic reduction application throughput when simultaneously requesting data from many servers using TCP. Inbound overfills small switch buffers, leading timeouts lasting hundreds of milliseconds. For that have barrier synchronization requirement (e.g., filesystem reads and parallel...

10.1145/1592568.1592604 article EN 2009-08-16

A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction Byzantine fault-tolerant services. optimistic quorum-based nature the Q/U allows it provide better throughput and fault-scalability than replicated state machines using agreement-based protocols. prototype built outperforms same popular machine implementation at all system sizes experiments...

10.1145/1095809.1095817 article EN ACM SIGOPS Operating Systems Review 2005-10-20

Large-scale deep learning requires huge computational resources to train a multi-layer neural network. Recent systems propose using 100s 1000s of machines networks with tens layers and billions connections. While the computation involved can be done more efficiently on GPUs than traditional CPU cores, training such single GPU is too slow distributed inefficient, due data movement overheads, stalls, limited memory. This paper describes new parameter server, called GeePS, that supports...

10.1145/2901318.2901323 article EN 2016-04-12

Storage technology has enjoyed considerable growth since the first disk drive was introduced nearly 50 years ago, in part facilitated by slow and steady evolution of storage interfaces (SCSI ATA/IDE). The stability these allowed continual advances both devices applications, without frequent changes to standards. However, interface ultimately determines functionality supported devices, current are holding system designers back. progressed point that a change device is needed. Object-based an...

10.1109/mcom.2003.1222722 article EN IEEE Communications Magazine 2003-08-01

The exokernel operating system architecture safely gives untrusted software efficient control over hardware and resou rces by separating management from protection. This paper describes an that allows specialized applications to achieve high performance without sacrificing the of unm odified UNIX programs. It evaluates architectur e measuring end-to-end application on Xok, for Intel x86-based computers, comparing Xok’s two widely-used 4.4BSD systems (FreeBSD OpenBSD). results show common...

10.1145/268998.266644 article EN 1997-10-01

Disk subsystem performance can be dramatically improved by dynamically ordering, or scheduling, pending requests. Via strongly validated simulation, we examine the impact of complex logical-to-physical mappings and large prefetching caches on scheduling effectiveness. Using both synthetic workloads traces captured from six different user environments, arrive at three main conclusions: (1) Incorporating mapping information into scheduler provides only a marginal (less than 2%) decrease in...

10.1145/183018.183045 article EN 1994-05-01

As society increasingly relies on digitally stored and accessed information, supporting the availability, integrity confidentiality of this information is crucial. We need systems in which users can securely store critical ensuring that it persists, continuously accessible, cannot be destroyed kept confidential. A survivable storage system would provide these guarantees over time despite malicious compromises node subsets. The PASIS architecture flexibly efficiently combines proven...

10.1109/2.863969 article EN Computer 2000-01-01

Self-securing storage prevents intruders from undetectably tampering with or permanently deleting stored data. To accomplish this, self-securing devices internally audit all requests and keep old versions of data for a window time, regardless the commands received potentially compromised host operating systems. Within window, system administrators have this valuable information intrusion diagnosis recovery. Our implementation, called S4, combines log-structuring journal-based metadata to...

10.5555/1251229.1251241 article EN Operating Systems Design and Implementation 2000-10-22

This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such enables the to efficiently provide linearizability and wait-freedom of read write operations erasure-coded in asynchronous environments with Byzantine failures clients servers. By exploiting storage-nodes, shifts most work allows highly optimistic operation: reads occur single round-trip unless observe concurrency or failures. Measurements system...

10.1109/dsn.2004.1311884 article EN 2004-01-01

Power-proportional cluster-based storage is an important component of overall cloud computing infrastructure. With it, substantial subsets nodes in the cluster can be turned off to save power during periods low utilization. Rabbit a distributed file system that arranges its data-layout provide ideal power-proportionality down very minimum number powered-up (enough store primary replica available datasets). addresses node failure rates large-scale clusters with data layouts minimize must if...

10.1145/1807128.1807164 article EN 2010-06-10

No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor cluster-based that data-specific selection of, on-line changes to, schemes models. Thus, different data types can share scalable infrastructure still enjoy specialized choices, rather than suffering from one size fits all. Experiments with show performance benefits of 2-3× when...

10.5555/1251028.1251033 article EN File and Storage Technologies 2005-12-13

Sophisticated disk scheduling algorithms require accurate, detailed drive specifications, including data about mechanical delays, on-board caching and prefetching algorithms, command protocol overheads, logical-to-physical block mappings. Comprehensive models used in storage subsystem design similar levels of detail. We describe a suite general-purpose techniques for acquiring the necessary information from SCSI drive. Using only ANSI-standard interface, we demonstrate how important...

10.1145/223586.223604 article EN ACM SIGMETRICS Performance Evaluation Review 1995-05-01

The causes of performance changes in a distributed system often elude even its developers. This paper develops new technique for gaining insight into such changes: comparing request flows from two executions (e.g., versions or time periods). Building on end-to-end request-flow tracing within and across components, algorithms are described identifying ranking the flow and/or timing processing. implementation these tool called Spectroscope is evaluated. Six case studies presented using to...

10.5555/1972457.1972463 article EN Networked Systems Design and Implementation 2011-03-30

TetriSched is a scheduler that works in tandem with calendaring reservation system to continuously re-evaluate the immediate-term scheduling plan for all pending jobs (including those reservations and best-effort jobs) on each cycle. leverages information supplied by about jobs' deadlines estimated runtimes ahead deciding whether wait busy preferred resource type (e.g., machine GPU) or fall back less placement options. Plan-ahead affords significant flexibility handling mis-estimates job...

10.1145/2901318.2901355 article EN 2016-04-12

Open Cirrus is a cloud computing testbed that, unlike existing alternatives, federates distributed data centers. It aims to spur innovation in systems and applications research catalyze development of an open source service stack for the cloud.

10.1109/mc.2010.111 article EN Computer 2010-04-01

As digital content becomes more prevalent in the home, non-technical users are increasingly interested sharing that with others and accessing it from multiple devices. Not much is known about how these think controlling access to this data. To better understand this, we conducted semi-structured, in-situ interviews 33 15 households. We found create ad-hoc access-control mechanisms do not always work; their ideal policies complex multi-dimensional; a priori policy specification often...

10.1145/1753326.1753421 article EN 2010-04-10
Coming Soon ...