Sanjay Ghemawat

ORCID: 0009-0005-6843-3093
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Distributed systems and fault tolerance
  • Parallel Computing and Optimization Techniques
  • Distributed and Parallel Computing Systems
  • Cloud Computing and Resource Management
  • Natural Language Processing Techniques
  • Logic, programming, and type systems
  • Topic Modeling
  • Semantic Web and Ontologies
  • Advanced Database Systems and Queries
  • Formal Methods in Verification
  • Algorithms and Data Compression
  • Advanced Neural Network Applications
  • Security and Verification in Computing
  • Computational Physics and Python Applications
  • Caching and Content Delivery
  • Multimodal Machine Learning Applications
  • Peer-to-Peer Network Technologies
  • Software Testing and Debugging Techniques
  • Software System Performance and Reliability
  • Software Engineering Research
  • IoT and Edge/Fog Computing
  • Fuzzy and Soft Set Theory
  • Graph Theory and Algorithms
  • Infectious Encephalopathies and Encephalitis

Google (United States)
2000-2023

NTT (Japan)
2019

IBM (United States)
2018

University of Massachusetts Amherst
2018

New York University
2018

Microsoft (Canada)
2018

Microsoft Research Montréal (Canada)
2018

Microsoft (United States)
2018

Samsung (South Korea)
2018

Karlsruhe Institute of Technology
2018

MapReduce is a programming model and an associated implementation for processing generating large datasets that amenable to broad variety of real-world tasks. Users specify the computation in terms map reduce function, underlying runtime system automatically parallelizes across large-scale clusters machines, handles machine failures, schedules inter-machine communication make efficient use network disks. Programmers find easy use: more than ten thousand distinct programs have been...

10.1145/1327452.1327492 article EN Communications of the ACM 2008-01-01

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. uses dataflow graphs to represent computation, shared state, the operations mutate state. It maps nodes of graph across many machines cluster, within multiple computational devices, including multicore CPUs, general-purpose GPUs, custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility application developer: whereas previous "parameter server"...

10.48550/arxiv.1605.08695 preprint EN other-oa arXiv (Cornell University) 2016-01-01

TensorFlow is an interface for expressing machine learning algorithms, and implementation executing such algorithms. A computation expressed using can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices as phones tablets up to large-scale distributed systems hundreds machines thousands computational GPU cards. The system flexible used express including training inference algorithms deep neural network models, it has been conducting...

10.48550/arxiv.1603.04467 preprint EN other-oa arXiv (Cornell University) 2016-01-01

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, the operations mutate state. It maps nodes of graph across many machines cluster, within multiple computational devices, including multicore CPUs, general-purpose GPUs, custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility application developer: whereas previous parameter...

10.5555/3026877.3026899 article EN Operating Systems Design and Implementation 2016-11-02

article Share on The Google file system Authors: Sanjay Ghemawat GoogleView Profile , Howard Gobioff Shun-Tak Leung Authors Info & Claims ACM SIGOPS Operating Systems ReviewVolume 37Issue 5December 2003 pp 29–43https://doi.org/10.1145/1165389.945450Online:19 October 2003Publication History 2,942citation59,825DownloadsMetricsTotal Citations2,942Total Downloads59,825Last 12 Months2,935Last 6 weeks416 Get Citation AlertsNew Alert added!This alert has been successfully added and will be sent...

10.1145/1165389.945450 article EN ACM SIGOPS Operating Systems Review 2003-10-19

Bigtable is a distributed storage system for managing structured data that designed to scale very large size: petabytes of across thousands commodity servers. Many projects at Google store in Bigtable, including web indexing, Earth, and Finance. These applications place different demands on both terms size (from URLs pages satellite imagery) latency requirements backend bulk processing real-time serving). Despite these varied demands, has successfully provided flexible, high-performance...

10.1145/1365815.1365816 article EN ACM Transactions on Computer Systems 2008-06-01

MapReduce is a programming model and an associated implementation for processing generating large data sets. Users specify map function that processes key/value pair to generate set of intermediate pairs, reduce merges all values with the same key. Many real world tasks are expressible in this model, as shown paper. Programs written functional style automatically parallelized executed on cluster commodity machines. The run-time system takes care details partitioning input data, scheduling...

10.21276/ijre.2018.5.5.4 article EN cc-by INTERNATIONAL JOURNAL OF RESEARCH AND ENGINEERING 2018-04-01

Bigtable is a distributed storage system for managing structured data that designed to scale very large size: petabytes of across thousands commodity servers. Many projects at Google store in Bigtable, including web indexing, Earth, and Finance. These applications place different demands on both terms size (from URLs pages satellite imagery) latency requirements backend bulk processing real-time serving). Despite these varied demands, has successfully provided flexible, high-performance...

10.5555/1298455.1298475 article EN Operating Systems Design and Implementation 2006-11-06

Large language models have been shown to achieve remarkable performance across a variety of natural tasks using few-shot learning, which drastically reduces the number task-specific training examples needed adapt model particular application. To further our understanding impact scale on we trained 540-billion parameter, densely activated, Transformer model, call Pathways Language Model PaLM. We PaLM 6144 TPU v4 chips Pathways, new ML system enables highly efficient multiple Pods. demonstrate...

10.48550/arxiv.2204.02311 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Article Share on The Google file system Authors: Sanjay Ghemawat GoogleView Profile , Howard Gobioff Shun-Tak Leung Authors Info & Claims SOSP '03: Proceedings of the nineteenth ACM symposium Operating systems principlesOctober 2003Pages 29–43https://doi.org/10.1145/945445.945450Published:19 October 2003Publication History 3,240citation64,109DownloadsMetricsTotal Citations3,240Total Downloads64,109Last 12 Months2,787Last 6 weeks710 Get Citation AlertsNew Alert added!This alert has been...

10.1145/945449.945450 article EN 2003-01-01

article Share on The Google file system Authors: Sanjay Ghemawat GoogleView Profile , Howard Gobioff Shun-Tak Leung Authors Info & Claims ACM SIGOPS Operating Systems ReviewVolume 37Issue 5December 2003 pp 29–43https://doi.org/10.1145/1165389.945450Online:19 October 2003Publication History 2,957citation60,150DownloadsMetricsTotal Citations2,957Total Downloads60,150Last 12 Months3,038Last 6 weeks280 Get Citation AlertsNew Alert added!This alert has been successfully added and will be sent...

10.1145/945445.945450 article EN 2003-10-19

MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.

10.1145/1629175.1629198 article EN Communications of the ACM 2009-12-21

Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It the first system to distribute data at global scale support externally-consistent distributed transactions. This paper describes how structured, its feature set, rationale underlying various design decisions, a novel time API that exposes clock uncertainty. implementation are critical supporting external consistency variety of powerful features: nonblocking reads in past, lock-free...

10.5555/2387880.2387905 article EN Operating Systems Design and Implementation 2012-10-08

Spanner is Google’s scalable, multiversion, globally distributed, and synchronously replicated database. It the first system to distribute data at global scale support externally-consistent distributed transactions. This article describes how structured, its feature set, rationale underlying various design decisions, a novel time API that exposes clock uncertainty. implementation are critical supporting external consistency variety of powerful features: nonblocking reads in past, lock-free...

10.1145/2491245 article EN ACM Transactions on Computer Systems 2013-08-01

This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The supports multiprocessors, works unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, operating kernel. Samples are collected at high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown most workloads). Analysis tools supplied use sample data...

10.1145/265924.265925 article EN ACM Transactions on Computer Systems 1997-11-01

To provide high availability for services such as mail or bulletin boards, data must be replicated. One way to guarantee consistency of replicated is force service operations occur in the same order at all sites, but this approach expensive. For some applications a weaker causal operation can preserve while providing better performance. This paper describes new implementing operations. Our technique also supports two other kinds operations: that are totally ordered with respect one another...

10.1145/138873.138877 article EN ACM Transactions on Computer Systems 1992-11-01

Spanner is Google's scalable, multiversion, globally distributed, and synchronously replicated database. It the first system to distribute data at global scale support externally-consistent distributed transactions. This article describes how structured, its feature set, rationale underlying various design decisions, a novel time API that exposes clock uncertainty. implementation are critical supporting external consistency variety of powerful features: nonblocking reads in past, lock-free...

10.1145/2518037.2491245 article EN ACM Transactions on Computer Systems 2013-08-01

This paper describes the design and implementation of Harp file system. is a replicated Unix system accessible via VFS interface. It provides highly available reliable storage for files guarantees that operations are executed atomically in spite concurrency failures. uses novel variation primary copy replication technique good performance because it allows us to trade disk accesses network communication. intended be used within service distributed network; our current implementation,...

10.1145/121132.121169 article EN 1991-09-01

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, based recurrent neural networks reinforcement depend recurrence relations, data-dependent conditional execution, other features that call flow. These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in distributed system. For performance, scalability, expressiveness, system must support heterogeneous environments....

10.1145/3190508.3190551 preprint EN 2018-04-18

Thor is an object-oriented database system designed for use in a heterogeneous distributed environment. It provides highly-reliable and highly-available persistent storage objects, supports safe sharing of these objects by applications written different programming languages.Safe long-lived requires encapsulation: the must guarantee that interact with only invoking methods. Although safety concerns are important, most databases forgo to avoid paying associated performance costs.This paper...

10.1145/233269.233346 article EN 1996-01-01

We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration systems and ML research ideas, while retaining state art performance current models. Pathways uses sharded dataflow graph asynchronous operators that consume produce futures, efficiently gang-schedules heterogeneous parallel computations on thousands accelerators coordinating data transfers over their dedicated interconnects. makes use novel...

10.48550/arxiv.2203.12533 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Article Continuous profiling: where have all the cycles gone? Share on Authors: Jennifer M. Anderson View Profile , Lance Berc Jeffrey Dean Sanjay Ghemawat Monika R. Henzinger Shun-Tak A. Leung Richard L. Sites Mark T. Vandevoorde Carl Waldspurger William E. Weihl Authors Info & Claims SOSP '97: Proceedings of sixteenth ACM symposium Operating systems principlesOctober 1997 Pages 1–14https://doi.org/10.1145/268998.266637Published:01 October 175citation1,209DownloadsMetricsTotal...

10.1145/268998.266637 article EN 1997-10-01

When writing a distributed application, conventional wisdom says to split your application into separate services that can be rolled out independently. This approach is well-intentioned, but microservices-based architecture like this often backfires, introducing challenges counteract the benefits tries achieve. Fundamentally, because microservices conflate logical boundaries (how code written) with physical deployed). In paper, we propose different programming methodology decouples two in...

10.1145/3593856.3595909 article EN 2023-06-22

We present a new limited form of interprocedural analysis called field that can be used by compiler to reduce the costs modern language features such as object-oriented programming, automatic memory management, and run-time checks required for type safety. Unlike many previous analyses, our is cheap, does not require access entire program. Field exploits declared restrictions placed on fields in modular (e.g. modifiers Java) order determine useful properties an object. describe...

10.1145/349299.349343 article EN Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation 2000-05-01
Coming Soon ...