George Amvrosiadis

ORCID: 0000-0002-7328-1857
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Cloud Computing and Resource Management
  • Caching and Content Delivery
  • Distributed and Parallel Computing Systems
  • Parallel Computing and Optimization Techniques
  • Scientific Computing and Data Management
  • Distributed systems and fault tolerance
  • Peer-to-Peer Network Technologies
  • Algorithms and Data Compression
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Cloud Data Security Solutions
  • Advanced Neural Network Applications
  • Software System Performance and Reliability
  • Topic Modeling
  • Big Data and Digital Economy
  • Digital Rights Management and Security
  • Text Readability and Simplification
  • Natural Language Processing Techniques
  • Ferroelectric and Negative Capacitance Devices
  • Data Stream Mining Techniques
  • Big Data Technologies and Applications
  • Data Quality and Management
  • Data Mining Algorithms and Applications
  • IoT and Edge/Fog Computing

Carnegie Mellon University
2017-2025

University of Toronto
2012-2016

The energy consumed by data centers is starting to make up a significant fraction of the world's consumption and carbon emissions. A large spent on center cooling, which has motivated body work temperature management in centers. Interestingly, key aspect not been well understood: controlling setpoint at run center's cooling system. Most set their thermostat based (conservative) suggestions manufacturers, as there limited understanding how higher temperatures will affect At same time, studies...

10.1145/2254756.2254778 article EN 2012-06-11

The energy consumed by data centers is starting to make up a significant fraction of the world's consumption and carbon emissions. A large spent on center cooling, which has motivated body work temperature management in centers. Interestingly, key aspect not been well understood: controlling setpoint at run center's cooling system. Most set their thermostat based (conservative) suggestions manufacturers, as there limited understanding how higher temperatures will affect At same time, studies...

10.1145/2318857.2254778 article EN ACM SIGMETRICS Performance Evaluation Review 2012-06-07

For a decade, the Ceph distributed file system followed conventional wisdom of building its storage backend on top local systems. This is preferred choice for most systems today because it allows them to benefit from convenience and maturity battle-tested code. Ceph's experience, however, shows that this comes at high price. First, developing zero-overhead transaction mechanism challenging. Second, metadata performance level can significantly affect level. Third, supporting emerging hardware...

10.1145/3341301.3359656 article EN 2019-10-21

Zoned Namespace (ZNS) SSDs are the latest evolution of host-managed flash storage, enabling improved performance at a lower cost-per-byte than traditional block interface (conventional) SSDs. To date, there is no support for arranging these new devices in arrays that offer increased throughput and reliability (RAID). We identify key challenges designing redundant ZNS SSD arrays, such as managing metadata updates persisting partial stripe writes absence overwrite from device. present RAIZN,...

10.1145/3575693.3575746 article EN 2023-01-27

Datacenters need to reduce embodied carbon emissions, particularly for flash, which accounts 40% of in servers. However, decreasing flash’s emissions is challenging due limited write endurance, more than halves with each generation denser flash. Reducing requires extending flash lifetime, stressing its endurance even further. The legacy Logical Block-Addressable Device (LBAD) interface exacerbates the problem by forcing devices perform garbage collection, leading writes. Flash-based caches...

10.1145/3718390 article EN cc-by ACM Transactions on Storage 2025-03-05

Storage systems rely on maintenance tasks, such as backup and layout optimization, to ensure data availability good performance. These tasks access large amounts of can significantly impact foreground applications. We argue that storage be performed more efficiently by prioritizing processing is currently cached in memory. Data either due other requesting it previously, or overlapping I/O activity.

10.1145/2815400.2815424 article EN 2015-10-01

Analysis of large-scale simulation output is a core element scientific inquiry, but analysis queries may experience significant I/O overhead when the data not structured for efficient retrieval. While in-situ processing allows improved time-to-insight many applications, scaling frameworks to hundreds thousands cores can be difficult in practice. The DeltaFS indexing new approach massive amounts achieve point and small-range queries. This paper describes challenges lessons learned this...

10.1109/sc.2018.00006 article EN 2018-11-01

Latent sector errors (LSEs) are a common hard disk failure mode, where sectors become inaccessible while the rest of remains unaffected. To protect against LSEs, commercial storage systems use scrubbers: background processes verifying data. The efficiency different scrubbing algorithms in detecting LSEs has been studied depth; however, no attempts have made to evaluate or mitigate impact on application performance. We provide first known evaluation performance policies implementation,...

10.1109/dsn.2012.6263919 article EN 2012-06-01

For a decade, the Ceph distributed file system followed conventional wisdom of building its storage backend on top local systems. This is preferred choice for most systems today, because it allows them to benefit from convenience and maturity battle-tested code. Ceph’s experience, however, shows that this comes at high price. First, developing zero-overhead transaction mechanism challenging. Second, metadata performance level can significantly affect level. Third, supporting emerging...

10.1145/3386362 article EN ACM Transactions on Storage 2020-05-18

Although large language models (LLMs) have been touted for their ability to generate natural-sounding text, there are growing concerns around possible negative effects of LLMs such as data memorization, bias, and inappropriate language. Unfortunately, the complexity generation capacities make validating (and correcting) difficult. In this work, we introduce ReLM, a system querying using standard regular expressions. ReLM formalizes enables broad range model evaluations, reducing complex...

10.48550/arxiv.2211.15458 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Deep learning accelerators efficiently train over vast and growing amounts of data, placing a newfound burden on commodity networks storage devices. A common approach to conserve bandwidth involves resizing or compressing data prior training. We introduce Progressive Compressed Records (PCRs), format that uses compression reduce the overhead fetching transporting effectively reducing training time required achieve target accuracy. PCRs deviate from previous formats by combining progressive...

10.14778/3476249.3476308 article EN Proceedings of the VLDB Endowment 2021-07-01

Input pipelines, which ingest and transform input data, are an essential part of training Machine Learning (ML) models. However, it is challenging to implement efficient as requires reasoning about parallelism, asynchrony, variability in fine-grained profiling information. Our analysis over two million ML jobs Google datacenters reveals that a significant fraction model could benefit from faster data pipelines. At the same time, our indicates most do not saturate host hardware, pointing...

10.48550/arxiv.2111.04131 preprint EN other-oa arXiv (Cornell University) 2021-01-01

In this paper we introduce the Indexed Massive Directory, a new technique for indexing data within DeltaFS. With its design as scalable, server-less file system HPC platforms, DeltaFS scales metadata performance with application scale. The Directory is novel extension to plane, enabling in-situ of massive amounts written single directory simultaneously, and in an arbitrarily large number files. We achieve through memory-efficient mechanism reordering data, log-structured storage layout pack...

10.1145/3149393.3149398 article EN 2017-11-03

Complex storage stacks providing data compression, indexing, and analytics help leverage the massive amounts of generated today to derive insights. It is challenging perform this computation, however, while fully utilizing underlying media. This because, servers with large core counts are widely available, single-core performance memory bandwidth per grow slower than count die. Computational offers a promising solution problem by dedicated compute resources along processing path. We present...

10.1145/3415581 article EN ACM Transactions on Storage 2020-09-24

An increasing demand for cross-cloud and cross-region data access is bringing forth challenges related to high transfer costs latency. In response, we introduce Macaron, an auto-configuring cache system designed minimize cost remote access. A key insight behind Macaron that cloud size tied cost, not hardware limits, shifting the way think about design eviction policies. dynamically configures utilizes a mix of storage types adapt workload changes reduce costs. We demonstrate reduces by 65%...

10.1145/3694715.3695972 article EN cc-by 2024-11-04
Coming Soon ...