Alex Conway

ORCID: 0000-0003-4890-7413
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Caching and Content Delivery
  • Algorithms and Data Compression
  • Parallel Computing and Optimization Techniques
  • Cloud Computing and Resource Management
  • Optimization and Search Problems
  • Distributed systems and fault tolerance
  • Web Data Mining and Analysis
  • Inflammatory mediators and NSAID effects
  • Cellular Automata and Applications
  • Network Packet Processing and Optimization
  • DNA and Biological Computing
  • Multi-Criteria Decision Making
  • Graph Theory and Algorithms
  • Advanced Algebra and Logic
  • Software Testing and Debugging Techniques
  • Complexity and Algorithms in Graphs
  • Green IT and Sustainability
  • Game Theory and Voting Systems
  • Smart Cities and Technologies
  • Digital and Cyber Forensics
  • Big Data Technologies and Applications
  • Pharmacy and Medical Practices
  • Wikis in Education and Collaboration
  • IoT and Edge/Fog Computing

Cornell University
2023-2024

Kitware (United States)
2021-2023

Twin Cities Orthopedics
2022

University of Minnesota
2022

University of Cape Town
2021

Rutgers Sexual and Reproductive Health and Rights
2017-2019

Rutgers, The State University of New Jersey
2018-2019

Rütgers (Germany)
2018

Today's filters, such as quotient, cuckoo, and Morton, have a trade-off between space speed; even when moderately full (e.g., 50%-75% full), their performance degrades nontrivially. The result is that today's systems designers are forced to choose speed usage. In this paper, we present the vector quotient filter (VQF). Locally, VQF based on Robin Hood hashing, like filter, but uses power-of-two-choices hashing reduce variance of runs, thus offers consistent, high throughput across load...

10.1145/3448016.3452841 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

The TLB is increasingly a bottleneck for big data applications. In most designs, the number of entries are highly constrained by latency requirements, and growing much more slowly than working sets Many solutions to this problem, such as huge pages, perforated or coalescing, rely on physical contiguity performance gains, yet cost defragmenting memory can easily nullify these gains. This paper introduces mosaic which increase reach compressing multiple, discrete translations into one entry....

10.1145/3582016.3582021 article EN 2023-03-20

A critical aspect of modern key-value stores is the interaction between compaction policy and filters. Aggressive reduces on-disk footprint a store can improve query performance, but reduce insertion throughput because it I/O CPU expensive. Filters mitigate costs lazy compaction, only if they fit in RAM, limiting scalability queries with compaction. And, fast storage devices, querying filters compacting system be significant. In this work, we present Mapped SplinterDB, that achieves...

10.1145/3588726 article EN Proceedings of the ACM on Management of Data 2023-05-26

The classical paging problem, introduced by Sleator and Tarjan in 1985, formalizes the problem of caching pages RAM order to minimize IOs. Their online formulation ignores cost address translation: programs refer data via virtual addresses, these must be translated into physical locations RAM. Although an individual translation is much smaller than that IO, every memory access involves translation, whereas IOs can infrequent. In practice, one spend money avoid over-provisioning RAM;...

10.1145/3737700 article EN ACM Transactions on Algorithms 2025-06-04

Despite being one of the oldest data structures in computer science, hash tables continue to be focus a great deal both theoretical and empirical research. A central reason for this is that many fundamental properties desires from table are difficult achieve simultaneously; thus variants offering different trade-offs have been proposed. This article introduces Iceberg hashing, simultaneously offers strongest known guarantees on large number core properties. hashing supports constant-time...

10.1145/3625817 article EN Journal of the ACM 2023-10-02

Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), transfer data, firmware-internal operations. The Disk-Access Machine (DAM) model simplifies reality by assuming that storage data blocks of size B all transfers unit cost. Despite its simplifications, the DAM is reasonably accurate. In fact, if set half-bandwidth point, where latency bandwidth hardware are equal, approximates IO cost...

10.1145/3323165.3323210 article EN 2019-06-17

Adaptive filters, such as telescoping and adaptive cuckoo update their representation upon detecting a false positive to avoid repeating the same error in future. filters require an auxiliary structure, typically much larger than main filter often residing on slow storage, facilitate adaptation. However, existing are not practical have seen no adoption real-world systems due two reasons. Firstly, they offer weak adaptivity guarantees, meaning that fixing new can cause previously fixed come...

10.48550/arxiv.2405.10253 preprint EN arXiv (Cornell University) 2024-05-16

The classical paging problem, introduced by Sleator and Tarjan in 1985, formalizes the problem of caching pages RAM order to minimize IOs. Their online formulation ignores cost address translation: programs refer data via virtual addresses, these must be translated into physical locations RAM. Although an individual translation is much smaller than that IO, every memory access involves translation, whereas IOs can infrequent. In practice, one spend money avoid over-provisioning RAM;...

10.1145/3409964.3461814 article EN 2021-06-30

The online list-labeling problem is an algorithmic primitive with a large literature of upper bounds, lower and applications. goal to store dynamically-changing set n items in array m slots, while maintaining the invariant that appear sorted order, minimizing relabeling cost, defined be number are moved per insertion/deletion. For linear regime, where $m = (1+\Theta(1))n$, bound $O(\log^{2}n)$ on cost has been known since 1981. A $\Omega(\log^{2}n)$ for deterministic algorithms so-called...

10.1109/focs54457.2022.00096 article EN 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS) 2022-10-01

Full-path indexing can improve I/O efficiency for workloads that operate on data organized using traditional, hierarchical directories, because is placed persistent storage in scan order. Prior results indicate, however, renames a local file system with full-path are prohibitively expensive. This article shows how to use realize fast directory scans, writes, and renames. The introduces range-rename mechanism efficient key-space changes write-optimized dictionary. encapsulated the key-value...

10.1145/3241061 article EN ACM Transactions on Storage 2018-08-31

Wikipedia is a major source of information utilized by internet users around the globe for fact-checking and access to general, encyclopedic information. For researchers, it offers an unprecedented opportunity measure how societies respond events our collective perception world evolves over time in response events. use reading patterns its reflect interests way they are expressed search – whether as part fleeting, zeitgeist-fed trends or long-term on most every topic, from personal business,...

10.1145/3442442.3452341 article EN Companion Proceedings of the The Web Conference 2018 2021-04-19

Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), transfer data, firmware-internal operations. The Disk-access Machine (DAM) model simplifies reality by assuming that storage data blocks of size B all transfers unit cost. Despite its simplifications, the DAM is reasonably accurate. In fact, if set half-bandwidth point, where latency bandwidth hardware are equal, then approximates IO...

10.1145/3470635 article EN ACM Transactions on Parallel Computing 2021-09-20

Balls-and-bins games have been a successful tool for modeling load balancing problems. In this paper, we study new scenario, which call the ball-recycling game, defined as follows:Throw m balls into n bins i.i.d. according to given probability distribution p. Then, at each time step, pick non-empty bin and recycle its balls: take from selected re-throw them p.This balls-and-bins game closely models memory-access heuristics in databases. The goal is bin-picking method that maximizes recycling...

10.5555/3310435.3310590 article EN 2019-01-06

File systems must allocate space for files without knowing what will be added or removed in the future. Over life of a file system, this may cause suboptimal placement decisions that eventually lead to slower performance, aging. Conventional wisdom suggests system aging is solved problem common case; heuristics avoid aging, such as colocating related and data blocks, are effective until storage device fills up, at which point pressure exacerbates fragmentation-based However, article...

10.48550/arxiv.2401.08858 preprint EN cc-by-nc-nd arXiv (Cornell University) 2024-01-01

.The online list-labeling problem is an algorithmic primitive with a large literature of upper bounds, lower and applications. The goal to store dynamically changing set \(n\) items in array \(m\) slots, while maintaining the invariant that appear sorted order minimizing relabeling cost, defined be number are moved per insertion/deletion. For linear regime, where \(m = (1 + \Theta (1)) n\), bound \(O(\log^2 n)\) on cost has been known since 1981. A \(\Omega (\log^2 for deterministic...

10.1137/22m1534468 article EN SIAM Journal on Computing 2024-04-11

The list-labeling problem is one of the most basic and well-studied algorithmic primitives in data structures, with an extensive literature spanning upper bounds, lower management applications. classical algorithm for this problem, dating back to 1981, has amortized cost $O(\log^2 n)$. Subsequent work led improvements three directions: \emph{low-latency} (worst-case) bounds; \emph{high-throughput} (expected) (adaptive) bounds \emph{important workloads}. Perhaps surprisingly, these directions...

10.48550/arxiv.2404.16623 preprint EN arXiv (Cornell University) 2024-04-25
Coming Soon ...