- Advanced Data Storage Technologies
- Caching and Content Delivery
- Algorithms and Data Compression
- Parallel Computing and Optimization Techniques
- Cloud Computing and Resource Management
- Optimization and Search Problems
- Distributed systems and fault tolerance
- Web Data Mining and Analysis
- Inflammatory mediators and NSAID effects
- Cellular Automata and Applications
- Network Packet Processing and Optimization
- DNA and Biological Computing
- Multi-Criteria Decision Making
- Graph Theory and Algorithms
- Advanced Algebra and Logic
- Software Testing and Debugging Techniques
- Complexity and Algorithms in Graphs
- Green IT and Sustainability
- Game Theory and Voting Systems
- Smart Cities and Technologies
- Digital and Cyber Forensics
- Big Data Technologies and Applications
- Pharmacy and Medical Practices
- Wikis in Education and Collaboration
- IoT and Edge/Fog Computing
Cornell University
2023-2024
Kitware (United States)
2021-2023
Twin Cities Orthopedics
2022
University of Minnesota
2022
University of Cape Town
2021
Rutgers Sexual and Reproductive Health and Rights
2017-2019
Rutgers, The State University of New Jersey
2018-2019
Rütgers (Germany)
2018
Today's filters, such as quotient, cuckoo, and Morton, have a trade-off between space speed; even when moderately full (e.g., 50%-75% full), their performance degrades nontrivially. The result is that today's systems designers are forced to choose speed usage. In this paper, we present the vector quotient filter (VQF). Locally, VQF based on Robin Hood hashing, like filter, but uses power-of-two-choices hashing reduce variance of runs, thus offers consistent, high throughput across load...
The TLB is increasingly a bottleneck for big data applications. In most designs, the number of entries are highly constrained by latency requirements, and growing much more slowly than working sets Many solutions to this problem, such as huge pages, perforated or coalescing, rely on physical contiguity performance gains, yet cost defragmenting memory can easily nullify these gains. This paper introduces mosaic which increase reach compressing multiple, discrete translations into one entry....
A critical aspect of modern key-value stores is the interaction between compaction policy and filters. Aggressive reduces on-disk footprint a store can improve query performance, but reduce insertion throughput because it I/O CPU expensive. Filters mitigate costs lazy compaction, only if they fit in RAM, limiting scalability queries with compaction. And, fast storage devices, querying filters compacting system be significant. In this work, we present Mapped SplinterDB, that achieves...
The classical paging problem, introduced by Sleator and Tarjan in 1985, formalizes the problem of caching pages RAM order to minimize IOs. Their online formulation ignores cost address translation: programs refer data via virtual addresses, these must be translated into physical locations RAM. Although an individual translation is much smaller than that IO, every memory access involves translation, whereas IOs can infrequent. In practice, one spend money avoid over-provisioning RAM;...
Despite being one of the oldest data structures in computer science, hash tables continue to be focus a great deal both theoretical and empirical research. A central reason for this is that many fundamental properties desires from table are difficult achieve simultaneously; thus variants offering different trade-offs have been proposed. This article introduces Iceberg hashing, simultaneously offers strongest known guarantees on large number core properties. hashing supports constant-time...
Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), transfer data, firmware-internal operations. The Disk-Access Machine (DAM) model simplifies reality by assuming that storage data blocks of size B all transfers unit cost. Despite its simplifications, the DAM is reasonably accurate. In fact, if set half-bandwidth point, where latency bandwidth hardware are equal, approximates IO cost...
Adaptive filters, such as telescoping and adaptive cuckoo update their representation upon detecting a false positive to avoid repeating the same error in future. filters require an auxiliary structure, typically much larger than main filter often residing on slow storage, facilitate adaptation. However, existing are not practical have seen no adoption real-world systems due two reasons. Firstly, they offer weak adaptivity guarantees, meaning that fixing new can cause previously fixed come...
The classical paging problem, introduced by Sleator and Tarjan in 1985, formalizes the problem of caching pages RAM order to minimize IOs. Their online formulation ignores cost address translation: programs refer data via virtual addresses, these must be translated into physical locations RAM. Although an individual translation is much smaller than that IO, every memory access involves translation, whereas IOs can infrequent. In practice, one spend money avoid over-provisioning RAM;...
The online list-labeling problem is an algorithmic primitive with a large literature of upper bounds, lower and applications. goal to store dynamically-changing set n items in array m slots, while maintaining the invariant that appear sorted order, minimizing relabeling cost, defined be number are moved per insertion/deletion. For linear regime, where $m = (1+\Theta(1))n$, bound $O(\log^{2}n)$ on cost has been known since 1981. A $\Omega(\log^{2}n)$ for deterministic algorithms so-called...
Full-path indexing can improve I/O efficiency for workloads that operate on data organized using traditional, hierarchical directories, because is placed persistent storage in scan order. Prior results indicate, however, renames a local file system with full-path are prohibitively expensive. This article shows how to use realize fast directory scans, writes, and renames. The introduces range-rename mechanism efficient key-space changes write-optimized dictionary. encapsulated the key-value...
Wikipedia is a major source of information utilized by internet users around the globe for fact-checking and access to general, encyclopedic information. For researchers, it offers an unprecedented opportunity measure how societies respond events our collective perception world evolves over time in response events. use reading patterns its reflect interests way they are expressed search – whether as part fleeting, zeitgeist-fed trends or long-term on most every topic, from personal business,...
Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), transfer data, firmware-internal operations. The Disk-access Machine (DAM) model simplifies reality by assuming that storage data blocks of size B all transfers unit cost. Despite its simplifications, the DAM is reasonably accurate. In fact, if set half-bandwidth point, where latency bandwidth hardware are equal, then approximates IO...
Balls-and-bins games have been a successful tool for modeling load balancing problems. In this paper, we study new scenario, which call the ball-recycling game, defined as follows:Throw m balls into n bins i.i.d. according to given probability distribution p. Then, at each time step, pick non-empty bin and recycle its balls: take from selected re-throw them p.This balls-and-bins game closely models memory-access heuristics in databases. The goal is bin-picking method that maximizes recycling...
File systems must allocate space for files without knowing what will be added or removed in the future. Over life of a file system, this may cause suboptimal placement decisions that eventually lead to slower performance, aging. Conventional wisdom suggests system aging is solved problem common case; heuristics avoid aging, such as colocating related and data blocks, are effective until storage device fills up, at which point pressure exacerbates fragmentation-based However, article...
.The online list-labeling problem is an algorithmic primitive with a large literature of upper bounds, lower and applications. The goal to store dynamically changing set \(n\) items in array \(m\) slots, while maintaining the invariant that appear sorted order minimizing relabeling cost, defined be number are moved per insertion/deletion. For linear regime, where \(m = (1 + \Theta (1)) n\), bound \(O(\log^2 n)\) on cost has been known since 1981. A \(\Omega (\log^2 for deterministic...
The list-labeling problem is one of the most basic and well-studied algorithmic primitives in data structures, with an extensive literature spanning upper bounds, lower management applications. classical algorithm for this problem, dating back to 1981, has amortized cost $O(\log^2 n)$. Subsequent work led improvements three directions: \emph{low-latency} (worst-case) bounds; \emph{high-throughput} (expected) (adaptive) bounds \emph{important workloads}. Perhaps surprisingly, these directions...