Aaron J. Elmore

ORCID: 0000-0002-4062-8826
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Cloud Computing and Resource Management
  • Advanced Data Storage Technologies
  • Distributed systems and fault tolerance
  • Data Quality and Management
  • Scientific Computing and Data Management
  • Data Management and Algorithms
  • Distributed and Parallel Computing Systems
  • Parallel Computing and Optimization Techniques
  • Time Series Analysis and Forecasting
  • Algorithms and Data Compression
  • Research Data Management Practices
  • Anomaly Detection Techniques and Applications
  • Privacy-Preserving Technologies in Data
  • Personal Information Management and User Behavior
  • Blockchain Technology Applications and Security
  • Advanced Image and Video Retrieval Techniques
  • Fault Detection and Control Systems
  • Petri Nets in System Modeling
  • Semantic Web and Ontologies
  • Visual Attention and Saliency Detection
  • Real-Time Systems Scheduling
  • Adversarial Robustness in Machine Learning
  • Data Stream Mining Techniques
  • Advanced Vision and Imaging

University of Chicago
2015-2024

University of Illinois Chicago
2017-2021

University of Washington
2018

Portland State University
2018

University of California, Santa Barbara
2010-2013

Frostburg State University
2013

This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models. trend is fueled by proliferation storage engines and query languages based on observation 'no one size fits all'. To this shift, we propose polystore architecture; it designed unify querying over We consider challenges opportunities associated with polystores. Open questions in space revolve around optimization assignment objects engines. introduce our...

10.1145/2814710.2814713 article EN ACM SIGMOD Record 2015-08-12

Multitenant data infrastructures for large cloud platforms hosting hundreds of thousands applications face the challenge serving characterized by small footprint and unpredictable load patterns. When such a platform is built on an elastic pay-per-use infrastructure, added to minimize system's operating cost while guaranteeing tenants' service level agreements (SLA). Elastic balancing therefore important feature enable scale-up during high scaling down when low. Live migration, technique...

10.1145/1989323.1989356 article EN 2011-06-12

On-line transaction processing (OLTP) database management systems (DBMSs) often serve time-varying workloads due to daily, weekly or seasonal fluctuations in demand, because of rapid growth demand a company's business success. In addition, many OLTP are heavily skewed "hot" tuples ranges tuples. For example, the majority NYSE volume involves only 40 stocks. To deal with such fluctuations, an DBMS needs be elastic; that is, it must able expand and contract resources response load dynamically...

10.14778/2735508.2735514 article EN Proceedings of the VLDB Endowment 2014-11-01

This paper presents BigDAWG, a reference implementation of new architecture for "Big Data" applications. Such applications not only call large-scale analytics, but also real-time streaming support, smaller analytics at interactive speeds, data visualization, and cross-storage-system queries. Guided by the principle that "one size does fit all", we build on top variety storage engines, each designed specialized use case. To illustrate promise this approach, demonstrate its effectiveness...

10.14778/2824032.2824098 article EN Proceedings of the VLDB Endowment 2015-08-01

Anomaly detection (AD) is a fundamental task for time-series analytics with important implications the downstream performance of many applications. In contrast to other domains where AD mainly focuses on point-based anomalies (i.e., outliers in standalone observations), time series also concerned range-based spanning multiple observations). Nevertheless, it common use traditional information retrieval measures, such as Precision, Recall, and F-score, assess quality methods by thresholding...

10.14778/3551793.3551830 article EN Proceedings of the VLDB Endowment 2022-07-01

Large language models (LLMs), such as GPT-4, are revolutionizing software's ability to understand, process, and synthesize language. The authors of this paper believe that advance in technology is significant enough prompt introspection the data management community, similar previous technological disruptions advents world wide web, cloud computing, statistical machine learning. We argue disruptive influence LLMs will have on come from two angles. (1) A number hard database problems, namely,...

10.14778/3611479.3611527 article EN Proceedings of the VLDB Endowment 2023-07-01

Transaction processing database management systems (DBMSs) are critical for today's data-intensive applications because they enable an organization to quickly ingest and query new information. Many of these exceed the capabilities a single server, thus their has be deployed in distributed DBMS. The key factor affecting such system's performance is how partitioned. If partitioned incorrectly, number transactions can high. These have synchronize operations over network, which considerably...

10.14778/3025111.3025125 article EN Proceedings of the VLDB Endowment 2016-11-01

Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying and programming models. For example, a medical dataset unstructured text, relational data, time series waveforms imagery. Trying to fit such in single system can adverse performance efficiency effects. As part Intel Science Technology Center on Big Data, we developing polystore designed problems. BigDAWG (short Data Analytics Working...

10.1109/hpec.2016.7761636 preprint EN 2016-09-01

We present a framework for concurrency control and availability in multi-datacenter datastores. While we consider Google's Megastore as our motivating example, define general abstractions key components, making solution extensible to any system that satisfies the abstraction properties. first develop analyze transaction management replication protocol based on straightforward implementation of Paxos algorithm. Our investigation reveals this acts prevention mechanism rather than mechanism....

10.14778/2350229.2350261 article EN Proceedings of the VLDB Endowment 2012-07-01

For data-intensive applications with many concurrent users, modern distributed main memory database management systems (DBMS) provide the necessary scale-out support beyond what is possible single-node systems. These DBMSs are optimized for short-lived transactions that common in on-line transaction processing (OLTP) workloads. One way they achieve this to partition into disjoint subsets and use a single-threaded manager per executes one-at-a-time serial order. This minimizes overhead of...

10.1145/2723372.2723726 article EN 2015-05-27

While there have been many solutions proposed for storing and analyzing large volumes of data, all these limited support collaborative data analytics, especially given the individuals teams are simultaneously analyzing, modifying exchanging datasets, employing a number heterogeneous tools or languages analysis, writing scripts to clean, preprocess, query data. We demonstrate DataHub, unified platform with ability load, store, query, collaboratively analyze, interactively visualize, interface...

10.14778/2824032.2824100 article EN Proceedings of the VLDB Endowment 2015-08-01

Distance measures are core building blocks in time-series analysis and the subject of active research for decades. Unfortunately, most detailed experimental study this area is outdated (over a decade old) and, naturally, does not reflect recent progress. Importantly, (i) omitted multiple distance measures, including classic measure literature; (ii) considered only single normalization method; (iii) reported raw classification error rates without statistically validating findings, resulting...

10.1145/3318464.3389760 article EN 2020-05-29

Anomaly detection (AD) is a fundamental task for time-series analytics with important implications the downstream performance of many applications. In contrast to other domains where AD mainly focuses on point-based anomalies (i.e., outliers in standalone observations), time series also concerned range-based spanning multiple observations). Nevertheless, it common use traditional information retrieval measures, such as Precision, Recall, and F-score, assess quality methods by thresholding...

10.48550/arxiv.2502.13318 preprint EN arXiv (Cornell University) 2025-02-18

A multitenant database management system (DBMS) in the cloud must continuously monitor trade-off between efficient resource sharing among multiple application databases (tenants) and their performance. Considering scale of \attn{hundreds to} thousands tenants such DBMSs, manual approaches for continuous monitoring are not tenable. self-managing controller a DBMS faces several challenges. For instance, how to characterize tenant given its variety workloads, reduce impact colocation, detect...

10.1145/2463676.2465308 article EN 2013-06-22

Modern data-intensive applications often generate large amounts of low precision float data with a limited range values. Despite the prevalence such data, there is lack an effective solution to ingest, store, and analyze bounded, low-precision, numeric data. To address this gap, we propose Buff, new compression technique that uses decomposed columnar storage encoding methods provide compression, fast ingestion, high-speed in-situ adaptive query operators SIMD support.

10.14778/3476249.3476305 article EN Proceedings of the VLDB Endowment 2021-07-01

With the explosive growth of high-dimensional data, approximate methods emerge as promising solutions for nearest neighbor search. Among alternatives, quantization have gained attention due to fast query responses and low encoding storage costs. Quantization decompose data dimensions into non-overlapping subspaces encode using a different dictionary per subspace. The state-of-the-art approach assigns sizes uniformly across while attempting balance relative importance subspaces....

10.1109/icde53745.2022.00268 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2022-05-01

Data partitioning is crucial to improving query performance several workload-based techniques have been proposed in database literature. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not a representative workload priori. Static data are therefore suitable for such settings. In this paper, we propose Amoeba, distributed storage system that uses adaptive multi-attribute efficiently support as well recurring queries. Amoeba requires zero set-up...

10.1145/3127479.3131613 article EN 2017-09-24

As scientific endeavors and data analysis become increasingly collaborative, there is a need for management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, curation across teams individuals. Common practice sharing collaborating on involves creating storing multiple copies dataset, one each stage with no provenance information tracking relationships between these datasets. This results not only in...

10.14778/2947618.2947619 article EN Proceedings of the VLDB Endowment 2016-05-01

Big data analytic applications give rise to large-scale extract-transform-load (ETL) as a fundamental step transform new into native representation. ETL workloads pose significant performance challenges on conventional architectures, so we propose the design of unstructured processor (UDP), software programmable accelerator that includes multi-way dispatch, variable-size symbol support, Flexible-source dispatch (stream buffer and scalar registers), memory addressing accelerate kernels both...

10.1145/3123939.3123983 article EN 2017-10-04

Columnar databases rely on specialized encoding schemes to reduce storage requirements. These encodings also enable efficient in-situ data processing. Nevertheless, many existing columnar are encoding-oblivious. When storing the data, these systems a global understanding of dataset or types derive simple rules for selection. Such rule-based selection leads unsatisfactory performance. Specifically, when performing queries, always decode into memory, ignoring possibility optimizing access...

10.1145/3448016.3457283 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

Similarity search is a core analytical task, and its performance critically depends on the choice of distance measure. For time-series querying, elastic measures achieve state-of-the-art accuracy but are computationally expensive. Thus, fast lower bounding (LB) prune unnecessary comparisons with distances to accelerate similarity search. Despite decades attention, there has never been study assess progress in this area. In addition, research disproportionately focused one popular measure,...

10.14778/3594512.3594530 article EN Proceedings of the VLDB Endowment 2023-04-01

Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for system that can support versioning, enabling such to efficiently store, track, query across versions. We introduce O rpheus DB, version control " bolts on versioning capabilities traditional relational database system, thereby gaining the analytics "for free". develop evaluate multiple data models representing versioned data,...

10.14778/3115404.3115417 article EN Proceedings of the VLDB Endowment 2017-06-01
Coming Soon ...