- Cloud Computing and Resource Management
- Advanced Data Storage Technologies
- Caching and Content Delivery
- Distributed systems and fault tolerance
- Distributed and Parallel Computing Systems
- IoT and Edge/Fog Computing
- Parallel Computing and Optimization Techniques
- Software System Performance and Reliability
- Data Stream Mining Techniques
- Algorithms and Data Compression
- Scientific Computing and Data Management
IBM Research - Austin
2019-2020
Binghamton University
2013-2019
In the last decade, increased use and growth of social media, unconventional web technologies, mobile applications, have all encouraged development a new breed database models. NoSQL data stores target unstructured data, which by nature is dynamic key focus area for "Big Data" research. New generation can prove costly unpractical to administer with SQL databases due lack structure, high scalability, elasticity needs. such as MongoDB Cassandra provide desirable platform fast efficient...
Lossless data compression is highly desirable in enterprise and cloud environments for storage memory cost savings improved utilization I/O network. While the value provided by recognized, its application practice often limited because it's a processor intensive operation resulting low throughput high elapsed time intense workloads.The IBM POWER9 z15 systems overcome shortcomings of existing approaches including novel on-chip integrated accelerator. The accelerator reduces cycles, traffic,...
The progressive transition in the nature of both scientific and industrial datasets has been driving force behind development research interests NoSQL model. Loosely structured data poses a challenge to traditional store systems, when working with model, these systems are often considered impractical costly. As quantity quality unstructured grows, so does demand for processing pipeline that is capable seamlessly combining storage model "Big Data" platform such as MapReduce. Although...
The progressive transition in the nature of both scientific and industrial datasets has been driving force behind development research interests NoSQL data model. Loosely structured poses a challenge to traditional store systems, when working with model, these systems are often considered impractical expensive. As quantity unstructured grows, so does demand for processing pipeline that is capable seamlessly combining storage model "Big Data" platform such as MapReduce. Although, MapReduce...
High-velocity data imposes high durability overheads on Big Data technology components such as NoSQL stores. In Apache Cassandra and MongoDB, widely used solutions with scalability availability, write-ahead logging is to provide durability. However, current techniques are limited by the excessive overhead in I/O subsystem. To address this performance gap, we have designed a novel CAPI-Flash based durable mechanism for MongoDB. We take advantage of throughput, low latency path flash storage...
High-velocity data imposes high durability overheads on Big Data technology components such as NoSQL stores. In Apache Cassandra, a widely used solution with scalability and availability, write-ahead logging is to support Commitlog operations, which in turn provides fault tolerance applications. However, current techniques are limited by the excessive overhead I/O subsystem. To address this performance gap, we have designed novel CAPI-Flash based durable for Cassandra. We take advantage of...
In real-world NoSQL deployments, users have to trade off CPU, memory, I/O bandwidth and storage space achieve the required performance efficiency goals. Data compression is a vital component improve efficiency, but reading compressed data increases response time. Therefore, stores rely heavily on using memory as cache speed up read operations. However, large DRAM capacity expensive, databases become costly deploy hard scale. our work, we present persistent caching mechanism for Apache...
Consistency models for distributed data stores offer insights and paths to reasoning about what a user of such system can expect. However, often consistency are defined or implemented in coarse-grained manners, making it difficult achieve precisely the required. Further, many domains already written handle anomalies systems, yet they have little opportunity expressing taking advantage their leniency. We propose reflective consistency-an active solution which adapts an underlying store...
The Raft consensus algorithm is used in many popular distributed key-value stores to offer strong consistency. Due the cost of implementing consistency, its performance characteristics may not meet requirements some users. To satisfy these users, allow users bypass when serving read requests. Unfortunately, yet predictably, this introduces anomalies. While a tradeoff be willing make, effects are properly accounted for: i.e., it impossible know how much consistency being traded away for...