- Advanced Database Systems and Queries
- Advanced Data Storage Technologies
- Data Management and Algorithms
- Algorithms and Data Compression
- Distributed systems and fault tolerance
- Caching and Content Delivery
- Data Stream Mining Techniques
- Cloud Computing and Resource Management
- Parallel Computing and Optimization Techniques
- Data Mining Algorithms and Applications
- Peer-to-Peer Network Technologies
- Semantic Web and Ontologies
- Data Quality and Management
- Groundwater flow and contamination studies
- Distributed and Parallel Computing Systems
- Advanced Computational Techniques and Applications
IBM (Canada)
2016-2024
IBM Research - Almaden
2000-2019
IBM (United States)
2000-2019
University of Toronto
2002
DB2 with BLU Acceleration deeply integrates innovative new techniques for defining and processing column-organized tables that speed read-mostly Business Intelligence queries by 10 to 50 times improve compression 3 times, compared traditional row-organized tables, without the complexity of indexes or materialized views on those tables. But is much more than just a column store. Exploiting frequency-based dictionary main-memory query technology from Blink project at IBM Research - Almaden,...
Query performance in current systems depends significantly on tuning: how well the query matches available indexes, materialized views etc. Even a tuned system, there are always some queries that take much longer than others. This frustrates users who increasingly want consistent response times to ad hoc queries. We argue processors should instead aim for constant all queries, with no assumption about tuning. present Blink, our first attempt at this goal, runs every as table scan over fully...
We present new hash tables for joins, and a join based on them, that consumes far less memory is usually faster than recently published in-memory joins. Our not restricted to outer fit wholly in memory. Key this concise table (CHT), linear probing has 100% fill factor, uses sparse bitmap with embedded population counts almost entirely avoid collisions. This also serves as Bloom filter use multi-table study the random access characteristics of renew case non-partitioned introduce variant...
Table scans have become more interesting recently due to greater use of ad-hoc queries and availability multi-core, vector-enabled hardware. scan performance is limited by value representation, table layout, processing techniques. In this paper we propose a new layout technique for efficient one-pass predicate evaluation. Starting with set rows fixed number bits per column, append columns form banks then pad each bank supported machine word length, typically 16, 32, or 64 bits. We evaluate...
Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These are built top of the most frequently queried columns according statistics. Unfortunately, maintaining multiple same can be extremely space consuming, causing significant performance degradation due potential exhaustion memory space. In this paper, we demonstrate that there exist many opportunities exploit column correlations for...
Although the DRAM for main memories of systems continues to grow exponentially according Moore's Law and become less expensive, we argue that memory hierarchies will always exist many reasons, both economic practical, in particular due concurrent users competing working perform joins grouping. We present in-memory BLU Acceleration used IBM's DB2 Linux, UNIX, Windows, now also dashDB cloud offering, which was designed implemented from ground up exploit but is not limited what fits does...
We demonstrate Hybrid Transactional and Analytics Processing (HTAP) on the Spark platform by Wildfire prototype, which can ingest up to ~6 million inserts per second node simultaneously perform complex SQL analytics queries. Here, a simplified mobile application uses recommend advertising customers based upon their distance from stores interest in products sold these stores, while continuously graphing results as those move respond ads with purchases.
Compression has historically been used to reduce the cost of storage, I/Os from that and buffer pool utilization, at expense CPU required decompress data every time it is queried. However, significant additional efficiencies can be achieved by deferring decompression as late in query processing possible performing operations directly on still-compressed data. In this paper, we investigate benefits challenges joins compressed (or encoded) We demonstrate benefit independently optimizing...
The requirements of Internet Things (IoT) workloads are unique in the database space. While significant effort has been spent over last decade rearchitecting OLTP and Analytics for public cloud, little done to rearchitect IoT cloud. In this paper we present IBM Db2 Event Store ™ , a cloud-native system designed specifically workloads, which require extremely high-speed ingest, efficient open data storage, near real-time analytics. Additionally, by leveraging SQL compiler, optimizer runtime,...
Materialized views (or Automatic Summary Tables—ASTs) are commonly used to improve the performance of aggregation queries by orders magnitude. In contrast regular tables, ASTs synchronized database system. this paper, we present techniques for maintaining cube ASTs. Our implementation is based on IBM DB2 UDB.
In a classic transactional distributed database management system (DBMS), write transactions invariably synchronize with coordinator before final commitment. While enforcing serializability, this model has long been criticized for not satisfying the applications' availability requirements. When entering era of Internet Things (IoT), problem become more severe, as an increasing number applications call capability hybrid and analytical processing (HTAP), where aggregation constraints need to...
Materialized views (or Automatic Summary Tables—ASTs) are commonly used to improve the performance of aggregation queries by orders magnitude. In contrast regular tables, ASTs synchronized database system. this paper, we present techniques for maintaining cube ASTs. Our implementation is based on IBM DB2 UDB.
Database systems built on traditional storage subsystems typically store their data in small blocks referred to as pages (commonly sized a multiple of 4KB for historical reasons). These subsystems, example network attached block storage, were designed efficient random-access I/O patterns at the level, and size is usually configurable by application based its needs. For large scale analytic databases cloud environments, these are not cost effective when compared object database that exploit...
Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These are built top of the most frequently queried columns according statistics. Unfortunately, maintaining multiple same can be extremely space consuming, causing significant performance degradation due potential exhaustion memory space. However, we find that there indeed exist many opportunities save storage by exploiting column correlations....
Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These are built top of the most frequently queried columns according statistics. Unfortunately, maintaining multiple same can be extremely space consuming, causing significant performance degradation due potential exhaustion memory space. In this paper, we demonstrate that there exist many opportunities exploit column correlations for...
In a classic transactional distributed database management system (DBMS), write transactions invariably synchronize with coordinator before final commitment. While enforcing serializability, this model has long been criticized for not satisfying the applications' availability requirements. When entering era of Internet Things (IoT), problem become more severe, as an increasing number applications call capability hybrid and analytical processing (HTAP), where aggregation constraints need to...