- Advanced Data Storage Technologies
- Parallel Computing and Optimization Techniques
- Distributed systems and fault tolerance
- Security and Verification in Computing
- Data Management and Algorithms
- Distributed and Parallel Computing Systems
- Advanced Database Systems and Queries
- Cloud Computing and Resource Management
- Geographic Information Systems Studies
- Caching and Content Delivery
- Software Testing and Debugging Techniques
- Software System Performance and Reliability
- Internet Traffic Analysis and Secure E-voting
- Data Mining Algorithms and Applications
- Constraint Satisfaction and Optimization
- Advanced Malware Detection Techniques
- Logic, programming, and type systems
- Data Quality and Management
- Interconnection Networks and Systems
- Advanced Software Engineering Methodologies
- Network Security and Intrusion Detection
- Software Engineering Research
- IoT and Edge/Fog Computing
- Asian Culture and Media Studies
- Web Data Mining and Analysis
University of Toronto
2012-2023
Microsoft Research (United Kingdom)
2017
Carnegie Mellon University
2000-2005
The volume of spatial data generated and consumed is rising exponentially new applications are emerging as the costs storage, processing power network bandwidth continue to decline. Database support for operations fast becoming a necessity rather than niche feature provided by few products. However, functionality offered current commercial open-source relational databases differs significantly in terms available features, true geodetic support, functions indexing. Benchmarks play crucial...
Dynamic binary translation (DBT) is a powerful technique that enables fine-grained monitoring and manipulation of an existing program binary. At the user level, it has been employed extensively to develop various analysis, bug-finding, security tools. Such tools are currently not available for operating system (OS) binaries since no comprehensive DBT framework exists OS kernel. To address this problem, we have developed runs as Linux kernel module, based on user-level DynamoRIO framework....
Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate fully automatic technique which liberates programmer from task, provides high performance, requires only minimal changes current systems. our...
File system bugs that corrupt file metadata on disk are insidious. Existing file-system reliability methods, such as checksums, redundancy, or transactional updates, merely ensure the corruption is reliably preserved. The typical workarounds, based using backups repairing system, painfully slow. Worse, recovery performed long after original error occurred and thus may result in further data loss.We present a called Recon protects from buggy operations. Our approach leverages modern systems...
Spatial data analysis applications are emerging from a wide range of domains such as building information management, environmental assessments and medical imaging. Time-consuming computational geometry algorithms make these slow, even for medium-sized datasets. At the same time, there is rapid expansion in available processing cores, through multicore machines Cloud computing. The confluence trends demands effective parallelization spatial query processing. Unfortunately, traditional...
Deterministic databases offer several benefits: they ensure serializable execution while avoiding concurrency-control related aborts, and scale well in distributed environments. Today, most deterministic database designs use partitioning to up avoid contention. However, requires significant programmer effort, leads poor performance under skewed workloads, incurs unnecessary overheads certain uncontended workloads.
The design of new programming languages benefits from interpretation, which can provide a simple initial implementation, flexibility to explore language features, and portability many platforms. only downside is speed execution, as there remains large performance gap between even efficient interpreters mixed-mode systems that include just-in-time compiler (or JIT for short). Augmenting an interpreter with JIT, however, not small task. Today, JITs used Java™ are loosely-coupled the...
Spatial join is a crucial operation in many spatial analysis applications scientific and geographical information systems. Due to the compute-intensive nature of predicate evaluation, queries can be slow even with moderate sized dataset. Efficient parallelization therefore essential achieve acceptable performance for applications. Technological trends, including rising core count increasingly large main memory, hold great promise this regard. Previous parallel approaches tried partition...
Storage systems rely on maintenance tasks, such as backup and layout optimization, to ensure data availability good performance. These tasks access large amounts of can significantly impact foreground applications. We argue that storage be performed more efficiently by prioritizing processing is currently cached in memory. Data either due other requesting it previously, or overlapping I/O activity.
CSV is a popular Open Data format widely used in variety of domains for its simplicity and effectiveness storing disseminating data. Unfortunately, data published this often does not conform to strict specifications, making automated extraction from files painful task. While table discovery HTML pages or spreadsheets has been studied extensively, extracting tables still poses considerable challenge due their loosely defined limited embedded metadata. In work we lay out the challenges...
Out-of-core applications consume physical resources at a rapid rate, causing interactive sharing the same machine to exhibit poor response times. This behavior is result of default resource management strategies in OS that are inappropriate for memory-intensive applications. Using an approach integrates compiler analysis with simple support and run-time layer adapts dynamic conditions, we have shown impact out-of-core on ones can be greatly mitigated. A combination prefetching pages will...
Spatial databases are used in a wide variety of real-world applications, such as land surveying, urban planning, and environmental assessments, well geospatial Web services. As uses spatial become more widespread, there is growing need for good performance applications. In workloads, queries tend to be computationally-intensive due the complex processing geometric relationships. Furthermore, significant fraction query execution time spent on CPU stalls memory accesses, caused by...
Primary-backup replication is commonly used for providing fault tolerance in databases. It performed by replaying the database recovery log on a backup server. Such scheme raises several challenges modern, high-throughput multi-core hard to replay concurrently, and so can become bottleneck. Moreover, with high transaction rates primary, transfer cause network bottlenecks. Both these bottlenecks significantly slow primary database. In this paper, we propose using record-replay replicating...
We introduce a strategy for inlining native functions into Java™ applications using JIT compiler. perform further optimizations to transform inlined callbacks semantically equivalent lightweight operations. show that this can substantially reduce the overhead of performing JNI calls, while preserving key safety and portability properties JNI. Our work leverages ability store statically-generated IL alongside binaries, facilitate at Java callsites compilation time. Preliminary results with...
As modern operating systems become more complex, understanding their inner workings is increasingly difficult. Dynamic kernel instrumentation a well established method of obtaining insight into the an OS, with applications including debugging, profiling and monitoring, security auditing. To date, all dynamic for follow probe-based paradigm. While efficient on fixed-length instruction set architectures, probes are extremely expensive variable-length ISAs such as popular Intel x86 AMD x86-64....
Traditionally, operating systems use a coarse approximation of memory accesses to implement management algorithms by monitoring page faults or scanning table entries. With finer-grained access information, however, the system can manage muchmore effectively. Previous work has proposed software mechanism based on virtual protection and soft track at finer granularity. In this paper, we show that while approach is effective for some applications, many others it results in an unacceptably high...
File system bugs that corrupt metadata on disk are insidious. Existing reliability methods, such as checksums, redundancy, or transactional updates, merely ensure the corruption is reliably preserved. Typical workarounds, based using backups repairing file system, painfully slow. Worse, recovery may result in further corruption. We present Recon, a protects from buggy operations. Our approach leverages systems provide crash consistency updates. define declarative statements called invariants...
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization protect against deadlock, priority inversion, and convoying. In both cases, dynamic data structures that locking, require a memory reclamation scheme reclaims nodes once they are no longer in use. The of existing schemes has not been thoroughly evaluated. We conduct the first...
Spatial databases are increasingly important for a wide variety of real-world applications, such as land surveying, urban planning, cartography and location-based services. However, spatial database workload properties not well-understood. For example, it is unknown to what degree one application resembles another in terms resource demand, or how the demand will change more concurrent queries (i.e., users) added. We show that workloads have different CPU execution profile than well-studied...
Today file system tools and file-system aware storage applications are tightly coupled with implementations. Developing these is challenging because it requires detailed knowledge of the format, code for interpreting metadata has to be written manually. This complex specific, so application significant re-engineering support different systems.