- Parallel Computing and Optimization Techniques
- Security and Verification in Computing
- Distributed systems and fault tolerance
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Explainable Artificial Intelligence (XAI)
- Machine Learning and Data Classification
- Advanced Malware Detection Techniques
- Advanced Memory and Neural Computing
- Software System Performance and Reliability
- Bayesian Modeling and Causal Inference
- Real-Time Systems Scheduling
- Interconnection Networks and Systems
- Logic, Reasoning, and Knowledge
- Software Engineering Research
- Caching and Content Delivery
- Distributed and Parallel Computing Systems
- Advanced Neural Network Applications
- Software Testing and Debugging Techniques
- Cloud Data Security Solutions
- Neural Networks and Applications
- Ferroelectric and Negative Capacitance Devices
- Embedded Systems Design Techniques
- Radiation Effects in Electronics
- Formal Methods in Verification
University of British Columbia
2021-2025
ETH Zurich
2015-2020
Hybrid computing platforms, comprising CPU cores and FPGA logic, are increasingly used for accelerating data-intensive workloads in cloud deployments, a growing topic of interest systems research. However, from research perspective, existing hardware platforms limited: they often optimized concrete, narrow use-cases and, therefore lack the flexibility needed to explore other applications configurations.
Multi-socket machines with 1-100 TBs of physical memory are becoming prevalent. Applications running on such multi-socket suffer non-uniform bandwidth and latency when accessing memory. Decades research have focused data allocation placement policies in NUMA settings, but there been no studies the question how to place page-tables amongst sockets. We make case for explicit page-table show that is crucial overall performance. propose Mitosis mitigate effects walks by transparently replicating...
Sparse decision tree optimization has been one of the most fundamental problems in AI since its inception and is a challenge at core interpretable machine learning. computationally hard, despite steady effort 1960's, breakthroughs have made on problem only within past few years, primarily finding optimal sparse trees. However, current state-of-the-art algorithms often require impractical amounts computation time memory to find or near-optimal trees for some real-world datasets, particularly...
Memory-centric computing demands careful organization of the virtual address space, but traditional methods for doing so are inflexible and inefficient. If an application wishes to larger physical memory than bits allow, if it maintain pointer-based data structures beyond process lifetimes, or share large amounts across simultaneously executing processes, legacy interfaces managing space cumbersome often incur excessive overheads. We propose a new operating system design that promotes spaces...
Increasing heterogeneity in the memory system mandates careful data placement to hide non-uniform access (NUMA) effects on applications. However, NUMA optimizations have predominantly focused application past decades, largely ignoring of kernel structures due their small footprint; this is evident typical OS designs that pin objects memory. In paper, we show gaining importance context page-tables: sub-optimal page-tables causes severe slowdown (up 3.1x) virtualized servers.
Formal verification is a promising approach to eliminate bugs at compile time, before they ship. Indeed, our community has verified wide variety of system software. However, much this success required heroic developer effort, relied on bespoke logics for individual domains, or sacrificed expressiveness powerful proof automation.
The performance of parallel programs on multicore machines often critically depends group communication operations like barriers and reductions being highly tuned to hardware, a task requiring considerable developer skill.Smelt is library that automatically builds efficient inter-core broadcast trees individual machines, using machine model derived from hardware registers plus micro-benchmarks capturing the low-level characteristics missing vendor specifications.Experiments wide variety show...
Building persistent memory (PM) data structures is difficult because crashes interrupt operations, leaving in an inconsistent state. Solving this requires augmenting code that modifies PM state to ensure interrupted operations can be completed or undone. Today, done using careful, hand-crafted code, a compiler pass, page faults. We propose new, easy way transform volatile structure work with uses cache-coherent accelerator do augmentation, and we show it may outperform existing approaches...
It is time to reconsider memory protection. The emergence of large non-volatile main memories, scalable interconnects, and rack-scale computers running numbers small "micro services" creates significant challenges for protection based solely on MMU mechanisms. Central this a tension between translation: optimizing translation performance often comes with cost in flexibility.
The hardware/software boundary in modern heterogeneous multicore computers is increasingly complex, and diverse across different platforms. A single memory access by a core or DMA engine traverses multiple hardware translation caching steps, the destination cell register often appears at physical addresses for cores. Interrupts pass through complex topology of interrupt controllers remappers before delivery to one more cores, each with specific constraints on their configurations. System...
Verified systems software has generally had to assume the correctness of operating system and its provided services (like networking file system). Even though there exist verified systems, specifications for these components do not compose with applications produce a fully high-performance stack.
Modern Systems-on-Chip (SoCs) are networks of heterogeneous cores, intelligent devices, and memory, connected through multiple configurable address translation protection units like IOMMUs System MMUs.
Memory-centric computing demands careful organization of the virtual address space, but traditional methods for doing so are inflexible and inefficient. If an application wishes to larger physical memory than bits allow, if it maintain pointer-based data structures beyond process lifetimes, or share large amounts across simultaneously executing processes, legacy interfaces managing space cumbersome often incur excessive overheads. We propose a new operating system design that promotes spaces...
Address translation hardware is at the cornerstone of modern computer systems. It provides a wide range security-relevant features and abstractions such as memory partitioning, address space isolation, virtual memory. Hardware designers have developed different protection schemes with varying means configuration.
Device drivers are components that enable operating systems to interact with devices. Unfortunately, they the main source of bugs in systems, because writing a driver is an intricate and error-prone process requires extensive knowledge devices systems. Furthermore, supporting new accommodating kernel revisions require significant development effort. To facilitate device drivers, we present Ghost Writer, end-to-end toolchain allows developers synthesize correct-by-construction from high-level...
Memory-centric computing demands careful organization of the virtual address space, but traditional methods for doing so are inflexible and inefficient. If an application wishes to larger physical memory than bits allow, if it maintain pointer-based data structures beyond process lifetimes, or share large amounts across simultaneously executing processes, legacy interfaces managing space cumbersome often incur excessive overheads. We propose a new operating system design that promotes spaces...
Memory-centric computing demands careful organization of the virtual address space, but traditional methods for doing so are inflexible and inefficient. If an application wishes to larger physical memory than bits allow, if it maintain pointer-based data structures beyond process lifetimes, or share large amounts across simultaneously executing processes, legacy interfaces managing space cumbersome often incur excessive overheads. We propose a new operating system design that promotes spaces...
Modern hardware platforms are increasingly complex and heterogeneous. System software uses a hodgepodge of different mechanisms representations to express the memory topology target platform. Considerable maintenance effort is required keep them in sync while often sharing impossible due hard-coded values. Incorrect platform-specific values initialization sequence can lead security critical hard-to-find bugs because misconfigured translation hardware, inaccessible devices, or use bad pointers.
Byte-addressable nonvolatile memory (NVM) blends the concepts of storage and can radically improve data-centric applications, from in-memory databases to graph processing. By enabling large-capacity devices be shared across multiple computing elements, fabric-attached NVM changes nature rack-scale systems enables short-latency direct access while retaining data persistence properties simplifying software stack. An adequate protection scheme is paramount when addressing persistent memory, but...