- Parallel Computing and Optimization Techniques
- Distributed systems and fault tolerance
- Advanced Data Storage Technologies
- Software Engineering Research
- Logic, programming, and type systems
- Web Data Mining and Analysis
- Data Management and Algorithms
- Ferroelectric and Negative Capacitance Devices
- Physical Unclonable Functions (PUFs) and Hardware Security
- Low-power high-performance VLSI design
- Interconnection Networks and Systems
- Algorithms and Data Compression
- Stochastic Gradient Optimization Techniques
- Scientific Computing and Data Management
- Advanced Memory and Neural Computing
- Advanced Database Systems and Queries
- Security and Verification in Computing
- Advanced Neural Network Applications
- Graphene research and applications
- Distributed and Parallel Computing Systems
Meta (United States)
2019-2023
Princeton University
2017-2019
This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is pragmatic approach to compilation that enables generation highly optimized code multiple targets. Glow lowers traditional neural network dataflow graph into two-phase strongly-typed intermediate representation. The high-level representation allows optimizer perform domain-specific optimizations. lower-level instruction-based address-only memory-related optimizations, such as instruction...
Meta has traditionally relied on using CPU-based servers for running inference workloads, specifically Deep Learning Recommendation Models (DLRM), but the increasing compute and memory requirements of these models have pushed company towards specialized solutions such as GPUs or other hardware accelerators. This paper describes company's effort in constructing its first silicon designed recommendation systems; it accelerator architecture platform design, software stack enabling optimizing...
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, well high compute, and network bandwidth requirements. We co-designed high-performance, energy-efficient accelerator platform based on these describe ecosystem developed deployed Facebook: both hardware, through Open Compute Platform (OCP), software framework tooling, Pytorch/Caffe2/Glow. A...
Software security techniques rely on correct execution by the hardware. Securing hardware components has been challenging due to their complexity and proportionate attack surface they present during design, manufacture, deployment, operation. Recognizing that external communication represents one of greatest threats a system's security, this paper introduces TrustGuard containment architecture. contains malicious erroneous behavior using relatively simple pluggable gatekeeping component...
Compiler optimizations discover facts about program behavior by querying static analysis. However, developing or extending precise analysis is difficult. Some prior works implement with a single algorithm, but the algorithm becomes more complex as it extended for greater precision. Other achieve modularity implementing several simple algorithms and trivially composing them to report best result from among them. Such modular approach has limited precision because employs only one in response...
Compiler optimizations discover facts about program behavior by querying static analysis. However, developing or extending precise analysis is difficult. Some prior works implement with a single algorithm, but the algorithm becomes more complex as it extended for greater precision. Other achieve modularity implementing several simple algorithms and trivially composing them to report best result from among them. Such modular approach has limited precision because employs only one in response...
Speculation with transactional memory systems helps pro- grammers and compilers produce profitable thread-level parallel programs. Prior work shows that supporting transactions can span multiple threads, rather than requiring be contained within a single thread, enables new types of speculative parallelization techniques for both programmers parallelizing compilers. Unfortunately, software support multi-threaded (MTXs) comes significant additional inter-thread communication overhead...
Speculation with transactional memory systems helps pro- grammers and compilers produce profitable thread-level parallel programs. Prior work shows that supporting transactions can span multiple threads, rather than requiring be contained within a single thread, enables new types of speculative parallelization techniques for both programmers parallelizing compilers. Unfortunately, software support multi-threaded (MTXs) comes significant additional inter-thread communication overhead...