NFDI4DS | UHH-SEMS - Publication Details

Glow: Graph Lowering Compiler Techniques for Neural Networks

OPENALEX - Publications

Nadav Rotem Jordan Fix Saleem Abdulrasool Summer Deng Roman Dzhabarov and 8 more

This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is pragmatic approach to compilation that enables generation highly optimized code multiple targets. Glow lowers traditional neural network dataflow graph into two-phase strongly-typed intermediate representation. The high-level representation allows optimizer perform domain-specific optimizations. lower-level instruction-based address-only memory-related optimizations, such as instruction...

10.48550/arxiv.1805.00907 preprint EN other-oa arXiv (Cornell University) 2018-01-01

MTIA: First Generation Silicon Targeting Meta's Recommendation Systems

OPENALEX - Publications

Amin Firoozshahian Joel Coburn Roman Levenstein Rakesh Nattoji Ashwin Kamath and 51 more

Meta has traditionally relied on using CPU-based servers for running inference workloads, specifically Deep Learning Recommendation Models (DLRM), but the increasing compute and memory requirements of these models have pushed company towards specialized solutions such as GPUs or other hardware accelerators. This paper describes company's effort in constructing its first silicon designed recommendation systems; it accelerator architecture platform design, software stack enabling optimizing...

10.1145/3579371.3589348 article EN 2023-06-16

First-Generation Inference Accelerator Deployment at Facebook

OPENALEX - Publications

Michael J. Anderson Benny Chen Stephen Chen Summer Deng Jordan Fix and 50 more

In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, well high compute, and network bandwidth requirements. We co-designed high-performance, energy-efficient accelerator platform based on these describe ecosystem developed deployed Facebook: both hardware, through Open Compute Platform (OCP), software framework tooling, Pytorch/Caffe2/Glow. A...

10.48550/arxiv.2107.04140 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Architectural Support for Containment-based Security

OPENALEX - Publications

Hansen Zhang Soumyadeep Ghosh Jordan Fix Sotiris Apostolakis Stephen R. Beard and 3 more

Software security techniques rely on correct execution by the hardware. Securing hardware components has been challenging due to their complexity and proportionate attack surface they present during design, manufacture, deployment, operation. Recognizing that external communication represents one of greatest threats a system's security, this paper introduces TrustGuard containment architecture. contains malicious erroneous behavior using relatively simple pluggable gatekeeping component...

10.1145/3297858.3304020 article EN 2019-04-04

A collaborative dependence analysis framework

OPENALEX - Publications

Nick Johnson Jordan Fix Stephen R. Beard Taewook Oh Thomas B. Jablin and 1 more

Compiler optimizations discover facts about program behavior by querying static analysis. However, developing or extending precise analysis is difficult. Some prior works implement with a single algorithm, but the algorithm becomes more complex as it extended for greater precision. Other achieve modularity implementing several simple algorithms and trivially composing them to report best result from among them. Such modular approach has limited precision because employs only one in response...

10.1109/cgo.2017.7863736 article EN 2017-02-01

A collaborative dependence analysis framework

OPENALEX - Publications

Nick Johnson Jordan Fix Stephen R. Beard Taewook Oh Thomas B. Jablin and 1 more

Compiler optimizations discover facts about program behavior by querying static analysis. However, developing or extending precise analysis is difficult. Some prior works implement with a single algorithm, but the algorithm becomes more complex as it extended for greater precision. Other achieve modularity implementing several simple algorithms and trivially composing them to report best result from among them. Such modular approach has limited precision because employs only one in response...

10.5555/3049832.3049849 article EN Symposium on Code Generation and Optimization 2017-02-04

Hardware Multithreaded Transactions

OPENALEX - Publications

Jordan Fix Nayana Prasad Nagendra Sotiris Apostolakis Hansen Zhang Sophie Qiu and 1 more

Speculation with transactional memory systems helps pro- grammers and compilers produce profitable thread-level parallel programs. Prior work shows that supporting transactions can span multiple threads, rather than requiring be contained within a single thread, enables new types of speculative parallelization techniques for both programmers parallelizing compilers. Unfortunately, software support multi-threaded (MTXs) comes significant additional inter-thread communication overhead...

10.1145/3173162.3173172 article EN 2018-03-19

Hardware Multithreaded Transactions

OPENALEX - Publications

Jordan Fix Nayana Prasad Nagendra Sotiris Apostolakis Hansen Zhang Sophie Qiu and 1 more

Speculation with transactional memory systems helps pro- grammers and compilers produce profitable thread-level parallel programs. Prior work shows that supporting transactions can span multiple threads, rather than requiring be contained within a single thread, enables new types of speculative parallelization techniques for both programmers parallelizing compilers. Unfortunately, software support multi-threaded (MTXs) comes significant additional inter-thread communication overhead...

10.1145/3296957.3173172 article EN ACM SIGPLAN Notices 2018-03-19