- Parallel Computing and Optimization Techniques
- Interconnection Networks and Systems
- Supercapacitor Materials and Fabrication
- Security and Verification in Computing
- Advanced Data Storage Technologies
- Advanced Memory and Neural Computing
- Low-power high-performance VLSI design
- Software-Defined Networks and 5G
- Physical Unclonable Functions (PUFs) and Hardware Security
- Graphene research and applications
- Logic, programming, and type systems
- Embedded Systems Design Techniques
- Network Packet Processing and Optimization
- Energy Efficiency in Computing
- VLSI and FPGA Design Techniques
- Semiconductor materials and devices
- Distributed systems and fault tolerance
- Software Testing and Debugging Techniques
- Distributed and Parallel Computing Systems
- Radiation Effects in Electronics
- Advanced Malware Detection Techniques
- Teaching and Learning Programming
- Logic, Reasoning, and Knowledge
- Ferroelectric and Negative Capacitance Devices
Foundation for the Advancement of Social Theory
2023
Carnegie Mellon University
2010-2016
Memory isolation is a key property of reliable and secure computing system--an access to one memory address should not have unintended side effects on data stored in other addresses. However, as DRAM process technology scales down smaller dimensions, it becomes more difficult prevent cells from electrically interacting with each other. In this paper, we expose the vulnerability commodity chips disturbance errors. By reading same DRAM, show that possible corrupt nearby More specifically,...
Energy efficiency and energy-proportional computing have become a central focus in enterprise server architecture. As thermal electrical constraints limit system power, datacenter operators more conscious of energy costs, becomes important across the whole system. There are many proposals to scale at level. However, one significant component memory system, remains largely unaddressed. We propose dynamic volt age/frequency scaling (DVFS) address this problem, evaluate simple algorithm real
Memory isolation is a key property of reliable and secure computing system-an access to one memory address should not have unintended side effects on data stored in other addresses. However, as DRAM process technology scales down smaller dimensions, it becomes more difficult prevent cells from electrically interacting with each other. In this paper, we expose the vulnerability commodity chips disturbance errors. By reading same DRAM, show that possible corrupt nearby More specifically,...
Several system-level operations trigger bulk data copy or initialization. Even though these do not require any computation, current systems transfer a large quantity of back and forth on the memory channel to perform such operations. As result, consume high latency, bandwidth, energy--degrading both system performance energy efficiency.
As Chip Multiprocessors (CMPs) scale to tens or hundreds of nodes, the interconnect becomes a significant factor in cost, energy consumption and performance. Recent work has explored many design tradeoffs for networks-on-chip (NoCs) with novel router architectures reduce hardware cost. In particular, recent proposes bufferless deflection routing eliminate buffers. The high cost buffers makes this choice potentially appealing, especially low-to-medium network loads. However, current designs...
A primary use of chip-multiprocessor (CMP) systems is to speed up a single application by exploiting thread-level parallelism. In such systems, threads may slow each other down issuing memory requests that interfere in the shared subsystem. This inter-thread system interference can significantly degrade parallel performance. Better request scheduling mitigate performance degradation. However, previously proposed algorithms for CMPs are designed multi-programmed workloads where core runs an...
A conventional Network-on-Chip (NoC) router uses input buffers to store in-flight packets. These improve performance, but consume significant power. It is possible bypass these when they are empty, reducing dynamic power, static buffer and power utilized, remain. To energy efficiency, less deflection routing removes buffers, instead (misrouting) resolve contention. However, at high network load, deflections cause unnecessary hops, wasting performance. In this work, we propose a new NoC...
In this paper, we present network-on-chip (NoC) design and contrast it to traditional network design, highlighting similarities differences between the two. As an initial case study, examine congestion in bufferless NoCs. We show that manifests itself differently a NoC than networks. Network reduces system throughput congested workloads for smaller NoCs (16 64 nodes), limits scalability of larger (256 4096 nodes) even when traffic has locality (e.g., application's required data is mapped...
In this paper, we present network-on-chip (NoC) design and contrast it to traditional network design, highlighting core differences between NoCs networks. As an initial case study, examine congestion in bufferless NoCs. We show that manifests itself differently a NoC than network, with application-level awareness the make proper throttling decisions improve system performance by up 28%. It is our hope unique interesting challenges of on-chip can be met novel effective solutions from...
The network-on-chip (NoC) is a primary shared resource in chip multiprocessor (CMP) system. As core counts continue to increase and applications become increasingly data-intensive, the network load will also increase, leading more congestion network. This can degrade system performance if not appropriately controlled. Prior works have proposed source-throttling control, which limits rate at new traffic (packets) enters NoC order reduce improve performance. These prior control mechanisms...
Hierarchical ring networks, which hierarchically connect multiple levels of rings, have been proposed in the past to improve scalability interconnects, but hierarchical designs sacrifice some key benefits rings by reintroducing more complex in-ring buffering and buffered flow control. Our goal this paper is design a new interconnect that can maintain most simplicity traditional (i.e., no or control) while achieving high as designs. To end, we revisit concept hierarchical-ring networkon-chip....
We introduce Hardware-assisted Fault Isolation (HFI), a simple extension to existing processors support secure, flexible, and efficient in-process isolation. HFI addresses the limitations of software-based isolation (SFI) systems including: runtime overheads, limited scalability, vulnerability Spectre attacks, compatibility with code. can seamlessly integrate current SFI (e.g., WebAssembly), or directly sandbox unmodified native binaries. To ease adoption, relies only on incremental changes...
In this paper, we present network-on-chip (NoC) design and contrast it to traditional network design, highlighting similarities differences between the two. As an initial case study, examine congestion in bufferless NoCs. We show that manifests itself differently a NoC than networks. Network reduces system throughput congested workloads for smaller NoCs (16 64 nodes), limits scalability of larger (256 4096 nodes) even when traffic has locality (e.g., application's required data is mapped...
Energy consumption of routers in commonly used mesh-based on-chip networks for chip multiprocessors is an increasingly important concern: these consist a crossbar and complex control logic can require significant buffers, hence high energy area consumption. In contrast, alternative design uses ring-based to connect network nodes with small simple routers. Rings have been recent commercial designs, are well-suited smaller core counts. However, rings do not scale as efficiently meshes. this...
This paper makes two observations that lead to a new heterogeneous core design. First, we observe most serial code exhibits fine-grained heterogeneity: at the scale of tens or hundreds instructions, regions fit different microarchitectures better (at same point points in time). Second, by grouping contiguous instructions into blocks are executed atomically, can exploit this atomicity allows each block be independently on its own execution backend fits characteristics best. Based these...
As process technology scales down to smaller dimensions, DRAM chips become more vulnerable disturbance, a phenomenon in which different cells interfere with each other's operation. For the first time academic literature, our ISCA paper exposes existence of disturbance errors commodity that are sold and used today. We show repeatedly reading from same address could corrupt data nearby addresses. More specifically: When row is opened (i.e., activated) closed precharged) hammered), it can...
Language-level guarantees---like module runtime isolation for WebAssembly (Wasm)---are only as strong the compiler that produces a final, native-machine-specific executable. The process of lowering language-level constructions to ISA-specific instructions can introduce subtle bugs violate security guarantees. In this paper, we present Crocus, system lightweight, modular verification instruction-lowering rules within Cranelift, production retargetable Wasm native code generator. We use Crocus...
In existing systems, to perform any bulk data movement operation (copy or initialization), the has first be read into on-chip processor, all way L1 cache, and result of must written back main memory. This is despite fact that these operations do not involve actual computation. RowClone exploits organization commodity DRAM completely inside using two mechanisms. The mechanism, Fast Parallel Mode, copies between rows same subarray by issuing back-to-back activate commands source destination...