- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Security and Verification in Computing
- Caching and Content Delivery
- 3D IC and TSV technologies
- Embedded Systems Design Techniques
- VLSI and FPGA Design Techniques
- Advanced Memory and Neural Computing
- Stochastic Gradient Optimization Techniques
- Advanced Malware Detection Techniques
- Cloud Computing and Resource Management
- Distributed and Parallel Computing Systems
- Low-power high-performance VLSI design
- Privacy-Preserving Technologies in Data
- Recommender Systems and Techniques
- Distributed systems and fault tolerance
- Cryptography and Data Security
- IoT and Edge/Fog Computing
- Semiconductor materials and devices
- Cloud Data Security Solutions
- VLSI and Analog Circuit Testing
- Green IT and Sustainability
- Ferroelectric and Negative Capacitance Devices
- Physical Unclonable Functions (PUFs) and Hardware Security
Intel (United States)
2002-2025
Chungbuk National University
2025
Intel (United Kingdom)
2023
North Carolina State University
2023
Meta (United States)
2021
Meta (Israel)
2020-2021
University of Chicago
2019
Taiwan Semiconductor Manufacturing Company (Taiwan)
2017
Taiwan Semiconductor Manufacturing Company (United States)
2015
Georgia Institute of Technology
2005-2014
One of the challenges for 3D technology adoption is insufficient understanding testing issues and lack DFT solutions. This article describes ICs, including problems that are unique to integration, summarizes early research results in this area. Researchers investigating various IC manufacturing processes particularly relevant DFT. In terms process level assembly ICs require, we can broadly classify techniques as monolithic or die stacking.
The widespread application of deep learning has changed the landscape computation in data centers. In particular, personalized recommendation for content ranking is now largely accomplished using neural networks. However, despite their importance and amount compute cycles they consume, relatively little research attention been devoted to systems. To facilitate advance understanding these workloads, this paper presents a set real-world, production-scale DNNs coupled with relevant performance...
This paper explores the environmental impact of super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize carbon footprint computing by examining model development cycle across industry-scale machine learning use cases and, at same time, considering life system hardware. Taking step further, we capture operational manufacturing present an end-to-end analysis what how hardware-software design at-scale optimization can help...
Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes lightweight, commodity DRAM compliant, near-memory processing solution accelerate personalized inference. The in-depth characterization production-grade shows high model-, operator...
Memory bandwidth has become a major performance bottleneck as more and cores are integrated onto single die, demanding data from the system memory. Several prior studies have demonstrated that this memory problem can be addressed by employing 3D-stacked architecture, which provides wide, high frequency memory-bus interface. Although previous 3D proposals already provide much traditional L2 cache consume, dense through-silicon-vias (TSVs) of chip stacks still bandwidth. In paper, we contest...
Phase change memory (PCM) is an emerging technology for future computing systems. Compared to other non-volatile alternatives, PCM more matured production, and has a faster read latency potentially higher storage density. The main roadblock precluding from being used, in particular, the hierarchy, its limited write endurance. To address this issue, recent studies proposed either reduce PCM's frequency or use wear-leveling evenly distribute writes. Although these techniques can extend...
An updated take on Amdahl's analytical model uses modern design constraints to analyze many-core alternatives.The revised models provide computer architects with a better understanding of manycore types, enabling them make more informed tradeoffs. Unsustainable power consumption and ever-increasing verification complexity have driven the microprocessor industry integrate multiple cores single die, or multicore, as an architectural solution sustaining Moore's law. 1 With dual-core quad-core...
Transactional memory systems are expected to enable parallel programming at lower complexity, while delivering improved performance over traditional lock-based systems. Nonetheless, there certain situations where transactional could actually perform worse. can outperform locks only when the executing workloads contain sufficient parallelism. When workload lacks inherent parallelism, launching excessive transactions adversely degrade performance. These likely become dominant in future...
As technology scaling poses a threat to DRAM due physical limitations such as limited charge, alternative memory technologies including several emerging non-volatile memories are being explored possible replacements. One main roadblock for wider adoption of these new is the write endurance, which leads wear-out related permanent failures. Furthermore, increases variation in cell lifetime resulting early failures many cells. Existing error correcting techniques primarily devised recovering...
Neural personalized recommendation is the cornerstone of a wide collection cloud services and products, constituting significant compute demand infrastructure. Thus, improving execution efficiency directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, inference scheduler that maximizes latency-bounded throughput by taking account characteristics query size arrival patterns, model architectures, underlying hardware systems. By carefully optimizing...
Several recent works have demonstrated the benefits of through-silicon-via (TSV) based 3D integration [1–4], but none them involves a fully functioning multicore processor and memory stacking. 3D-MAPS (3D Massively Parallel Processor with Stacked Memory) is two-tier IC, where logic die consists 64 general-purpose cores running at 277MHz, contains 256KB SRAM (see Fig. 10.6.1). Fabrication done using 130nm GlobalFoundries device technology Tezzaron TSV bonding technology. Packaging by Amkor....
Given recent algorithm, software, and hardware innovation, computing has enabled a plethora of new applications. As becomes increasingly ubiquitous, however, so does its environmental impact. This paper brings the issue to attention computer-systems researchers. Our analysis, built on industry-reported characterization, quantifies effects in terms carbon emissions. Broadly, emissions have two sources: operational energy consumption, manufacturing infrastructure. Although from former are...
As the application of deep learning continues to grow, so does amount data used make predictions. While traditionally big-data was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE client-cloud model allows cloud services perform inferences directly on clients' encrypted data. can meet privacy constraints it introduces enormous computational challenges remains impractically slow...
Near-memory processing (NMP) is a prospective paradigm enabling memory-centric computing. By moving the compute capability next to main memory (DRAM modules), it can fundamentally address CPU-memory bandwidth bottleneck and thus effectively improve performance of memory-constrained workloads. Using personalized recommendation system as driving example, we developed scalable, practical DIMM-based NMP solution tailor-designed for accelerating inference serving. Our demonstrated on versatile...
Given the performance and efficiency optimizations realized by computer systems architecture community over last decades, dominating source of computing's carbon footprint is shifting from operational emissions to embodied emissions. These owe hardware manufacturing infrastructure-related activities. Despite rising emissions, there a distinct lack architectural modeling tools quantify optimize end-to-end computing. This work proposes ACT, an framework, enable characterization...
Given recent algorithm, software, and hardware innovation, computing has enabled a plethora of new applications. As becomes increasingly ubiquitous, however, so does its environmental impact. This article brings the issue to attention computer-systems researchers. Our analysis, built on industry-reported characterization, quantifies effects in terms carbon emissions. Broadly, emissions have two sources: operational energy consumption, manufacturing infrastructure. Although from former are...
DRAMs require periodic refresh for preserving data stored in them. The interval depends on the vendor and design technology they use. For each a DRAM row, information cell is read out then written back to itself as bit self-destructive. process inevitable maintaining correctness, unfortunately, at expense of power bandwidth overhead. future trend integrate layers 3D die-stacked top processor further exacerbates situation accesses these will be more frequent hiding cycles available slack...
Transactional Memory (TM) promises to simplify concurrent programming, which has been notoriously difficult but crucial in realizing the performance benefit of multi-core processors. Software Transaction (STM), particular, represents a body important TM technologies since it provides mechanism run transactional programs when hardware support is not available, or resources are exhausted. Nonetheless, most previous researches on STMs were constrained executing trivial, small-scale workloads....
This paper presents the first multiobjective microarchitectural floorplanning algorithm for high-performance processors implemented in two-dimensional (2-D) and three-dimensional (3-D) ICs. The floorplanner takes a netlist determines dimension as well placement of functional modules into single- or multiple-device layers while simultaneously achieving high performance thermal reliability. traditional design objectives such area wirelength are also considered. 3-D considers following...
There are several emerging memory technologies looming on the horizon to compensate physical scaling challenges of DRAM. Phase change (PCM) is one such candidate proposed for being part main in computing systems. One salient feature PCM its multi-level-cell (MLC) property, which can be used multiply capacity at cell level. However, due nature that value written drift over time, prone a unique type soft errors, posing great challenge their practical deployment. This paper first quantitatively...
Article Eager writeback - a technique for improving bandwidth utilization Share on Authors: Hsien-Hsin S. Lee ACAL, EECS Department, University Of Michigan, Ann Arbor, MI MIView Profile , Gary Tyson Matthew K. Farrens Department of Computer Science, California, Davis, CA CAView Authors Info & Claims MICRO 33: Proceedings the 33rd annual ACM/IEEE international symposium MicroarchitectureDecember 2000 Pages 11–21https://doi.org/10.1145/360128.360132Online:01 December 2000Publication History...
Encrypting data in unprotected memory has gained much interest lately for digital rights protection and security reasons. Counter Mode is a well-known encryption scheme. It symmetric-key scheme based on any block cipher, e.g. AES. The schemeýs algorithm uses secret key counter (or sequence number) to generate an pad which XORed with the stored memory. Like other schemes, this method suffers from inherent latency of decrypting encrypted when loading them into on-chip cache. One solution that...
With more applications being deployed on embedded platforms, software protection becomes increasingly important. This problem is crucial systems like financial transaction terminals, pay-TV access-control decoders, where adversaries may easily gain full physical accesses to the and critical algorithms must be protected from cracked. However, as this paper points out that protecting with either encryption or obfuscation cannot completely preclude control flow information leaked. Encryption...
The first memristor, originally theorized by Dr. Leon Chua in 1971, was identified a team at HP Labs 2008. This new fundamental circuit element is unique that its resistance changes as current passes through it, giving the device memory of past system state. immediately obvious application such non-volatile memory, wherein high- and low-resistance states are used to store binary values. A array memristors forms what called resistive RAM or RRAM. In this paper, we survey have been produced...