Hsien-Hsin S. Lee

ORCID: 0000-0002-8926-8243
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Security and Verification in Computing
  • Caching and Content Delivery
  • 3D IC and TSV technologies
  • Embedded Systems Design Techniques
  • VLSI and FPGA Design Techniques
  • Advanced Memory and Neural Computing
  • Stochastic Gradient Optimization Techniques
  • Advanced Malware Detection Techniques
  • Cloud Computing and Resource Management
  • Distributed and Parallel Computing Systems
  • Low-power high-performance VLSI design
  • Privacy-Preserving Technologies in Data
  • Recommender Systems and Techniques
  • Distributed systems and fault tolerance
  • Cryptography and Data Security
  • IoT and Edge/Fog Computing
  • Semiconductor materials and devices
  • Cloud Data Security Solutions
  • VLSI and Analog Circuit Testing
  • Green IT and Sustainability
  • Ferroelectric and Negative Capacitance Devices
  • Physical Unclonable Functions (PUFs) and Hardware Security

Intel (United States)
2002-2025

Chungbuk National University
2025

Intel (United Kingdom)
2023

North Carolina State University
2023

Meta (United States)
2021

Meta (Israel)
2020-2021

University of Chicago
2019

Taiwan Semiconductor Manufacturing Company (Taiwan)
2017

Taiwan Semiconductor Manufacturing Company (United States)
2015

Georgia Institute of Technology
2005-2014

One of the challenges for 3D technology adoption is insufficient understanding testing issues and lack DFT solutions. This article describes ICs, including problems that are unique to integration, summarizes early research results in this area. Researchers investigating various IC manufacturing processes particularly relevant DFT. In terms process level assembly ICs require, we can broadly classify techniques as monolithic or die stacking.

10.1109/mdt.2009.125 article EN IEEE Design & Test of Computers 2009-09-01

The widespread application of deep learning has changed the landscape computation in data centers. In particular, personalized recommendation for content ranking is now largely accomplished using neural networks. However, despite their importance and amount compute cycles they consume, relatively little research attention been devoted to systems. To facilitate advance understanding these workloads, this paper presents a set real-world, production-scale DNNs coupled with relevant performance...

10.1109/hpca47549.2020.00047 article EN 2020-02-01

This paper explores the environmental impact of super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize carbon footprint computing by examining model development cycle across industry-scale machine learning use cases and, at same time, considering life system hardware. Taking step further, we capture operational manufacturing present an end-to-end analysis what how hardware-software design at-scale optimization can help...

10.48550/arxiv.2111.00364 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes lightweight, commodity DRAM compliant, near-memory processing solution accelerate personalized inference. The in-depth characterization production-grade shows high model-, operator...

10.1109/isca45697.2020.00070 article EN 2020-05-01

Memory bandwidth has become a major performance bottleneck as more and cores are integrated onto single die, demanding data from the system memory. Several prior studies have demonstrated that this memory problem can be addressed by employing 3D-stacked architecture, which provides wide, high frequency memory-bus interface. Although previous 3D proposals already provide much traditional L2 cache consume, dense through-silicon-vias (TSVs) of chip stacks still bandwidth. In paper, we contest...

10.1109/hpca.2010.5416628 article EN 2010-01-01

Phase change memory (PCM) is an emerging technology for future computing systems. Compared to other non-volatile alternatives, PCM more matured production, and has a faster read latency potentially higher storage density. The main roadblock precluding from being used, in particular, the hierarchy, its limited write endurance. To address this issue, recent studies proposed either reduce PCM's frequency or use wear-leveling evenly distribute writes. Although these techniques can extend...

10.1145/1815961.1816014 article EN 2010-06-19

An updated take on Amdahl's analytical model uses modern design constraints to analyze many-core alternatives.The revised models provide computer architects with a better understanding of manycore types, enabling them make more informed tradeoffs. Unsustainable power consumption and ever-increasing verification complexity have driven the microprocessor industry integrate multiple cores single die, or multicore, as an architectural solution sustaining Moore's law. 1 With dual-core quad-core...

10.1109/mc.2008.494 article EN Computer 2008-12-01

Transactional memory systems are expected to enable parallel programming at lower complexity, while delivering improved performance over traditional lock-based systems. Nonetheless, there certain situations where transactional could actually perform worse. can outperform locks only when the executing workloads contain sufficient parallelism. When workload lacks inherent parallelism, launching excessive transactions adversely degrade performance. These likely become dominant in future...

10.1145/1378533.1378564 article EN 2008-06-14

As technology scaling poses a threat to DRAM due physical limitations such as limited charge, alternative memory technologies including several emerging non-volatile memories are being explored possible replacements. One main roadblock for wider adoption of these new is the write endurance, which leads wear-out related permanent failures. Furthermore, increases variation in cell lifetime resulting early failures many cells. Existing error correcting techniques primarily devised recovering...

10.1109/micro.2010.46 article EN 2010-12-01

Neural personalized recommendation is the cornerstone of a wide collection cloud services and products, constituting significant compute demand infrastructure. Thus, improving execution efficiency directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, inference scheduler that maximizes latency-bounded throughput by taking account characteristics query size arrival patterns, model architectures, underlying hardware systems. By carefully optimizing...

10.1109/isca45697.2020.00084 preprint EN 2020-05-01

Several recent works have demonstrated the benefits of through-silicon-via (TSV) based 3D integration [1–4], but none them involves a fully functioning multicore processor and memory stacking. 3D-MAPS (3D Massively Parallel Processor with Stacked Memory) is two-tier IC, where logic die consists 64 general-purpose cores running at 277MHz, contains 256KB SRAM (see Fig. 10.6.1). Fabrication done using 130nm GlobalFoundries device technology Tezzaron TSV bonding technology. Packaging by Amkor....

10.1109/isscc.2012.6176969 article EN 2012-02-01

Given recent algorithm, software, and hardware innovation, computing has enabled a plethora of new applications. As becomes increasingly ubiquitous, however, so does its environmental impact. This paper brings the issue to attention computer-systems researchers. Our analysis, built on industry-reported characterization, quantifies effects in terms carbon emissions. Broadly, emissions have two sources: operational energy consumption, manufacturing infrastructure. Although from former are...

10.1109/hpca51647.2021.00076 article EN 2021-02-01

As the application of deep learning continues to grow, so does amount data used make predictions. While traditionally big-data was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE client-cloud model allows cloud services perform inferences directly on clients' encrypted data. can meet privacy constraints it introduces enormous computational challenges remains impractically slow...

10.1109/hpca51647.2021.00013 article EN 2021-02-01

Near-memory processing (NMP) is a prospective paradigm enabling memory-centric computing. By moving the compute capability next to main memory (DRAM modules), it can fundamentally address CPU-memory bandwidth bottleneck and thus effectively improve performance of memory-constrained workloads. Using personalized recommendation system as driving example, we developed scalable, practical DIMM-based NMP solution tailor-designed for accelerating inference serving. Our demonstrated on versatile...

10.1109/mm.2021.3097700 article EN IEEE Micro 2021-08-24

Given the performance and efficiency optimizations realized by computer systems architecture community over last decades, dominating source of computing's carbon footprint is shifting from operational emissions to embodied emissions. These owe hardware manufacturing infrastructure-related activities. Despite rising emissions, there a distinct lack architectural modeling tools quantify optimize end-to-end computing. This work proposes ACT, an framework, enable characterization...

10.1145/3470496.3527408 article EN 2022-05-31

Given recent algorithm, software, and hardware innovation, computing has enabled a plethora of new applications. As becomes increasingly ubiquitous, however, so does its environmental impact. This article brings the issue to attention computer-systems researchers. Our analysis, built on industry-reported characterization, quantifies effects in terms carbon emissions. Broadly, emissions have two sources: operational energy consumption, manufacturing infrastructure. Although from former are...

10.1109/mm.2022.3163226 article EN IEEE Micro 2022-03-29

DRAMs require periodic refresh for preserving data stored in them. The interval depends on the vendor and design technology they use. For each a DRAM row, information cell is read out then written back to itself as bit self-destructive. process inevitable maintaining correctness, unfortunately, at expense of power bandwidth overhead. future trend integrate layers 3D die-stacked top processor further exacerbates situation accesses these will be more frequent hiding cycles available slack...

10.1109/micro.2007.13 article EN 2007-01-01

Transactional Memory (TM) promises to simplify concurrent programming, which has been notoriously difficult but crucial in realizing the performance benefit of multi-core processors. Software Transaction (STM), particular, represents a body important TM technologies since it provides mechanism run transactional programs when hardware support is not available, or resources are exhausted. Nonetheless, most previous researches on STMs were constrained executing trivial, small-scale workloads....

10.1145/1378533.1378582 article EN 2008-06-14

This paper presents the first multiobjective microarchitectural floorplanning algorithm for high-performance processors implemented in two-dimensional (2-D) and three-dimensional (3-D) ICs. The floorplanner takes a netlist determines dimension as well placement of functional modules into single- or multiple-device layers while simultaneously achieving high performance thermal reliability. traditional design objectives such area wirelength are also considered. 3-D considers following...

10.1109/tcad.2006.883925 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2007-01-01

There are several emerging memory technologies looming on the horizon to compensate physical scaling challenges of DRAM. Phase change (PCM) is one such candidate proposed for being part main in computing systems. One salient feature PCM its multi-level-cell (MLC) property, which can be used multiply capacity at cell level. However, due nature that value written drift over time, prone a unique type soft errors, posing great challenge their practical deployment. This paper first quantitatively...

10.1145/2485922.2485960 article EN 2013-06-23

Article Eager writeback - a technique for improving bandwidth utilization Share on Authors: Hsien-Hsin S. Lee ACAL, EECS Department, University Of Michigan, Ann Arbor, MI MIView Profile , Gary Tyson Matthew K. Farrens Department of Computer Science, California, Davis, CA CAView Authors Info & Claims MICRO 33: Proceedings the 33rd annual ACM/IEEE international symposium MicroarchitectureDecember 2000 Pages 11–21https://doi.org/10.1145/360128.360132Online:01 December 2000Publication History...

10.1145/360128.360132 article EN 2000-12-01

Encrypting data in unprotected memory has gained much interest lately for digital rights protection and security reasons. Counter Mode is a well-known encryption scheme. It symmetric-key scheme based on any block cipher, e.g. AES. The schemeýs algorithm uses secret key counter (or sequence number) to generate an pad which XORed with the stored memory. Like other schemes, this method suffers from inherent latency of decrypting encrypted when loading them into on-chip cache. One solution that...

10.1145/1080695.1069972 article EN ACM SIGARCH Computer Architecture News 2005-05-01

With more applications being deployed on embedded platforms, software protection becomes increasingly important. This problem is crucial systems like financial transaction terminals, pay-TV access-control decoders, where adversaries may easily gain full physical accesses to the and critical algorithms must be protected from cracked. However, as this paper points out that protecting with either encryption or obfuscation cannot completely preclude control flow information leaked. Encryption...

10.1145/1023833.1023873 article EN 2004-09-22

The first memristor, originally theorized by Dr. Leon Chua in 1971, was identified a team at HP Labs 2008. This new fundamental circuit element is unique that its resistance changes as current passes through it, giving the device memory of past system state. immediately obvious application such non-volatile memory, wherein high- and low-resistance states are used to store binary values. A array memristors forms what called resistive RAM or RRAM. In this paper, we survey have been produced...

10.1109/3dic.2009.5306582 article EN 2009-09-01
Coming Soon ...