- Advanced Data Storage Technologies
- Parallel Computing and Optimization Techniques
- Caching and Content Delivery
- Semiconductor materials and devices
- Cellular Automata and Applications
- Radiation Effects in Electronics
- Cloud Computing and Resource Management
- Security and Verification in Computing
- Distributed systems and fault tolerance
- Embedded Systems Design Techniques
- Green IT and Sustainability
- Electric and Hybrid Vehicle Technologies
- Mathematical Analysis and Transform Methods
- Mobile Crowdsensing and Crowdsourcing
- Advanced Malware Detection Techniques
- Advanced Memory and Neural Computing
- Industrial Technology and Control Systems
- Interconnection Networks and Systems
- Blockchain Technology Applications and Security
- Ferroelectric and Negative Capacitance Devices
- Cloud Data Security Solutions
- Digital Filter Design and Implementation
- Error Correcting Code Techniques
- Innovative Human-Technology Interaction
- IoT and Edge/Fog Computing
Carnegie Mellon University
2013-2022
Northeastern University
2021
ETH Zurich
2017
University of Michigan–Ann Arbor
2012-2013
University of Hong Kong
2002
Several system-level operations trigger bulk data copy or initialization. Even though these do not require any computation, current systems transfer a large quantity of back and forth on the memory channel to perform such operations. As result, consume high latency, bandwidth, energy--degrading both system performance energy efficiency.
NAND flash memory is ubiquitous in everyday life today because its capacity has continuously increased and cost decreased over decades. This positive growth a result of two key trends: 1) effective process technology scaling; 2) multi-level (e.g., MLC, TLC) cell data coding. Unfortunately, the reliability raw stored also continued to become more difficult ensure, these trends lead fewer electrons floating gate represent data; larger cell-to-cell interference disturbance effects. Without...
Retention errors, caused by charge leakage over time, are the dominant source of flash memory errors. Understanding, characterizing, and reducing retention errors can significantly improve NAND reliability endurance. In this paper, we first characterize, with real 2y-nm MLC chips, how threshold voltage distribution changes different age - length time since a cell was programmed. We observe from our characterization results that 1) optimal read reference cell, using which data be lowest raw...
Although transistor density continues to increase, voltage scaling has stalled and thus power is increasing each technology generation. Particularly in mobile devices, which have limited cooling options, these trends lead a utilization wall sustained chip performance primarily by rather than area. However, many applications do not demand performance; they comprise short bursts of computation response sporadic user activity. To improve responsiveness for such applications, this paper explores...
NAND flash memory reliability continues to degrade as the is scaled down and more bits are programmed per cell. A key contributor this reduced read disturb, where a one row of cells impacts threshold voltages unread in different rows same block. Such disturbances may shift these logical states than originally programmed, leading errors that hurt endurance. For first time open literature, paper experimentally characterizes disturb on state-of-the-art 2Y-nm (i.e., 20-24 nm) MLC chips. Our...
Memory devices represent a key component of datacenter total cost ownership (TCO), and techniques used to reduce errors that occur on these increase this cost. Existing approaches providing reliability for memory pessimistically treat all data as equally vulnerable errors. Our insight is there exists diverse spectrum tolerance in new data-intensive applications, traditional one-size-fits-all are inefficient terms For example, we found while error protection increases system by 12.5%, some...
NAND flash memory density continues to scale keep up with the increasing storage demands of data-intensive applications. Unfortunately, as a result this scaling, lifetime has been decreasing. Each cell in can endure only limited number writes, due damage caused by each program and erase operation on cell. This be partially repaired its own during idle time between or operations (known dwell time), via phenomenon known self-recovery effect. Prior works study effect for planar (i.e., 2D)...
Increased NAND flash memory density has come at the cost of lifetime reductions. Flash can be extended by relaxing internal data retention time, duration for which a cell correctly holds data. Such relaxation cannot exposed externally to avoid altering expected integrity property device. Reliability mechanisms, most prominently refresh, restore integrity, but greatly reduce improvements from time performing large number write operations. We find that achieved more efficiently exploiting...
Modern NAND flash memory chips provide high density by storing two bits of data in each cell, called a multi-level cell (MLC). An MLC partitions the threshold voltage range into four states. When is programmed, applied to cell. Due parasitic capacitance coupling between cells that are physically close other, programming can lead cell-to-cell program interference, which introduces errors neighboring cells. In order reduce impact interference on reliability memory, manufacturers adopt two-step...
Compared to planar NAND flash memory, 3D memory uses a new cell design, and vertically stacks dozens of silicon layers in single chip. This allows increase storage density using much less aggressive manufacturing process technology than NAND. The circuit-level structural changes significantly alter how different error sources affect the reliability memory. Our goal is (1)~identify understand these characteristics (2)~develop techniques mitigate prevailing errors. \chIIIn this paper, we...
NAND flash memory is a widely used storage medium that can be treated as noisy channel. Each cell stores data the threshold voltage of floating gate transistor. The shift result various types circuit-level noise, introducing errors when are read from channel and ultimately reducing lifetime. An accurate model distribution across cells enable mechanisms within controller improve reliability device Unfortunately, existing models either not enough or have high computational complexity, which...
Modern solid-state drives (SSDs) use new host-interface protocols, such as NVMe, to provide applications with fast access storage. These protocols make of a concept known the multi-queue SSD (MQ-SSD), where has direct application-level I/O request queues. This removes most OS software stack that was used in older control how and when requests were dispatched storage devices. Unfortunately, while elimination leads significant performance improvement, we show this paper it introduces problem:...
Most applications manipulate persistent data, yet traditional systems decouple data manipulation from persistence in a two-level storage model. Programming languages and system software one set of formats volatile main memory (DRAM) using load/store interface, while maintain another non-volatile memories, such as Flash hard disk drives systems, file interface. Unfortunately, an approach suffers the performance energy overheads locating moving translating between different these two levels...
Digital forensic investigators often need to extract data from a seized device that contains NAND flash memory. Many such devices are physically damaged, preventing using automated techniques the stored within device. Instead, turn chip-off analysis, where they use thermal-based procedure remove memory chip device, and access directly raw on chip. We perform an analysis of errors introduced into multi-level cell (MLC) chips after has been seized. make two major observations. First, between...
NAND flash memory is ubiquitous in everyday life today because its capacity has continuously increased and cost decreased over decades. This positive growth a result of two key trends: (1) effective process technology scaling; (2) multi-level (e.g., MLC, TLC) cell data coding. Unfortunately, the reliability raw stored also continued to become more difficult ensure, these trends lead fewer electrons floating gate represent data; larger cell-to-cell interference disturbance effects. Without...
Compared to planar (i.e., two-dimensional) NAND flash memory, 3D memory uses a new cell design, and vertically stacks dozens of silicon layers in single chip. This allows increase storage density using much less aggressive manufacturing process technology than memory. The circuit-level structural changes significantly alter how different error sources affect the reliability In this paper, through experimental characterization real, state-of-the-art chips, we find that exhibits three were not...
Numerous tools have been proposed to help developers fix software errors and inefficiencies. Widely-used techniques such as memory checking suffer from overheads that limit their use pre-deployment testing, while more advanced systems severe performance impacts they may require special-purpose hardware. Previous works described hardware can accelerate individual analyses, but specialization stymies adoption; generalized mechanisms are likely be added commercial processors. This paper...
Raw bit errors are common in NAND flash memory and will increase the future. These reduce reliability limit lifetime of a device. We aim to improve with multitude low-cost architectural techniques. show that can be improved at low cost performance overhead by deploying various techniques aware higher-level application behavior underlying device characteristics. analyze error characteristics workload through experimental characterization, design new controller algorithms use insights gained...
The tight thermal constraints of mobile devices, which limit sustainable performance, and the bursty nature interactive applications call for a new design focus: enhancing user responsiveness rather than sustained throughput. To that end, this article explores computational sprinting, wherein device temporarily exceeds limits to provide brief, intense burst computation in response input. By enabling tenfold more within timescale human patience, sprinting has potential fundamentally change...
NAND flash memory is ubiquitous in everyday life today because its capacity has continuously increased and cost decreased over decades. This positive growth a result of two key trends: (1) effective process technology scaling, (2) multi-level (e.g., MLC, TLC) cell data coding. Unfortunately, the reliability raw stored also continued to become more difficult ensure, these trends lead fewer electrons (floating gate) represent larger cell-to-cell interference disturbance effects. Without...
Modern DRAM modules are often equipped with hardware error correction capabilities, especially for deployed in large-scale data centers, as process technology scaling has increased the susceptibility of these devices to errors. To provide fast detection and correction, error-correcting codes (ECC) placed on an additional chip a module. This expands raw capacity module by 12.5%, but applications unable use any this extra capacity, it is used exclusively reliability all data. In reality, there...
In existing systems, to perform any bulk data movement operation (copy or initialization), the has first be read into on-chip processor, all way L1 cache, and result of must written back main memory. This is despite fact that these operations do not involve actual computation. RowClone exploits organization commodity DRAM completely inside using two mechanisms. The mechanism, Fast Parallel Mode, copies between rows same subarray by issuing back-to-back activate commands source destination...
This paper summarizes our work on experimentally characterizing, mitigating, and recovering read disturb errors in multi-level cell (MLC) NAND flash memory, which was published DSN 2015, examines the work's significance future potential. memory reliability continues to degrade as is scaled down more bits are programmed per cell. A key contributor this reduced disturb, where a one row of cells impacts threshold voltages unread different rows same block. For first time open literature,...