NFDI4DS | UHH-SEMS - Publication Details

Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data

OPENALEX - Publications

Tao Lü Qing Liu Xubin He Huizhang Luo E. Suchyta and 6 more

Scientific simulations generate large amounts of floating-point data, which are often not very compressible using the traditional reduction schemes, such as deduplication or lossless compression. The emergence lossy compression holds promise to satisfy data demand from HPC applications; however, has been widely adopted in science production. We believe a fundamental reason is that there lack understanding benefits, pitfalls, and performance on scientific data. In this paper, we conduct...

10.1109/ipdps.2018.00044 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2018-05-01

MF-Net: A Multimodal Fusion Model for Fast Multi-object Tracking

OPENALEX - Publications

Shirui Tian Mingxing Duan Jiayan Deng Huizhang Luo Yikun Hu

In the realm of multimodal multi-object tracking (MOT) applications based on point clouds and images, current research predominantly focuses enhancing accuracy, often neglecting issue computational efficiency. Consequently, these models struggle to exhibit optimal capabilities in scenarios demanding high real-time performance. To address challenges, this paper introduces a fast model fusion (MF-Net). The is divided into three primary modules: object detection, fusion, trajectory matching....

10.1109/tvt.2024.3375457 article EN IEEE Transactions on Vehicular Technology 2024-03-11

HSMU-SpGEMM: Achieving High Shared Memory Utilization for Parallel Sparse General Matrix-Matrix Multiplication on Modern GPUs

OPENALEX - Publications

Min Wu Huizhang Luo Fenfang Li Yiran Zhang Zhuo Tang and 3 more

10.1109/hpca61900.2025.00109 article EN 2025-03-01

Optimizing both performance and tail latency for B+tree on persistent memory

OPENALEX - Publications

Xianyu He Chaoshu Yang Runyu Zhang Huizhang Luo Zhi‐Chao Cao and 1 more

10.1016/j.sysarc.2025.103406 article EN Journal of Systems Architecture 2025-04-01

Two-step state transition minimization for lifetime and performance improvement on MLC STT-RAM

OPENALEX - Publications

Huizhang Luo Jingtong Hu Liang Shi Chun Jason Xue Qingfeng Zhuge

Spin-transfer torque random access memory (STT-RAM) is considered as a promising candidate to replace SRAM the next generation cache since it has better scalability and lower leakage power. Recently, 2-bit multi-level cell (MLC) STT-RAM been proposed further increase data density. However, key drawback for MLC that magnetization directions of its hard soft domains cannot be flipped two opposite simultaneously, which leads two-step problem in state transitions. Two-step transitions would...

10.1145/2897937.2898106 article EN 2016-05-25

Identifying Latent Reduced Models to Precondition Lossy Compression

OPENALEX - Publications

Huizhang Luo Dan Huang Qing Liu Zhenbo Qiao Hong Jiang and 5 more

With the high volume and velocity of scientific data produced on high-performance computing systems, it has become increasingly critical to improve compression performance. Leveraging general tolerance reduced accuracy in applications, lossy compressors can achieve much higher ratios with a user-prescribed error bound. However, they are still far from satisfying reduction requirements applications. In this paper, we propose evaluate idea that need be preconditioned prior compression, such...

10.1109/ipdps.2019.00039 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2019-05-01

Compression Ratio Modeling and Estimation across Error Bounds for Lossy Compression

OPENALEX - Publications

Jinzhen Wang Tong Liu Qing Liu Xubin He Huizhang Luo and 1 more

Scientific simulations on high-performance computing (HPC) systems generate vast amounts of floating-point data that need to be reduced in order lower the storage and I/O cost. Lossy compressors trade accuracy for reduction performance have been demonstrated effective reducing volume. However, a key hurdle wide adoption lossy is trade-off between compression performance, particularly ratio, not well understood. Consequently, domain scientists often exhaust many possible error bounds before...

10.1109/tpds.2019.2938503 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2019-08-30

zMesh: Exploring Application Characteristics to Improve Lossy Compression Ratio for Adaptive Mesh Refinement

OPENALEX - Publications

Huizhang Luo Junqi Wang Qing Liu Jieyang Chen Scott Klasky and 1 more

Scientific simulations on high-performance computing systems produce vast amounts of data that need to be stored and analyzed efficiently. Lossy compression significantly reduces the volume by trading accuracy for performance. Despite recent success lossy compression, such as ZFP SZ, performance is still far from being able keep up with exponential growth data. This paper aims further take advantage application characteristics, an area often under-explored, improve ratios adaptive mesh...

10.1109/ipdps49936.2021.00048 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2021-05-01

Energy, latency, and lifetime improvements in MLC NVM with enhanced WOM code

OPENALEX - Publications

Huizhang Luo Liang Shi Qiao Li Chun Jason Xue Edwin H.‐M. Sha

Non-volatile memories (NVMs), such as phase change memory (PCM) and resistive random access (ReRAM), have emerged promising technologies for replacements of DRAM due to their advantages, better scalability, zero cell leakage, DRAM-comparable read latency. Furthermore, multiple level (MLC) NVMs offer high data density capacity over single (SLC) NVM-s. However, the adoption MLC is limited by programming energy latency well low endurance. In this paper, we propose an enhanced (2 <sup...

10.1109/aspdac.2018.8297381 article EN 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC) 2018-01-01

Accurate age counter for wear leveling on non-volatile based main memory

OPENALEX - Publications

Huizhang Luo Qingfeng Zhuge Liang Shi Jian Li Edwin H.‐M. Sha

10.1007/s10617-014-9141-x article EN Design Automation for Embedded Systems 2013-09-01

Peak-to-average pumping efficiency improvement for charge pump in Phase Change Memories

OPENALEX - Publications

Huizhang Luo Jingtong Hu Liang Shi Chun Jason Xue Qingfeng Zhuge

The emerging Phase Change Memory (PCM) is considered as a promising candidate to replace DRAM the next generation main memory since it has better scalability and lower leakage power. However, high write power consumption become challenge in adopting PCM memory. In addition fact that writing cells requires current voltage, loss charge pumps (CPs) also contributes large percentage of consumption. pumping efficiency chip concave function current. Based on characteristics function, overall can...

10.1109/aspdac.2016.7428053 article EN 2016-01-01

ZFP-X: Efficient Embedded Coding for Accelerating Lossy Floating Point Compression

OPENALEX - Publications

Bing Lu Yida Li Junqi Wang Huizhang Luo Kenli Li

Today's scientific simulations are confronting seriously limited I/O bandwidth, network and storage capacity because of immense volumes data generated in high-performance computing systems. Data compression has emerged as one the most effective approaches to resolve issue exponential increase data. However, existing state-of-the-art compressors also low throughput, especially under trend growing disparities between compute rates. Among them, embedded coding is widely applied, which...

10.1109/ipdps54959.2023.00107 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2023-05-01

Improving MLC PCM write throughput by write reconstruction

OPENALEX - Publications

Huizhang Luo Liang Shi Mengying Zhao Qingfeng Zhuge Chun Jason Xue

The emerging Phase Change Memory (PCM) is considered as one of the most promising candidates to replace DRAM main memory due its better scalability and nonvolatility. With multi-bit storage capability, Multiple-Level-Cell (MLC) PCM outperforms Single-Level-Cell (SLC) in density. However, high write latency a performance bottleneck for MLC two reasons. First, has much longer programming time. Second, latencies different transitions cell states range widely. When cells are concurrently written...

10.1109/nvmsa.2015.7304373 article EN 2015-08-01

DuoModel: Leveraging Reduced Model for Data Reduction and Re-Computation on HPC Storage

OPENALEX - Publications

Huizhang Luo Qing Liu Zhenbo Qiao Jinzhen Wang Mengxiao Wang and 1 more

High-performance computing (HPC) applications generate large amounts of floating-point data that need to be stored and analyzed efficiently extract the insights advance knowledge discovery. With growing disparities between compute I/O, optimizing storage stack alone may not suffice cure I/O problem. There has been a strong push in HPC communities perform reduction before is transmitted order lower cost. However, as now, neither lossless nor lossy compressors can achieve adequate ratio...

10.1109/lcos.2018.2855118 article EN Letters of the IEEE Computer Society 2018-01-01

SIRIUS: Enabling Progressive Data Exploration for Extreme-Scale Scientific Data

OPENALEX - Publications

Zhenbo Qiao Tao Lü Huizhang Luo Qing Liu Scott Klasky and 2 more

Scientific simulations on high performance computing (HPC) platforms generate large quantities of data. To bridge the widening gap between compute and I/O, enable data to be more efficiently stored analyzed, simulation outputs need refactored, reduced, appropriately mapped storage tiers. However, a systematic solution support these steps has been lacking in current HPC software ecosystem. that end, this paper develops SIRIUS, progressive JPEG-like management scheme for storing analyzing big...

10.1109/tmscs.2018.2886851 article EN IEEE Transactions on Multi-Scale Computing Systems 2018-10-01

Write reconstruction for write throughput improvement on MLC PCM based main memory

OPENALEX - Publications

Huizhang Luo Penglin Dai Liang Shi Chun Jason Xue Qingfeng Zhuge and 1 more

10.1016/j.sysarc.2016.05.006 article EN Journal of Systems Architecture 2016-06-26

Energy, latency, and lifetime improvements in MLC NVM with enhanced WOM code

OPENALEX - Publications

Huizhang Luo Liang Shi Qiao Li Chun Jason Xue Edwin H.‐M. Sha

Non-volatile memories (NVMs), such as phase change memory (PCM) and resistive random access (ReRAM), have emerged promising technologies for replacements of DRAM due to their advantages, better scalability, zero cell leakage, DRAM-comparable read latency. Furthermore, multiple level (MLC) NVMs offer high data density capacity over single (SLC) NVM-s. However, the adoption MLC is limited by programming energy latency well low endurance. In this paper, we propose an enhanced (23}2/4 WOM code...

10.5555/3201607.3201737 article EN Asia and South Pacific Design Automation Conference 2018-01-22

Write Energy Reduction for PCM via Pumping Efficiency Improvement

OPENALEX - Publications

Huizhang Luo Qing Liu Jingtong Hu Qiao Li Liang Shi and 2 more

The emerging Phase Change Memory (PCM) is considered to be a promising candidate replace DRAM as the next generation main memory due its higher scalability and lower leakage power. However, high write power consumption has become major challenge in adopting PCM memory. In addition fact that writing cells requires current voltage, loss charge pumps also contributes large percentage of consumption. pumping efficiency chip concave function current. Leveraging characteristics function, overall...

10.1145/3200139 article EN ACM Transactions on Storage 2018-08-31

A content-aware writing mechanism for reducing energy on non-volatile memory based embedded storage systems

OPENALEX - Publications

Jian Li Qingfeng Zhuge Duo Liu Huizhang Luo Edwin H.‐M. Sha

10.1007/s10617-014-9150-9 article EN Design Automation for Embedded Systems 2013-09-01

LAMP: Improving Compression Ratio for AMR Applications via Level Associated Mapping-Based Preconditioning

OPENALEX - Publications

Yida Li Huizhang Luo Fenfang Li Junqi Wang Kenli Li

Data compression can efficiently reduce the memory and persistence storage cost, which is highly desirable in modern computing systems, such as enterprise, cloud, High-Performance Computing (HPC) environments. However, main challenges of existing data compressors are insufficient ratio low throughput. This paper focuses on improving state-of-the-art lossy algorithms from view applications. Besides, we also use characteristics applications to runtime overhead. To this end, explore idea with...

10.1109/tc.2023.3297442 article EN IEEE Transactions on Computers 2023-07-20

cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores

OPENALEX - Publications

Zixuan Li Mingxing Duan Huizhang Luo Wangdong Yang Kenli Li and 1 more

Sparse tensors are prevalent in real-world applications, often characterized by their large-scale, high-order, and high-dimensional nature. Directly handling raw is impractical due to the significant memory computational overhead involved. The current mainstream approach involves compressing or decomposing original tensor. One popular tensor decomposition algorithm Tucker decomposition. However, existing state-of-the-art algorithms for large-scale typically relax optimization problem into...

10.48550/arxiv.2404.10087 preprint EN arXiv (Cornell University) 2024-04-15

Hierarchical Explanations for Text Classification Models: Fast and Effective

OPENALEX - Publications

Zhenyu Nie Zheng Xiao Huizhang Luo Xuan Liu Anthony T. Chronopoulos

10.1109/icdm59182.2024.00042 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2024-12-09

zMesh: Theories and Methods to Exploring Application Characteristics to Improve Lossy Compression Ratio for Adaptive Mesh Refinement

OPENALEX - Publications

Huizhang Luo Junqi Wang Qing Liu Jieyang Chen Scott Klasky and 1 more

Scientific simulations on high-performance computing systems produce vast amounts of data that need to be stored and analyzed efficiently. Lossy compression significantly reduces the volume by trading accuracy for performance. Despite recent success lossy compressions, such as ZFP SZ, performance is still far from being able keep up with exponential growth data. This article aims further take advantage application characteristics, an area often under-explored, improve ratios adaptive mesh...

10.1109/tpds.2022.3168386 article EN IEEE Transactions on Parallel and Distributed Systems 2022-04-19

FastLoad: Speeding up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUs

OPENALEX - Publications

Jinyu Hu Huizhang Luo Hong Jiang Guoqing Xiao Kenli Li

10.1109/tpds.2024.3477431 article EN IEEE Transactions on Parallel and Distributed Systems 2024-01-01

zeroTT: A Two-Step State Transition Avoidance Scheme for MLC STT-RAM

OPENALEX - Publications

Dong Yin Huizhang Luo Jeff Zhang Mingxing Duan Wangdong Yang and 2 more

10.1145/3649329.3658240 article EN 2024-06-23