- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Algorithms and Data Compression
- Distributed and Parallel Computing Systems
- Advanced Memory and Neural Computing
- Interconnection Networks and Systems
- Computer Graphics and Visualization Techniques
- Numerical Methods and Algorithms
- Advanced Neural Network Applications
- Model Reduction and Neural Networks
- Semiconductor materials and devices
- Cloud Computing and Resource Management
- Ferroelectric and Negative Capacitance Devices
- Target Tracking and Data Fusion in Sensor Networks
- Advanced Data Compression Techniques
- Magnetic properties of thin films
- Caching and Content Delivery
- Distributed systems and fault tolerance
- Scientific Computing and Data Management
- Fire Detection and Safety Systems
- Topic Modeling
- Video Surveillance and Tracking Methods
Hunan University
2021-2025
National Supercomputing Center in Wuxi
2023
New Jersey Institute of Technology
2018-2022
Chongqing University
2013-2018
Scientific simulations generate large amounts of floating-point data, which are often not very compressible using the traditional reduction schemes, such as deduplication or lossless compression. The emergence lossy compression holds promise to satisfy data demand from HPC applications; however, has been widely adopted in science production. We believe a fundamental reason is that there lack understanding benefits, pitfalls, and performance on scientific data. In this paper, we conduct...
In the realm of multimodal multi-object tracking (MOT) applications based on point clouds and images, current research predominantly focuses enhancing accuracy, often neglecting issue computational efficiency. Consequently, these models struggle to exhibit optimal capabilities in scenarios demanding high real-time performance. To address challenges, this paper introduces a fast model fusion (MF-Net). The is divided into three primary modules: object detection, fusion, trajectory matching....
Spin-transfer torque random access memory (STT-RAM) is considered as a promising candidate to replace SRAM the next generation cache since it has better scalability and lower leakage power. Recently, 2-bit multi-level cell (MLC) STT-RAM been proposed further increase data density. However, key drawback for MLC that magnetization directions of its hard soft domains cannot be flipped two opposite simultaneously, which leads two-step problem in state transitions. Two-step transitions would...
With the high volume and velocity of scientific data produced on high-performance computing systems, it has become increasingly critical to improve compression performance. Leveraging general tolerance reduced accuracy in applications, lossy compressors can achieve much higher ratios with a user-prescribed error bound. However, they are still far from satisfying reduction requirements applications. In this paper, we propose evaluate idea that need be preconditioned prior compression, such...
Scientific simulations on high-performance computing (HPC) systems generate vast amounts of floating-point data that need to be reduced in order lower the storage and I/O cost. Lossy compressors trade accuracy for reduction performance have been demonstrated effective reducing volume. However, a key hurdle wide adoption lossy is trade-off between compression performance, particularly ratio, not well understood. Consequently, domain scientists often exhaust many possible error bounds before...
Scientific simulations on high-performance computing systems produce vast amounts of data that need to be stored and analyzed efficiently. Lossy compression significantly reduces the volume by trading accuracy for performance. Despite recent success lossy compression, such as ZFP SZ, performance is still far from being able keep up with exponential growth data. This paper aims further take advantage application characteristics, an area often under-explored, improve ratios adaptive mesh...
Non-volatile memories (NVMs), such as phase change memory (PCM) and resistive random access (ReRAM), have emerged promising technologies for replacements of DRAM due to their advantages, better scalability, zero cell leakage, DRAM-comparable read latency. Furthermore, multiple level (MLC) NVMs offer high data density capacity over single (SLC) NVM-s. However, the adoption MLC is limited by programming energy latency well low endurance. In this paper, we propose an enhanced (2 <sup...
The emerging Phase Change Memory (PCM) is considered as a promising candidate to replace DRAM the next generation main memory since it has better scalability and lower leakage power. However, high write power consumption become challenge in adopting PCM memory. In addition fact that writing cells requires current voltage, loss charge pumps (CPs) also contributes large percentage of consumption. pumping efficiency chip concave function current. Based on characteristics function, overall can...
Today's scientific simulations are confronting seriously limited I/O bandwidth, network and storage capacity because of immense volumes data generated in high-performance computing systems. Data compression has emerged as one the most effective approaches to resolve issue exponential increase data. However, existing state-of-the-art compressors also low throughput, especially under trend growing disparities between compute rates. Among them, embedded coding is widely applied, which...
The emerging Phase Change Memory (PCM) is considered as one of the most promising candidates to replace DRAM main memory due its better scalability and nonvolatility. With multi-bit storage capability, Multiple-Level-Cell (MLC) PCM outperforms Single-Level-Cell (SLC) in density. However, high write latency a performance bottleneck for MLC two reasons. First, has much longer programming time. Second, latencies different transitions cell states range widely. When cells are concurrently written...
High-performance computing (HPC) applications generate large amounts of floating-point data that need to be stored and analyzed efficiently extract the insights advance knowledge discovery. With growing disparities between compute I/O, optimizing storage stack alone may not suffice cure I/O problem. There has been a strong push in HPC communities perform reduction before is transmitted order lower cost. However, as now, neither lossless nor lossy compressors can achieve adequate ratio...
Scientific simulations on high performance computing (HPC) platforms generate large quantities of data. To bridge the widening gap between compute and I/O, enable data to be more efficiently stored analyzed, simulation outputs need refactored, reduced, appropriately mapped storage tiers. However, a systematic solution support these steps has been lacking in current HPC software ecosystem. that end, this paper develops SIRIUS, progressive JPEG-like management scheme for storing analyzing big...
Non-volatile memories (NVMs), such as phase change memory (PCM) and resistive random access (ReRAM), have emerged promising technologies for replacements of DRAM due to their advantages, better scalability, zero cell leakage, DRAM-comparable read latency. Furthermore, multiple level (MLC) NVMs offer high data density capacity over single (SLC) NVM-s. However, the adoption MLC is limited by programming energy latency well low endurance. In this paper, we propose an enhanced (23}2/4 WOM code...
The emerging Phase Change Memory (PCM) is considered to be a promising candidate replace DRAM as the next generation main memory due its higher scalability and lower leakage power. However, high write power consumption has become major challenge in adopting PCM memory. In addition fact that writing cells requires current voltage, loss charge pumps also contributes large percentage of consumption. pumping efficiency chip concave function current. Leveraging characteristics function, overall...
Data compression can efficiently reduce the memory and persistence storage cost, which is highly desirable in modern computing systems, such as enterprise, cloud, High-Performance Computing (HPC) environments. However, main challenges of existing data compressors are insufficient ratio low throughput. This paper focuses on improving state-of-the-art lossy algorithms from view applications. Besides, we also use characteristics applications to runtime overhead. To this end, explore idea with...
Sparse tensors are prevalent in real-world applications, often characterized by their large-scale, high-order, and high-dimensional nature. Directly handling raw is impractical due to the significant memory computational overhead involved. The current mainstream approach involves compressing or decomposing original tensor. One popular tensor decomposition algorithm Tucker decomposition. However, existing state-of-the-art algorithms for large-scale typically relax optimization problem into...
Scientific simulations on high-performance computing systems produce vast amounts of data that need to be stored and analyzed efficiently. Lossy compression significantly reduces the volume by trading accuracy for performance. Despite recent success lossy compressions, such as ZFP SZ, performance is still far from being able keep up with exponential growth data. This article aims further take advantage application characteristics, an area often under-explored, improve ratios adaptive mesh...