NFDI4DS | UHH-SEMS - Publication Details

Zizhong Chen

ORCID: 0000-0003-2578-4940

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5061737717

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Distributed systems and fault tolerance
Distributed and Parallel Computing Systems
Radiation Effects in Electronics
Algorithms and Data Compression
Interconnection Networks and Systems
Cloud Computing and Resource Management
Advanced Data Compression Techniques
Supercapacitor Materials and Fabrication
Rough Sets and Fuzzy Logic
Data Mining Algorithms and Applications
Advancements in Battery Materials
Machine Learning and Data Classification
Imbalanced Data Classification Techniques
Face and Expression Recognition
Data Management and Algorithms
Advanced Image and Video Retrieval Techniques
Scientific Computing and Data Management
Advanced Clustering Algorithms Research
Advanced Battery Materials and Technologies
Topic Modeling
Advanced Neural Network Applications
Caching and Content Delivery
Anomaly Detection Techniques and Applications

University of California, Riverside
2016-2025

Xidian University
2023-2024

Shandong University
2019-2024

State Key Laboratory of Crystal Materials
2023

Chongqing University of Posts and Telecommunications
2018-2022

University of California System
2016-2021

Argonne National Laboratory
2019-2021

University of Florida
2021

Illinois Institute of Technology
2021

Washington State University
2021

Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

OPENALEX - Publications

Dingwen Tao Sheng Di Zizhong Chen Franck Cappello

Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm large-scale data. Our key contribution is significantly improving the prediction hitting rate (or accuracy) each point based on its nearby values along multiple dimensions. We derive series multilayer formulas their unified formula in context compression. One...

10.1109/ipdps.2017.115 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2017-05-01

Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets

OPENALEX - Publications

Xin Liang Sheng Di Dingwen Tao Sihuan Li Shaomeng Li and 3 more

Today's scientific simulations require a significant reduction of the data size because extremely large volumes they produce and limitation storage bandwidth space. If compression is set to reach high ratio, however, reconstructed are often distorted too much tolerate. In this paper, we explore new strategy that can effectively control distortion when significantly reducing size. The contribution threefold. (1) We propose an adaptive framework select either our improved Lorenzo prediction...

10.1109/bigdata.2018.8622520 article EN 2021 IEEE International Conference on Big Data (Big Data) 2018-12-01

GBNRS: A Novel Rough Set Algorithm for Fast Adaptive Attribute Reduction in Classification

OPENALEX - Publications

Shuyin Xia Hao Zhang Wenhua Li Guoyin Wang Elisabeth Giem and 1 more

Feature reduction is an important aspect of Big Data analytics on today’s ever-larger datasets. Rough sets are a classical method widely applied in attribute reduction. Most rough set algorithms use the <i>priori</i> domain knowledge dataset to process continuous attributes through using membership function. Neighborhood (NRS) replace function with concept neighborhoods, allowing NRS handle scenarios where no <i>a priori</i> available. However, neighborhood radius each object fixed,...

10.1109/tkde.2020.2997039 article EN IEEE Transactions on Knowledge and Data Engineering 2020-05-25

Condition Numbers of Gaussian Random Matrices

OPENALEX - Publications

Zizhong Chen Jack Dongarra

Let $G_{m \times n}$ be an $m n$ real random matrix whose elements are independent and identically distributed standard normal variables, let $\kappa_2(G_{m n})$ the 2-norm condition number of n}$. We prove that, for any \geq 2$, $n $x |n-m|+1$, satisfies ${\scriptsize \frac{1}{\sqrt{2\pi}}} ( { c }/{x} )^{|n-m|+1} < P{\scriptsize(\frac{\kappa_2(G_{m n})} {{n}/{(|n-m|+1)}} > x )} {\scriptsize C )^{|n-m|+1},$ where $0.245 \leq 2.000$ $5.013$ $\leq 6.414$ universal positive constants m, n, x....

10.1137/040616413 article EN SIAM Journal on Matrix Analysis and Applications 2005-01-01

RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise

OPENALEX - Publications

Baiyun Chen Shuyin Xia Zizhong Chen Binggui Wang Guoyin Wang

10.1016/j.ins.2020.10.013 article EN Information Sciences 2020-10-16

Online-ABFT

OPENALEX - Publications

Zizhong Chen

Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Large supercomputers especially susceptible to soft because their large number components. can generally be detected offline through comparison final computation results two duplicated computations, this approach often introduces significant overhead. This paper presents Online-ABFT, simple efficient online error detection technique detect in widely used Krylov subspace iterative...

10.1145/2442516.2442533 article EN 2013-02-23

A Fast Adaptive k-means with No Bounds

OPENALEX - Publications

Shuyin Xia Daowan Peng Deyu Meng Changqing Zhang Guoyin Wang and 3 more

This paper presents a novel accelerated exact k-means called as "Ball k-means" by using the ball to describe each cluster, which focus on reducing point-centroid distance computation. The can exactly find its neighbor clusters for resulting computations only between point and clusters' centroids instead of all centroids. What's more, cluster be divided into "stable area" "active area", latter one is further some "annular area". assignment points in not changed while will adjusted within few...

10.1109/tpami.2020.3008694 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-01-01

Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation

OPENALEX - Publications

Kai Zhao Sheng Di Maxim Dmitriev Thierry-Laurent Tonellot Zizhong Chen and 1 more

Today's scientific simulations are producing vast volumes of data that cannot be stored and transferred efficiently because limited storage capacity, parallel I/O bandwidth, network bandwidth. The situation is getting worse over time the ever-increasing gap between relatively slow transfer speed fast-growing computation power in modern supercomputers. Error-bounded lossy compression becoming one most critical techniques for resolving big issue, it can significantly reduce volume while...

10.1109/icde51399.2021.00145 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

SZ3: A Modular Framework for Composing Prediction-Based Error-Bounded Lossy Compressors

OPENALEX - Publications

Xin Liang Kai Zhao Sheng Di Sihuan Li Robert Underwood and 7 more

Today's scientific simulations require a significant reduction of data volume because extremely large amounts they produce and the limited I/O bandwidth storage space. Error-bounded lossy compression has been considered one most effective solutions to above problem. In practice, however, best-fit method often needs be customized or optimized in particular diverse characteristics different datasets various user requirements on quality performance. this paper, we address issue with novel...

10.1109/tbdata.2022.3201176 article EN IEEE Transactions on Big Data 2022-08-23

Rationally designed hollow carbon nanospheres decorated with S,P co-doped NiSe2 nanoparticles for high-performance potassium-ion and lithium-ion batteries

OPENALEX - Publications

Jiajia Ye Zizhong Chen Zhiqiang Zheng Zhanghua Fu Guanghao Gong and 2 more

10.1016/j.jechem.2022.12.052 article EN publisher-specific-oa Journal of Energy Chemistry 2023-01-11

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

OPENALEX - Publications

Yujia Zhai Chengquan Jiang Leyuan Wang Xiaoying Jia Shang Zhang and 3 more

Transformers have become keystone models in natural language processing over the past decade. They achieved great popularity deep learning applications, but increasing sizes of parameter spaces required by transformer generate a commensurate need to accelerate performance. Natural problems are also routinely faced with variable-length sequences, as word counts commonly vary among sentences. Existing frameworks pad sequences maximal length, which adds significant memory and computational...

10.1109/ipdps54959.2023.00042 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2023-05-01

A Lettpoint-Yolov11l Based Intelligent Robot for Precision Intra-Row Weeds Control in Lettuce

OPENALEX - Publications

Rui-Feng Wang Yu-Hao Tu Zizhong Chen Chang-Tao Zhao Wen‐Hao Su

10.2139/ssrn.5162748 preprint EN 2025-01-01

Algorithm-Based Fault Tolerance for Fail-Stop Failures

OPENALEX - Publications

Zizhong Chen Jack Dongarra

Fail-stop failures in distributed environments are often tolerated by checkpointing or message logging. In this paper, we show that fail-stop process ScaLAPACK matrix-matrix multiplication kennel can be without It has been proved previous algorithm-based fault tolerance that, for multiplication, the checksum relationship input matrices is preserved at end of computation no mater which algorithm chosen. From final results, processor miscalculations detected, located, and corrected...

10.1109/tpds.2008.58 article EN IEEE Transactions on Parallel and Distributed Systems 2008-11-12

Algorithm-based recovery for iterative methods without checkpointing

OPENALEX - Publications

Zizhong Chen

In today's high performance computing practice, fail-stop failures are often tolerated by checkpointing. While checkpointing is a very general technique and can be applied to wide range of applications, it introduces considerable overhead especially when computations reach petascale beyond. this paper, we show that, for many iterative methods, if the parallel data partitioning scheme satisfies certain conditions, methods themselves will maintain enough inherent redundant information accurate...

10.1145/1996130.1996142 article EN 2011-06-08

High performance linpack benchmark

OPENALEX - Publications

Teresa Davies Christer Karlsson Hui Liu Chong Ding Zizhong Chen

The probability that a failure will occur before the end of computation increases as number processors used in high performance computing application increases. For long running applications using large processors, it is essential fault tolerance be to prevent total loss all finished computations after failure. While checkpointing has been very useful tolerate failures for time, often introduces considerable overhead especially when modify amount memory between checkpoints and large. In this...

10.1145/1995896.1995923 article EN 2011-05-31

Complete Random Forest Based Class Noise Filtering Learning for Improving the Generalizability of Classifiers

OPENALEX - Publications

Shuyin Xia Guoyin Wang Zizhong Chen Yanlin Duan Qun Liu

The existing noise detection methods required the classifiers or distance measurements data overall distribution, and `curse of dimensionality' other restrictions made them insufficiently effective in complex data, e.g., different attribute weights, high-dimensionality, containing feature noise, nonlinearity, etc. This is also main reason that filtering were not widely applied formed an learning framework. To address this problem, we propose here a complete efficient random forest method...

10.1109/tkde.2018.2873791 article EN IEEE Transactions on Knowledge and Data Engineering 2018-10-04

Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP

OPENALEX - Publications

Dingwen Tao Sheng Di Xin Liang Zizhong Chen Franck Cappello

With ever-increasing volumes of scientific data produced by high-performance computing applications, significantly reducing size is critical because limited capacity storage space and potential bottlenecks on I/O or networks in writing/reading transferring data. SZ ZFP are two leading BSD licensed open source C/C++ libraries for compressed floating-point arrays that support high throughput read write random access. However, their performance not consistent across different sets fields some...

10.1109/tpds.2019.2894404 article EN IEEE Transactions on Parallel and Distributed Systems 2019-01-22

TiO2 particles wrapped onto macroporous germanium skeleton as high performance anode for lithium-ion batteries

OPENALEX - Publications

Qiang Liu Jiagang Hou Caixia Xu Zizhong Chen Rong Qin and 1 more

10.1016/j.cej.2019.122649 article EN Chemical Engineering Journal 2019-08-28

An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound

OPENALEX - Publications

Xin Liang Sheng Di Dingwen Tao Zizhong Chen Franck Cappello

Because of the ever-increasing execution scale scientific applications, how to store extremely large volume data efficiently is becoming a serious issue. A significant reduction size can effectively mitigate I/O burden and save considerable storage space. Since lossless compressors suffer from limited compression ratios, error-controlled lossy have been studied for years. Existing compressors, however, focus mainly on absolute error bounds, which cannot meet users' diverse demands such as...

10.1109/cluster.2018.00036 article EN 2018-09-01

SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors

OPENALEX - Publications

Kai Zhao Sheng Di Xin Lian Sihuan Li Dingwen Tao and 3 more

Efficient error-controlled lossy compressors are becoming critical to the success of today's large-scale scientific applications because ever-increasing volume data produced by applications. In past decade, many lossless and have been developed with distinct design principles for different datasets in largely diverse domains. order support researchers users assessing comparing a fair convenient way, we establish standard compression assessment benchmark -- Scientific Data Reduction Benchmark...

10.1109/bigdata50022.2020.9378449 article EN 2021 IEEE International Conference on Big Data (Big Data) 2020-12-10

Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization

OPENALEX - Publications

Kai Zhao Sheng Di Xin Liang Sihuan Li Dingwen Tao and 2 more

Today's extreme-scale high-performance computing (HPC) applications are producing volumes of data too large to save or transfer because limited storage space and I/O bandwidth. Error-bounded lossy compression has been commonly known as one the best solutions big science issue, it can significantly reduce volume with strictly controlled distortion based on user requirements. In this work, we develop an adaptive parameter optimization algorithm integrated a series strategies for SZ,...

10.1145/3369583.3392688 article EN 2020-06-22

Fault tolerant high performance computing by a coding approach

OPENALEX - Publications

Zizhong Chen Graham E. Fagg Edgar Gabriel Julien Langou Thara Angskun and 2 more

As the number of processors in today's high performance computers continues to grow, mean-time-to-failure these are becoming significantly shorter than execution time many current computing applications. Although architectures usually robust enough survive node failures without suffering complete system failure, most applications can not and, therefore, whenever a fails, have abort themselves and restart from beginning or stable-storage-based checkpoint.This paper explores use floating-point...

10.1145/1065944.1065973 article EN 2005-06-15

mCRF and mRD: Two Classification Methods Based on a Novel Multiclass Label Noise Filtering Learning Framework

OPENALEX - Publications

Shuyin Xia Baiyun Chen Guoyin Wang Yong Zheng Xinbo Gao and 2 more

Mitigating label noise is a crucial problem in classification. Noise filtering an effective method of dealing with which does not need to estimate the rate or rely on any loss function. However, most methods focus mainly binary classification, leaving more difficult counterpart multiclass classification relatively unexplored. To remedy this deficit, we present definition for setting and propose general framework novel learning Two examples complete random forest (mCRF) relative density, are...

10.1109/tnnls.2020.3047046 article EN publisher-specific-oa IEEE Transactions on Neural Networks and Learning Systems 2021-01-13

Random Space Division Sampling for Label-Noisy Classification or Imbalanced Classification

OPENALEX - Publications

Shuyin Xia Yong Zheng Guoyin Wang Ping He Heng Li and 1 more

This article presents a simple sampling method, which is very easy to be implemented, for classification by introducing the idea of random space division, called "random division sampling" (RSDS). It can extract boundary points as sampled result efficiently distinguishing label noise points, inner and points. makes it first general method that not only reduce data size but also enhance accuracy classifier, especially in label-noisy classification. The "general" means restricted any specific...

10.1109/tcyb.2021.3070005 article EN IEEE Transactions on Cybernetics 2021-04-28

GBRS: A Unified Granular-Ball Learning Model of Pawlak Rough Set and Neighborhood Rough Set

OPENALEX - Publications

Shuyin Xia Cheng Wang Guoyin Wang Xinbo Gao Weiping Ding and 3 more

Pawlak rough set (PRS) and neighborhood (NRS) are the two most common theoretical models. Although PRS can use equivalence classes to represent knowledge, it is unable process continuous data. On other hand, NRSs, which data, rather lose ability of using knowledge. To remedy this deficit, article presents a granular-ball (GBRS) based on computing combining robustness adaptability computing. The GBRS simultaneously both NRS, enabling not only be able deal with data for knowledge...

10.1109/tnnls.2023.3325199 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-11-09

Coming Soon ...