- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed systems and fault tolerance
- Distributed and Parallel Computing Systems
- Radiation Effects in Electronics
- Algorithms and Data Compression
- Interconnection Networks and Systems
- Cloud Computing and Resource Management
- Advanced Data Compression Techniques
- Supercapacitor Materials and Fabrication
- Rough Sets and Fuzzy Logic
- Data Mining Algorithms and Applications
- Advancements in Battery Materials
- Machine Learning and Data Classification
- Imbalanced Data Classification Techniques
- Face and Expression Recognition
- Data Management and Algorithms
- Advanced Image and Video Retrieval Techniques
- Scientific Computing and Data Management
- Advanced Clustering Algorithms Research
- Advanced Battery Materials and Technologies
- Topic Modeling
- Advanced Neural Network Applications
- Caching and Content Delivery
- Anomaly Detection Techniques and Applications
University of California, Riverside
2016-2025
Xidian University
2023-2024
Shandong University
2019-2024
State Key Laboratory of Crystal Materials
2023
Chongqing University of Posts and Telecommunications
2018-2022
University of California System
2016-2021
Argonne National Laboratory
2019-2021
University of Florida
2021
Illinois Institute of Technology
2021
Washington State University
2021
Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm large-scale data. Our key contribution is significantly improving the prediction hitting rate (or accuracy) each point based on its nearby values along multiple dimensions. We derive series multilayer formulas their unified formula in context compression. One...
Today's scientific simulations require a significant reduction of the data size because extremely large volumes they produce and limitation storage bandwidth space. If compression is set to reach high ratio, however, reconstructed are often distorted too much tolerate. In this paper, we explore new strategy that can effectively control distortion when significantly reducing size. The contribution threefold. (1) We propose an adaptive framework select either our improved Lorenzo prediction...
Feature reduction is an important aspect of Big Data analytics on today’s ever-larger datasets. Rough sets are a classical method widely applied in attribute reduction. Most rough set algorithms use the <i>priori</i> domain knowledge dataset to process continuous attributes through using membership function. Neighborhood (NRS) replace function with concept neighborhoods, allowing NRS handle scenarios where no <i>a priori</i> available. However, neighborhood radius each object fixed,...
Let $G_{m \times n}$ be an $m n$ real random matrix whose elements are independent and identically distributed standard normal variables, let $\kappa_2(G_{m n})$ the 2-norm condition number of n}$. We prove that, for any \geq 2$, $n $x |n-m|+1$, satisfies ${\scriptsize \frac{1}{\sqrt{2\pi}}} ( { c }/{x} )^{|n-m|+1} < P{\scriptsize(\frac{\kappa_2(G_{m n})} {{n}/{(|n-m|+1)}} > x )} {\scriptsize C )^{|n-m|+1},$ where $0.245 \leq 2.000$ $5.013$ $\leq 6.414$ universal positive constants m, n, x....
Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Large supercomputers especially susceptible to soft because their large number components. can generally be detected offline through comparison final computation results two duplicated computations, this approach often introduces significant overhead. This paper presents Online-ABFT, simple efficient online error detection technique detect in widely used Krylov subspace iterative...
This paper presents a novel accelerated exact k-means called as "Ball k-means" by using the ball to describe each cluster, which focus on reducing point-centroid distance computation. The can exactly find its neighbor clusters for resulting computations only between point and clusters' centroids instead of all centroids. What's more, cluster be divided into "stable area" "active area", latter one is further some "annular area". assignment points in not changed while will adjusted within few...
Today's scientific simulations are producing vast volumes of data that cannot be stored and transferred efficiently because limited storage capacity, parallel I/O bandwidth, network bandwidth. The situation is getting worse over time the ever-increasing gap between relatively slow transfer speed fast-growing computation power in modern supercomputers. Error-bounded lossy compression becoming one most critical techniques for resolving big issue, it can significantly reduce volume while...
Today's scientific simulations require a significant reduction of data volume because extremely large amounts they produce and the limited I/O bandwidth storage space. Error-bounded lossy compression has been considered one most effective solutions to above problem. In practice, however, best-fit method often needs be customized or optimized in particular diverse characteristics different datasets various user requirements on quality performance. this paper, we address issue with novel...
Transformers have become keystone models in natural language processing over the past decade. They achieved great popularity deep learning applications, but increasing sizes of parameter spaces required by transformer generate a commensurate need to accelerate performance. Natural problems are also routinely faced with variable-length sequences, as word counts commonly vary among sentences. Existing frameworks pad sequences maximal length, which adds significant memory and computational...
Fail-stop failures in distributed environments are often tolerated by checkpointing or message logging. In this paper, we show that fail-stop process ScaLAPACK matrix-matrix multiplication kennel can be without It has been proved previous algorithm-based fault tolerance that, for multiplication, the checksum relationship input matrices is preserved at end of computation no mater which algorithm chosen. From final results, processor miscalculations detected, located, and corrected...
In today's high performance computing practice, fail-stop failures are often tolerated by checkpointing. While checkpointing is a very general technique and can be applied to wide range of applications, it introduces considerable overhead especially when computations reach petascale beyond. this paper, we show that, for many iterative methods, if the parallel data partitioning scheme satisfies certain conditions, methods themselves will maintain enough inherent redundant information accurate...
The probability that a failure will occur before the end of computation increases as number processors used in high performance computing application increases. For long running applications using large processors, it is essential fault tolerance be to prevent total loss all finished computations after failure. While checkpointing has been very useful tolerate failures for time, often introduces considerable overhead especially when modify amount memory between checkpoints and large. In this...
The existing noise detection methods required the classifiers or distance measurements data overall distribution, and `curse of dimensionality' other restrictions made them insufficiently effective in complex data, e.g., different attribute weights, high-dimensionality, containing feature noise, nonlinearity, etc. This is also main reason that filtering were not widely applied formed an learning framework. To address this problem, we propose here a complete efficient random forest method...
With ever-increasing volumes of scientific data produced by high-performance computing applications, significantly reducing size is critical because limited capacity storage space and potential bottlenecks on I/O or networks in writing/reading transferring data. SZ ZFP are two leading BSD licensed open source C/C++ libraries for compressed floating-point arrays that support high throughput read write random access. However, their performance not consistent across different sets fields some...
Because of the ever-increasing execution scale scientific applications, how to store extremely large volume data efficiently is becoming a serious issue. A significant reduction size can effectively mitigate I/O burden and save considerable storage space. Since lossless compressors suffer from limited compression ratios, error-controlled lossy have been studied for years. Existing compressors, however, focus mainly on absolute error bounds, which cannot meet users' diverse demands such as...
Efficient error-controlled lossy compressors are becoming critical to the success of today's large-scale scientific applications because ever-increasing volume data produced by applications. In past decade, many lossless and have been developed with distinct design principles for different datasets in largely diverse domains. order support researchers users assessing comparing a fair convenient way, we establish standard compression assessment benchmark -- Scientific Data Reduction Benchmark...
Today's extreme-scale high-performance computing (HPC) applications are producing volumes of data too large to save or transfer because limited storage space and I/O bandwidth. Error-bounded lossy compression has been commonly known as one the best solutions big science issue, it can significantly reduce volume with strictly controlled distortion based on user requirements. In this work, we develop an adaptive parameter optimization algorithm integrated a series strategies for SZ,...
As the number of processors in today's high performance computers continues to grow, mean-time-to-failure these are becoming significantly shorter than execution time many current computing applications. Although architectures usually robust enough survive node failures without suffering complete system failure, most applications can not and, therefore, whenever a fails, have abort themselves and restart from beginning or stable-storage-based checkpoint.This paper explores use floating-point...
Mitigating label noise is a crucial problem in classification. Noise filtering an effective method of dealing with which does not need to estimate the rate or rely on any loss function. However, most methods focus mainly binary classification, leaving more difficult counterpart multiclass classification relatively unexplored. To remedy this deficit, we present definition for setting and propose general framework novel learning Two examples complete random forest (mCRF) relative density, are...
This article presents a simple sampling method, which is very easy to be implemented, for classification by introducing the idea of random space division, called "random division sampling" (RSDS). It can extract boundary points as sampled result efficiently distinguishing label noise points, inner and points. makes it first general method that not only reduce data size but also enhance accuracy classifier, especially in label-noisy classification. The "general" means restricted any specific...
Pawlak rough set (PRS) and neighborhood (NRS) are the two most common theoretical models. Although PRS can use equivalence classes to represent knowledge, it is unable process continuous data. On other hand, NRSs, which data, rather lose ability of using knowledge. To remedy this deficit, article presents a granular-ball (GBRS) based on computing combining robustness adaptability computing. The GBRS simultaneously both NRS, enabling not only be able deal with data for knowledge...