- Parallel Computing and Optimization Techniques
- Interconnection Networks and Systems
- Distributed and Parallel Computing Systems
- Embedded Systems Design Techniques
- Advanced Data Storage Technologies
- Distributed systems and fault tolerance
- Supercapacitor Materials and Fabrication
- Software Engineering Research
- Cloud Computing and Resource Management
- Software Testing and Debugging Techniques
- Real-Time Systems Scheduling
- Graph Theory and Algorithms
- Software System Performance and Reliability
- Stochastic Gradient Optimization Techniques
- Advanced Software Engineering Methodologies
- Logic, programming, and type systems
- Advanced Neural Network Applications
- Matrix Theory and Algorithms
- Time Series Analysis and Forecasting
- Neural Networks and Applications
- VLSI and Analog Circuit Testing
- Advanced Memory and Neural Computing
- Machine Learning and Data Classification
- Network Packet Processing and Optimization
- Anomaly Detection Techniques and Applications
Beijing Normal University
2024-2025
Beijing Institute of Technology
2015-2024
Henan University
2007
Hebei University of Architecture
2006
Extracting local variable refactoring is frequently employed to replace one or more occurrences of a complex expression with simple accesses newly introduced variable. To facilitate the refactoring, most IDEs can automate extract refactorings when to-be-extracted expressions are selected by developers. However, tools usually all that lexically identical without comprehensive analysis safety refactoring. The automatically conducted may lead serious software defects. Besides that, existing...
Sparse Matrix-Vector Multiplication (SpMV) kernel dominates the computing cost in numerous scientific applications. Many implementations based on different sparse formats were proposed recently for this GPU side. Since performance of these varies significantly according to sparsity characteristics input matrix and hardware specifications, no one them can be considered as best use every matrix. In paper, we address problem selecting representation a given by using machine learning approach....
Researchers have recently achieved significant advances in deep learning techniques, which turn has substantially advanced other research disciplines, such as natural language processing, image speech recognition, and software engineering. Various techniques been successfully employed to facilitate engineering tasks, including code generation, refactoring, fault localization. Many papers also presented top conferences journals, demonstrating the applications of resolving various tasks....
Natural languages are “natural” in that texts natural repetitive and predictable. Recent research indicates programming share similar characteristics (naturalness), with source code displaying patterns of repetition predictability. Notably, studies have shown buggy deviates from these is significantly less than bug-free one. In this paper, we conduct a large-scale extensive empirical study to investigate whether defects lead unnaturalness code. Different existing studies, leverage multiple...
Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various contributions. However, comprehensive and systematic literature survey that introduces, analyzes, discusses, summarizes advancements of recent years currently lacking. Aiming fill this gap, paper compares existing techniques analyzes their strengths weaknesses. We...
The Sparse Matrix-Vector Multiplication (SpMV) kernel dominates the computing cost in numerous scientific applications. Many implementations based on different sparse formats were proposed to improve this recent GPU architectures. However, it has been widely observed that there is no “best-for-all” format for SpMV GPU. Indeed, serious performance degradation of an order magnitude can be without a careful selection use. To address problem, we propose article BestSF (Best Format), new...
A real scalable triplet-based computer architecture TriBA is proposed in this paper. an object-oriented chip multi-processor that supports truly parallel execution of objects from hardware. Cores on the same are connected via hierarchical interconnection network (THIN), which has simple topology and computing locality characteristic. distributed deterministic routing algorithm (DDRA) elaborated, already for THIN. Runtime mapped to processor cores according their coupling degree can also...
Determinism is an appealing property for parallel programs, as it simplifies understanding, reasoning and debugging. It particularly in dynamic (scripting) languages, where ease of programming a dominant design goal. Some existing languages use the type system to enforce determinism statically, but this not generally practical languages. In paper, we describe how can be obtained---and dynamically enforced/verified---for appropriate extensions scripting language. Specifically, introduce...
With the exponential growth in continuous data streams, real time streaming processing has been gaining a lot of popularity. Spark Streaming is one open source frameworks for reliable, high-throughput and low latency stream processing. Though it near framework running on commodity hardware, event not guaranteed its scheduling system. Profiling results indicate that total delay events with unstable inputs more volatile presents big fluctuations. In this paper, we propose simple, yet effective...
Streaming computing attracts intense attention because of the demand for massive data analyzing in real-time. Due to unbounded and continuous input, volume streaming is so high that all cannot be permanently stored. Piecewise polynomial fitting a popular compression method approximately represents raw stream with multiple polynomials. The coefficients corresponding best-fitting curve can calculated by least squares, which minimizes sum squared residuals between observed fitted values....
Sparse matrix–vector multiplication (SpMV) kernel dominates the computing cost in numerous applications. Most of existing studies dedicated to improving this have been targeting just one type processing units, mainly multicore CPUs or graphics units (GPUs), and not explored potential recent, rapidly emerging, CPU-GPU heterogeneous platforms. To take full advantage these systems, input sparse matrix has be partitioned on different available units. The partitioning problem is more challenging...
Sparse Matrix-Vector Multiplication (SpMV) kernel dominates the computing cost in numerous scientific applications. Many implementations based on different sparse formats were proposed recently for optimizing this GPU side. Since performance of SpMV varies significantly according to sparsity characteristics input matrix and hardware features, developing an accurate model is a challenging task. The traditional approach building such models by analytical modeling difficult practice requires...
AMG is one of the most efficient and widely used methods for solving sparse linear systems. The computational process mainly consists a series iterative calculations generalized matrix-matrix multiplication (SpGEMM) matrix-vector (SpMV). Optimizing these matrix crucial accelerating In this paper, we first focus on optimizing SpGEMM algorithm in AmgX, popular library GPUs. We propose new called SpGEMM-upper, which achieves an average speedup 2.02× Tesla V100 1.96× RTX 3090 against original...
Sparse matrix-vector multiplication (SpMV) is one of the important kernels many iterative algorithms for solving sparse linear systems. The limited storage and computational resources individual GPUs restrict both scale speed SpMV computing in problem-solving. As real-world engineering problems continue to increase complexity, imperative collaborative execution across multiple increasingly apparent. Although multi-GPU-based takes less kernel time, it also introduces additional data...
On-chip communication architectures can have a great influence on the speed and area of multi-core processor (MCP) designs. A new chip design paradigm called network-on-chip (NOC) offers promising interconnection architectural choice for future MCP. on-chip network named Triple-based Hierarchical Interconnection Network (THIN) is proposed that aims to decrease node degree, reduce links shorten diameter. The topology THIN very simple it has obviously hierarchical, symmetric scalable...
Triple-based hierarchical interconnection network (THIN) is not only a new kind of direct networks but also HIN’s. Efficient routing algorithm very essential to the performance and parallel computing system. This paper presents DDRA (Distributed Deterministic Routing Algorithm) in triple-based network. Fully applying characteristic network, just uses node address determine an approximate minimal path between source destination node, without constructing route table on each node. The analysis...
Scratchpad Memory (SPM) is a fast and small software-managed SRAM. Its current extensive uses in embedded processors are motivated by the advantages of power saving, area low access time compared with cache. However, existing SPM management methods depend heavily on profiling compilers. The dependence compiler also makes applications hard to transplant. This paper presents novel strategy manage scratchpad memory without support. Based reference locality theory, hardware random sampling...