- Parallel Computing and Optimization Techniques
- Lattice Boltzmann Simulation Studies
- Advanced Data Storage Technologies
- Software-Defined Networks and 5G
- Advanced Numerical Methods in Computational Mathematics
- Network Security and Intrusion Detection
- Distributed and Parallel Computing Systems
- Advanced Malware Detection Techniques
- Fluid Dynamics and Turbulent Flows
- Software System Performance and Reliability
- Model Reduction and Neural Networks
- Polymer Nanocomposites and Properties
- Aerosol Filtration and Electrostatic Precipitation
- Cavitation Phenomena in Pumps
- Algorithms and Data Compression
- Computer Graphics and Visualization Techniques
- Matrix Theory and Algorithms
- Fluid Dynamics and Vibration Analysis
- DNA and Biological Computing
- Cloud Computing and Resource Management
- Computational Fluid Dynamics and Aerodynamics
- Fluid Dynamics Simulations and Interactions
- Recommender Systems and Techniques
- Stochastic Gradient Optimization Techniques
- Diamond and Carbon-based Materials Research
National University of Defense Technology
2013-2025
National Supercomputing Center of Tianjin
2024
Physics-informed neural networks (PINNs) have emerged as a popular approach in scientific machine learning for solving both forward and inverse problems of partial differential equations (PDEs). However, complex physical systems are often characterized by parameters, such viscosity Reynolds number fluid dynamics, which pose significant challenges parameterized PDE solutions. The inherent limitations PINNs include the need repeated time-consuming training under varying parameter conditions,...
The availability of microservice systems is critical to business operations and corporate reputation. However, the dynamics complexity introduce significant challenges performance issue diagnosis large-scale systems. After investigating hundreds real-world cases in Tencent, we find that previous troubleshooting approaches fail accurately localize root causes because they overlook inconsistency between causality calling relationships. Therefore, propose a novel approach, MicroDig, diagnose...
Greedy algorithm is one of the important point selection methods in radial basis function based mesh deformation. However, large-scale mesh, conventional greedy will generate expensive time consumption and result performance penalties. To accelerate computational procedure selection, a block iteration with parallelization method proposed this paper. By method, complexities three steps are all reduced from O ( n 3 ) to 2 . In addition, two separates boundary points into sub-cores, efficiently...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using single unified programming interface language. But guaranteeing portability relies heavily on platform-specific implementations. In this paper, we provide an implementation ARMv8 multi-core which efficiently maps the generic platform model to architecture. With implementation, first characterize maximum achieved arithmetic throughput memory accessing bandwidth architecture, measure...
The overset grid method is widely employed to solve moving boundary problems in numerical simulations. However, the heavy and inevitable communication resulting from movements severely impedes improvement of parallel efficiency. This paper proposes a Motion Trace Decomposition (MTD) alleviate this issue. MTD minimizes overhead between processors by decomposing sub-grids distributing them according object motion trajectory, negating need reproduce areas when boundaries move. Various tests...
<title>Abstract</title> Background: The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make existing alignment tools inapplicable. Additionally, a single CPU's performance bottleneck restricts the effectiveness algorithms for SMRT sequencing. Methods: To address these challenges, we introduce...
With the increasing of number CPU cores, thousands cores are used in current supercomputers. The MPIJOpenMP hybrid programming model is popular multicore systems. Some serial codes pure MPI programs turn to bottleneck and easy be neglected when these ported model. In Linpack benchmark, we focus on local swap algorithm present an OpenMP optimization method speedup performance using multi-thread. On a cluster system with 36 multi-core CPUs, experiment results show that this can decrease time...
C++ allows reinterpretation of memory objects via type casting, which facilitates easier manipulation class fields and virtual methods inside the hierarchy. However, misinterpretation objects, is called confusion, can result in illegal access or methods. Type confusion accounts for many security vulnerabilities programs written C++. Previous detection techniques report a bug when an object parent casted to child class. downcast safe as long no are accessed. This paper presents Harmless...
The Overset Grid method is a promising computational approach for tackling the challenging moving boundary problems in Computational Fluid Dynamics (CFD) simulations. efficiency and accuracy of are critically dependent on effectiveness Assembly (OGA) process. However, OGA process plagued by unavoidable issues load imbalance communication overheads, which adversely impact parallel method, particularly when dealing with sub-grids motion. This paper proposes an improved assembly as effective...
Designing fast singular value decomposition (SVD) is significantly interesting in applications. The random direct SVD (RSVD) has provided a scheme to compute the well-approximate by unilateral randomized sampling. In this paper, we present an efficient algorithm bilateral sampling way. We also prove that proposed algorithms can be bounded well and have less computational complexity compared RSVD when objective matrix approximately square. Numerical experiments on graph Laplacian Hilbert...
LBM can conveniently deal with the interaction between fluid and solid. Thus, it is widely used in numerical simulation of multi-physics applications. Based on domestic processor FT-2000, Rayleigh-Bnard convection carried out. Through performance tests, found that function call overhead process occupies nearly execution time. By expanding reorganizing data structure, time for 104 timestep declines from 9236s to 2754s, has improved about .
In this paper, we investigate the scalability of OpenFOAM for viscoelastic solver which is implemented in our previous study on HPC platforms.Results show that scales reasonably well up to 256 cores.Further profiling shows greatly restricted by global reduction, introduced numerous scalar product operations parallel PCG algorithm OpenFOAM.