NFDI4DS | UHH-SEMS - Publication Details

Xingfu Wu

ORCID: 0000-0001-8150-5171

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5072646801

Research Areas

Parallel Computing and Optimization Techniques
Distributed and Parallel Computing Systems
Advanced Data Storage Technologies
Cloud Computing and Resource Management
Interconnection Networks and Systems
Probabilistic and Robust Engineering Design
Software System Performance and Reliability
Software Engineering Research
Lattice Boltzmann Simulation Studies
Computational Physics and Python Applications
Machine Learning and Data Classification
Automotive and Human Injury Biomechanics
Metaheuristic Optimization Algorithms Research
Algorithms and Data Compression
earthquake and tectonic studies
Low-power high-performance VLSI design
Structural Response to Dynamic Loads
Artificial Intelligence in Games
Evolutionary Algorithms and Applications
Logic, programming, and type systems
Embedded Systems Design Techniques
Distributed systems and fault tolerance
Caching and Content Delivery
Big Data Technologies and Applications
Anomaly Detection Techniques and Applications

Hunan University
2019-2025

Argonne National Laboratory
2003-2024

University of Chicago
1991-2023

Yunnan Academy of Agricultural Sciences
2018

Texas A&M University
2006-2016

Mitchell Institute
2004-2016

Northwestern University
2000-2003

Beihang University
1998-2002

Louisiana State University
1999-2000

Institute of Computing Technology
2000

Industrial multivariate time-series data anomaly detection incorporating attention mechanisms and adversarial training

OPENALEX - Publications

Wenjie Yang Wenchao Chu Xingfu Wu Lianlin Zhou Ao Wang and 2 more

10.1080/0951192x.2025.2452985 article EN International Journal of Computer Integrated Manufacturing 2025-01-20

SAGIPS: A Physics-Inspired Scalable Asynchronous Generative Inverse-Problem Solver

OPENALEX - Publications

D. I. Lersch Malachi Schram Zhenyu Dai Kishansingh Rajput N. Sato and 3 more

Abstract Solving large scale inverse-problems using deep-learning algorithms have become an essential part of modern research and industrial applications. The complexity the underlying inverse problem may require utilization high performance computing systems which poses a challenge on algorithmic design solver. Most deep learning require, due to their design, custom parallelization techniques in order be resource efficient while showing reasonable convergence. In this paper we introduce...

10.1088/2632-2153/adc8fb article EN cc-by Machine Learning Science and Technology 2025-04-03

Prophesy

OPENALEX - Publications

Valerie Taylor Xingfu Wu Rick Stevens

Performance is an important issue with any application, especially grid applications. Efficient execution of applications requires insight into how the system features impact performance This generally results from significant experimental analysis and possibly development models. paper present Prophesy system, for which novel component model development. In particular, this discusses use our coupling parameter (i.e., a metric that attempts to quantify interaction between kernels compose...

10.1145/773056.773060 article EN ACM SIGMETRICS Performance Evaluation Review 2003-03-01

Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems

OPENALEX - Publications

Charles Lively Xingfu Wu Valerie Taylor Shirley Moore Hung-Ching Chang and 2 more

10.1007/s00450-011-0190-0 article EN Computer Science - Research and Development 2011-08-30

Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems

OPENALEX - Publications

Charles Lively Xingfu Wu Valerie Taylor Shirley Moore Hung-Ching Chang and 1 more

Energy consumption is a major concern with high-performance multicore systems. In this paper, we explore the energy and performance (execution time) characteristics of different parallel implementations scientific applications. particular, experiments focus on message-passing interface (MPI)-only versus hybrid MPI/OpenMP for NAS (NASA Advanced Supercomputing) BT (Block Tridiagonal) benchmark (strong scaling), Lattice Boltzmann application Gyrokinetic Toroidal Code — GTC (weak as well central...

10.1177/1094342011414749 article EN The International Journal of High Performance Computing Applications 2011-08-01

Using Performance-Power Modeling to Improve Energy Efficiency of HPC Applications

OPENALEX - Publications

Xingfu Wu Valerie Taylor Jeanine Cook Philip J. Mucci

Energy-efficient scientific applications require insight into how high-performance computing system features impact the applications' power and performance. This results from development of performance models. When used with an earthquake simulation aerospace application, a proposed modeling framework reduces energy consumption by up to 48.65 percent 30.67 percent, respectively.

10.1109/mc.2016.311 article EN Computer 2016-10-01

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

OPENALEX - Publications

Xingfu Wu Prasanna Balaprakash Michael Kruse Jaehoon Koo Brice Videau and 5 more

ABSTRACT As we enter the exascale computing era, efficiently utilizing power and optimizing performance of scientific applications under energy constraints has become critical challenging. We propose a low‐overhead autotuning framework to autotune for various hybrid MPI/OpenMP at large scales explore tradeoffs between application runtime power/energy efficient execution, then use this four ECP proxy applications—XSBench, AMG, SWFFT, SW4lite. Our approach uses Bayesian optimization with...

10.1002/cpe.8322 article EN cc-by-nc-nd Concurrency and Computation Practice and Experience 2024-10-30

Performance projection of HPC applications using SPEC CFP2006 benchmarks

OPENALEX - Publications

Sameh Sharkawi Don DeSota Raj Panda Rajeev Indukuru Stephen Stevens and 2 more

Performance projections of high performance computing (HPC) applications onto various hardware platforms are important for vendors and HPC users. The aid in the design future systems, enable them to compare application across different existing help users with system procurement refinements. In this paper, we present a method projecting node level using published data industry standard benchmarks, SPEC CFP2006, counter from one base machine. particular, project eight four utilizing...

10.1109/ipdps.2009.5161057 article EN 2009-05-01

E-AMOM: an energy-aware modeling and optimization methodology for scientific applications

OPENALEX - Publications

Charles Lively Valerie Taylor Xingfu Wu Hung-Ching Chang Chun-Yi Su and 3 more

10.1007/s00450-013-0239-3 article EN Computer Science - Research and Development 2013-07-24

Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

OPENALEX - Publications

Xingfu Wu Valerie Taylor

10.1016/j.jcss.2013.02.005 article EN publisher-specific-oa Journal of Computer and System Sciences 2013-03-13

Integrating ytopt and libEnsemble to autotune OpenMC

OPENALEX - Publications

Xingfu Wu John Tramm Jeffrey Larson John-Luke Navarro Prasanna Balaprakash and 5 more

Ytopt is a Python machine-learning-based autotuning software package developed within the ECP PROTEAS-TUNE project. The ytopt adopts an asynchronous search framework that consists of sampling small number input parameter configurations and progressively fitting surrogate model over input-output space until exhausting user-defined maximum evaluations or wall-clock time. libEnsemble toolkit for coordinating workflows dynamic ensembles calculations across massively parallel resources PETSc/TAO...

10.1177/10943420241286476 article EN The International Journal of High Performance Computing Applications 2024-10-07

Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

OPENALEX - Publications

Xingfu Wu Valerie Taylor

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm hybrid programs, whereby OpenMP can be used data sharing multicores that comprise node MPI communication between nodes. In this paper, we use SP BT benchmarks of NPB 3.3 as basis comparative approach to implement MPI/OpenMP versions BT. particular, compare performance counterparts on large-scale...

10.1145/1964218.1964228 article EN ACM SIGMETRICS Performance Evaluation Review 2011-03-29

Using kernel couplings to predict parallel application performance

OPENALEX - Publications

Valerie Taylor Xingfu Wu Jonathan Geisler Rick Stevens

Performance models provide significant insight into the performance relationships between an application and system used for execution. The major obstacle to developing is lack of knowledge about different functions that compose application. This paper addresses issue by using a coupling parameter, which quantifies interaction kernels, develop predictions. results, three NAS parallel benchmarks, indicate predictions parameter were greatly improved over traditional technique summing execution...

10.1109/hpdc.2002.1029910 article EN 2003-06-25

SKOPE

OPENALEX - Publications

Jiayuan Meng Xingfu Wu Vitali Morozov Venkatram Vishwanath Kalyan Kumaran and 1 more

Understanding workload behavior plays an important role in performance studies. The growing complexity of applications and architectures has increased the gap among application developers, engineers, hardware designers. To reduce this gap, we propose SKOPE, a SKeleton framework for Performance Exploration, that produces descriptive model about semantic workload, which can infer potential transformations help users understand how workloads may interact with adapt to emerging hardware. SKOPE...

10.1145/2597917.2597928 article EN 2014-05-20

Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems

OPENALEX - Publications

Xingfu Wu Valerie Taylor Charles Lively Sameh Sharkawi

Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs being configured in a hierarchical manner to compose node cluster system. A major challenge be addressed is efficient use of such systems large-scale scientific applications. In this paper, we quantify the gap resulting from using different number processors per node; information provide baseline amount optimization needed when all on CMP clusters. We conduct detailed analysis identify how...

10.1109/icpp-w.2008.21 article EN Proceedings - International Workshops on Parallel Processing 2008-09-01

Performance, Energy, and Scalability Analysis and Improvement of Parallel Cancer Deep Learning CANDLE Benchmarks

OPENALEX - Publications

Xingfu Wu Valerie Taylor Justin M. Wozniak Rick Stevens Thomas Brettin and 1 more

Training scientific deep learning models requires the significant compute power of high-performance computing systems. In this paper, we analyze performance characteristics benchmarks from exploratory research project CANDLE (Cancer Distributed Learning Environment) with a focus on hyperparameters epochs, batch sizes, and rates. We present parallel methodology that uses distributed framework Horovod to parallelize benchmarks. then use scaling strategies for both epochs size linear rate...

10.1145/3337821.3337905 article EN 2019-07-25

Autotuning PolyBench benchmarks with LLVM Clang/Polly loop optimization pragmas using Bayesian optimization

OPENALEX - Publications

Xingfu Wu Michael Kruse Prasanna Balaprakash Hal Finkel Paul Hovland and 2 more

Abstract We develop a ytopt autotuning framework that leverages Bayesian optimization to explore the parameter space search and compare four different supervised learning methods within evaluate their effectiveness. select six of most complex PolyBench benchmarks apply newly developed LLVM Clang/Polly loop pragmas optimize them. then use pragma parameters improve performance. The experimental results show our approach outperforms other compiling provide smallest execution time for syr2k,...

10.1002/cpe.6683 article EN Concurrency and Computation Practice and Experience 2021-11-08

Prophesy: an infrastructure for analyzing and modeling the performance of parallel and distributed applications

OPENALEX - Publications

Valerie Taylor Xingfu Wu Jonathan Geisler X. Li Zhiling Lan and 3 more

Efficient execution of applications requires insight into how the system features impact performance application. For distributed systems, task gaining this is complicated by complexity features. This generally results from significant experimental analysis and possibly development models. paper presents Prophesy project, an infrastructure that aids in needed based upon experience. The core component a relational database allows for recording data, application details.

10.1109/hpdc.2000.868668 article EN 2002-11-07

Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

OPENALEX - Publications

Xingfu Wu Valerie Taylor

Journal Article Performance Characteristics of Hybrid MPI/OpenMP Implementations NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters Get access Xingfu Wu, Wu * 1Department Computer Science Engineering, Texas A&M University, College Station, TX 77843, USA *Corresponding author: wuxf@cse.tamu.edu Search for other works by this author on: Oxford Academic Google Scholar Valerie Taylor The Journal, Volume 55, Issue 2, February 2012, Pages 154–167,...

10.1093/comjnl/bxr063 article EN The Computer Journal 2011-07-18

An inverse reconstruction approach considering uncertainty and correlation for vehicle-vehicle collision accidents

OPENALEX - Publications

Qiming Liu Jie Liu Xingfu Wu Xu Han Lixiong Cao and 1 more

10.1007/s00158-019-02231-9 article EN Structural and Multidisciplinary Optimization 2019-03-19

Utilizing Hardware Performance Counters to Model and Optimize the Energy and Performance of Large Scale Scientific Applications on Power-Aware Supercomputers

OPENALEX - Publications

Xingfu Wu Valerie Taylor

Hardware performance counters are used as effective proxies to estimate power consumption and runtime. In this paper we present a counter-based modeling optimization method, use the method model four metrics: runtime, system power, CPU memory power. The that compose models explore some counter-guided optimizations with two large-scale scientific applications: an earthquake simulation aerospace application. We demonstrate of using power-aware supercomputers, Mira at Argonne National...

10.1109/ipdpsw.2016.78 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2016-05-01

Toward an End-to-End Auto-tuning Framework in HPC PowerStack

OPENALEX - Publications

Xingfu Wu Aniruddha Marathe Siddhartha Jana Ondřej Vysocký Jophin John and 5 more

Efficiently utilizing procured power and optimizing performance of scientific applications under energy constraints are challenging. The HPC PowerStack defines a software stack to manage high-performance computing systems standardizes the interfaces between different components stack. This survey paper presents findings working group focused on end-to-end tuning PowerStack. First, we provide background layer-specific efforts in terms their high-level objectives, optimization goals,...

10.1109/cluster49012.2020.00068 article EN 2020-09-01

MuMMI: Multiple Metrics Modeling Infrastructure

OPENALEX - Publications

Xingfu Wu Charles Lively Valerie Taylor Hung-Ching Chang Chun-Yi Su and 4 more

10.1109/snpd.2013.73 article EN 2013-07-01

Coming Soon ...