- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Advanced Malware Detection Techniques
- Distributed and Parallel Computing Systems
- Security and Verification in Computing
- Seismic Imaging and Inversion Techniques
- Adversarial Robustness in Machine Learning
- Software Engineering Research
- Authorship Attribution and Profiling
- Anomaly Detection Techniques and Applications
- Cloud Data Security Solutions
- Scientific Computing and Data Management
- Distributed systems and fault tolerance
- Radiation Effects in Electronics
- Hydraulic Fracturing and Reservoir Analysis
- Hydrocarbon exploration and reservoir analysis
- Software Testing and Debugging Techniques
- Manufacturing Process and Optimization
- Ferroelectric and Negative Capacitance Devices
- Open Source Software Innovations
- Software System Performance and Reliability
- Simulation and Modeling Applications
- Psychopathy, Forensic Psychiatry, Sexual Offending
- Diamond and Carbon-based Materials Research
- Algorithms and Data Compression
China Aerospace Science and Industry Corporation (China)
2024
Rice University
2020-2022
Huazhong University of Science and Technology
2018
University of Wisconsin–Madison
2013-2017
Hess (United States)
2014
Binary code analysis is an enabling technique for many applications. Modern compilers and run-time libraries have introduced significant complexities to binary code, which negatively affect the capabilities of tool kits analyze may cause tools report inaccurate information about code. Analysts hence be confused applications based on these degrading quality. We examine problem constructing control flow graphs from labeling with accurate function boundary annotations. identified several...
As a crucial task in heterogeneous distributed systems, DAG-scheduling models scheduling application with set of tasks by Direct Acyclic Graph (DAG). The goal is to assign different processors so that the whole can finish as soon possible. Task Duplication-Based (TDB) scheme an important technique addressing this problem. main idea duplicate on multiple machines results duplicated are available trade computation time for communication time. Existing TDB algorithms enumerate and test all...
Code authorship information is important for analyzing software quality, performing forensics, and improving maintenance. However, current tools assume that the last developer to change a line of code its author regardless all earlier changes. This approximation loses information. We present two new line-level models overcome this limitation. first define repository graph as abstraction repository, in which nodes are commits edges represent development dependencies. Then each code,...
Binary code authorship identification is the task of determining authors a piece binary from set known authors. Modern software often contains multiple However, existing techniques assume that each program written by single author. We present new finer-grained technique to tougher problem author basic block. Our evaluation shows our can discriminate block with 52% accuracy among 282 authors, as opposed 0.4% random guess, and it provides practical solution for identifying in software.
GPGPUs are widely used in high-performance computing systems to accelerate scientific and machine learning workloads. Developing efficient GPU kernels is critically important obtain "bare-metal" performance on GPU-based clusters. In this paper, we describe the design implementation of GVPROF, first value profiler that pinpoints value-related inefficiencies applications running NVIDIA The novelty GVPROF resides its ability detect temporal spatial redundancies, which provides useful...
Coverage-guided fuzzing is one of the most effective solutions for vulnerability discovery. Among coverage-guided fuzzing, full-speed such as UnTracer, traces test cases only when they discover new coverage. Due to high expense tracing cases, fuzzers improve efficiency by coverage-increasing cases. However, existing fuzzer (i.e., UnTracer) based on basic block coverage, suffering a severe problem called edge collision. Moreover, neglect path frequency, which affects effectiveness. In this...
Developing efficient GPU kernels can be difficult because of the complexity architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at kernel level, if any. In this paper, we describe GPA, a advisor for NVIDIA GPUs that suggests potential code optimizations hierarchy levels, including individual lines, loops, functions. To relieve users burden interpreting counters analyzing bottlenecks, GPA uses data flow analysis to approximately attribute...
General-purpose GPUs have become common in modern computing systems to accelerate applications many domains, including machine learning, high-performance computing, and autonomous driving. However, inefficiencies abound GPU-accelerated applications, which prevent them from obtaining bare-metal performance. Performance tools play an important role understanding performance complex code bases. Many GPU pinpoint time-consuming provide high-level insights but overlook one issue---value-related...
Binary code authorship identification determines authors of a binary program. Existing techniques have used supervised machine learning for this task. In paper, we look problem from an attacker's perspective. We aim to modify test binary, such that it not only causes misprediction but also maintains the functionality original input binary. Attacks against are intrinsically more difficult than attacks domains as computer vision, where attackers can change each pixel image independently and...
While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of techniques work well high-order stencils, such as those used seismic imaging. In this paper, we study practical imaging computations using large domains with meaningful boundary conditions. We manually crafted a collection implementations 25-point modeling stencil CUDA along code to apply evaluated our shapes, memory hierarchy usage, data-fetching patterns, and other...
The US Department of Energy's fastest supercomputers and forthcoming exascale systems employ Graphics Processing Units (GPUs) to increase the computational performance compute nodes. However, complexity GPU architectures makes tailoring sophisticated applications achieve high on GPU-accelerated a major challenge. At best, prior tools for code only provide coarse-grained tuning advice at kernel level. In this article, we describe GPA, advisor that suggests potential optimizations hierarchy...
Summary Finite‐difference methods based on high‐order stencils are commonly used for modeling of seismic wave propagation, weather forecasting, computational fluid dynamics, convolutional neural networks, and others. Nowadays, the community employs graphics processing units (GPUs) to accelerate such stencil computations. As a result, knowing how write efficient computations GPUs is significant interest. While high‐performance, low‐order have been studied extensively in literature, not all...
Binary rewriting has been widely used in software security, correctness assessment, performance analysis, and debugging. One approach for binary lifts the to IR then regenerates a new one, which achieves near-to-zero runtime overhead, but relies on several limiting assumptions binaries achieve complete analysis perform lifting. Another patches individual instructions without utilizing any great reliability as it does not make about binary, incurs prohibitive overhead.
Binary code analysis is widely used to help assess a program's correctness, performance, and provenance. applications often construct control flow graphs, analyze data flow, use debugging information understand how machine relates source lines, inlined functions, types. To date, binary has been single-threaded, which too slow for convenient in performance tuning workflows where it attribute complex with large binaries.
As we near the end of Moore's law scaling, next-generation computing platforms are increasingly exploring heterogeneous processors for acceleration. Graphics Processing Units (GPUs) most widely used accelerators. Meanwhile, applications evolving by adopting new programming models and algorithms emerging platforms. To harness full power GPUs, performance tools serve a critical role in understanding tuning application performance, especially those that involve complex executions spanning both...
A trusted execution environment (TEE) such as Intel Software Guard Extension (SGX) runs a remote attestation to prove data owner the integrity of initial state an enclave, including program operate on her data. For this purpose, data-processing is supposed be open owner, so its functionality can evaluated before trust established. However, increasingly there are application scenarios in which itself needs protected. So compliance with privacy policies expected by should verified without...
The static instrumentation of machine code, also known as binary rewriting, is a power technique, but suffers from high runtime overhead compared to compiler-level instrumentation. Recent research has shown that tools can achieve near-to-zero when rewriting binaries (excluding the application specific instrumentation). However, users often have difficulties in understanding why their slow and how optimize We are inspired by traditional program optimization workflow, where one profile...
MapReduce is a famous programming model for processing large data sets on clusters of computers. However, pre-stack depth migration (PreSDM), which classic imaging method in geophysical domain, it shows great inadaptation due to the computational characteristics problem. In this paper, an improved introduced especially designed Kirchhoff PreSDM and Data-ware scheduling policy considered maximized utilization I/O capacities GPU grid. The implemented framework tested actual examples results...
Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy's forthcoming exascale systems. Since execution model GPUs differs from that conventional processors, applications need to be rewritten exploit GPU parallelism. Performance tools are needed such GPU-accelerated systems help developers assess how well offload computation onto GPUs.In this paper, we describe extensions Rice University's...
P304 Converted-Wave Velocity Analysis in the Presence of Anisotropy – A Case Study from Shengli Oilfield China Z. Qian* (British Geological Survey) X. Li Meng (SinoPec Oilfield) & L. Bi SUMMARY The Ken-71 multi-component seismic data were acquired with digital MEMS (micro-electromechanical system) sensors over a mixed sand and shale sequence overburden. This gives rise to serious non-hyperbolic moveout effects converted-wave due both asymmetry raypath anisotropic effects. Conventional...