- Parallel Computing and Optimization Techniques
- Cloud Computing and Resource Management
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Graph Theory and Algorithms
- Wireless Communication Security Techniques
- Cooperative Communication and Network Coding
- Chemical Synthesis and Analysis
- Low-power high-performance VLSI design
- Advanced Malware Detection Techniques
- Network Packet Processing and Optimization
- Algorithms and Data Compression
- Ferroelectric and Negative Capacitance Devices
- Interconnection Networks and Systems
- Numerical Methods and Algorithms
- Innovation and Knowledge Management
- Urban Transport and Accessibility
- Advanced Wireless Communication Technologies
- Access Control and Trust
- Software System Performance and Reliability
- Advanced Energy Technologies and Civil Engineering Innovations
- Advanced Image Processing Techniques
- Energy Efficient Wireless Sensor Networks
- Computer Graphics and Visualization Techniques
- Supramolecular Chemistry and Complexes
National University of Defense Technology
2009-2023
North China Electric Power University
2017
China Electric Power Research Institute
2017
Beijing University of Posts and Telecommunications
2016-2017
Delft University of Technology
2012-2015
Yangzhou University
2006-2007
OpenCL and OpenMP are the most commonly used programming models for multi-core processors. They also fundamentally different in their approach to parallelization. In this paper, we focus on comparing performance of OpenMP. We select three applications from Rodinia benchmark suite (which provides equivalent implementations), carry out experiments with datasets platforms. see that incorrect usage CPUs, inherent fine-grained parallelism, immature compilers main reasons lead poorer performance....
Heterogeneous platforms composed of multi-core CPUs and different types accelerators, like GPUs Xeon Phi, are becoming popular for data parallel applications. The heterogeneity the hardware mix diversity applications pose significant challenges to exploiting such platforms. In this situation, an effective workload partitioning between processing units is critically important improving application performance. This a function capabilities as well dataset be used. work, we present systematic...
With its design concept of cross-platform portability, OpenCL can be used not only on GPUs (for which it is quite popular), but also CPUs. Whether porting GPU programs to CPUs, or simply writing new code for using brings up the performance issue, usually raised in one two forms: "OpenCL portable!" "Why CPUs after all?!". We argue that both issues addressed by a thorough study factors impact This analysis focus this paper. Specifically, starting from main architectural mismatches between...
Heterogeneous platforms integrating different processors like GPUs and multi-core CPUs become popular in high performance computing. While most applications are currently using the homogeneous parts of these platforms, we argue that there is a large class can benefit from their heterogeneity: massively parallel imbalanced applications. Such emerge, for example, variable time step based numerical methods simulations. In this paper, present Glinda, framework accelerating on heterogeneous...
Although GPUs are considered ideal to accelerate massively data-parallel applications, there still exceptions this rule. For example, imbalanced applications cannot be efficiently processed by GPUs: despite the massive data parallelism, a varied computational workload per point remains GPU-unfriendly. To process we exploit use of heterogeneous platforms (GPUs and CPUs) partitioning fit usage patterns processors. In work, present our flexible adaptive method that predicts optimal...
GPUs are widely used to accelerate data-parallel applications. However, while the GPU processing capability is enhanced in each generation, CPU computing power also increased by adding more cores and widening vector units. Compared rapid development of CPUs, bandwidth data transfer between host grows much slower, resulting a data-transfer wall for using GPUs. In this situation, choosing right mix hardware resources - i.e., The configuration critically important improving application...
This article presents the Graph Algorithm Repository for Designing Next-generation Accelerators (GARDENIA), a benchmark suite studying irregular graph algorithms on massively parallel accelerators. Applications with limited control and data irregularity are main focus of existing generic benchmarks accelerators, while available processing do not apply state-of-the-art and/or optimization techniques. GARDENIA includes emerging workloads from analytics, sparse linear algebra, machine-learning...
Recent parallel architectures are equipped with local memory, which simplifies hardware design at the cost of increased program complexity due to explicit management. To simplify this extra-burden that programmers have, we introduce an easy-to-use API, ELMO, improves productivity while preserving high performance memory operations. Specifically, ELMO is a generic API covers different use-cases. We also present prototype implementations for these APIs and perform multiple GPU-inspired...
Real-time stereo matching, which is important in many applications like self-driving cars and 3-D scene reconstruction, requires large computation capability high memory bandwidth. The most time-consuming part of stereo-matching algorithms the aggregation information (i.e. costs) over local image regions. In this paper, we present a generic representation suitable implementations for three commonly used cost aggregators on many-core processors. We perform typical optimizations kernels, leads...
Detecting strongly connected components (SCC) has been broadly used in many real-world applications. To speedup SCC detection for large-scale graphs, parallel algorithms have proposed to leverage modern GPUs. Existing GPU implementations are able get on synthetic graph instances, but show limited performance when applied datasets. In this paper, we present a implementation GPUs that achieves high both and graphs. We use hybrid method divides the algorithm into two phases. Our is dynamically...
Heterogeneous platforms are mixes of different processing units. The key factor to their efficient usage is workload partitioning. Both static and dynamic partitioning strategies have been defined in previous work, but applicability performance differ significantly depending on the application execute. In this paper, we propose an application-driven method select best strategy for a given workload. To end, define classification based kernel structure -- i.e., number kernels execution flow....
Accelerator-based platforms are heterogeneous in nature, yet most applications avoid heterogeneity, and focus on acceleration alone. Platform-level heterogeneity can bring significant performance improvement, as it essentially means using additional resources for the same computation. But is gained these worth effort to program deploy applications? In this work, we present a taxonomy of existing programming models tools available computing with accelerators, give examples systems fitting...
Media access control (MAC) protocols of wireless sensor networks (WSNs) must minimize the radio energy costs in nodes. Latency and throughput are also important design features for MAC current WSNs applications. But most them cannot guarantee quality real-time traffic. This paper studies state art protocols, then introduces a medium protocol that provides multiple priority levels. The channel is accessed by sensors according to their priorities. Sensors send frames round manner with same...
This paper introduces an extensible distributed file system framework, YaFS, using heterogeneous online storage services as its back-ends. It provides a configurable solution for simplifying the usage of multiple resources and accessing data ubiquitously safely. YaFS is POSIX compliant, so that it could support most existing applications seamlessly. An offline mode used to cope with challenged unreliable network environment. We implement abstraction layer plug-in mechanism uniformly...
This paper presents the Graph Analytics Repository for Designing Next-generation Accelerators (GARDENIA), a benchmark suite studying irregular algorithms on massively parallel accelerators. Existing generic benchmarks accelerators have mainly focused high performance computing (HPC) applications with limited control and data irregularity, while available graph analytics do not apply state-of-the-art and/or optimization techniques. GARDENIA includes emerging in big-data machine learning...
Heterogeneous platforms are mixes of different processing units in a compute node (e.g., CPUs+GPUs, CPU+MICs) or chip package APUs). This type keeps gaining popularity various computer systems ranging from supercomputers to mobile devices. In this context, improving their efficiency and usability has become increasingly important. thesis, we develop systematic methods for large variety data parallel applications efficiently utilize heterogeneous platforms. Specifically, (1) evaluate the...
Traditional math libraries in high performance computing (HPC) are designed with accuracy as the first priority. With development of modern hardware processors and expansion HPC application domains, it is highly desirable to develop fast, approximate function implementations for performance-hungry error-tolerable applications. In this paper, we propose an acceleration method trigonometric functions (sine cosine) based on specialized instructions. We implement vector versions which utilize...
The ever-changing market information makes the traditional collection and way for using it unfitted enterprises' business requirements. Knowledge mining Web intelligence (KB4WBI) platform is put forward in this paper, online knowledge acquisition semantics management are realized. Since has evident time effectiveness context-related characteristic, great emphasis placed on research of sequence representation model ontology evolution. Compared to current methods, comprehensively considers...
Heterogeneous platforms integrating different types of processing units (such as multi-core CPUs and GPUs) are in high demand performance computing. Existing studies have shown that using heterogeneous can improve application hardware utilization. However, systematic methods to design, implement, map applications efficiently use computing resources only very few. The goal my PhD research is therefore study such systems propose allow many (classes of) them. After 3.5 years study,...
The problem of relay channel (RC) with noncausal state information (CSI) available at the is considered. With CSI, can help communication in two ways: 1) by relaying message information; 2) conveying CSI to destination decode. In previous work, Zaidi et al. established a lower bound letting send performing Gelfand-Pinsker (GP) coding. While our schemes, we combine ways as well compressed receivers. We investigate three decode-forward (DF) bounds. first bounds are obtained transmitting and...
Existing generic benchmarks for accelerators (e.g. Parboil and Rodinia) have focused on high performance computing (HPC) applications which limited control flows data irregularity. Previous available graph analytics benchmark suites include straightforward implemented workloads do not employ up-to-date optimization techniques thus quite different behaviors from real-world applications. This paper first briefly presents characterizes the Graph Analytics Repository Designing Next-generation...