NFDI4DS | UHH-SEMS - Publication Details

Jie Shen

ORCID: 0000-0003-4247-7029

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101642102

Research Areas

Parallel Computing and Optimization Techniques
Cloud Computing and Resource Management
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Graph Theory and Algorithms
Wireless Communication Security Techniques
Cooperative Communication and Network Coding
Chemical Synthesis and Analysis
Low-power high-performance VLSI design
Advanced Malware Detection Techniques
Network Packet Processing and Optimization
Algorithms and Data Compression
Ferroelectric and Negative Capacitance Devices
Interconnection Networks and Systems
Numerical Methods and Algorithms
Innovation and Knowledge Management
Urban Transport and Accessibility
Advanced Wireless Communication Technologies
Access Control and Trust
Software System Performance and Reliability
Advanced Energy Technologies and Civil Engineering Innovations
Advanced Image Processing Techniques
Energy Efficient Wireless Sensor Networks
Computer Graphics and Visualization Techniques
Supramolecular Chemistry and Complexes

National University of Defense Technology
2009-2023

North China Electric Power University
2017

China Electric Power Research Institute
2017

Beijing University of Posts and Telecommunications
2016-2017

Delft University of Technology
2012-2015

Yangzhou University
2006-2007

Moving from exascale to zettascale computing: challenges and techniques

OPENALEX - Publications

Xiangke Liao Kai Lu Canqun Yang Jinwen Li Yuan Yuan and 6 more

10.1631/fitee.1800494 article EN Frontiers of Information Technology & Electronic Engineering 2018-10-01

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

OPENALEX - Publications

Jie Shen Jianbin Fang Henk Sips Ana Lucia Vărbănescu

OpenCL and OpenMP are the most commonly used programming models for multi-core processors. They also fundamentally different in their approach to parallelization. In this paper, we focus on comparing performance of OpenMP. We select three applications from Rodinia benchmark suite (which provides equivalent implementations), carry out experiments with datasets platforms. see that incorrect usage CPUs, inherent fine-grained parallelism, immature compilers main reasons lead poorer performance....

10.1109/icppw.2012.18 article EN 2012-09-01

Workload Partitioning for Accelerating Applications on Heterogeneous Platforms

OPENALEX - Publications

Jie Shen Ana Lucia Vărbănescu Yutong Lu Peng Zou Henk Sips

Heterogeneous platforms composed of multi-core CPUs and different types accelerators, like GPUs Xeon Phi, are becoming popular for data parallel applications. The heterogeneity the hardware mix diversity applications pose significant challenges to exploiting such platforms. In this situation, an effective workload partitioning between processing units is critically important improving application performance. This a function capabilities as well dataset be used. work, we present systematic...

10.1109/tpds.2015.2509972 article EN IEEE Transactions on Parallel and Distributed Systems 2015-12-17

Performance Traps in OpenCL for CPUs

OPENALEX - Publications

Jie Shen Jianbin Fang Henk Sips Ana Lucia Vărbănescu

With its design concept of cross-platform portability, OpenCL can be used not only on GPUs (for which it is quite popular), but also CPUs. Whether porting GPU programs to CPUs, or simply writing new code for using brings up the performance issue, usually raised in one two forms: "OpenCL portable!" "Why CPUs after all?!". We argue that both issues addressed by a thorough study factors impact This analysis focus this paper. Specifically, starting from main architectural mismatches between...

10.1109/pdp.2013.16 article EN 2013-02-01

An application-centric evaluation of OpenCL on multi-core CPUs

OPENALEX - Publications

Jie Shen Jianbin Fang Henk Sips Ana Lucia Vărbănescu

10.1016/j.parco.2013.08.009 article EN Parallel Computing 2013-09-04

Glinda

OPENALEX - Publications

Jie Shen Ana Lucia Vărbănescu Henk Sips Michael Arntzen Dick G. Simons

Heterogeneous platforms integrating different processors like GPUs and multi-core CPUs become popular in high performance computing. While most applications are currently using the homogeneous parts of these platforms, we argue that there is a large class can benefit from their heterogeneity: massively parallel imbalanced applications. Such emerge, for example, variable time step based numerical methods simulations. In this paper, present Glinda, framework accelerating on heterogeneous...

10.1145/2482767.2482785 article EN 2013-05-03

Improving performance by matching imbalanced workloads with heterogeneous platforms

OPENALEX - Publications

Jie Shen Ana Lucia Vărbănescu Peng Zou Yutong Lu Henk Sips

Although GPUs are considered ideal to accelerate massively data-parallel applications, there still exceptions this rule. For example, imbalanced applications cannot be efficiently processed by GPUs: despite the massive data parallelism, a varied computational workload per point remains GPU-unfriendly. To process we exploit use of heterogeneous platforms (GPUs and CPUs) partitioning fit usage patterns processors. In work, present our flexible adaptive method that predicts optimal...

10.1145/2597652.2597675 article EN 2014-06-10

Look before You Leap: Using the Right Hardware Resources to Accelerate Applications

OPENALEX - Publications

Jie Shen Ana Lucia Vărbănescu Henk Sips

GPUs are widely used to accelerate data-parallel applications. However, while the GPU processing capability is enhanced in each generation, CPU computing power also increased by adding more cores and widening vector units. Compared rapid development of CPUs, bandwidth data transfer between host grows much slower, resulting a data-transfer wall for using GPUs. In this situation, choosing right mix hardware resources - i.e., The configuration critically important improving application...

10.1109/hpcc.2014.65 article EN 2014-08-01

GARDENIA

OPENALEX - Publications

Zhen Xu Xuhao Chen Jie Shen Yang Zhang Cheng Chen and 1 more

This article presents the Graph Algorithm Repository for Designing Next-generation Accelerators (GARDENIA), a benchmark suite studying irregular graph algorithms on massively parallel accelerators. Applications with limited control and data irregularity are main focus of existing generic benchmarks accelerators, while available processing do not apply state-of-the-art and/or optimization techniques. GARDENIA includes emerging workloads from analytics, sparse linear algebra, machine-learning...

10.1145/3283450 article EN ACM Journal on Emerging Technologies in Computing Systems 2019-01-09

ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels

OPENALEX - Publications

Jianbin Fang Ana Lucia Vărbănescu Jie Shen Henk Sips

Recent parallel architectures are equipped with local memory, which simplifies hardware design at the cost of increased program complexity due to explicit management. To simplify this extra-burden that programmers have, we introduce an easy-to-use API, ELMO, improves productivity while preserving high performance memory operations. Specifically, ELMO is a generic API covers different use-cases. We also present prototype implementations for these APIs and perform multiple GPU-inspired...

10.1109/pdp.2013.61 article EN 2013-02-01

Accelerating Cost Aggregation for Real-Time Stereo Matching

OPENALEX - Publications

Jianbin Fang Ana Lucia Vărbănescu Jie Shen Henk Sips Görkem Saygılı and 1 more

Real-time stereo matching, which is important in many applications like self-driving cars and 3-D scene reconstruction, requires large computation capability high memory bandwidth. The most time-consuming part of stereo-matching algorithms the aggregation information (i.e. costs) over local image regions. In this paper, we present a generic representation suitable implementations for three commonly used cost aggregators on many-core processors. We perform typical optimizations kernels, leads...

10.1109/icpads.2012.71 article EN 2012-12-01

High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs

OPENALEX - Publications

Pingfan Li Xuhao Chen Jie Shen Jianbin Fang Tao Tang and 1 more

Detecting strongly connected components (SCC) has been broadly used in many real-world applications. To speedup SCC detection for large-scale graphs, parallel algorithms have proposed to leverage modern GPUs. Existing GPU implementations are able get on synthetic graph instances, but show limited performance when applied datasets. In this paper, we present a implementation GPUs that achieves high both and graphs. We use hybrid method divides the algorithm into two phases. Our is dynamically...

10.1145/3026937.3026941 article EN 2017-01-27

Orchestrating parallel detection of strongly connected components on GPUs

OPENALEX - Publications

Xuhao Chen Cheng Chen Jie Shen Jianbin Fang Tao Tang and 2 more

10.1016/j.parco.2017.11.001 article EN Parallel Computing 2017-11-10

Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms

OPENALEX - Publications

Jie Shen Ana Lucia Vărbănescu Xavier Martorell Henk Sips

Heterogeneous platforms are mixes of different processing units. The key factor to their efficient usage is workload partitioning. Both static and dynamic partitioning strategies have been defined in previous work, but applicability performance differ significantly depending on the application execute. In this paper, we propose an application-driven method select best strategy for a given workload. To end, define classification based kernel structure -- i.e., number kernels execution flow....

10.1109/icpp.2015.65 article EN 2015-09-01

Heterogeneous computing with accelerators: an overview with examples

OPENALEX - Publications

Ana Lucia Vărbănescu Jie Shen

Accelerator-based platforms are heterogeneous in nature, yet most applications avoid heterogeneity, and focus on acceleration alone. Platform-level heterogeneity can bring significant performance improvement, as it essentially means using additional resources for the same computation. But is gained these worth effort to program deploy applications? In this work, we present a taxonomy of existing programming models tools available computing with accelerators, give examples systems fitting...

10.1109/fdl.2016.7880387 article EN 2016-09-01

A Fair Multi-priority MAC Protocol Design of Wireless Sensor Networks

OPENALEX - Publications

Hongjun Li Xun Li Jie Shen Hongxu Ma

Media access control (MAC) protocols of wireless sensor networks (WSNs) must minimize the radio energy costs in nodes. Latency and throughput are also important design features for MAC current WSNs applications. But most them cannot guarantee quality real-time traffic. This paper studies state art protocols, then introduces a medium protocol that provides multiple priority levels. The channel is accessed by sensors according to their priorities. Sensors send frames round manner with same...

10.1109/nswctc.2009.307 article EN 2009-04-01

A Distributed filesystem framework for transparent accessing heterogeneous storage services

OPENALEX - Publications

Yutong Lu Huajian Mao Jie Shen

This paper introduces an extensible distributed file system framework, YaFS, using heterogeneous online storage services as its back-ends. It provides a configurable solution for simplifying the usage of multiple resources and accessing data ubiquitously safely. YaFS is POSIX compliant, so that it could support most existing applications seamlessly. An offline mode used to cope with challenged unreliable network environment. We implement abstraction layer plug-in mechanism uniformly...

10.1109/ipdps.2009.5161180 article EN 2009-05-01

GARDENIA: A Domain-specific Benchmark Suite for Next-generation Accelerators

OPENALEX - Publications

Xu Zhen Xuhao Chen Jie Shen Yang Zhang Cheng Chen and 1 more

This paper presents the Graph Analytics Repository for Designing Next-generation Accelerators (GARDENIA), a benchmark suite studying irregular algorithms on massively parallel accelerators. Existing generic benchmarks accelerators have mainly focused high performance computing (HPC) applications with limited control and data irregularity, while available graph analytics do not apply state-of-the-art and/or optimization techniques. GARDENIA includes emerging in big-data machine learning...

10.48550/arxiv.1708.04567 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Efficient High Performance Computing on Heterogeneous Platforms

OPENALEX - Publications

Jie Shen

Heterogeneous platforms are mixes of different processing units in a compute node (e.g., CPUs+GPUs, CPU+MICs) or chip package APUs). This type keeps gaining popularity various computer systems ranging from supercomputers to mobile devices. In this context, improving their efficiency and usability has become increasingly important. thesis, we develop systematic methods for large variety data parallel applications efficiently utilize heterogeneous platforms. Specifically, (1) evaluate the...

10.4233/uuid:3efc8aae-e31f-47a4-a92c-b9da05917ada article EN 2015-11-24

A Quantitative Evaluation of Vector Transcendental Functions on ARMv8-Based Processors

OPENALEX - Publications

Jie Shen Biao Long Chun Huang

10.1007/s11390-021-1203-5 article EN Journal of Computer Science and Technology 2023-05-30

Optimizing Fast Trigonometric Functions on Modern CPUs

OPENALEX - Publications

Jie Shen Biao Long Chun Huang

Traditional math libraries in high performance computing (HPC) are designed with accuracy as the first priority. With development of modern hardware processors and expansion HPC application domains, it is highly desirable to develop fast, approximate function implementations for performance-hungry error-tolerable applications. In this paper, we propose an acceleration method trigonometric functions (sine cosine) based on specialized instructions. We implement vector versions which utilize...

10.1109/hpcc-dss-smartcity-dependsys57074.2022.00162 article EN 2022-12-01

Knowledge Mining for Web Business Intelligence Platform and Its Sequence Knowledge Model

OPENALEX - Publications

Jie Shen Wei Liuhua Kun He Xu Fa-yan Lei Bi and 1 more

The ever-changing market information makes the traditional collection and way for using it unfitted enterprises' business requirements. Knowledge mining Web intelligence (KB4WBI) platform is put forward in this paper, online knowledge acquisition semantics management are realized. Since has evident time effectiveness context-related characteristic, great emphasis placed on research of sequence representation model ontology evolution. Compared to current methods, comprehensively considers...

10.1109/cis.workshops.2007.137 article EN Computational Intelligence and Security 2007-12-15

Improving Application Performance by Efficiently Utilizing Heterogeneous Many-core Platforms

OPENALEX - Publications

Jie Shen Ana Lucia Vărbănescu Henk Sips

Heterogeneous platforms integrating different types of processing units (such as multi-core CPUs and GPUs) are in high demand performance computing. Existing studies have shown that using heterogeneous can improve application hardware utilization. However, systematic methods to design, implement, map applications efficiently use computing resources only very few. The goal my PhD research is therefore study such systems propose allow many (classes of) them. After 3.5 years study,...

10.1109/ccgrid.2015.44 article EN 2015-05-01

Decode-Forward Relaying with State Available Noncausally at the Relay

OPENALEX - Publications

Jie Shen Dajin Wang Ou Wang Geng Zhang

The problem of relay channel (RC) with noncausal state information (CSI) available at the is considered. With CSI, can help communication in two ways: 1) by relaying message information; 2) conveying CSI to destination decode. In previous work, Zaidi et al. established a lower bound letting send performing Gelfand-Pinsker (GP) coding. While our schemes, we combine ways as well compressed receivers. We investigate three decode-forward (DF) bounds. first bounds are obtained transmitting and...

10.12783/dtcse/itme2017/7999 article EN DEStech Transactions on Computer Science and Engineering 2017-04-27

Architectural Implications in Graph Processing of Accelerator with Gardenia Benchmark Suite

OPENALEX - Publications

Yang Zhang Jie Shen Zhen Xu Shikai Qiu Xuhao Chen

Existing generic benchmarks for accelerators (e.g. Parboil and Rodinia) have focused on high performance computing (HPC) applications which limited control flows data irregularity. Previous available graph analytics benchmark suites include straightforward implemented workloads do not employ up-to-date optimization techniques thus quite different behaviors from real-world applications. This paper first briefly presents characterizes the Graph Analytics Repository Designing Next-generation...

10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00191 article EN 2019-12-01

Coming Soon ...