NFDI4DS | UHH-SEMS - Publication Details

Mehmet E. Belviranlı

ORCID: 0000-0001-9434-9833

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5080092881

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Interconnection Networks and Systems
Embedded Systems Design Techniques
Distributed and Parallel Computing Systems
Cloud Computing and Resource Management
Advanced Neural Network Applications
Real-Time Systems Scheduling
Advanced Memory and Neural Computing
Brain Tumor Detection and Classification
Microbial Metabolic Engineering and Bioproduction
Ferroelectric and Negative Capacitance Devices
Bioinformatics and Genomic Networks
Advanced Graph Neural Networks
Network Packet Processing and Optimization
Radiation Effects in Electronics
Petri Nets in System Modeling
Software Testing and Debugging Techniques
Stochastic Gradient Optimization Techniques
Drilling and Well Engineering
Business Process Modeling and Analysis
Biomedical Text Mining and Ontologies
Vehicular Ad Hoc Networks (VANETs)
Data Visualization and Analytics
Caching and Content Delivery

Colorado School of Mines
2020-2024

University of California, Riverside
2012-2022

Oak Ridge National Laboratory
2018-2022

Google (United States)
2022

Lawrence Livermore National Laboratory
2022

Arizona State University
2022

Meta (United States)
2022

University of California System
2017

Bilkent University
2010

A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures

OPENALEX - Publications

Mehmet E. Belviranlı Laxmi N. Bhuyan Rajiv Gupta

Today's heterogeneous architectures bring together multiple general-purpose CPUs and domain-specific GPUs FPGAs to provide dramatic speedup for many applications. However, the challenge lies in utilizing these processors optimize overall application performance by minimizing workload completion time. Operating system development systems is their infancy. In this article, we propose a new scheduling balancing scheme, HDSS, execution of loops having dependent or independent iterations on...

10.1145/2400682.2400716 article EN ACM Transactions on Architecture and Code Optimization 2013-01-01

DRAGON: Breaking GPU Memory Capacity Limits with Direct NVM Access

OPENALEX - Publications

Pak Markthub Mehmet E. Belviranlı Seyong Lee Jeffrey S. Vetter Satoshi Matsuoka

Heterogeneous computing with accelerators is growing in importance high performance (HPC). Recently, application datasets have expanded beyond the memory capacity of these accelerators, and often their hosts. Meanwhile, nonvolatile (NVM) storage has emerged as a pervasive component HPC systems because NVM provides massive amounts at affordable cost. Currently, for accelerator applications to use NVM, they must manually orchestrate data movement across multiple memories this approach only...

10.1109/sc.2018.00035 article EN 2018-11-01

AxoNN

OPENALEX - Publications

Ismet Dagli Alexander Cieslewicz Jedidiah McClurg Mehmet E. Belviranlı

The energy and latency demands of critical workload execution, such as object detection, in embedded systems vary based on the physical system state other external factors. Many recent mobile autonomous System-on-Chips (SoC) embed a diverse range accelerators with unique power performance characteristics. execution flow workloads can be adjusted to span into multiple so that trade-off between fits dynamically changing

10.1145/3489517.3530572 article EN Proceedings of the 59th ACM/IEEE Design Automation Conference 2022-07-10

Stadium Hashing: Scalable and Flexible Hashing on GPUs

OPENALEX - Publications

Farzad Khorasani Mehmet E. Belviranlı Rajiv Gupta Laxmi N. Bhuyan

Hashing is one of the most fundamental operations that provides a means for program to obtain fast access large amounts data. Despite emergence GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions are yet receive adequate attention. Existing not only impose restrictions (e.g., inability concurrently execute insertion and retrieval operations, limitation on size key-value pairs) limit their applicability, does scale hash tables must be kept...

10.1109/pact.2015.13 article EN 2015-10-01

Wireframe

OPENALEX - Publications

AmirAli Abdolrashidi Devashree Tripathy Mehmet E. Belviranlı Laxmi N. Bhuyan Daniel Wong

GPUs lack fundamental support for data-dependent parallelism and synchronization. While CUDA Dynamic Parallelism signals progress in this direction, many limitations challenges still remain. This paper introduces Wireframe, a hardware-software solution that enables generalized Wireframe applications to naturally express execution dependencies across different thread blocks through dependency graph abstraction at run-time, which is sent the GPU hardware kernel launch. At enforces specified...

10.1145/3123939.3123976 article EN 2017-10-14

Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

OPENALEX - Publications

Ismet Dagli Mehmet E. Belviranlı

Two distinguishing features of state-of-the-art mobile and autonomous systems are: 1) There are often multiple workloads, mainly deep neural network (DNN) inference, running concurrently continuously.2) They operate on shared memory System-on-Chips (SoC) that embed heterogeneous accelerators tailored for specific operations.State-of-the-art lack efficient performance resource management techniques necessary to either maximize total system throughput or minimize end-to-end workload latency.In...

10.1145/3627535.3638502 article EN 2024-02-20

PeerWave

OPENALEX - Publications

Mehmet E. Belviranlı Peng Deng Laxmi N. Bhuyan Rajiv Gupta Qi Zhu

Nested loops with regular iteration dependencies span a large class of applications ranging from string matching to linear system solvers. Wavefront parallelism is well-known technique enable concurrent processing such and widely being used on GPUs benefit their massively parallel computing capabilities. uses global barriers between tiles enforce data dependencies. However, diagonal-wide synchronization causes load imbalance by forcing SMs wait for the completion SM longest computation....

10.1145/2751205.2751243 article EN 2015-06-02

CuMAS

OPENALEX - Publications

Mehmet E. Belviranlı Farzad Khorasani Laxmi N. Bhuyan Rajiv Gupta

Recent generations of GPUs and their corresponding APIs provide means for sharing compute resources among multiple applications with greater efficiency than ever. This advance has enabled the to act as shared computation in multi-user environments, like supercomputers cloud computing. research focused on maximizing utilization GPU computing by simultaneously executing (i.e., concurrent kernels) via temporal or spatial partitioning. However, they have not considered PCI-e bus which is equally...

10.1145/2925426.2926271 article EN 2016-06-01

CiSE: A Circular Spring Embedder Layout Algorithm

OPENALEX - Publications

Uḡur Doḡrusöz Mehmet E. Belviranlı A. Dilek

We present a new algorithm for automatic layout of clustered graphs using circular style. The tries to determine optimal location and orientation individual clusters intrinsically within modified spring embedder. Heuristics such as reversal the order nodes in cluster swap neighboring node pairs same are employed intermittently further relax embedder system, resulting reduced inter-cluster edge crossings. Unlike other algorithms generating drawings, our does not require quotient graph be...

10.1109/tvcg.2012.178 article EN IEEE Transactions on Visualization and Computer Graphics 2012-09-05

Juggler

OPENALEX - Publications

Mehmet E. Belviranlı Seyong Lee Jeffrey S. Vetter Laxmi N. Bhuyan

Scientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of dependences across thread blocks may significantly impact speedup by requiring global synchronization multiprocessors (SMs) inside GPU. To efficiently interblock dependences, we need fine-granular task-based execution models that will treat SMs a GPU as stand-alone parallel units. Such...

10.1145/3178487.3178492 article EN 2018-02-06

VISIBIOweb: visualization and layout services for BioPAX pathway models

OPENALEX - Publications

A. Dilek Mehmet E. Belviranlı Uḡur Doḡrusöz

With recent advancements in techniques for cellular data acquisition, information on processes has been increasing at a dramatic rate. Visualization is critical to analyzing and interpreting complex information; representing or pathways no exception. VISIBIOweb free, open-source, web-based pathway visualization layout service models BioPAX format. VISIBIOweb, one can obtain well-laid-out views of using the standard notation Systems Biology Graphical Notation (SBGN), embed such within one's...

10.1093/nar/gkq352 article EN cc-by-nc Nucleic Acids Research 2010-05-11

Scheduling for Cyber-Physical Systems with Heterogeneous Processing Units under Real-World Constraints

OPENALEX - Publications

Justin McGowen Ismet Dagli Neil T. Dantam Mehmet E. Belviranlı

Cyber-physical systems (CPS) such as robots and self-driving cars pose strict physical requirements to avoid failure. The scheduling choices impact these requirements. This presents a challenge: How do we find efficient schedules for CPS with heterogeneous processing units, that the are resource-bounded meet requirements? For example, tasks require significant computation time in car can delay reaction, decreasing available braking time. Heterogeneous computing — containing CPUs, GPUs, other...

10.1145/3650200.3656625 article EN other-oa 2024-05-30

MEPHESTO

OPENALEX - Publications

Mohammad Alaul Haque Monil Mehmet E. Belviranlı Seyong Lee Jeffrey S. Vetter Allen D. Malony

Integrated shared memory heterogeneous architectures are pervasive because they satisfy the diverse needs of mobile, autonomous, and edge computing platforms. Although specialized processing units (PUs) that share a unified system improve performance energy efficiency by reducing data movement, also increase contention for this since PUs interact with each other. Prior work has investigated degradation due to contention, but few have studied relationship power contention. Moreover,...

10.1145/3410463.3414671 article EN 2020-09-30

Designing Algorithms for the EMU Migrating-threads-based Architecture

OPENALEX - Publications

Mehmet E. Belviranlı Seyong Lee Jeffrey S. Vetter

The decades-old memory bottleneck problem for data-intensive applications is getting worse as the processor core counts continue to increase. Workloads with sparse access characteristics only achieve a fraction of system's total bandwidth. EMU architecture provides radical approach issue by migrating computational threads location where data resides. system enables large PGAS-type hundreds nodes via Cilk-based multi-threaded execution scheme. brings brand new challenges in application design...

10.1109/hpec.2018.8547571 article EN 2018-09-01

A computational-graph partitioning method for training memory-constrained DNNs

OPENALEX - Publications

Fareed Qararyah Mohamed Wahib Doğa Dikbayır Mehmet E. Belviranlı Didem Unat

10.1016/j.parco.2021.102792 article EN publisher-specific-oa Parallel Computing 2021-04-29

Optimizing Regular Expressions via Rewrite-Guided Synthesis

OPENALEX - Publications

Jedidiah McClurg Miles Claver Jackson Garner Jake Vossen Jordan Schmerge and 1 more

Regular expressions are pervasive in modern systems. Many real-world regular inefficient, sometimes to the extent that they vulnerable complexity-based attacks, and while much research has focused on detecting inefficient or accelerating expression matching at hardware level, we investigate automatically transforming remove inefficiencies. We reduce this problem general optimization, an important task necessary a variety of domains even beyond compilers, e.g., digital logic design, etc....

10.1145/3559009.3569664 article EN 2022-10-08

FLAME: Graph-based hardware representations for rapid and precise performance modeling

OPENALEX - Publications

Mehmet E. Belviranlı Jeffrey S. Vetter

The slowdown of Moore's law has caused an escalation in architectural diversity over the last decade, and agile development domain-specific heterogeneous chips is becoming a high priority. However, this must also consider portable programming environments other constraints system design. More importantly, understanding role each component end-to-end design important to both architects application developers include metrics like power, performance, space, cost, reliability. Being able quickly...

10.23919/date.2019.8747521 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2019-03-01

Contention-aware Performance Modeling for Heterogeneous Edge and Cloud Systems

OPENALEX - Publications

Ismet Dagli Andrew Depke Andrew Mueller Sahil Hassan Ali Akoglu and 1 more

Diversely Heterogeneous System-on-Chips (DH-SoC) are increasingly popular computing platforms in many fields, such as autonomous driving and AR/VR applications, due to their ability effectively balance performance energy efficiency. Having multiple target accelerators for concurrent workloads requires a careful runtime analysis of scheduling. In this study, we examine scenario that mandates several concerns be carefully addressed: 1) exploring the mapping various heterogeneous optimize...

10.1145/3589010.3594889 article EN cc-by 2023-08-14

PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-Chips

OPENALEX - Publications

Yuanchao Xu Mehmet E. Belviranlı Xipeng Shen Jeffrey S. Vetter

Many slowdown models have been proposed to characterize memory interference of workloads co-running on heterogeneous System-on-Chips (SoCs). But they are mostly for post-silicon usage. How effectively consider in the SoC design stage remains an open problem. This paper presents a new approach this problem, consisting novel processor-centric modeling methodology and three-region interference-conscious model. The process needs no measurement various combinations applications, but produced can...

10.1145/3466752.3480101 article EN 2021-10-17

Thermal prediction and scheduling of network applications on multicore processors

OPENALEX - Publications

Chih-Hsun Chou Mehmet E. Belviranlı Laxmi N. Bhuyan

As processor power density increases, chip/core tempera-ture control becomes critical for building multicore systems. This paper addresses the problem of inter-core thermal coupling and periodic variation while executing multi-threaded network applications in a archi-tecture.

10.5555/2537857.2537876 article EN Architectures for Networking and Communications Systems 2013-10-21

A paradigm shift in GP-GPU computing

OPENALEX - Publications

Mehmet E. Belviranlı Chih-Hsun Chou Laxmi N. Bhuyan Rajiv Gupta

General purpose computing on GPUs have became increasingly popular over the last decade. Scientific applications with SIMD computation characteristics show considerable performance improvements when run these massively parallel architectures. However, data dependencies across thread blocks significantly impact degree of achievable parallelism by requiring global synchronization multi-processors (SMs) inside GPU.

10.1145/2608020.2608024 article EN 2014-06-20

Multi-accelerator Neural Network Inference in Diversely Heterogeneous Embedded Systems

OPENALEX - Publications

Ismet Dagli Mehmet E. Belviranlı

Neural network inference (NNI) is commonly used in mobile and autonomous systems for latency-sensitive critical operations such as obstacle detection avoidance. In addition to latency, energy consumption also an important factor workloads, since the battery a limited resource systems. Energy latency demands of workload execution can vary based on physical system state. For example, remaining low-running should be prioritized motor quadcopter. On other hand, if quadcopter flying through...

10.1109/rsdha54838.2021.00006 article EN 2021-11-01

H-EYE: Holistic Resource Modeling and Management for Diversely Scaled Edge-Cloud Systems

OPENALEX - Publications

Ismet Dagli Amid Morshedlou Jamal Rostami Mehmet E. Belviranlı

Computing systems have been evolving to be more pervasive, heterogeneous, and dynamic. An increasing number of emerging domains now rely on diverse edge cloud continuum where the execution applications often spans various tiers with significantly heterogeneous computational capabilities. Resources in each tier are handled isolation due scalability privacy concerns. However, better overall resource utilization could achieved if different had means communicate their In this paper, we propose...

10.48550/arxiv.2402.04522 preprint EN arXiv (Cornell University) 2024-02-06

Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems

OPENALEX - Publications

Justin Davis Mehmet E. Belviranlı

In recent years, deep neural networks (DNNs) have gained widespread adoption for continuous mobile object detection (OD) tasks, particularly in autonomous systems. However, a prevalent issue their deployment is the one-size-fits-all approach, where single DNN used, resulting inefficient utilization of computational resources. This inefficiency detrimental energy-constrained systems, as it degrades overall system efficiency. We identify that, contextual information embedded input data stream...

10.48550/arxiv.2402.07415 preprint EN arXiv (Cornell University) 2024-02-12

Exploring Page-based RDMA for Irregular GPU Workloads. A case study on NVMe-backed GNN Execution

OPENALEX - Publications

Benjamin Wagley Pak Markthub James Crea Bo Wu Mehmet E. Belviranlı

10.1145/3649411.3649413 article EN 2024-03-02

Coming Soon ...