Mehmet E. Belviranlı

ORCID: 0000-0001-9434-9833
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Embedded Systems Design Techniques
  • Distributed and Parallel Computing Systems
  • Cloud Computing and Resource Management
  • Advanced Neural Network Applications
  • Real-Time Systems Scheduling
  • Advanced Memory and Neural Computing
  • Brain Tumor Detection and Classification
  • Microbial Metabolic Engineering and Bioproduction
  • Ferroelectric and Negative Capacitance Devices
  • Bioinformatics and Genomic Networks
  • Advanced Graph Neural Networks
  • Network Packet Processing and Optimization
  • Radiation Effects in Electronics
  • Petri Nets in System Modeling
  • Software Testing and Debugging Techniques
  • Stochastic Gradient Optimization Techniques
  • Drilling and Well Engineering
  • Business Process Modeling and Analysis
  • Biomedical Text Mining and Ontologies
  • Vehicular Ad Hoc Networks (VANETs)
  • Data Visualization and Analytics
  • Caching and Content Delivery

Colorado School of Mines
2020-2024

University of California, Riverside
2012-2022

Oak Ridge National Laboratory
2018-2022

Google (United States)
2022

Lawrence Livermore National Laboratory
2022

Arizona State University
2022

Meta (United States)
2022

University of California System
2017

Bilkent University
2010

Today's heterogeneous architectures bring together multiple general-purpose CPUs and domain-specific GPUs FPGAs to provide dramatic speedup for many applications. However, the challenge lies in utilizing these processors optimize overall application performance by minimizing workload completion time. Operating system development systems is their infancy. In this article, we propose a new scheduling balancing scheme, HDSS, execution of loops having dependent or independent iterations on...

10.1145/2400682.2400716 article EN ACM Transactions on Architecture and Code Optimization 2013-01-01

Heterogeneous computing with accelerators is growing in importance high performance (HPC). Recently, application datasets have expanded beyond the memory capacity of these accelerators, and often their hosts. Meanwhile, nonvolatile (NVM) storage has emerged as a pervasive component HPC systems because NVM provides massive amounts at affordable cost. Currently, for accelerator applications to use NVM, they must manually orchestrate data movement across multiple memories this approach only...

10.1109/sc.2018.00035 article EN 2018-11-01

The energy and latency demands of critical workload execution, such as object detection, in embedded systems vary based on the physical system state other external factors. Many recent mobile autonomous System-on-Chips (SoC) embed a diverse range accelerators with unique power performance characteristics. execution flow workloads can be adjusted to span into multiple so that trade-off between fits dynamically changing

10.1145/3489517.3530572 article EN Proceedings of the 59th ACM/IEEE Design Automation Conference 2022-07-10

Hashing is one of the most fundamental operations that provides a means for program to obtain fast access large amounts data. Despite emergence GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions are yet receive adequate attention. Existing not only impose restrictions (e.g., inability concurrently execute insertion and retrieval operations, limitation on size key-value pairs) limit their applicability, does scale hash tables must be kept...

10.1109/pact.2015.13 article EN 2015-10-01

GPUs lack fundamental support for data-dependent parallelism and synchronization. While CUDA Dynamic Parallelism signals progress in this direction, many limitations challenges still remain. This paper introduces Wireframe, a hardware-software solution that enables generalized Wireframe applications to naturally express execution dependencies across different thread blocks through dependency graph abstraction at run-time, which is sent the GPU hardware kernel launch. At enforces specified...

10.1145/3123939.3123976 article EN 2017-10-14

Two distinguishing features of state-of-the-art mobile and autonomous systems are: 1) There are often multiple workloads, mainly deep neural network (DNN) inference, running concurrently continuously.2) They operate on shared memory System-on-Chips (SoC) that embed heterogeneous accelerators tailored for specific operations.State-of-the-art lack efficient performance resource management techniques necessary to either maximize total system throughput or minimize end-to-end workload latency.In...

10.1145/3627535.3638502 article EN 2024-02-20

Nested loops with regular iteration dependencies span a large class of applications ranging from string matching to linear system solvers. Wavefront parallelism is well-known technique enable concurrent processing such and widely being used on GPUs benefit their massively parallel computing capabilities. uses global barriers between tiles enforce data dependencies. However, diagonal-wide synchronization causes load imbalance by forcing SMs wait for the completion SM longest computation....

10.1145/2751205.2751243 article EN 2015-06-02

Recent generations of GPUs and their corresponding APIs provide means for sharing compute resources among multiple applications with greater efficiency than ever. This advance has enabled the to act as shared computation in multi-user environments, like supercomputers cloud computing. research focused on maximizing utilization GPU computing by simultaneously executing (i.e., concurrent kernels) via temporal or spatial partitioning. However, they have not considered PCI-e bus which is equally...

10.1145/2925426.2926271 article EN 2016-06-01

We present a new algorithm for automatic layout of clustered graphs using circular style. The tries to determine optimal location and orientation individual clusters intrinsically within modified spring embedder. Heuristics such as reversal the order nodes in cluster swap neighboring node pairs same are employed intermittently further relax embedder system, resulting reduced inter-cluster edge crossings. Unlike other algorithms generating drawings, our does not require quotient graph be...

10.1109/tvcg.2012.178 article EN IEEE Transactions on Visualization and Computer Graphics 2012-09-05

Scientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of dependences across thread blocks may significantly impact speedup by requiring global synchronization multiprocessors (SMs) inside GPU. To efficiently interblock dependences, we need fine-granular task-based execution models that will treat SMs a GPU as stand-alone parallel units. Such...

10.1145/3178487.3178492 article EN 2018-02-06

With recent advancements in techniques for cellular data acquisition, information on processes has been increasing at a dramatic rate. Visualization is critical to analyzing and interpreting complex information; representing or pathways no exception. VISIBIOweb free, open-source, web-based pathway visualization layout service models BioPAX format. VISIBIOweb, one can obtain well-laid-out views of using the standard notation Systems Biology Graphical Notation (SBGN), embed such within one's...

10.1093/nar/gkq352 article EN cc-by-nc Nucleic Acids Research 2010-05-11

Cyber-physical systems (CPS) such as robots and self-driving cars pose strict physical requirements to avoid failure. The scheduling choices impact these requirements. This presents a challenge: How do we find efficient schedules for CPS with heterogeneous processing units, that the are resource-bounded meet requirements? For example, tasks require significant computation time in car can delay reaction, decreasing available braking time. Heterogeneous computing — containing CPUs, GPUs, other...

10.1145/3650200.3656625 article EN other-oa 2024-05-30

Integrated shared memory heterogeneous architectures are pervasive because they satisfy the diverse needs of mobile, autonomous, and edge computing platforms. Although specialized processing units (PUs) that share a unified system improve performance energy efficiency by reducing data movement, also increase contention for this since PUs interact with each other. Prior work has investigated degradation due to contention, but few have studied relationship power contention. Moreover,...

10.1145/3410463.3414671 article EN 2020-09-30

The decades-old memory bottleneck problem for data-intensive applications is getting worse as the processor core counts continue to increase. Workloads with sparse access characteristics only achieve a fraction of system's total bandwidth. EMU architecture provides radical approach issue by migrating computational threads location where data resides. system enables large PGAS-type hundreds nodes via Cilk-based multi-threaded execution scheme. brings brand new challenges in application design...

10.1109/hpec.2018.8547571 article EN 2018-09-01

Regular expressions are pervasive in modern systems. Many real-world regular inefficient, sometimes to the extent that they vulnerable complexity-based attacks, and while much research has focused on detecting inefficient or accelerating expression matching at hardware level, we investigate automatically transforming remove inefficiencies. We reduce this problem general optimization, an important task necessary a variety of domains even beyond compilers, e.g., digital logic design, etc....

10.1145/3559009.3569664 article EN 2022-10-08

The slowdown of Moore's law has caused an escalation in architectural diversity over the last decade, and agile development domain-specific heterogeneous chips is becoming a high priority. However, this must also consider portable programming environments other constraints system design. More importantly, understanding role each component end-to-end design important to both architects application developers include metrics like power, performance, space, cost, reliability. Being able quickly...

10.23919/date.2019.8747521 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2019-03-01

Diversely Heterogeneous System-on-Chips (DH-SoC) are increasingly popular computing platforms in many fields, such as autonomous driving and AR/VR applications, due to their ability effectively balance performance energy efficiency. Having multiple target accelerators for concurrent workloads requires a careful runtime analysis of scheduling. In this study, we examine scenario that mandates several concerns be carefully addressed: 1) exploring the mapping various heterogeneous optimize...

10.1145/3589010.3594889 article EN cc-by 2023-08-14

Many slowdown models have been proposed to characterize memory interference of workloads co-running on heterogeneous System-on-Chips (SoCs). But they are mostly for post-silicon usage. How effectively consider in the SoC design stage remains an open problem. This paper presents a new approach this problem, consisting novel processor-centric modeling methodology and three-region interference-conscious model. The process needs no measurement various combinations applications, but produced can...

10.1145/3466752.3480101 article EN 2021-10-17

As processor power density increases, chip/core tempera-ture control becomes critical for building multicore systems. This paper addresses the problem of inter-core thermal coupling and periodic variation while executing multi-threaded network applications in a archi-tecture.

10.5555/2537857.2537876 article EN Architectures for Networking and Communications Systems 2013-10-21

General purpose computing on GPUs have became increasingly popular over the last decade. Scientific applications with SIMD computation characteristics show considerable performance improvements when run these massively parallel architectures. However, data dependencies across thread blocks significantly impact degree of achievable parallelism by requiring global synchronization multi-processors (SMs) inside GPU.

10.1145/2608020.2608024 article EN 2014-06-20

Neural network inference (NNI) is commonly used in mobile and autonomous systems for latency-sensitive critical operations such as obstacle detection avoidance. In addition to latency, energy consumption also an important factor workloads, since the battery a limited resource systems. Energy latency demands of workload execution can vary based on physical system state. For example, remaining low-running should be prioritized motor quadcopter. On other hand, if quadcopter flying through...

10.1109/rsdha54838.2021.00006 article EN 2021-11-01

Computing systems have been evolving to be more pervasive, heterogeneous, and dynamic. An increasing number of emerging domains now rely on diverse edge cloud continuum where the execution applications often spans various tiers with significantly heterogeneous computational capabilities. Resources in each tier are handled isolation due scalability privacy concerns. However, better overall resource utilization could achieved if different had means communicate their In this paper, we propose...

10.48550/arxiv.2402.04522 preprint EN arXiv (Cornell University) 2024-02-06

In recent years, deep neural networks (DNNs) have gained widespread adoption for continuous mobile object detection (OD) tasks, particularly in autonomous systems. However, a prevalent issue their deployment is the one-size-fits-all approach, where single DNN used, resulting inefficient utilization of computational resources. This inefficiency detrimental energy-constrained systems, as it degrades overall system efficiency. We identify that, contextual information embedded input data stream...

10.48550/arxiv.2402.07415 preprint EN arXiv (Cornell University) 2024-02-12
Coming Soon ...