Mohammad Alian

ORCID: 0000-0002-4622-2181
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Cloud Computing and Resource Management
  • Interconnection Networks and Systems
  • Superconducting Materials and Applications
  • Caching and Content Delivery
  • Distributed and Parallel Computing Systems
  • Advanced Memory and Neural Computing
  • Network Traffic and Congestion Control
  • Real-Time Systems Scheduling
  • Nuclear Physics and Applications
  • Green IT and Sustainability
  • Domain Adaptation and Few-Shot Learning
  • Data Management and Algorithms
  • Cloud Computing and Remote Desktop Technologies
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Image and Video Retrieval Techniques
  • Advanced Drug Delivery Systems
  • Privacy-Preserving Technologies in Data
  • Radiation Effects in Electronics
  • Stochastic Gradient Optimization Techniques
  • Computer Graphics and Visualization Techniques
  • Simulation Techniques and Applications
  • Advanced Neural Network Applications
  • CCD and CMOS Imaging Sensors

Cornell University
2024-2025

University of Kansas
2020-2024

Birzeit University
2024

University of Illinois Urbana-Champaign
2015-2021

International University of the Caribbean
2020

Samsung (South Korea)
2018

Seoul National University
2018

University of Illinois System
2017

Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on learning patterns of data and are permeating into different industries markets. Cloud infrastructure accelerators offer INFerence-as-a-Service (INFaaS) become the enabler this rather quick invasive shift in industry. To end, mostly accelerator-based INFaaS (Google's TPU [1], NVIDIA T4 [2], Microsoft Brainwave [3], etc.) has backbone many real-life applications. However, as demand for such services grows,...

10.1109/micro50266.2020.00062 article EN 2020-10-01

Training real-world Deep Neural Networks (DNNs) can take an eon (i.e., weeks or months) without leveraging distributed systems. Even training takes inordinate time, of which a large fraction is spent in communicating weights and gradients over the network. State-of-the-art algorithms use hierarchy worker-aggregator nodes. The aggregators repeatedly receive gradient updates from their allocated group workers, send back updated weights. This paper sets out to reduce this significant...

10.1109/micro.2018.00023 article EN 2018-10-01

The physical memory capacity of servers is expected to increase drastically with deployment the forthcoming non-volatile technologies. This a welcomed improvement for emerging data-intensive applications. For such be cost-effective, nonetheless, we must cost-effectively compute throughput and bandwidth commensurate in without compromising application readiness. Tackling this challenge, present Memory Channel Network (MCN) architecture paper. Specifically, first, propose an MCN DIMM,...

10.1109/micro.2018.00070 article EN 2018-10-01

When analyzing a distributed computer system, we often observe that the complex interplay among processor, node, and network sub-systems can profoundly affect performance power efficiency of system. Therefore, to effectively cross-optimize hardware software components need full-system simulation infrastructure precisely capture interplay. Responding aforementioned need, present dist-gem5, flexible, detailed, open-source model simulate system using multiple hosts. Then validate dist-gem5...

10.1109/ispass.2017.7975287 article EN 2017-04-01

A modern datacenter server aims to achieve high energy efficiency by co-running multiple applications. Some of such applications (e.g., web search) are latency sensitive. Therefore, they require low-latency I/O services fast respond requests from clients. However, we observe that simply replacing the storage devices servers with Ultra-Low-Latency (ULL) SSDs does not notably reduce services, especially when In this paper, propose FLASHSHARE assist ULL satisfy different levels service...

10.5555/3291168.3291203 article EN Operating Systems Design and Implementation 2018-10-08

In modern server CPUs, last-level cache (LLC) is a critical hardware resource that exerts significant influence on the performance of workloads, and how to manage LLC key isolation QoS in cloud with multi-tenancy. this paper, we argue addition CPU cores, high-speed I/O also important for management. This because an Intel architectural innovation – Data Direct (DDIO) directly injects inbound traffic (part of) instead main memory. We summarize two problems caused by DDIO show (1) default...

10.1109/isca52012.2021.00018 article EN 2021-06-01

Improving the performance and power efficiency of a single processor has been fraught with various challenges stemming from end classical technology scaling. Thus, importance efficiently running applications on parallel/distributed computer system continued to increase. In developing optimizing such system, it is critical study impact complex interplay amongst processor, node, network architectures in detail. This necessitates flexible, detailed open-source full-system simulation...

10.1109/lca.2015.2438295 article EN IEEE Computer Architecture Letters 2015-06-02

Optimizing bandwidth was the main focus of designing scale-out networks for several decades and this optimization trend has served well traditional Internet applications. However, emergence datacenters as single computer entities made latency important in datacenter networks. PCIe interconnect is known to be bottleneck communication its overhead can contribute up ~90% overall latency. Despite overheads, de facto standard servers it been established maintained more than two decades. In...

10.1145/3352460.3358278 article EN 2019-10-11

The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern hardware at cycle level, it has enough fidelity boot unmodified Linux-based operating systems run full applications multiple architectures including x86, Arm, RISC-V. been under active development over last nine years since original release. In this time, there have 7500 commits codebase from 250 unique...

10.48550/arxiv.2007.03152 preprint EN cc-by arXiv (Cornell University) 2020-01-01

The rate of network packets encapsulating requests from clients can significantly affect the utilization, and thus performance sleep states processors in servers deploying a power management policy. To improve energy efficiency, may adopt an aggressive policy that frequently transitions processor to low-performance or state at low utilization. However, such not respond sudden increase early enough due considerable penalty transitioning high-performance state. This turn entails violations...

10.1109/hpca.2017.57 article EN 2017-02-01

I/O performance plays a critical role in the overall of modern servers. The emergence ultra high-speed devices makes data movement between processors, main memory, and major bottleneck. Conventionally, memory is used as an intermediate buffer processor cannot directly access side caches. Data Direct (DDIO) technology aims to reduce bandwidth utilization by enabling leverage Last Level Cache (LLC) buffer. Our experimental results show that DDIO can completely eliminate while running...

10.1109/ispass48437.2020.00031 article EN 2020-08-01

There has been significant focus on offloading upperlayer network protocols (ULPs) to accelerators located CPUs and SmartNICs. However, restricting accelerator placement these locations limits both the variety of ULPs that can be accelerated overall performance. In particular, it overlooks opportunity accelerate running atop a stateful transport protocol in face high cache contention. That is, at rates, frequent DRAM accesses SmartNIC-CPU synchronizations outweigh benefits hardware...

10.1109/hpca57654.2024.00032 article EN 2024-03-02

While (I) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (II) storage dissaggregation in the system infrastructure level and (III) integration domain-specific accelerators hardware level. Each these three trends individually provide significant benefits; however, when combined benefits diminish. On convergence trends, paper makes observation that for functions, overhead accessing dissaggregated overshadows gains from...

10.1145/3620665.3640413 article EN 2024-04-22

High-bandwidth network interface cards (NICs), each capable of transferring 100s Gigabits per second, are making inroads into the servers next-generation datacenters. Such unprecedented data delivery rates impose immense pressure, especially on server's memory subsystem, as NICs first transfer to DRAM before processing. To alleviate cache hierarchy has evolved, supporting a direct I/O (DDIO) technology directly place in last-level (LLC). Subsequently, various policies have been explored...

10.1109/micro56248.2022.00042 article EN 2022-10-01

Processor power management exploiting Dynamic Voltage and Frequency Scaling (DVFS) plays a crucial role in improving the data-center's energy efficiency. However, we observe that current policies Linux (i.e., governors) often considerably increase tail response time violate given Service Level Objective (SLO)) consumption of latency-critical applications. Furthermore, previously proposed SLO-aware oversimplify network request processing ignore fact requests arrive at application layer...

10.1145/3466752.3480098 article EN 2021-10-17

The PCI-Express interconnect is the dominant interconnection technology within a single computer node that used for connecting off-chip devices such as network interface cards (NICs) and GPUs to processor chip. bandwidth latency are often bottleneck in processor, memory device interactions impacts overall performance of connected devices. Architecture simulators focus on modeling lack model I/O interconnections. In this work, we implement flexible detailed widely known architecture...

10.1109/iiswc.2018.8573496 article EN 2018-09-01

While (1) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (2) storage dissaggregation in the system infrastructure level and (3) integration domain-specific accelerators hardware level. Each these three trends individually provide significant benefits; however, when combined benefits diminish. Specifically, paper makes key observation that for functions, overhead accessing dissaggregated persistent overshadows gains from...

10.48550/arxiv.2303.03483 preprint EN cc-by arXiv (Cornell University) 2023-01-01

In this work, we set out to find the answers following questions: (1) Where are bottlenecks in a state-of-theart architectural simulator? (2) How much faster can simulations run by tuning system configurations? (3) What opportunities accelerating software simulation using hardware accelerators? We choose gem5 as representative simulator, several with various configurations, perform detailed analysis of source code on different server platforms, tune both and settings for running simulations,...

10.1109/ispass57527.2023.00019 article EN 2023-04-01

The advance of DRAM manufacturing technology slows down, whereas the density and performance needs continue to increase. This desire has motivated industry explore emerging Non-Volatile Memory (e.g., 3D XPoint) high-density Managed Solution). Since such memory technologies increase at cost longer latency, lower bandwidth, or both, it is essential use them with fast conventional DRAM) which hot pages are transferred runtime. Nonetheless, we observe that page transfers often block channels...

10.1145/3243176.3243191 article EN 2018-10-10

Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory hierarchies that enhance memory-level parallelism (MLP), which is crucial for high performance. To support MLP, shared last-level caches (LLCs) are divided into multiple banks, allowing parallel access. However, uneven distribution of cache requests from the cores, especially when cores concentrated on a single bank, can result in significant contention affecting all access cache. Such bank even be maliciously...

10.48550/arxiv.2410.14003 preprint EN arXiv (Cornell University) 2024-10-17
Coming Soon ...