NFDI4DS | UHH-SEMS - Publication Details

Mohammad Alian

ORCID: 0000-0002-4622-2181

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5069911395

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Cloud Computing and Resource Management
Interconnection Networks and Systems
Superconducting Materials and Applications
Caching and Content Delivery
Distributed and Parallel Computing Systems
Advanced Memory and Neural Computing
Network Traffic and Congestion Control
Real-Time Systems Scheduling
Nuclear Physics and Applications
Green IT and Sustainability
Domain Adaptation and Few-Shot Learning
Data Management and Algorithms
Cloud Computing and Remote Desktop Technologies
Ferroelectric and Negative Capacitance Devices
Advanced Image and Video Retrieval Techniques
Advanced Drug Delivery Systems
Privacy-Preserving Technologies in Data
Radiation Effects in Electronics
Stochastic Gradient Optimization Techniques
Computer Graphics and Visualization Techniques
Simulation Techniques and Applications
Advanced Neural Network Applications
CCD and CMOS Imaging Sensors

Cornell University
2024-2025

University of Kansas
2020-2024

Birzeit University
2024

University of Illinois Urbana-Champaign
2015-2021

International University of the Caribbean
2020

Samsung (South Korea)
2018

Seoul National University
2018

University of Illinois System
2017

Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

OPENALEX - Publications

Soroush Ghodrati Byung Hoon Ahn Joon Kyung Kim Sean Kinzer Brahmendra Reddy Yatham and 7 more

Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on learning patterns of data and are permeating into different industries markets. Cloud infrastructure accelerators offer INFerence-as-a-Service (INFaaS) become the enabler this rather quick invasive shift in industry. To end, mostly accelerator-based INFaaS (Google's TPU [1], NVIDIA T4 [2], Microsoft Brainwave [3], etc.) has backbone many real-life applications. However, as demand for such services grows,...

10.1109/micro50266.2020.00062 article EN 2020-10-01

A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks

OPENALEX - Publications

Youjie Li Jongse Park Mohammad Alian Yifan Yuan Zheng Qu and 5 more

Training real-world Deep Neural Networks (DNNs) can take an eon (i.e., weeks or months) without leveraging distributed systems. Even training takes inordinate time, of which a large fraction is spent in communicating weights and gradients over the network. State-of-the-art algorithms use hierarchy worker-aggregator nodes. The aggregators repeatedly receive gradient updates from their allocated group workers, send back updated weights. This paper sets out to reduce this significant...

10.1109/micro.2018.00023 article EN 2018-10-01

Application-Transparent Near-Memory Processing Architecture with Memory Channel Network

OPENALEX - Publications

Mohammad Alian Seung Won Min Hadi Asghari-Moghaddam Ashutosh Dhar Dong Kai Wang and 8 more

The physical memory capacity of servers is expected to increase drastically with deployment the forthcoming non-volatile technologies. This a welcomed improvement for emerging data-intensive applications. For such be cost-effective, nonetheless, we must cost-effectively compute throughput and bandwidth commensurate in without compromising application readiness. Tackling this challenge, present Memory Channel Network (MCN) architecture paper. Specifically, first, propose an MCN DIMM,...

10.1109/micro.2018.00070 article EN 2018-10-01

Accelerating Retrieval-Augmented Generation

OPENALEX - Publications

Derrick Quinn Mohammad Nouri Neel Patel John Salihu Alireza Salemi and 3 more

10.1145/3669940.3707264 article EN 2025-02-03

SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads

OPENALEX - Publications

Amin Mamandipoor Hoang Tran Mohammad Alian

10.1109/lca.2025.3549423 article EN IEEE Computer Architecture Letters 2025-01-01

dist-gem5: Distributed simulation of computer clusters

OPENALEX - Publications

Mohammad Alian Umur Darbaz Gábor Dózsa Stephan Diestelhorst Daehoon Kim and 1 more

When analyzing a distributed computer system, we often observe that the complex interplay among processor, node, and network sub-systems can profoundly affect performance power efficiency of system. Therefore, to effectively cross-optimize hardware software components need full-system simulation infrastructure precisely capture interplay. Responding aforementioned need, present dist-gem5, flexible, detailed, open-source model simulate system using multiple hosts. Then validate dist-gem5...

10.1109/ispass.2017.7975287 article EN 2017-04-01

Flashshare: punching through server storage stack from kernel to firmware for ultra-low latency SSDs

OPENALEX - Publications

Jie Zhang Miryeong Kwon Donghyun Gouk Sungjoon Koh Changlim Lee and 6 more

A modern datacenter server aims to achieve high energy efficiency by co-running multiple applications. Some of such applications (e.g., web search) are latency sensitive. Therefore, they require low-latency I/O services fast respond requests from clients. However, we observe that simply replacing the storage devices servers with Ultra-Low-Latency (ULL) SSDs does not notably reduce services, especially when In this paper, propose FLASHSHARE assist ULL satisfy different levels service...

10.5555/3291168.3291203 article EN Operating Systems Design and Implementation 2018-10-08

Don’t Forget the I/O When Allocating Your LLC

OPENALEX - Publications

Yifan Yuan Mohammad Alian Yipeng Wang Ren Wang Ilia Kurakin and 2 more

In modern server CPUs, last-level cache (LLC) is a critical hardware resource that exerts significant influence on the performance of workloads, and how to manage LLC key isolation QoS in cloud with multi-tenancy. this paper, we argue addition CPU cores, high-speed I/O also important for management. This because an Intel architectural innovation – Data Direct (DDIO) directly injects inbound traffic (part of) instead main memory. We summarize two problems caused by DDIO show (1) default...

10.1109/isca52012.2021.00018 article EN 2021-06-01

<italic>pd-gem5</italic>: Simulation Infrastructure for Parallel/Distributed Computer Systems

OPENALEX - Publications

Mohammad Alian Daehoon Kim Nam Sung Kim

Improving the performance and power efficiency of a single processor has been fraught with various challenges stemming from end classical technology scaling. Thus, importance efficiently running applications on parallel/distributed computer system continued to increase. In developing optimizing such system, it is critical study impact complex interplay amongst processor, node, network architectures in detail. This necessitates flexible, detailed open-source full-system simulation...

10.1109/lca.2015.2438295 article EN IEEE Computer Architecture Letters 2015-06-02

NetDIMM

OPENALEX - Publications

Mohammad Alian Nam Sung Kim

Optimizing bandwidth was the main focus of designing scale-out networks for several decades and this optimization trend has served well traditional Internet applications. However, emergence datacenters as single computer entities made latency important in datacenter networks. PCIe interconnect is known to be bottleneck communication its overhead can contribute up ~90% overall latency. Despite overheads, de facto standard servers it been established maintained more than two decades. In...

10.1145/3352460.3358278 article EN 2019-10-11

The gem5 Simulator: Version 20.0+

OPENALEX - Publications

Jason Lowe-Power Abdul Mutaal Ahmad Ayaz Akram Mohammad Alian Rico Amslinger and 73 more

The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern hardware at cycle level, it has enough fidelity boot unmodified Linux-based operating systems run full applications multiple architectures including x86, Arm, RISC-V. been under active development over last nine years since original release. In this time, there have 7500 commits codebase from 250 unique...

10.48550/arxiv.2007.03152 preprint EN cc-by arXiv (Cornell University) 2020-01-01

NCAP: Network-Driven, Packet Context-Aware Power Management for Client-Server Architecture

OPENALEX - Publications

Mohammad Alian Ahmed Abulila Lokesh Jindal Daehoon Kim Nam Sung Kim

The rate of network packets encapsulating requests from clients can significantly affect the utilization, and thus performance sleep states processors in servers deploying a power management policy. To improve energy efficiency, may adopt an aggressive policy that frequently transitions processor to low-performance or state at low utilization. However, such not respond sudden increase early enough due considerable penalty transitioning high-performance state. This turn entails violations...

10.1109/hpca.2017.57 article EN 2017-02-01

Data Direct I/O Characterization for Future I/O System Exploration

OPENALEX - Publications

Mohammad Alian Yifan Yuan Jie Zhang Ren Wang Myoungsoo Jung and 1 more

I/O performance plays a critical role in the overall of modern servers. The emergence ultra high-speed devices makes data movement between processors, main memory, and major bottleneck. Conventionally, memory is used as an intermediate buffer processor cannot directly access side caches. Data Direct (DDIO) technology aims to reduce bandwidth utilization by enabling leverage Last Level Cache (LLC) buffer. Our experimental results show that DDIO can completely eliminate while running...

10.1109/ispass48437.2020.00031 article EN 2020-08-01

SmartDIMM: In-Memory Acceleration of Upper Layer Protocols

OPENALEX - Publications

Neel Patel Amin Mamandipoor Mohammad Nouri Mohammad Alian

There has been significant focus on offloading upperlayer network protocols (ULPs) to accelerators located CPUs and SmartNICs. However, restricting accelerator placement these locations limits both the variety of ULPs that can be accelerated overall performance. In particular, it overlooks opportunity accelerate running atop a stateful transport protocol in face high cache contention. That is, at rates, frequent DRAM accesses SmartNIC-CPU synchronizations outweigh benefits hardware...

10.1109/hpca57654.2024.00032 article EN 2024-03-02

Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators

OPENALEX - Publications

Shuting Wang Hanyang Xu Amin Mamandipoor Rohan Mahapatra Byung Hoon Ahn and 4 more

10.1109/hpca57654.2024.00083 article EN 2024-03-02

In-Storage Domain-Specific Acceleration for Serverless Computing

OPENALEX - Publications

Rohan Mahapatra Soroush Ghodrati Byung Hoon Ahn Sean Kinzer Shu-Ting Wang and 6 more

While (I) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (II) storage dissaggregation in the system infrastructure level and (III) integration domain-specific accelerators hardware level. Each these three trends individually provide significant benefits; however, when combined benefits diminish. On convergence trends, paper makes observation that for functions, overhead accessing dissaggregated overshadows gains from...

10.1145/3620665.3640413 article EN 2024-04-22

Userspace Networking in gem5

OPENALEX - Publications

Johnson Umeike Siddharth Agarwal N.V. Lazarev Mohammad Alian

10.1109/ispass61541.2024.00026 article NL 2024-05-05

IDIO: Network-Driven, Inbound Network Data Orchestration on Server Processors

OPENALEX - Publications

Mohammad Alian Siddharth Agarwal Jongmin Shin Neel Patel Yifan Yuan and 3 more

High-bandwidth network interface cards (NICs), each capable of transferring 100s Gigabits per second, are making inroads into the servers next-generation datacenters. Such unprecedented data delivery rates impose immense pressure, especially on server's memory subsystem, as NICs first transfer to DRAM before processing. To alleviate cache hierarchy has evolved, supporting a direct I/O (DDIO) technology directly place in last-level (LLC). Subsequently, various policies have been explored...

10.1109/micro56248.2022.00042 article EN 2022-10-01

NMAP: Power Management Based on Network Packet Processing Mode Transition for Latency-Critical Workloads

OPENALEX - Publications

Ki-Dong Kang Gyeongseo Park Hyosang Kim Mohammad Alian Nam Sung Kim and 1 more

Processor power management exploiting Dynamic Voltage and Frequency Scaling (DVFS) plays a crucial role in improving the data-center's energy efficiency. However, we observe that current policies Linux (i.e., governors) often considerably increase tail response time violate given Service Level Objective (SLO)) consumption of latency-critical applications. Furthermore, previously proposed SLO-aware oversimplify network request processing ignore fact requests arrive at application layer...

10.1145/3466752.3480098 article EN 2021-10-17

Simulating PCI-Express Interconnect for Future System Exploration

OPENALEX - Publications

Mohammad Alian Krishna Parasuram Srinivasan Nam Sung Kim

The PCI-Express interconnect is the dominant interconnection technology within a single computer node that used for connecting off-chip devices such as network interface cards (NICs) and GPUs to processor chip. bandwidth latency are often bottleneck in processor, memory device interactions impacts overall performance of connected devices. Architecture simulators focus on modeling lack model I/O interconnections. In this work, we implement flexible detailed widely known architecture...

10.1109/iiswc.2018.8573496 article EN 2018-09-01

Domain-Specific Computational Storage for Serverless Computing

OPENALEX - Publications

Rohan Mahapatra Soroush Ghodrati Byung Hoon Ahn Sean Kinzer Shuting Wang and 6 more

While (1) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (2) storage dissaggregation in the system infrastructure level and (3) integration domain-specific accelerators hardware level. Each these three trends individually provide significant benefits; however, when combined benefits diminish. Specifically, paper makes key observation that for functions, overhead accessing dissaggregated persistent overshadows gains from...

10.48550/arxiv.2303.03483 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Profiling gem5 Simulator

OPENALEX - Publications

Johnson Umeike Neel Patel Alex Manley Amin Mamandipoor Heechul Yun and 1 more

In this work, we set out to find the answers following questions: (1) Where are bottlenecks in a state-of-theart architectural simulator? (2) How much faster can simulations run by tuning system configurations? (3) What opportunities accelerating software simulation using hardware accelerators? We choose gem5 as representative simulator, several with various configurations, perform detailed analysis of source code on different server platforms, tune both and settings for running simulations,...

10.1109/ispass57527.2023.00019 article EN 2023-04-01

3D-Xpath

OPENALEX - Publications

Sukhan Lee Kiwon Lee Minchul Sung Mohammad Alian Chan Kyung Kim and 5 more

The advance of DRAM manufacturing technology slows down, whereas the density and performance needs continue to increase. This desire has motivated industry explore emerging Non-Volatile Memory (e.g., 3D XPoint) high-density Managed Solution). Since such memory technologies increase at cost longer latency, lower bandwidth, or both, it is essential use them with fast conventional DRAM) which hot pages are transferred runtime. Nonetheless, we observe that page transfers often block channels...

10.1145/3243176.3243191 article EN 2018-10-10

Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems

OPENALEX - Publications

Craig Sullivan Alex Manley Mohammad Alian Heechul Yun

Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory hierarchies that enhance memory-level parallelism (MLP), which is crucial for high performance. To support MLP, shared last-level caches (LLCs) are divided into multiple banks, allowing parallel access. However, uneven distribution of cache requests from the cores, especially when cores concentrated on a single bank, can result in significant contention affecting all access cache. Such bank even be maliciously...

10.48550/arxiv.2410.14003 preprint EN arXiv (Cornell University) 2024-10-17

Coming Soon ...