NFDI4DS | UHH-SEMS - Publication Details

Houxiang Ji

ORCID: 0009-0008-8402-0127

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5059787309

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Advanced Neural Network Applications
Advanced Memory and Neural Computing
Security and Verification in Computing
Distributed and Parallel Computing Systems
Advanced Graph Neural Networks
Physical Unclonable Functions (PUFs) and Hardware Security
Machine Learning and ELM
Coding theory and cryptography
Domain Adaptation and Few-Shot Learning
Cryptographic Implementations and Security
Cryptography and Residue Arithmetic
Advanced Malware Detection Techniques
Embedded Systems Design Techniques
Advanced Data Compression Techniques
Numerical Methods and Algorithms
Adversarial Robustness in Machine Learning
Machine Learning in Materials Science
Data Quality and Management
Distributed systems and fault tolerance
Multimodal Machine Learning Applications
Ferroelectric and Negative Capacitance Devices
Cloud Computing and Resource Management
Cloud Data Security Solutions

University of Illinois Urbana-Champaign
2019-2025

Shanghai Jiao Tong University
2017-2019

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

OPENALEX - Publications

Yan Sun Yifan Yuan Zeduo Yu Reese Kuper Chihun Song and 9 more

The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on expansion disaggregation technologies based Compute eXpress Link (CXL). Especially, CXL-based technology has recently gained notable attention its ability not only to economically expand but also decouple from a specific interface of the CPU. However, since CXL devices been widely available, they emulated using DDR in remote NUMA node. In this paper, first time, we comprehensively...

10.1145/3613424.3614256 preprint EN 2023-10-28

Hardware-Accelerated Kernel-Space Memory Compression Using Intel QAT

OPENALEX - Publications

Xia Qin Houxiang Ji Yang Zhou Nam Sung Kim

10.1109/lca.2025.3534831 article EN cc-by IEEE Computer Architecture Letters 2025-01-01

ReCom: An efficient resistive accelerator for compressed deep neural networks

OPENALEX - Publications

Houxiang Ji Linghao Song Li Jiang Hai Li Yiran Chen

Deep Neural Networks (DNNs) play a key role in prevailing machine learning applications. Resistive random-access memory (ReRAM) is capable of both computation and storage, contributing to the acceleration on DNNs by processing memory. Besides, significant amount zero weights observed DNNs, providing space reduce cost further skipping ineffectual calculations associated with them. However, irregular distribution makes it difficult for resistive accelerators take advantage sparsity as expected...

10.23919/date.2018.8342009 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

Cooperative Memory Deduplication with Intel Data Streaming Accelerator

OPENALEX - Publications

Houxiang Ji Minho Kim Soo‐Young Oh Daehoon Kim Nam Sung Kim

Memory deduplication plays a critical role in reducing memory consumption and the total cost of ownership (TCO) hyperscalers, particularly as advent large language models imposes unprecedented demands on resources. However, conventional CPU-based can interfere with co-running applications, significantly impacting performance time-sensitive workloads. Intel introduced <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">on-chip</i> Data Streaming...

10.1109/lca.2025.3527458 article EN IEEE Computer Architecture Letters 2025-01-01

X-PPR: Post Package Repair for CXL Memory

OPENALEX - Publications

Chihun Song Michael Jaemin Kim Yan Sun Houxiang Ji Kyungsan Kim and 3 more

10.1109/lca.2025.3552190 article EN IEEE Computer Architecture Letters 2025-01-01

Graphite

OPENALEX - Publications

Zhangxiaowen Gong Houxiang Ji Yao Yao Christopher W. Fletcher Christopher J. Hughes and 1 more

Graph Neural Networks (GNNs) are becoming popular because they effective at extracting information from graphs. To execute GNNs, CPUs good platforms of their high availability and terabyte-level memory capacity, which enables full-batch computation on large However, GNNs heavily bound, limits performance.

10.1145/3470496.3527403 article EN 2022-05-31

SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs

OPENALEX - Publications

Zhangxiaowen Gong Houxiang Ji Christopher W. Fletcher Christopher J. Hughes Sara S. Baghsorkhi and 1 more

General Matrix Multiplication (GEMM) is the key operation in Deep Neural Networks (DNNs). While dense GEMM uses SIMD CPUs efficiently, sparse much less efficient, especially at modest levels of unstructured sparsity common DNN inference/training. Thus, most DNNs use GEMM.In this paper, we propose SAVE, a novel vector engine for that efficiently skips ineffectual computation due to implementations. SAVE's hardware extensions pipeline are transparent software. SAVE accelerates FP32 and...

10.1109/micro50266.2020.00070 article EN 2020-10-01

Speculation Invariance (InvarSpec): Faster Safe Execution Through Program Analysis

OPENALEX - Publications

Zirui Neil Zhao Houxiang Ji Mengjia Yan Jiyong Yu Christopher W. Fletcher and 3 more

Many hardware-based defense schemes against speculative execution attacks use special mechanisms to protect instructions while speculative, and lift the when turn non-speculative. In this paper, we observe that can sometimes become Speculation Invariant before turning invariance means (i) whether instruction will execute (ii) instruction's operands are not a function of state. Hence, propose protection on these early, they speculation invariant, issue them without protection. As result,...

10.1109/micro50266.2020.00094 article EN 2020-10-01

TAROT: A CXL SmartNIC-Based Defense Against Multi-bit Errors by Row-Hammer Attacks

OPENALEX - Publications

Chihun Song Michael Jaemin Kim Tianchen Wang Houxiang Ji Jinghan Huang and 6 more

10.1145/3620666.3651325 article EN 2024-04-24

Fast parallel CRC algorithm and implementation on a configurable processor

OPENALEX - Publications

Houxiang Ji Earl Killian

We present a fast cyclic redundancy check (CRC) algorithm that performs CRC computation for any length of message in parallel. For given with length, we first divide the into blocks, each which has fixed size equal to degree generator polynomial. Then perform among blocks parallel using Galois field multiplication and accumulation (GFMAC). Theoretically, our can achieve unlimited speedup over bit-serial or byte-wise table lookup at expense adding enough GFMAC units. Our lengthy two three...

10.1109/icc.2002.997161 article EN 2003-06-25

SparseTrain

OPENALEX - Publications

Zhangxiaowen Gong Houxiang Ji Christopher W. Fletcher Christopher J. Hughes Josep Torrellas

Our community has greatly improved the efficiency of deep learning applications, including by exploiting sparsity in inputs. Most that work, though, is for inference, where weight known statically, and/or specialized hardware. We propose a scheme to leverage dynamic during training. In particular, we exploit zeros introduced ReLU activation function both feature maps and their gradients. This challenging because degree moderate locations change over time. also rely purely on software....

10.1145/3410463.3414655 preprint EN 2020-09-30

An optimized processor for fast Reed-Solomon encoding and decoding

OPENALEX - Publications

Houxiang Ji

In this paper we present an optimized processor for fast Reed-Solomon encoding and decoding by using a configurable with parallel Galois Field multiplication accumulation (GFMAC) units. With processor, maximum performance can be achieved RS encoding/decoding over traditional implementations. Our implementation requires to add small number of logical gates customized GFMAC instructions maintain fewer registers. The is quite flexible compact supporting different coding standards. Compared...

10.1109/icassp.2002.5745304 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2002-05-01

HUBPA

OPENALEX - Publications

Houxiang Ji Li Jiang Tianjian Li Naifeng Jing Jing Ke and 1 more

Training Convolutional Neural Networks(CNNs) is both memory-and computation-intensive. The resistive random access memory (ReRAM) has shown its advantage to accelerate such tasks with high energy-efficiency. However, the ReRAM-based pipeline architecture suffers from low utilization of computing resource, caused by imbalanced data throughput in different stages because inherent down-sampling effect CNNs and inflexible usage ReRAM cells. In this paper, we propose a novel bidirectional...

10.1145/3287624.3287674 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2019-01-18

Favorable Block First: A Comprehensive Cache Scheme to Accelerate Partial Stripe Recovery of Triple Disk Failure Tolerant Arrays

OPENALEX - Publications

Luyu Li Houxiang Ji Chentao Wu Jie Li Minyi Guo

With the development of cloud computing, disk arrays tolerating triple failures (3DFTs) are receiving more attention nowadays because they can provide high data reliability with low monetary cost. However, a challenging issue in these is how to efficiently reconstruct lost data, especially for partial stripe errors (e.g., sector and chunk errors). It one most significant scenarios practice. existing cache strategies not efficient reconstruction 3DFTs, which complex relationships among...

10.1109/icpp.2017.31 article EN 2017-08-01

Pinned loads: taming speculative loads in secure processors

OPENALEX - Publications

Zirui Neil Zhao Houxiang Ji Adam Morrison Darko Marinov Josep Torrellas

In security frameworks for speculative execution, an instruction is said to reach its Visibility Point (VP) when it no longer vulnerable pipeline squashes. Before a potentially leaky reaches VP, has stall—unless defense scheme such as invisible speculation provides protection. Unfortunately, either stalling or protecting the execution of pre-VP instructions typically performance cost.

10.1145/3503222.3507724 article EN 2022-02-22

HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing

OPENALEX - Publications

Jinghan Huang Jiaqi Lou Srikar Vanavasam Xinhao Kong Houxiang Ji and 4 more

10.1109/isca59077.2024.00051 article EN 2024-06-29

Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective

OPENALEX - Publications

Houxiang Ji Srikar Vanavasam Yang Zhou Xia Qin Jinghan Huang and 6 more

10.1109/micro61859.2024.00110 article EN 2024-11-02

Comparative Reasoning for Knowledge Graph Fact Checking

OPENALEX - Publications

Lihui Liu Houxiang Ji Jiejun Xu Hanghang Tong

Knowledge graph has been widely used in fact checking, owing to its capability provide crucial background knowledge help verify claims. Traditional checking works mainly focus on analyzing a single claim but have largely ignored analysis the semantic consistency of pair-wise claims, despite key importance real-world applications, e.g., multimodal fake news detection. This paper proposes neural network based model INSPECTOR for checking. Given pair aims detect potential inconsistency input...

10.1109/bigdata55660.2022.10020991 article EN 2021 IEEE International Conference on Big Data (Big Data) 2022-12-17

Coming Soon ...