Xiaochen Guo

ORCID: 0000-0001-7704-0412
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Memory and Neural Computing
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Ferroelectric and Negative Capacitance Devices
  • Neuroscience and Neural Engineering
  • Neural dynamics and brain function
  • Network Packet Processing and Optimization
  • Distributed systems and fault tolerance
  • Graph Theory and Algorithms
  • Interconnection Networks and Systems
  • Blind Source Separation Techniques
  • Metallurgical Processes and Thermodynamics
  • Speech and Audio Processing
  • Magnetic properties of thin films
  • Neural Networks and Reservoir Computing
  • 3D IC and TSV technologies
  • Advanced Neural Network Applications
  • Low-power high-performance VLSI design
  • CCD and CMOS Imaging Sensors
  • Chaos control and synchronization
  • Photoreceptor and optogenetics research
  • Advanced Graph Neural Networks
  • Advanced Algorithms and Applications
  • Solar-Powered Water Purification Methods
  • Advanced Photonic Communication Systems

Lehigh University
2016-2023

Qualcomm (United States)
2023

ETH Zurich
2020

Universidad del Noreste
2020

Webb Institute
2020

Faculty of Media
2020

Harbin Engineering University
2017-2019

Shandong Iron and Steel Group (China)
2019

University of Science and Technology Beijing
2015-2016

University of Rochester
2010-2015

As CMOS scales beyond the 45nm technology node, leakage concerns are starting to limit microprocessor performance growth. To keep dynamic power constant across process generations, traditional MOSFET scaling theory prescribes reducing supply and threshold voltages in proportion device dimensions, a practice that induces an exponential increase subthreshold leakage. result, has become comparable current-generation processes, will soon exceed it magnitude if scaled down any further. Beyond...

10.1145/1815961.1816012 article EN 2010-06-19

With technology scaling, on-chip power dissipation and off-chip memory bandwidth have become significant performance bottlenecks in virtually all computer systems, from mobile devices to supercomputers. An effective way of improving the face limitations is rely on associative systems. Recent work a PCM-based, TCAM accelerator shows that search capability can reduce both demand overall system energy. Unfortunately, previously proposed resistive accelerators limited flexibility: only...

10.1145/2485922.2485939 article EN 2013-06-23

Power dissipation and off-chip bandwidth restrictions are critical challenges that limit microprocessor performance. Ternary content addressable memories (TCAM) hold the potential to address both problems in context of a wide range data-intensive workloads benefit from associative search capability. is reduced by eliminating instruction processing data movement overheads present purely RAM based system. Bandwidth demand lowered directly on TCAM chip, thereby decreasing traffic....

10.1145/2155620.2155660 article EN 2011-12-03

Hardware prefetching is an efficient mechanism to hide cache miss penalties. Accuracy, coverage, and timeliness are three primary metrics in evaluating prefetcher performance. Highly accurate hardware prefetches desired predict complex memory access patterns multicore systems. In this paper, we propose a long short term (LSTM) prefetcher---a neural network based prefetcher. Offline experiment shows that the proposed LSTM achieves higher accuracy better coverage on set of evaluated traces.

10.1145/3132402.3132405 article EN Proceedings of the International Symposium on Memory Systems 2017-10-02

As CMOS scales beyond the 45nm technology node, leakage concerns are starting to limit microprocessor performance growth. To keep dynamic power constant across process generations, traditional MOSFET scaling theory prescribes reducing supply and threshold voltages in proportion device dimensions, a practice that induces an exponential increase subthreshold leakage. result, has become comparable current-generation processes, will soon exceed it magnitude if scaled down any further. Beyond...

10.1145/1816038.1816012 article EN ACM SIGARCH Computer Architecture News 2010-06-19

With technology scaling, on-chip power dissipation and off-chip memory bandwidth have become significant performance bottlenecks in virtually all computer systems, from mobile devices to supercomputers. An effective way of improving the face limitations is rely on associative systems. Recent work a PCM-based, TCAM accelerator shows that search capability can reduce both demand overall system energy. Unfortunately, previously proposed resistive accelerators limited flexibility: only...

10.1145/2508148.2485939 article EN ACM SIGARCH Computer Architecture News 2013-06-23

Flow structures were investigated in a dissipative ladle shroud (DLS) and tundish using Large Eddy Simulation. The numerical results validated inside the DLS with PIV experiments. Velocity distribution, vorticity islands strain rate analyzed respectively, compared that of bell-shaped (BLS). showed three chambers gave rise to velocity differences, fluctuating rates vortices, promoted an increase on turbulence dissipation rate; average outflow ranged from 0.25 0.5 m/s when inlet was 0.708 m/s....

10.2355/isijinternational.isijint-2015-085 article EN cc-by-nc-nd ISIJ International 2015-01-01

A field-assisted spin-torque transfer magnetoresistive RAM (STT-MRAM) cache is presented for the use in high-performance energy-efficient microprocessors. Adding field assistance reduces switching latency by a factor of 4. An array model developed to evaluate energy different currents and sizes. Several STT-MRAM-based cells demonstrate 55% reduction as compared with an SRAM subsystem. As STT-MRAM caches subbank buffering differential writes, improves system performance 28%, 6.7% increase energy.

10.1109/tvlsi.2015.2401577 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2015-03-03

DRAM density scaling has become increasingly difficult due to challenges in maintaining a sufficiently high storage capacitance and low leakage current at nanoscale feature sizes. Non-volatile memories (NVMs) have drawn significant attention as potential replacements because they represent information using resistance rather than electrical charge. Spin-torque transfer magnetoresistive RAM (STT-MRAM) is one of the most promising NVM technologies its relatively write energy, speed, endurance....

10.1109/tc.2017.2779151 article EN publisher-specific-oa IEEE Transactions on Computers 2017-12-04

Power dissipation and memory bandwidth are significant performance bottlenecks in virtually all computer systems. Associative computing with ternary content addressable (TCAM) holds the potential to address both problems a wide range of data intensive workloads. is reduced by eliminating instruction processing movement overheads present purely RAM-based system. Bandwidth demand lowered directly on TCAM chip, thereby decreasing off-chip traffic. Unfortunately, existing SRAM-based cells more...

10.1109/mm.2015.89 article EN IEEE Micro 2015-09-01

A novel gas blowing mode with different flowrates for two plugs of metallurgical ladle is explored and studied through a sophisticated water model. The results show that this can efficiently decrease the mixing time total area slag eye most cases, as compared conventional same plugs. Generally, relatively close angle between porous small radial position are beneficial to in bath, while far plug leads smaller eye. In addition, tracers fed from middle dual proven be very ladle. layer will...

10.1080/03019233.2019.1576270 article EN Ironmaking & Steelmaking Processes Products and Applications 2019-02-10

The state-of-the-art deep neural network (DNN) models use pruning to avoid over-fitting and reduce the number of parameters. In order improve storage computational efficiency, only nonzero elements are stored, their locations encoded into a sparse format. Sparse General Matrix Multiplication (SpGEMM) is kernel computation DNN-based applications. One challenge computing SpGEMM multiplying zero while keeping hardware utilization high in accelerators that consist processing element (PE) arrays....

10.1109/hpca56546.2023.10070977 article EN 2023-02-01

Convolutional neural networks have been proposed as an approach for classifying data corresponding to labeled and unlabeled datasets. The fast-growing empowers deep learning algorithms achieve higher accuracy. Numerous trained models proposed, which involve complex increasing network depth. main challenges of implementing convolutional are high energy consumption, on-chip off-chip bandwidth requirements, large memory footprint. Different types communication traffic distribution methods...

10.1145/3313231.3352378 article EN 2019-09-26

In this paper, we develop an in-memory analog computing (IMAC) architecture realizing both synaptic behavior and activation functions within non-volatile memory arrays. Spin-orbit torque magnetoresistive random-access (SOT-MRAM) devices are leveraged to realize sigmoidal neurons as well binarized synapses. First, it is shown the proposed IMAC can be utilized a multilayer perceptron (MLP) classifier achieving orders of magnitude performance improvement compared previous mixed-signal digital...

10.1109/isvlsi51109.2021.00043 preprint EN 2021-07-01

DRAM refresh is responsible for significant performance and energy overheads in a wide range of computer systems, from mobile platforms to datacenters [1] . With the growing demand capacity worsening retention time characteristics deeply scaled DRAM, expected become an even more pronounced problem future technology generations [2] This paper examines content aware refresh, new technique that reduces frequency by exploiting unidirectional nature errors: assuming logical 1 0 respectively are...

10.1109/tc.2018.2868338 article EN IEEE Transactions on Computers 2018-09-06

Applications with irregular memory access patterns do not benefit well from the hierarchy as applications that have good locality do. Relatively high miss ratio and long latency can cause processor to stall degrade system performance. Prefetching help hide penalty by predicting which addresses will be accessed in near future issuing requests ahead of time. However, software prefetchers add instruction overhead, whereas hardware cannot efficiently predict sequences accuracy. Fortunately, many...

10.1109/micro50266.2020.00057 article EN 2020-10-01

Neural networks have shown great potential in many applications like speech recognition, drug discovery, image classification, and object detection. network models are inspired by biological neural networks, but they optimized to perform machine learning tasks on digital computers. The proposed work explores the possibility of using living vitro as basic computational elements for applications. A new supervised STDP-based algorithm is this work, which considers neuron engineering...

10.1109/icassp.2018.8462502 article EN 2018-04-01

In the GPS system, signal received by receiver is easily effected spoofing jamming, which leads to some error in measurement of pseudo range and bad for navigation positioning. Therefore, this paper proposes Anti-spoofing Algorithm Based on Improved Particle Filter. By introduction positioning correction M-estimation theory, simulation results analyses verify that proposed algorithm achieves purpose eliminating jamming modifying error.

10.1109/usnc-ursi.2018.8602856 article EN 2018-07-01

DRAM contributes a significant part of the total system energy consumption, and row activation is one most inefficient components. Prior works on fine-grained rely increasing number local wires to avoid degrading performance, which adds area overheads. This work proposes interleaved I/O allow data transferring from different partially activated banks share global I/O. The proposed architecture allows half-, quarter-, or one-eighth- page activations without changing wires. performance...

10.1109/islped.2017.8009201 article EN 2017-07-01

The increasing number of cores challenges the scalability chip multiprocessors. Recent studies proposed idea disintegration by partitioning a large into multiple smaller chips and using silicon interposer-based integration (2.5D) to connect these chips. This method can improve yield, but as small increases, chip-to-chip communication becomes performance bottleneck.

10.1145/3313231.3352363 article EN 2019-09-26

General-purpose computing systems employ memory hierarchies to provide the appearance of a single large, fast, coherent memory. In special-purpose CPUs, programmers manually manage distinct, non-coherent scratchpad memories. this article, we combine these mechanisms by adding virtually addressed, set-associative general purpose CPU. Our exists alongside traditional cache and is able avoid many programming challenges associated with scratchpads without sacrificing generality (e.g.,...

10.1145/3436730 article EN ACM Transactions on Architecture and Code Optimization 2020-12-30
Coming Soon ...