Rui Wang

ORCID: 0000-0003-2741-6033
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Advanced Memory and Neural Computing
  • Interconnection Networks and Systems
  • Physical Unclonable Functions (PUFs) and Hardware Security
  • Integrated Circuits and Semiconductor Failure Analysis
  • Cloud Computing and Resource Management
  • Low-power high-performance VLSI design
  • Embedded Systems Design Techniques
  • Real-Time Systems Scheduling
  • Distributed systems and fault tolerance
  • VLSI and Analog Circuit Testing
  • Advanced Neural Network Applications
  • Distributed and Parallel Computing Systems
  • Machine Learning and Data Classification
  • Power Systems and Renewable Energy
  • Caching and Content Delivery
  • Machine Learning and ELM
  • Cloud Data Security Solutions
  • Machine Learning and Algorithms
  • Cloud Computing and Remote Desktop Technologies
  • Domain Adaptation and Few-Shot Learning
  • Computational Geometry and Mesh Generation
  • Cell Image Analysis Techniques
  • Advanced Malware Detection Techniques

Beihang University
2014-2024

Intrinsic LifeSciences (United States)
2020-2023

Institute of Software
2010-2021

NARI Group (China)
2019

University of California, Davis
2011-2014

The qualities of Physical Unclonable Functions (PUFs) suffer from several noticeable degradations due to silicon aging. In this paper, we investigate the long-term effects aging on PUFs derived start-up behavior Static Random Access Memories (SRAM). Previous research SRAM is based transistor-level simulation or accelerated test at high temperature and voltage observe within a short period time. contrast, have run continuous power-up 16 Arduino Leonardo boards under nominal conditions for two...

10.23919/date48585.2020.9116353 preprint EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2020-03-01

Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of sharing among concurrently executing threads. Often requires mutual exclusion mechanism to ensure integrity multithreaded environment. Although modern provide atomic primitives that can be leveraged construct fine-grained locks, the existing lock implementations either incur frequent concurrency bugs, or lead...

10.1145/2903150.2903155 article EN 2016-05-16

Due to the cost-effective, massive computational power of graphics processing units (GPUs), there is a growing interest utilizing GPUs in real-time systems. For example have been applied automotive systems enable new advanced and intelligent driver assistance technologies, accelerating path self-driving cars. In such systems, are shared among tasks with mixed timing constraints: (RT) that be accomplished before specified deadlines, non-real-time, best-effort (BE) tasks. this paper, (1) we...

10.1145/2925426.2926265 article EN 2016-06-01

PHYSICALLY UNCLONABLE Function (PUF) modules are a useful hardware security primitive due to their uniqueness, non-reproducible and unclonable features. Generally, PUF is used for two applications: secret key generation cryptographic use and/or device authentication/provenance of Integrated Circuits (ICs) [1]. PUFs exploit process variation (e.g., gate oxide thickness, size, threshold voltage) that occurs naturally during the fabrication ICs. Although ICs fabricated from identical layouts,...

10.1109/mdat.2023.3322621 article EN IEEE Design and Test 2023-10-06

Deep learning compilers with auto-tuners have the ability to generate high-performance programs, particularly tensor programs on accelerators. However, performance of these is shape-sensitive and hardware resource-sensitive. When shape only known at runtime instead compile time, must tune for every possible shape, leading significant time cost overhead. Additionally, if a program tuned one device deployed different device, may not be as optimal before. To address challenges, we propose...

10.1109/tc.2023.3288758 article EN IEEE Transactions on Computers 2023-06-23

Many constrained devices of Internet Things (IoT) are operating under low power, and with limited computational network resources. The cannot use standard security protocols to protect end-to-end because they become the weakness IoT. Narrow Band (NB-IoT) is broad application prospects in production management, life-cycle asset management smart power utilization grid. Its characteristics demands domain present a challenge for electric business. In order improve high data transmission,...

10.1109/ei247390.2019.9062264 article EN 2019-11-01

As the energy consumption has been surging in an unsustainable way, it is important to understand impact of existing architecture designs from efficiency perspective, which especially valuable for High Performance Computing (HPC) and datacenter environment hosting tens thousands servers. One obstacle hindering advance comprehensive evaluation on deficient power measuring approach. Most study relies either external meters or models, both these two methods contain intrinsic drawbacks their...

10.1371/journal.pone.0188428 article EN cc-by PLoS ONE 2017-11-21

High bandwidth 3-D-stacked dynamic random access memory (DRAM) has been proposed to address the wall in modern systems, especially when it is used as a large last-level cache (LLC). However, stacking DRAM directly on top of processor significantly impedes efficiency cooling, potentially causing thermal issues both and DRAM. Dynamic management (DTM) based temperature can be heavily intrusive because normal working for lower than limit. This paper shows that many cases better disable hot...

10.1109/tcad.2019.2927528 article EN publisher-specific-oa IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2019-07-09

As multi-core/many-core becomes the trend of processor architecture, conflict in shared cache has become more and serious that restricts performance improvement parallel program. Recent research employed page coloring mechanism to realizing partitioning on real system for purpose decline conflict. However, coloring-based some side-effects, one is memory space an application can allocate from which may lead pressure, another changing partition dynamically need massive copying will incur large...

10.1109/pdcat.2012.88 article EN 2012-12-01

Modern processors mostly use cache to hide the memory access latency, so performance is very important application program. A detailed analysis will provide programmers a clear view of their program behaviors, which can help them identify bottleneck and optimize source code. As chip industry turn integrate multiple cores into one chip, multi-core/many-core processor becomes new approach maintain Moor's Law. Therefore, Parallel programs be more even in personal computers. In parallel...

10.1109/csc.2013.11 article EN 2013-11-01

10.1007/s11432-015-5352-4 article EN Science China Information Sciences 2015-06-15

As DRAM is facing the scaling difficulty in terms of energy cost and reliability, some nonvolatile storage materials were proposed to be substitute or supplement main memory. Phase Change Memory (PCM) one most promising memory that could put into use near future. However, before becoming a qualified technology, PCM should designed reliably so it can ensure computer system's stable running even when errors occur. The typical wear-out have been well studied, but transient errors, caused by...

10.1371/journal.pone.0131964 article EN cc-by PLoS ONE 2015-07-09

We present RESCURE, a security solution built on software, which retrofits Internet of Things (IoT) devices to secure ones. RESCURE exploits the entropy originating from random variations silicon (transistors) during manufacturing and generates unique unforgeable root key an identity per device. In this way, are inseparable IoT hardware. To achieve lifetime reliability (reproducibility) (randomness) for identity, we apply error correcting randomness amplification algorithms signals derived...

10.1145/3407023.3407075 article EN Proceedings of the 17th International Conference on Availability, Reliability and Security 2020-07-30

We propose aviation real-time adaptive ring (AVATAR) as a potential solution for the integrated communication infrastructure future aero-engine control systems. AVATAR features an Ethernet-over-WDM (wavelength-division multiplexing) architecture. It employs reconfigurable optical add/drop multiplexer (ROADM) node technology. Compared with existing serialized data bus, e.g., time-triggered protocol (TTP), exploits multi-wavelength and spatial reuse properties of WDM through sophisticated...

10.1109/taes.2013.110758 article EN IEEE Transactions on Aerospace and Electronic Systems 2014-01-01

SRAM Physical Unclonable Functions (PUFs) are among other things today commercially used for secure primitives such as key generation and authentication. The quality of the PUFs hence security primitives, depends on intrinsic variations which technology dependent. Therefore, to sustain commercial usage cutting-edge technologies, it is important properly model evaluate their reliability. In this work, we PUF reliability using within class Hamming distance (WCHD) 16nm, 14nm, 7nm simulations...

10.23919/date54114.2022.9774735 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2022-03-14

A novel communication platform, AViAtion real-Time Adaptive Ring (AVATAR), using Ethernet-over-WDM is proposed to support real-time aero-engine control network. Optimal frame layout for different network configurations are used guide the design.

10.1364/ofc.2011.othz4 article EN 2011-01-01

Marine science and numerical modeling (MASNUM) is widely used in forecasting ocean wave movement, through simulating the variation tendency of wave. Although efforts have been devoted to improve performance MASNUM from various aspects by existing work, there still large space unexplored for further improvement. In this paper, we aim at improving propagation solver data access during simulation, addition efficiency output I/O load balance. Our optimizations include several effective...

10.1371/journal.pone.0169130 article EN cc-by PLoS ONE 2017-01-03

The efficiency of caches plays a vital role in microprocessors. In this paper, we introduce novel and flexible cache substrate, which integrates nonvolatile memory devices into the standard SRAM cells. By allowing (NV-SRAM) cell to store inconsistent data between portion NV portion, show that proposed <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -SRAM not only provides enriched functionalities, but also allows simultaneous multiple...

10.1109/tcad.2016.2582872 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2016-06-22

As more emerging applications are moving to GPUs, thread-level synchronization has become a requirement. However, GPUs only provide warp-level and thread-block-level rather than synchronization. Moreover, it is highly possible cause live-locks by using CPU mechanisms implement for GPUs. In this article, we first propose software-based mechanism called lock stealing avoid live-locks. We then describe how our algorithm in mutual exclusive locks readers-writer with high performance. Finally,...

10.1109/tpds.2019.2955705 article EN IEEE Transactions on Parallel and Distributed Systems 2019-11-26

The efficiency of caches plays a vital role in microprocessor. In this paper, we introduce novel and flexible cache substrate that employs non-volatile yet versatile SRAM (NV2-SRAM) cell design, which synergistically integrates new memory devices into the standard cells. Our experiments show it can achieve 67 percent energy saving 3:1x reliability improvement over based cache, outperforming drowsy design terms both power reliability. Moreover, proposed architecture be used to improve...

10.1109/lca.2014.2298412 article EN IEEE Computer Architecture Letters 2014-01-31

Cache size is a scarce resource in multi processors systems, Scheduling has dramatic impact on the delay introduced by cache contention. This paper investigates effects between programs running system, considering proposed low load case, most programs' number of each core one. The goal to reduce contention threads, improve performance processor and shorten execution time, achieve reducing energy consumption.

10.1109/cdciem.2012.50 article EN International Conference on Computer Distributed Control and Intelligent Environmental Monitoring 2012-03-01

Improving the latency hiding ability is important for GPU performance. Although existing works, which mainly target on either improving thread level parallelism or optimizing memory hierarchy, are effective at GPUs' ability, warps still blocked after executing long operations, reducing number of schedulable warps. This article revisits recently proposed non-blocking execution GPUs to improve GPUs. With execution, instructions from by operations can be pre-executed make full use resources....

10.1109/tc.2020.3026043 article EN publisher-specific-oa IEEE Transactions on Computers 2020-09-23
Coming Soon ...