Tao Liu

ORCID: 0000-0002-9653-4108
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Distributed and Parallel Computing Systems
  • Cloud Computing and Resource Management
  • Matrix Theory and Algorithms
  • Stochastic Gradient Optimization Techniques
  • Embedded Systems Design Techniques
  • Embedded Systems and FPGA Design
  • Advanced Memory and Neural Computing
  • Network Packet Processing and Optimization
  • Software-Defined Networks and 5G
  • Nuclear reactor physics and engineering
  • Advanced Neural Network Applications
  • Complexity and Algorithms in Graphs
  • Advanced Sensor and Energy Harvesting Materials
  • Wireless Sensor Networks and IoT
  • Generative Adversarial Networks and Image Synthesis
  • Real-Time Systems Scheduling
  • Dielectric materials and actuators
  • Aluminum Alloy Microstructure Properties
  • Advanced machining processes and optimization
  • Powder Metallurgy Techniques and Materials
  • Radiation Therapy and Dosimetry
  • Advanced Welding Techniques Analysis

Qilu University of Technology
2019-2024

Shandong Academy of Sciences
2019-2024

Jilin University
2024

Florida International University
2019

Institute of Software
2013-2017

Beihang University
2013-2017

University of Science and Technology of China
2016

Southwest University of Science and Technology
2016

Harbin Institute of Technology
2009-2012

Shenyang University of Technology
2012

Programming network processors is challenging. To sustain high line rates, have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers recently proposed high-level programming languages for packet processing, but the challenges of compiling these into code that competitive with hand-tuned assembly remain unanswered.This paper describes Shangri-La compiler, which accepts a program written in C-like...

10.1145/1064978.1065038 article EN ACM SIGPLAN Notices 2005-06-12

Programming network processors is challenging. To sustain high line rates, have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers recently proposed high-level programming languages for packet processing, but the challenges of compiling these into code that competitive with hand-tuned assembly remain unanswered.This paper describes Shangri-La compiler, which accepts a program written in C-like...

10.1145/1065010.1065038 article EN Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation 2005-06-12

Nowadays, the ocean numerical models are gradually developing towards multi-physical process and high resolution, with increment of measured data more in-depth research in field. Therefore, general computing capability is no longer able to meet these models' needs. It necessary utilize powerful hardware parallel software model programs. China has made great development homegrown performance processors, sunway sw26010 many-core processor most outstanding representative. This paper focuses lag...

10.1109/access.2019.2944922 article EN cc-by IEEE Access 2019-01-01

Currently, there exists billions of files on the Internet, such as pictures, web pages, audio and video files, etc., number is still growing rapidly. These huge amount need to be processed by some applications quickly possible with parallel processing. With increasing cores in processors, programming becomes more complex. The behavior that multiple processes/threads access simultaneously may interfere each other cause extra performance loss. Consequently, this paper proposes a pipeline-based...

10.1109/csc.2013.15 article EN 2013-11-01

Uc/OS-II is an open-code real-time kernel based preemptive priority scheduling strategy. It assigns a unique for each task and does not support to schedule same tasks. In practical applications, assigning different tasks which realizing the function very good logical design. Moreover it can only create maximum of 64 tasks, meet needs increasingly complex applications. Aiming at these problems, in paper, real time uC/OS-II modified. The new kernal creatively gives approach layered hybird...

10.1109/icinis.2012.69 article EN 2012-11-01

The structural evolution of dielectric elastomer induced by pre-strain is a complex, multi-scale process that poses significant challenge to deep understanding the effect pre-strain. Through simulation results, we identify variation in constant and (electronic structure, molecular chain conformation, aggregation structure) response poly(methyl acrylate). As increases, initially rises (below 200% pre-strain) then declines (above pre-strain). Analysis charge distribution, surface electrostatic...

10.1063/5.0238343 article EN The Journal of Chemical Physics 2024-12-09

Extracting performance from modern multicore architectures requires that parallel sections be divided into many threads of execution. In order to fully utilize these effectively, load balancing has become one the most important factors affect applications on multicores. this paper, we have shown belong a single, multithreaded application can exhibit poorly performance. We propose dynamic cache reservation scheme which redistribute reserved space critical thread for speeding up during...

10.1109/imccc.2011.61 article EN 2011-10-01

This paper proposes a new intelligent window based on multi-sensor fusion. The is controlled by ARDUINO UNO development board. It has the functions of "Automatic Control" "Manual and "Close". In automatic control mode, will be parameters such as humidity, temperature, light intensity, wind speed air quality. project arduino MCU, PM2.5 detection, temperature humidity detection technology to design, mainly in "safety, intelligent, practical, market-oriented" four unity objective concept,...

10.1109/ddcls.2019.8908967 article EN 2022 IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS) 2019-05-01

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth thread-level parallelism of and slowly improved peak bandwidth, becomes a bottleneck GPU’s performance energy efficiency. In this article, we propose an integrated architectural scheme optimize the accesses therefore boost efficiency GPU. First, thread batch enabled partitioning (TEMP) improve access parallelism. particular, TEMP groups multiple blocks that share same set pages into applies...

10.1145/3330152 article EN ACM Journal on Emerging Technologies in Computing Systems 2019-10-31

Achieving microsecond-scale tail latency poses an extreme challenge to the conventional architecture of “NIC-OS-Application” in face high concurrent requests. Existing kernel-bypass network systems improve this situation significantly. Still, they cannot achieve load-aware in-server requests distribution, which turn not only harms resource efficiency but, more importantly, beats goal squeezing latency. This paper proposes iBalancer, proactive load balancer for system, aggressively handles...

10.1109/tpds.2021.3120021 article EN IEEE Transactions on Parallel and Distributed Systems 2021-10-15

With the development of electromagnetic simulation technology and increasing demand for simulation, verification based on numerical has received extensive attention from various research fields at home abroad. Solving linear sparse matrix equation generated in process is biggest bottleneck restricting running time program. Parallel computing, as an effective means to improve calculation speed processing capacity computer systems, can further expand scale problem solving shorten time. Next,...

10.1145/3491396.3506501 article EN 2021-12-28

Applications typically exhibit extremely different performance characteristics depending on the accelerator. Back propagation neural network (BPNN) has been parallelized into platforms. However, it not yet explored speculative multicore architecture thoroughly. This paper presents a study of parallelizing BPNN architecture, including its execution model, hardware design and programming model. The implementation was analyzed with seven well-known benchmark data sets. Furthermore, trades off...

10.1109/icpads.2016.0121 article EN 2016-12-01
Coming Soon ...