- Robotics and Sensor-Based Localization
- Ferroelectric and Negative Capacitance Devices
- Advanced Memory and Neural Computing
- Parallel Computing and Optimization Techniques
- Machine Learning in Materials Science
- VLSI and FPGA Design Techniques
- VLSI and Analog Circuit Testing
- Service-Oriented Architecture and Web Services
- Robotics and Automated Systems
- Autonomous Vehicle Technology and Safety
- Advancements in Photolithography Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Robotic Path Planning Algorithms
- 3D IC and TSV technologies
- Cloud Computing and Resource Management
- Interconnection Networks and Systems
- Machine Learning and ELM
- Tensor decomposition and applications
- Algorithms and Data Compression
- Optimization and Search Problems
- 3D Shape Modeling and Analysis
- Advanced Neural Network Applications
- Advanced Vision and Imaging
- Electrowetting and Microfluidic Technologies
Institute of Computing Technology
2018-2024
Chinese Academy of Sciences
2018-2024
University of Chinese Academy of Sciences
2024
Shanghai University
2024
Nanjing University
2024
For autonomous driving, vehicle detection is the prerequisite for many tasks like collision avoidance and path planning. In this letter, we present a real-time three-dimensional (RT3D) method that utilizes pure LiDAR point cloud to predict location, orientation, size of vehicles. contrast previous 3-D object methods, used pre-RoIpooling convolution technique moves majority operations ahead RoI pooling, leaving just small part behind, so significantly boosts computation efficiency. We also...
Recent advances in large language models have demonstrated their potential for automated generation of hardware description (HDL) code from high-level prompts.Researchers utilized fine-tuning to enhance the ability these (LLMs) field Chip Design.However, lack Verilog data hinders further improvement quality by LLMs.Additionally, absence a and electronic design automation (EDA) script augmentation framework significantly increases time required prepare training dataset LLM trainers.This paper...
Despite extensive efforts, existing approaches to design accelerators for optimization-based robotic applications have limitations. Some focus on accelerating general matrix operations, but they fail fully exploit the specific sparse structure commonly found in many algorithms. On other hand, certain methods require manual of dedicated accelerators, resulting inefficiencies and significant non-recurring engineering (NRE) costs.
With the rapid up-scaling of transformer-based large language models (LLM), training these is becoming increasingly demanding on novel parallel techniques. Tensor partitioning an extensively researched technique, encompassing data and model parallelism, has a significant influence LLM performance. However, existing state-of-the-art systems are based incomplete tensor space, where distribution partitioned sub-operators limited to spatial dimension. We discover that introducing temporal...
Performing complex tasks in open environments remains challenging for robots, even when using large language models (LLMs) as the core planner. Many LLM-based planners are inefficient due to their number of parameters and prone inaccuracies because they operate open-loop systems. We think reason is that only applying LLMs insufficient. In this work, we propose DaDu-E, a robust closed-loop planning framework embodied AI robots. Specifically, DaDu-E equipped with relatively lightweight LLM,...
Processing-in-memory architectures have been regarded as a promising solution for CNN acceleration. Existing PIM accelerator designs rely heavily on the experience of experts and require significant manual design overhead. Manual cannot effectively optimize explore architecture implementations. In this work, we develop an automatic framework PIMSYN synthesizing PIM-based accelerators, which greatly facilitates helps generate energyefficient accelerators. can automatically transform...
The resistive random-access memory (ReRAM) has widely been used to accelerate convolutional neural networks (CNNs) thanks its analog in-memory computing capability. ReRAM crossbars not only store layers’ weights, but also perform in-situ matrix-vector multiplications which are core operations of CNNs. To boost the performance ReRAM-based CNN accelerators, can be duplicated explore more intra-layer parallelism. crossbar allocation scheme significantly influence both throughput and bandwidth...
Due to the dynamic nature and uncertainty of grid computing, system reliability can become very unpredictable. Thus, a well-defined scheduling mechanism that provides high availability for applications is required. In this letter, we propose SLA-constrained policy-based enhance performance in grid. Also, implement proposed model show our guarantee as well support load balancing on an experimental basis.