- Parallel Computing and Optimization Techniques
- Cloud Computing and Resource Management
- Nanoplatforms for cancer theranostics
- Advanced Neural Network Applications
- Advanced Data Storage Technologies
- Embedded Systems Design Techniques
- Interconnection Networks and Systems
- Phagocytosis and Immune Regulation
- Robotics and Sensor-Based Localization
- Advanced Image and Video Retrieval Techniques
- Photodynamic Therapy Research Studies
- Text and Document Classification Technologies
- Graph Theory and Algorithms
- Software System Performance and Reliability
- VLSI and Analog Circuit Testing
- Advanced Optical Imaging Technologies
- Machine Learning and ELM
- 3D Shape Modeling and Analysis
- Metallurgy and Material Forming
- Cancer, Hypoxia, and Metabolism
- Video Surveillance and Tracking Methods
- Cell Image Analysis Techniques
- Caching and Content Delivery
- Photoacoustic and Ultrasonic Imaging
- Data Visualization and Analytics
North China Institute of Aerospace Engineering
2025
Shanghai East Hospital
2021-2024
The University of Texas at Austin
2020-2024
Shandong University
2020-2021
Northeastern University
2020
Abstract Immune checkpoint inhibitor (ICI) therapy is considered to be a revolutionary anti‐tumor strategy that may surpass other traditional therapies. Breast cancer particularly suitable for it theoretically due upregulation of programmed cell death 1 (PD‐1) / ligand (PD‐L1) immune pathway which exhausts the adaptive response mediated by T lymphocytes. However, its blockades exhibit very little effect in breast cancer, owing lack lymphocytes pre‐infiltration and co‐existing intricate...
Integrating chemodynamic therapy (CDT) and photodynamic (PDT) into one nanoplatform can produce much more reactive oxygen species (ROS) for tumor therapy. Nevertheless, it is still a great challenge to selectively generate sufficient ROS in regions. Meanwhile, CDT PDT are restricted by insufficient H2O2 content the as well limited tissue penetration of light source. In this study, smart pH/ROS-responsive nanoplatform, Fe2+@UCM-BBD, rationally designed combination The acidic microenvironment...
Abstract In recent years, the anticancer effects of disulfiram, a clinical drug for anti‐alcoholism, are confirmed. However, several defects limit translation disulfiram obviously, such as Cu(II)‐dependent activity, instability, and non‐selectivity cancer cells. Herein, phosphate hydrogen peroxide dual‐responsive nanoplatform (PCu‐HA‐DQ) is reported, which constructed by encapsulating prodrug (DQ) modifying hyaluronic acid (HA) on copper doping metal–organic frameworks (PCu MOFs). PCu‐HA‐DQ...
In this paper, an enhanced deep reinforcement learning approach is presented for unmanned aerial vehicles (UAVs) operating in dynamic and potentially hazardous environments. Initially, the capability to discern obstacles from visual data achieved through application of Yolov8-StrongSort technique. Concurrently, a novel storage system Q-networks (DQN), named memory (DDM), introduced hasten process convergence UAVs. Furthermore, addressing issue UAVs’ paths veering too close obstacles,...
Memory allocation and management have a significant impact on performance energy of modern applications. We observe that can vary by as much 72% in some applications based which memory allocator is used. Many current allocators are multi-threaded to support concurrent requests from different threads. However, such multi-threading comes at the cost maintaining complex metadata tightly coupled intertwined with user data. When functions other programs run same core, used may pollute processor...
Machine Learning (ML) has been widely adopted in design exploration using high level synthesis (HLS) for faster resource, timing and power estimation at very early stages FPGA-based design. To perform prediction accurately, high-quality large-volume datasets are required training ML models. However, the current used this domain proprietary or limited use, practitioners have to generate their own dataset train HLS-related This paper presents a ML-assisted FPGA HLS, called HLSDataset. The is...
Wave simulations are used in many applications: medical imaging, oil and gas exploration, earthquake hazard mitigation, defense systems, among others. Most of these applications require repeated solutions the wave equation on supercomputers. Minimizing time to solution energy consumption very beneficial this domain. Data movement overhead is one key bottlenecks that affect consumption.
Managed language frameworks are pervasive today, especially in modern datacenters. .NET is one such framework that used widely Microsoft Azure but has not been well-studied. Applications built on these have different characteristics compared to traditional SPEC-like programs due the presence of a managed runtime. This affects tradeoffs associated with designing hardware for applications. Our goal study performance bottlenecks To find suitable benchmarks, we use Principal Component Analysis...
Field Programmable Gate Array (FPGA) platform has been a popular choice for deploying Convolution Neural Networks (CNNs) as result of its high parallelism and low energy consumption. Due to the limited on-chip computation storage resources, FPGA clusters are becoming promising candidates improve CNN throughputs. In this paper, we first put forward strategies optimize inter-board resource allocation in clusters. Then model multi-board cluster problem based on dynamic programming get optimal...
Message queues are used widely in parallel processing systems for worker thread synchronization. When there is a throughput mismatch between the upstream and downstream tasks, message queue buffer will often exist as either empty or full. Polling on an full affect performance of threads, since such polling cycles could have been spent other computation. Non-blocking alternative that allow to be spared tasks per applications' choice. However, application programmers not supposed bear burden,...
In the big data domain, visualization of graph systems provides users more intuitive experiences, especially in field social networks, transportation systems, and even medical biological domains. Processing-in-Memory (PIM) has been a popular choice for deploying emerging applications as result its high parallelism low energy consumption. Furthermore, memory cells PIM platforms can serve both compute units storage units, making solutions able to efficiently support visualizing graphs at...
Heterogeneous systems with CPU-GPUs have become dominant parallel architectures in recent years. To optimize memory management and data transfer between CPUs GPUs, unified virtual asynchronous copy were introduced Nvidia GPUs. With such architectural support, the entire processing flow can now be pipelined into multiple stages, thereby efficiently overlapping computation.In this paper, we provide a thorough performance analysis of GPU (Async Memcpy) (UVM) on workloads covering domains. We...
Field Programmable Gate Array (FPGA) platform has been a popular choice for deploying Convolutional Neural Networks (CNNs) as result of its high parallelism and low energy consumption. Due to the limitation on-chip resources on single board, FPGA clusters become promising solutions improve throughput CNNs. In this paper, we firstly put forward strategies optimize resource allocation intra inter boards. Then model multi-board cluster problem design algorithms based knapsack dynamic...
3D models are widely used in computer graphics, vision, and robotics applications. Multiple hardware accelerators for running model related applications, since the computations required space an order of magnitude higher than 2D space. Due to high computation intensity workloads, using large datasets performance characterization is not a feasible choice during accelerator design. Representative subsets save execution or simulation time, e.g, ModelNet10, subset ModelNet40, machine learning...
With increasing core counts and multiple levels of cache memories, scaling multi-threaded task-level parallel workloads is continuously becoming a challenge. A key challenge to the number communicating tasks (or threads) rate at which existing communication mechanisms scale (in terms latency bandwidth). Architectures with hardware accelerated queuing operations have potential reduce improve scalability moving data between processing elements, reducing synchronization penalties, thereby...
Language agents have shown impressive problem-solving skills within defined settings and brief timelines. Yet, with the ever-evolving complexities of open-world simulations, there's a pressing need for that can flexibly adapt to complex environments consistently maintain long-term memory ensure coherent actions. To bridge gap between language games, we introduce Agent Role-Playing (LARP), which includes cognitive architecture encompasses processing decision-making assistant, an environment...
Storage is one of the important components in datacenters. As data volume rises and service scale grows, some workloads like database demand increasing amount storage. While a single server can only host limited number disks, distributed file systems (e.g., Hadoop Distributed File System referred to as HDFS) enable accessing disks mounted on other servers cluster, satisfying storage requirements. On side, NVMe-over-Fabric protocols NVMe-over-TCP) have been released solution device level...