- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Interconnection Networks and Systems
- Distributed and Parallel Computing Systems
- Low-power high-performance VLSI design
- Embedded Systems Design Techniques
- South Asian Studies and Diaspora
- Asian Geopolitics and Ethnography
- Privacy-Preserving Technologies in Data
- Infrastructure Maintenance and Monitoring
- Machine Learning in Healthcare
- Video Surveillance and Tracking Methods
- Advanced Neural Network Applications
- Recommender Systems and Techniques
- Ferroelectric and Negative Capacitance Devices
- IoT and Edge/Fog Computing
Hangzhou Dianzi University
2013-2025
Ministry of Education of the People's Republic of China
2015
University of Science and Technology of China
2011-2014
Suzhou Research Institute
2011-2012
The increasing demand on the main memory capacity is one of big data challenges. Dynamic random access (DRAM) does not represent best choice for a memory, due to high power consumption and low density. However, nonvolatile such as phase-change (PCM), represents an additional because high-density characteristic. Nevertheless, latency limited write endurance have disabled PCM replace DRAM currently. Therefore, hybrid which combines both PCM, has become good alternative traditional memory. Both...
As the smart cities emerged for more comfortable urban spaces, services, such as health, transportation, and so on, need to be promoted. In addition, cloud computing provides flexible allocation, migration of better security isolation; therefore, it is infrastructure cities. Single instruction-set architecture (ISA) heterogeneous multi-core processors have higher performance per watt than their symmetric counterparts are popular in current processors. computing, which integrates a few fast...
Main memory is expected to grow significantly in both speed and capacity for it a major shared resource among cores multi-core system, which will lead increasing power consumption. Therefore, critical address the issue without seriously decreasing performance subsystem. In this paper, we firstly propose affinity retains active low ranks as long possible avoid frequently switching between status, then present aware scheduling (MAS) balance performance, power, thermal fairness systems....
Main Memory is responsible for a large and increasing fraction of the energy consumed by multi-core systems. Therefore, it critical to address power issue in memory subsystem. In this paper, we present solution improve efficiency through coordinating page allocation thread group scheduling (CAS). Partitioning all threads into different groups, after using proposed allocation, same occupy rank. Adjusting default Linux CFS, implement scheduling. The CAS alternates active partial periodically...
In a modern multicore system, memory is shared among more and concurrently running multimedia applications. Therefore, contention interference are serious, inducing system performance degradation significantly, the of each thread differently, unfairness in resource sharing, priority inversion, even starvation. this paper, we propose an approach coordinating channel-aware page mapping policy scheduling (CCPS) to reduce intermultimedia application system. The idea map data different threads...
Phase detection and behavior analysis have been major concerned to improve the performance as well system throughputs. However, for distributed acceleration engines, execution among different phases is much more difficult be analyzed, especially loop based programs. With respect tasks in iterations, how efficiently detect belonging same iteration or even across iterations posing significant challenge. In this paper we propose a phase method loop-based programs on multiprocessor...
The growing gap between microprocessor speed and DRAM is a major problem that computer designers are facing. In order to narrow the gap, it necessary improve DRAM's throughput. Moreover, on multi-core platforms, memory shared by all cores usually suffers from contention interference problem, which can cause serious performance degradation unfairness among parallel running threads. To address these problems, this paper proposes techniques take both advantages of partitioning cores, threads...
Optimizing system performance through scheduling has received a lot of attention. However, none the existing approaches can balance improvement and fair share CPU time among threads. We present in this paper memory aware scheduler (SMAS). The key idea is to adopt thread group which partitions threads based on address space reduce switching overhead give each chance occupy time. There are three main contributions: 1) SMAS does well balancing fairness all threads; 2) our knowledge, first...
The last-level cache (LLC) mitigates the long latencies of memory access in today's chip multi-core processor (CMP). promotion policy LLC largely affects efficiency, while an inappropriate may lead useless blocks to remain longer than necessary, turn result into inefficiency. Currently state-of-the-art policies are unaware re-reference interval accesses. Applications that exhibit a perform poorly with these policies. In this paper, we propose uses prediction (RRIP) information. Such...
With the growth of Internet Things (IoT), increasingly, more computing tasks are implemented on power-sensitive mobile devices, causing a bottleneck energy consumption. Most devices consume considerable power in standby mode, during which capacitive DRAM cells' self-refresh power, is used to preserve data integrity, accounts for large part. To address this issue, strategies from both hardware and software perspectives have been proposed, yet, existing methods usually high cost. Software...
Dynamic voltage and frequency scaling (DVFS) has been the most useful technology to reduce power consumption, but it causes unpredictable program performance decreasing unfair sharing among threads, which may render analysis, optimization, isolation extremely difficult lead thread starvation priority inversion. This paper firstly proposes an OS scheduler based on dynamic time-slice (DTS) address problem incurred by DVFS. The DTS dynamically allocates each with a according threads' behavior...
Abstract In the field of traffic sign detection, effective data augmentation can improve model's detection capacity, enabling model to distinguish and locate signs more precisely enhancing driving safety. However, due small size low representation in dataset, standard common techniques are not suitable for detection. To address this issue, a novel strategy called flexible cut paste (FlexibleCP) is proposed. The overall enhancement approach shifted from multi‐image fusion target cropping...
The growing gap between microprocessor speed and DRAM is a major problem that computer designers are facing. In order to narrow the gap, it necessary improve DRAM's throughput. Moreover, on multi-core platforms, memory shared by all cores usually suffers from contention interference problem, which can cause serious performance degradation unfairness of overall system. To address these problems, this paper proposes techniques take advantage partitioning cores, threads banks into group form...
In this paper we extend and analyze Amdahl's law to general heterogeneous MPSoC era, find out how the speedup is affected by parameters, including amount for microprocessors accelerators, as well task partition characteristics. We also theoretical results about extended Law applied leverage load balancing of a without abstract limitation base core equivalents (BCEs). A prototype on FPGA constructed with Microblaze processors JPEG hardware accelerators. The experimental demonstrate that our...
Optimizing cache performance through improving data locality has been receiving a lot of attention. However, none the existing approaches can combine each task's behavior to optimize for caches. We present aware (BADL) in this paper. The key idea is add when allocating memory, which take advantage different performance. There are five main contributions: 1. our best knowledge, first attempt improve combining task behavior, 2. BADL detailed analyzes low derived from internal line, more...
On a Chip Multi-Processor (CMP) architecture, cache sharing impacts threads non-uniformly, where some may be slowed down significantly, while others are not. This cause severe performance problems such as throughput decreasing, thrashing. paper proposes new predicting inter-thread contention model, FOM (Frequency of Miss), and schedules based on the results CMP architecture. The input to our model is L2 misses number each thread. output extra for thread due sharing. We use guide scheduling....
In modern multi-core system, memory is shared among more and concurrently running threads. Therefore, contention interference seriously which induces performance degradation unevenly, unfairness resource sharing priority inversion even starvation. this paper, we first analyze the problems induced by in detail, then, propose pseudo share framework brings to exclusive system. The contains three steps: 1) Partition threads into thread groups respectively, each group runs on one core occupying...