- Parallel Computing and Optimization Techniques
- Low-power high-performance VLSI design
- Embedded Systems Design Techniques
- Advanced Data Storage Technologies
- Graph Theory and Algorithms
- VLSI and Analog Circuit Testing
- Network Packet Processing and Optimization
- Algorithms and Data Compression
- Complex Network Analysis Techniques
- Advancements in Photolithography Techniques
- Advanced Database Systems and Queries
- Advanced Graph Neural Networks
- Interconnection Networks and Systems
- Photonic and Optical Devices
- Green IT and Sustainability
- Robotic Mechanisms and Dynamics
- Radiation Effects in Electronics
- Insurance, Mortality, Demography, Risk Management
- Genomics and Chromatin Dynamics
- Cloud Computing and Remote Desktop Technologies
- Semiconductor materials and devices
- Genomics and Phylogenetic Studies
- Neural Networks and Applications
- Grey System Theory Applications
- Caching and Content Delivery
City University of Hong Kong
2022-2024
Michigan Technological University
2020-2022
University of California, Riverside
2016-2020
Ministry of Education of the People's Republic of China
2015-2016
Sun Yat-sen University
2014-2016
SYSU-CMU International Joint Research Institute
2015-2016
Graph analytics delivers deep knowledge by processing large volumes of highly connected data. In real-world graphs, the degree distribution tends to follow power law -- a small portion nodes own number neighbors. The high irregularity acts as major barrier their efficient on GPU architectures, which are primarily designed for accelerating computations regular data with SIMD executions. Existing solutions inefficiency GPU-based graph either modify programming abstraction or rely changes...
Faster app launching is crucial for the user experience on mobile devices. Apps launched from a background cached state, called hot-launching, have much better performance than apps scratch. To increase number of hot-launches, leading vendors now cache more in by enabling swap. Recent work also proposed reducing Java heap to apps. However, this paper found that existing methods deteriorate hot-launch while increasing simultaneously improve and performance, proposes Fleet,...
Finite state machines (FSMs) are basic computation models that play essential roles in many applications. Enabling efficient parallel FSM execution is critical to the performance of these However, they very challenging parallelize due their inherent data dependencies occur at each step computations.
Graph analytics delivers deep knowledge by processing large volumes of highly connected data. In real-world graphs, the degree distribution tends to follow power law -- a small portion nodes own number neighbors. The high irregularity acts as major barrier their efficient on GPU architectures, which are primarily designed for accelerating computations regular data with SIMD executions. Existing solutions inefficiency GPU-based graph either modify programming abstraction or rely changes...
Finite state machines (FSMs) are the backbone of many applications, but difficult to parallelize due their inherent dependencies. Speculative FSM parallelization has shown promise on multicore with up eight cores. However, as hardware parallelism grows (e.g., Xeon Phi 288 logical cores), a fundamental question raises: How does speculative scale number cores increases? Without answering this question, existing methods for simply choose use all available cores, which might not only waste...
Finite State Machine (FSM) plays a critical role in many real-world applications, ranging from pattern matching to network security. In recent years, significant research efforts have been made accelerate FSM computations on different parallel platforms, including multicores, GPUs, and DRAM-based accelerators. A popular direction is the speculation-centric parallelization. Despite their abundance promising results, benefits of parallelization GPUs heavily depend high speculation accuracy are...
JavaScript Object Notation (JSON) and its variants have gained great popularity in recent years. Unfortunately, the performance of their analytics is often dragged down by expensive JSON parsing. To address this, work has shown that building bitwise indices on data, called structural , can greatly accelerate querying. Despite promise, existing index construction does not scale well as records become larger more complex, due to (inherently) sequential process involvement costly memory copies...
Finite-state machine (FSM) is a fundamental computation model used by many applications. However, FSM execution known to be "embarrassingly sequential" due the state dependences among transitions. Existing solutions leverage enumerative or speculative parallelization break dependences. efficiency of both schemes highly depends on properties and its inputs. For those exhibiting unfavorable properties, former suffers from overhead maintaining multiple paths, while latter bottlenecked serial...
This paper presents and investigates two new numerical algorithms (i.e., E47 algorithm 94LVI algorithm) for solving the quadratic programming (QP) problem subject to inequality bound constraints. Such a constrained QP is firstly converted equivalently into linear variational (LVI), then piecewise-linear projection equation (PLPE). The are employed solve resultant PLPE, thus optimal solution obtained readily. In this paper, we analyze computational complexities present global convergence of...
Many performance-critical applications traverse bitstreams with bitwise computations for better performance or higher space efficiency, such as multimedia processing and bitmap indexing. However, when these carry dependences, the entire bitstream traversal becomes serial, fundamentally limiting scalability. In this work, we show that bitstream-carried dependences are actually "breakable" in many cases, adoption of a systematic treatment - principled speculation (PBS). The core idea PBS stems...
The population projection of the Indian subcontinent, which is closely related to future development this region and even whole world, has catched great attention among sociologists as well scientists. However, most former researches are just based on fertility, mortality or other individual quantifiable factors by using some traditional statistical models thus may lack all-sidedness in their results. historical data comprehensive reflection under influence all factors. Based over 2000...
Finite State Machines (FSMs) are fundamental in both hardware design and software development. However, the reliability of FSM computations remains poorly understood. Existing analyses mainly designed for generic unaware special error tolerance characteristics computations. This work introduces RelyFSM -- a state-level analysis framework By modeling behaviors unreliable executions qualitatively reasoning about transition structures, can precisely capture inherent Our evaluation with...
The recovery and prediction of Northern American population data, which are closely related to the future development America even whole world, have become significant subjects captured great attention among sociologists as well scientists. However, most relevant researches just based on fertility, mortality or other individual quantifiable factors by traditional statistical models thus lack all-sidedness in their results. As we know, historical data comprehensive reflection under influence...
The throughput of B+ tree query processing is critical to many databases, file systems, and cloud applications. Based on bulk synchronous parallel (BSP), latch-free has shown promise by queries in small batches avoiding the use locks. As number cores CPUs increases, it becomes possible process larger without adding any extra delays. In this work, we argue that as batch size there will be more optimization opportunities exposed beyond parallelism, especially when distributions are highly...
The throughput of B+ tree query processing is critical to many databases, file systems, and cloud applications. Based on bulk synchronous parallel (BSP), latch-free has shown promise by queries in small batches avoiding the use locks. As number cores CPUs increases, it becomes possible process larger without adding any extra delays. In this work, we argue that as batch size there will be more optimization opportunities exposed beyond parallelism, especially when distributions are highly...
Motif Searching is an important problem that can reveal crucial information from biological data. Since the general motif searching NP-hard and volume of data growing exponentially in recent years, there a pressing need for developing time space-efficient algorithms to find motifs. In this paper, we explore scalable parallelization Edit Distance-Based Search (EMS). We introduce two parallel designs, recursEMS which integrates existing EMS solver into recursion tree running multiple...