- Graph Theory and Algorithms
- Cloud Computing and Resource Management
- Advanced Graph Neural Networks
- Data Management and Algorithms
- Advanced Clustering Algorithms Research
- Caching and Content Delivery
- Parallel Computing and Optimization Techniques
- Topic Modeling
- Advanced Data Storage Technologies
- Data Stream Mining Techniques
- Advanced Computational Techniques and Applications
- Anomaly Detection Techniques and Applications
- Distributed systems and fault tolerance
- Advanced Image and Video Retrieval Techniques
- Advanced Database Systems and Queries
- Software-Defined Networks and 5G
- Distributed and Parallel Computing Systems
- Fault Detection and Control Systems
- Stochastic Gradient Optimization Techniques
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Network Traffic and Congestion Control
- Chaos-based Image/Signal Encryption
- Blockchain Technology Applications and Security
- IoT and Edge/Fog Computing
Northeastern University
2016-2025
Tianjin University of Commerce
2025
Beijing Jiaotong University
2007-2025
Jiaozuo University
2013-2024
Chongqing Jiaotong University
2024
Northeastern University
2008-2024
Guangdong Police College
2024
Universidad del Noreste
2012-2023
Lanzhou University
2020-2022
Sias University
2022
Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud typically involve sets of massive scale. Fast convergence iterative computation on set is essential for these applications. In this paper, we explore opportunity accelerating propose a distributed computing framework, PrIter, which enables fast by providing support prioritized iteration. Instead performing all...
Myriad of graph-based algorithms in machine learning and data mining require parsing relational iteratively. These are implemented a large-scale distributed environment to scale massive sets. To accelerate these iterative computations, we propose delta-based accumulative computation (DAIC). Different from traditional which iteratively update the result based on previous iteration, DAIC updates by accumulating “changes” between iterations. By DAIC, can process only avoid negligible updates....
The production and real-time usage of streaming data bring new challenges for systems due to huge volume quick response request applications. Message queuing that offer high throughput low latency play an important role in today's big processing. There are several popular message also many in-lab academia. These with different design philosopies have characteristics. It is non-trivial a non-expert choose suitable system meet his specific requirement. With this premise, our primary...
Density Peaks (DP) is a recently proposed clustering algorithm that has distinctive advantages over existing algorithms. It already been used in wide range of applications. However, DP requires computing the distance between every pair input points, therefore incurring quadratic computation overhead, which prohibitive for large data sets. In this paper, we study efficient distributed algorithms DP. We first show naive MapReduce solution (Basic-DDP) high communication and overhead. Then,...
GNN's training needs to resolve issues of vertex dependencies, i.e., each representation's update depends on its neighbors. Existing distributed GNN systems adopt either a dependencies-cached approach or dependencies-communicated approach. Having made intensive experiments and analysis, we find that decision choose one the other for best performance is determined by set factors, including graph inputs, model configurations, an underlying computing cluster environment. If various trainings...
Blockchain serves as a replicated transactional processing system in trustless distributed environment. Existing blockchain systems all rely on an explicit ordering step to determine the global order of transactions that are collected from multiple peers. The consensus can be bottleneck since it must Byzantine-fault tolerant and scarcely benefit parallel execution. In this paper, we propose ordering-free architecture makes implicit through deterministic Based novel architecture, develop...
Relational data are pervasive in many applications such as mining or social network analysis. These relational typically massive containing at least millions hundreds of relations. This poses demand for the design distributed computing frameworks processing these on a large cluster. MapReduce is an example framework. However, based require parsing iteratively and need to operate through iterations. lacks built-in support iterative process. paper presents iMapReduce, framework that supports...
As new data and updates are constantly arriving, the results of mining applications become stale obsolete over time. Incremental processing is a promising approach to refreshing results. It utilizes previously saved states avoid expense re-computation from scratch. In this paper, we propose i2MapReduce, novel incremental extension MapReduce, most widely used frameworkfor big data. Compared with state-of-the-art work on Incoop, i2MapReduce (i) performs key-value pair level rather than task...
In general, the performance of parallel graph processing is determined by three pairs critical parameters, namely synchronous or asynchronous execution mode (Sync Async), Push Pull communication mechanism (Push Pull), and Data-driven Topology-driven traversing scheme (DD TD), which increases complexity sophistication programming system implementation GPU. Existing graph-processing frameworks mainly use a single combination in entire for given application, but we have observed their variable...
LSM-tree has been widely used in data management production systems for write-intensive workloads. However, as read and write workloads co-exist under LSM-tree, accesses can experience long latency low throughput due to the interferences buffer caching from compaction, a major frequent operation LSM-tree. After existing blocks are reorganized written other locations on disks. As result, related that have loaded cache invalidated since their referencing addresses changed, causing serious...
Credit card fraud is a major problem in today’s financial world. It induces severe damage to institutions and individuals. There has been an exponential increase the losses due recent years. Hence, effectively detecting fraudulent behavior of vital importance for either or Since credit events account small proportion all transaction real life, datasets about are usually imbalanced. Some common classifiers, such as decision tree naïve Bayes, unable detect fraud. Furthermore, some cases,...
Myriad of data mining algorithms in scientific computing require parsing sets iteratively. These iterative have to be implemented a distributed environment scale massive sets. To accelerate computations large-scale environment, we identify broad class that can accumulate update results. Specifically, different from traditional computations, which iteratively the result based on previous iteration, accumulative accumulates intermediate We prove an will yield same as its corresponding update....
Stream clustering is a fundamental problem in many streaming data analysis applications. Comparing to classical batch-mode clustering, there are two key challenges stream clustering: (i) Given that input changing continuously, how incrementally update their results efficiently? (ii) clusters continuously evolve with the evolution of data, capture cluster activities? Unfortunately, most existing algorithms can neither result real-time nor track clusters. In this paper, we propose algorithm...
Many Graph Neural Network (GNN) training systems have emerged recently to support efficient GNN training. Since GNNs embody complex data dependencies between samples, the of should address distinct challenges different from DNN in management, such as partitioning, batch preparation for mini-batch training, and transferring CPUs GPUs. These factors, which take up a large proportion time, make management more significant. This paper reviews perspective provides comprehensive analysis...
Iterative computations are pervasive among data analysis applications, including web search, online social network analysis, recommendation systems, and so on. These applications typically involve sets of massive scale. Fast convergence the iterative on set is essential for these applications. In this paper, we explore opportunity accelerating by prioritization. Instead performing all points without discrimination, prioritize that help most, speed process significantly improved. We develop a...
ABSTRACT As an important component of the multiejector air curtain for automatic cold store, understanding impact geometric parameters ejector on its outlet jet is crucial boosting thermodynamic performance. In this paper, three structural ejector, namely, nozzle diameter ( d 1 ), mixing chamber 2 and distance from to (NXP), were optimized by multi‐index orthogonal test. The numerical simulation was carried out validated, influence factors discussed. results show that (1) primary factor all...
Concurrency control is crucial for ensuring consistency and isolation in distributed transaction processing. Traditional concurrency algorithms, such as locking-based protocols, usually suffer from performance degradation due to heavy coordination overheads. To overcome this problem, deterministic approaches are widely adopted many systems since they can avoid overhead by eliminating uncertainty. In these systems, every node receives identical batches, orders them according specific rules,...
Expectation Maximization is a popular approach for parameter estimation in many applications such as image understanding, document classification, or genome data analysis. Despite the popularity of EM algorithms, it challenging to efficiently implement these algorithms distributed environment. In particular, that frequently update parameters have been shown be much more efficient than their concurrent counterparts. Accordingly, we propose two approaches parallelize environment so scale...
In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental computing executed synchronously and/or asynchronously. We identify three barriers in existing processing. First, the scope largely limited to monotonic programs. Second, checking on conditions for monotonicity correctness async sophisticated manually done. Third, execution engines may be suboptimal due separation sync execution....