Yanfeng Zhang

ORCID: 0000-0002-9871-0304
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Graph Theory and Algorithms
  • Cloud Computing and Resource Management
  • Advanced Graph Neural Networks
  • Data Management and Algorithms
  • Advanced Clustering Algorithms Research
  • Caching and Content Delivery
  • Parallel Computing and Optimization Techniques
  • Topic Modeling
  • Advanced Data Storage Technologies
  • Data Stream Mining Techniques
  • Advanced Computational Techniques and Applications
  • Anomaly Detection Techniques and Applications
  • Distributed systems and fault tolerance
  • Advanced Image and Video Retrieval Techniques
  • Advanced Database Systems and Queries
  • Software-Defined Networks and 5G
  • Distributed and Parallel Computing Systems
  • Fault Detection and Control Systems
  • Stochastic Gradient Optimization Techniques
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Network Traffic and Congestion Control
  • Chaos-based Image/Signal Encryption
  • Blockchain Technology Applications and Security
  • IoT and Edge/Fog Computing

Northeastern University
2016-2025

Tianjin University of Commerce
2025

Beijing Jiaotong University
2007-2025

Jiaozuo University
2013-2024

Chongqing Jiaotong University
2024

Northeastern University
2008-2024

Guangdong Police College
2024

Universidad del Noreste
2012-2023

Lanzhou University
2020-2022

Sias University
2022

Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud typically involve sets of massive scale. Fast convergence iterative computation on set is essential for these applications. In this paper, we explore opportunity accelerating propose a distributed computing framework, PrIter, which enables fast by providing support prioritized iteration. Instead performing all...

10.1145/2038916.2038929 article EN 2011-10-26

Myriad of graph-based algorithms in machine learning and data mining require parsing relational iteratively. These are implemented a large-scale distributed environment to scale massive sets. To accelerate these iterative computations, we propose delta-based accumulative computation (DAIC). Different from traditional which iteratively update the result based on previous iteration, DAIC updates by accumulating “changes” between iterations. By DAIC, can process only avoid negligible updates....

10.1109/tpds.2013.235 article EN IEEE Transactions on Parallel and Distributed Systems 2013-09-16

The production and real-time usage of streaming data bring new challenges for systems due to huge volume quick response request applications. Message queuing that offer high throughput low latency play an important role in today's big processing. There are several popular message also many in-lab academia. These with different design philosopies have characteristics. It is non-trivial a non-expert choose suitable system meet his specific requirement. With this premise, our primary...

10.1109/access.2020.3046503 article EN cc-by IEEE Access 2020-12-22

Density Peaks (DP) is a recently proposed clustering algorithm that has distinctive advantages over existing algorithms. It already been used in wide range of applications. However, DP requires computing the distance between every pair input points, therefore incurring quadratic computation overhead, which prohibitive for large data sets. In this paper, we study efficient distributed algorithms DP. We first show naive MapReduce solution (Basic-DDP) high communication and overhead. Then,...

10.1109/tkde.2016.2609423 article EN IEEE Transactions on Knowledge and Data Engineering 2016-09-14

GNN's training needs to resolve issues of vertex dependencies, i.e., each representation's update depends on its neighbors. Existing distributed GNN systems adopt either a dependencies-cached approach or dependencies-communicated approach. Having made intensive experiments and analysis, we find that decision choose one the other for best performance is determined by set factors, including graph inputs, model configurations, an underlying computing cluster environment. If various trainings...

10.1145/3514221.3526134 article EN Proceedings of the 2022 International Conference on Management of Data 2022-06-10

Blockchain serves as a replicated transactional processing system in trustless distributed environment. Existing blockchain systems all rely on an explicit ordering step to determine the global order of transactions that are collected from multiple peers. The consensus can be bottleneck since it must Byzantine-fault tolerant and scarcely benefit parallel execution. In this paper, we propose ordering-free architecture makes implicit through deterministic Based novel architecture, develop...

10.14778/3551793.3551816 article EN Proceedings of the VLDB Endowment 2022-07-01

Relational data are pervasive in many applications such as mining or social network analysis. These relational typically massive containing at least millions hundreds of relations. This poses demand for the design distributed computing frameworks processing these on a large cluster. MapReduce is an example framework. However, based require parsing iteratively and need to operate through iterations. lacks built-in support iterative process. paper presents iMapReduce, framework that supports...

10.1109/ipdps.2011.260 article EN 2011-05-01

As new data and updates are constantly arriving, the results of mining applications become stale obsolete over time. Incremental processing is a promising approach to refreshing results. It utilizes previously saved states avoid expense re-computation from scratch. In this paper, we propose i2MapReduce, novel incremental extension MapReduce, most widely used frameworkfor big data. Compared with state-of-the-art work on Incoop, i2MapReduce (i) performs key-value pair level rather than task...

10.1109/tkde.2015.2397438 article EN IEEE Transactions on Knowledge and Data Engineering 2015-02-02

In general, the performance of parallel graph processing is determined by three pairs critical parameters, namely synchronous or asynchronous execution mode (Sync Async), Push Pull communication mechanism (Push Pull), and Data-driven Topology-driven traversing scheme (DD TD), which increases complexity sophistication programming system implementation GPU. Existing graph-processing frameworks mainly use a single combination in entire for given application, but we have observed their variable...

10.1145/3293883.3295733 article EN 2019-02-05

LSM-tree has been widely used in data management production systems for write-intensive workloads. However, as read and write workloads co-exist under LSM-tree, accesses can experience long latency low throughput due to the interferences buffer caching from compaction, a major frequent operation LSM-tree. After existing blocks are reorganized written other locations on disks. As result, related that have loaded cache invalidated since their referencing addresses changed, causing serious...

10.1109/icdcs.2017.70 article EN 2017-06-01

Credit card fraud is a major problem in today’s financial world. It induces severe damage to institutions and individuals. There has been an exponential increase the losses due recent years. Hence, effectively detecting fraudulent behavior of vital importance for either or Since credit events account small proportion all transaction real life, datasets about are usually imbalanced. Some common classifiers, such as decision tree naïve Bayes, unable detect fraud. Furthermore, some cases,...

10.1155/2022/8027903 article EN Mobile Information Systems 2022-04-25

Myriad of data mining algorithms in scientific computing require parsing sets iteratively. These iterative have to be implemented a distributed environment scale massive sets. To accelerate computations large-scale environment, we identify broad class that can accumulate update results. Specifically, different from traditional computations, which iteratively the result based on previous iteration, accumulative accumulates intermediate We prove an will yield same as its corresponding update....

10.1145/2287036.2287041 article EN 2012-06-18

Stream clustering is a fundamental problem in many streaming data analysis applications. Comparing to classical batch-mode clustering, there are two key challenges stream clustering: (i) Given that input changing continuously, how incrementally update their results efficiently? (ii) clusters continuously evolve with the evolution of data, capture cluster activities? Unfortunately, most existing algorithms can neither result real-time nor track clusters. In this paper, we propose algorithm...

10.1145/3186728.3164136 article EN Proceedings of the VLDB Endowment 2017-12-01

Many Graph Neural Network (GNN) training systems have emerged recently to support efficient GNN training. Since GNNs embody complex data dependencies between samples, the of should address distinct challenges different from DNN in management, such as partitioning, batch preparation for mini-batch training, and transferring CPUs GPUs. These factors, which take up a large proportion time, make management more significant. This paper reviews perspective provides comprehensive analysis...

10.14778/3648160.3648167 article EN Proceedings of the VLDB Endowment 2024-02-01

Iterative computations are pervasive among data analysis applications, including web search, online social network analysis, recommendation systems, and so on. These applications typically involve sets of massive scale. Fast convergence the iterative on set is essential for these applications. In this paper, we explore opportunity accelerating by prioritization. Instead performing all points without discrimination, prioritize that help most, speed process significantly improved. We develop a...

10.1109/tpds.2012.272 article EN IEEE Transactions on Parallel and Distributed Systems 2012-09-14

ABSTRACT As an important component of the multiejector air curtain for automatic cold store, understanding impact geometric parameters ejector on its outlet jet is crucial boosting thermodynamic performance. In this paper, three structural ejector, namely, nozzle diameter ( d 1 ), mixing chamber 2 and distance from to (NXP), were optimized by multi‐index orthogonal test. The numerical simulation was carried out validated, influence factors discussed. results show that (1) primary factor all...

10.1002/apj.70011 article EN Asia-Pacific Journal of Chemical Engineering 2025-03-24

Concurrency control is crucial for ensuring consistency and isolation in distributed transaction processing. Traditional concurrency algorithms, such as locking-based protocols, usually suffer from performance degradation due to heavy coordination overheads. To overcome this problem, deterministic approaches are widely adopted many systems since they can avoid overhead by eliminating uncertainty. In these systems, every node receives identical batches, orders them according specific rules,...

10.1038/s41598-025-00478-5 article EN cc-by-nc-nd Scientific Reports 2025-06-02

Expectation Maximization is a popular approach for parameter estimation in many applications such as image understanding, document classification, or genome data analysis. Despite the popularity of EM algorithms, it challenging to efficiently implement these algorithms distributed environment. In particular, that frequently update parameters have been shown be much more efficient than their concurrent counterparts. Accordingly, we propose two approaches parallelize environment so scale...

10.1109/cluster.2012.81 article EN 2012-09-01

In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental computing executed synchronously and/or asynchronously. We identify three barriers in existing processing. First, the scope largely limited to monotonic programs. Second, checking on conditions for monotonicity correctness async sophisticated manually done. Third, execution engines may be suboptimal due separation sync execution....

10.1145/3318464.3389712 article EN 2020-05-29
Coming Soon ...