NFDI4DS | UHH-SEMS - Publication Details

Yanfeng Zhang

ORCID: 0000-0002-9871-0304

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100367496

Research Areas

Graph Theory and Algorithms
Cloud Computing and Resource Management
Advanced Graph Neural Networks
Data Management and Algorithms
Advanced Clustering Algorithms Research
Caching and Content Delivery
Parallel Computing and Optimization Techniques
Topic Modeling
Advanced Data Storage Technologies
Data Stream Mining Techniques
Advanced Computational Techniques and Applications
Anomaly Detection Techniques and Applications
Distributed systems and fault tolerance
Advanced Image and Video Retrieval Techniques
Advanced Database Systems and Queries
Software-Defined Networks and 5G
Distributed and Parallel Computing Systems
Fault Detection and Control Systems
Stochastic Gradient Optimization Techniques
Advanced Neural Network Applications
Multimodal Machine Learning Applications
Network Traffic and Congestion Control
Chaos-based Image/Signal Encryption
Blockchain Technology Applications and Security
IoT and Edge/Fog Computing

Northeastern University
2016-2025

Tianjin University of Commerce
2025

Beijing Jiaotong University
2007-2025

Jiaozuo University
2013-2024

Chongqing Jiaotong University
2024

Northeastern University
2008-2024

Guangdong Police College
2024

Universidad del Noreste
2012-2023

Lanzhou University
2020-2022

Sias University
2022

iMapReduce: A Distributed Computing Framework for Iterative Computation

OPENALEX - Publications

Yanfeng Zhang Qixin Gao Lixin Gao Cuirong Wang

10.1007/s10723-012-9204-9 article EN Journal of Grid Computing 2012-03-01

PrIter

OPENALEX - Publications

Yanfeng Zhang Qixin Gao Lixin Gao Cuirong Wang

Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud typically involve sets of massive scale. Fast convergence iterative computation on set is essential for these applications. In this paper, we explore opportunity accelerating propose a distributed computing framework, PrIter, which enables fast by providing support prioritized iteration. Instead performing all...

10.1145/2038916.2038929 article EN 2011-10-26

Maiter: An Asynchronous Graph Processing Framework for Delta-Based Accumulative Iterative Computation

OPENALEX - Publications

Yanfeng Zhang Qixin Gao Lixin Gao Cuirong Wang

Myriad of graph-based algorithms in machine learning and data mining require parsing relational iteratively. These are implemented a large-scale distributed environment to scale massive sets. To accelerate these iterative computations, we propose delta-based accumulative computation (DAIC). Different from traditional which iteratively update the result based on previous iteration, DAIC updates by accumulating “changes” between iterations. By DAIC, can process only avoid negligible updates....

10.1109/tpds.2013.235 article EN IEEE Transactions on Parallel and Distributed Systems 2013-09-16

Multi-task spatio-temporal augmented net for industry equipment remaining useful life prediction

OPENALEX - Publications

Haodong Li Peng Cao Xingwei Wang Bo Yi Min Huang and 2 more

10.1016/j.aei.2023.101898 article EN Advanced Engineering Informatics 2023-01-01

A Fair Comparison of Message Queuing Systems

OPENALEX - Publications

Guo Fu Yanfeng Zhang Ge Yu

The production and real-time usage of streaming data bring new challenges for systems due to huge volume quick response request applications. Message queuing that offer high throughput low latency play an important role in today's big processing. There are several popular message also many in-lab academia. These with different design philosopies have characteristics. It is non-trivial a non-expert choose suitable system meet his specific requirement. With this premise, our primary...

10.1109/access.2020.3046503 article EN cc-by IEEE Access 2020-12-22

Batch image encryption using cross image permutation and diffusion

OPENALEX - Publications

Wei Song Chong Fu Yu Zheng Yanfeng Zhang Junxin Chen and 1 more

10.1016/j.jisa.2023.103686 article EN Journal of Information Security and Applications 2023-12-26

Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce

OPENALEX - Publications

Yanfeng Zhang Shimin Chen Ge Yu

Density Peaks (DP) is a recently proposed clustering algorithm that has distinctive advantages over existing algorithms. It already been used in wide range of applications. However, DP requires computing the distance between every pair input points, therefore incurring quadratic computation overhead, which prohibitive for large data sets. In this paper, we study efficient distributed algorithms DP. We first show naive MapReduce solution (Basic-DDP) high communication and overhead. Then,...

10.1109/tkde.2016.2609423 article EN IEEE Transactions on Knowledge and Data Engineering 2016-09-14

NeutronStar: Distributed GNN Training with Hybrid Dependency Management

OPENALEX - Publications

Qiange Wang Yanfeng Zhang Hao Wang Chaoyi Chen Xiaodong Zhang and 1 more

GNN's training needs to resolve issues of vertex dependencies, i.e., each representation's update depends on its neighbors. Existing distributed GNN systems adopt either a dependencies-cached approach or dependencies-communicated approach. Having made intensive experiments and analysis, we find that decision choose one the other for best performance is determined by set factors, including graph inputs, model configurations, an underlying computing cluster environment. If various trainings...

10.1145/3514221.3526134 article EN Proceedings of the 2022 International Conference on Management of Data 2022-06-10

NeuChain

OPENALEX - Publications

Zeshun Peng Yanfeng Zhang Qian Xu Haixu Liu Yuxiao Gao and 2 more

Blockchain serves as a replicated transactional processing system in trustless distributed environment. Existing blockchain systems all rely on an explicit ordering step to determine the global order of transactions that are collected from multiple peers. The consensus can be bottleneck since it must Byzantine-fault tolerant and scarcely benefit parallel execution. In this paper, we propose ordering-free architecture makes implicit through deterministic Based novel architecture, develop...

10.14778/3551793.3551816 article EN Proceedings of the VLDB Endowment 2022-07-01

iMapReduce: A Distributed Computing Framework for Iterative Computation

OPENALEX - Publications

Yanfeng Zhang Qinxin Gao Lixin Gao Cuirong Wang

Relational data are pervasive in many applications such as mining or social network analysis. These relational typically massive containing at least millions hundreds of relations. This poses demand for the design distributed computing frameworks processing these on a large cluster. MapReduce is an example framework. However, based require parsing iteratively and need to operate through iterations. lacks built-in support iterative process. paper presents iMapReduce, framework that supports...

10.1109/ipdps.2011.260 article EN 2011-05-01

i<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="zhang-ieq1-2397438.gif"/></alternatives></inline-formula>MapReduce: Incremental MapReduce for Mining Evolving Big Data

OPENALEX - Publications

Yanfeng Zhang Shimin Chen Qiang Wang Ge Yu

As new data and updates are constantly arriving, the results of mining applications become stale obsolete over time. Incremental processing is a promising approach to refreshing results. It utilizes previously saved states avoid expense re-computation from scratch. In this paper, we propose i2MapReduce, novel incremental extension MapReduce, most widely used frameworkfor big data. Compared with state-of-the-art work on Incoop, i2MapReduce (i) performs key-value pair level rather than task...

10.1109/tkde.2015.2397438 article EN IEEE Transactions on Knowledge and Data Engineering 2015-02-02

SEP-graph

OPENALEX - Publications

Hao Wang Liang Geng Rubao Lee Kaixi Hou Yanfeng Zhang and 1 more

In general, the performance of parallel graph processing is determined by three pairs critical parameters, namely synchronous or asynchronous execution mode (Sync Async), Push Pull communication mechanism (Push Pull), and Data-driven Topology-driven traversing scheme (DD TD), which increases complexity sophistication programming system implementation GPU. Existing graph-processing frameworks mainly use a single combination in entire for given application, but we have observed their variable...

10.1145/3293883.3295733 article EN 2019-02-05

LSbM-tree: Re-Enabling Buffer Caching in Data Management for Mixed Reads and Writes

OPENALEX - Publications

Dejun Teng Lei Guo Rubao Lee Chen Feng Siyuan Ma and 2 more

LSM-tree has been widely used in data management production systems for write-intensive workloads. However, as read and write workloads co-exist under LSM-tree, accesses can experience long latency low throughput due to the interferences buffer caching from compaction, a major frequent operation LSM-tree. After existing blocks are reorganized written other locations on disks. As result, related that have loaded cache invalidated since their referencing addresses changed, causing serious...

10.1109/icdcs.2017.70 article EN 2017-06-01

The Optimized Anomaly Detection Models Based on an Approach of Dealing with Imbalanced Dataset for Credit Card Fraud Detection

OPENALEX - Publications

Yanfeng Zhang Hongliang Lü Hong-Fan Lin Xue-Chen Qiao Hao Zheng

Credit card fraud is a major problem in today’s financial world. It induces severe damage to institutions and individuals. There has been an exponential increase the losses due recent years. Hence, effectively detecting fraudulent behavior of vital importance for either or Since credit events account small proportion all transaction real life, datasets about are usually imbalanced. Some common classifiers, such as decision tree naïve Bayes, unable detect fraud. Furthermore, some cases,...

10.1155/2022/8027903 article EN Mobile Information Systems 2022-04-25

Accelerate large-scale iterative computation through asynchronous accumulative updates

OPENALEX - Publications

Yanfeng Zhang Qixin Gao Lixin Gao Cuirong Wang

Myriad of data mining algorithms in scientific computing require parsing sets iteratively. These iterative have to be implemented a distributed environment scale massive sets. To accelerate computations large-scale environment, we identify broad class that can accumulate update results. Specifically, different from traditional computations, which iteratively the result based on previous iteration, accumulative accumulates intermediate We prove an will yield same as its corresponding update....

10.1145/2287036.2287041 article EN 2012-06-18

Clustering stream data by exploring the evolution of density mountain

OPENALEX - Publications

Shufeng Gong Yanfeng Zhang Ge Yu

Stream clustering is a fundamental problem in many streaming data analysis applications. Comparing to classical batch-mode clustering, there are two key challenges stream clustering: (i) Given that input changing continuously, how incrementally update their results efficiently? (ii) clusters continuously evolve with the evolution of data, capture cluster activities? Unfortunately, most existing algorithms can neither result real-time nor track clusters. In this paper, we propose algorithm...

10.1145/3186728.3164136 article EN Proceedings of the VLDB Endowment 2017-12-01

Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective

OPENALEX - Publications

Hao Yuan Yajiong Liu Yanfeng Zhang Ai Xin Qiange Wang and 3 more

Many Graph Neural Network (GNN) training systems have emerged recently to support efficient GNN training. Since GNNs embody complex data dependencies between samples, the of should address distinct challenges different from DNN in management, such as partitioning, batch preparation for mini-batch training, and transferring CPUs GPUs. These factors, which take up a large proportion time, make management more significant. This paper reviews perspective provides comprehensive analysis...

10.14778/3648160.3648167 article EN Proceedings of the VLDB Endowment 2024-02-01

PrIter: A Distributed Framework for Prioritizing Iterative Computations

OPENALEX - Publications

Yanfeng Zhang Qixin Gao Lixin Gao Cuirong Wang

Iterative computations are pervasive among data analysis applications, including web search, online social network analysis, recommendation systems, and so on. These applications typically involve sets of massive scale. Fast convergence the iterative on set is essential for these applications. In this paper, we explore opportunity accelerating by prioritization. Instead performing all points without discrimination, prioritize that help most, speed process significantly improved. We develop a...

10.1109/tpds.2012.272 article EN IEEE Transactions on Parallel and Distributed Systems 2012-09-14

Towards Communication-Efficient Out-of-Core Graph Processing on the GPU

OPENALEX - Publications

Qiange Wang Xin Ai Yongze Yan Shufeng Gong Yanfeng Zhang and 2 more

10.1109/tpds.2025.3547356 article EN IEEE Transactions on Parallel and Distributed Systems 2025-01-01

Orthogonal Analysis for Structural Optimization of Ejector for Multiejector Air Curtain

OPENALEX - Publications

Yabo Wang Xinxin Guo Cong Shi Mingzhu Zhang Zhe Tao and 1 more

ABSTRACT As an important component of the multiejector air curtain for automatic cold store, understanding impact geometric parameters ejector on its outlet jet is crucial boosting thermodynamic performance. In this paper, three structural ejector, namely, nozzle diameter ( d 1 ), mixing chamber 2 and distance from to (NXP), were optimized by multi‐index orthogonal test. The numerical simulation was carried out validated, influence factors discussed. results show that (1) primary factor all...

10.1002/apj.70011 article EN Asia-Pacific Journal of Chemical Engineering 2025-03-24

Neural network-driven parallel accelerated selective image encryption with semantic understanding

OPENALEX - Publications

Wei Song Chong Fu Yu Zheng Junxin Chen Yanfeng Zhang

10.1140/epjs/s11734-025-01611-1 article EN The European Physical Journal Special Topics 2025-04-23

An improved naive Bayes algorithm based on k ′ k -means reclassification algorithm for imbalanced classification

OPENALEX - Publications

Yanfeng Zhang Li‐Chun Wang Xin Wang

10.1080/03610918.2025.2496773 article EN Communications in Statistics - Simulation and Computation 2025-04-26

An optimized deterministic concurrency control approach for geo-distributed transaction processing on permissioned blockchains

OPENALEX - Publications

Zhibo Han Zeshun Peng Gang Wang Minghe Yu Xiaohua Li and 2 more

Concurrency control is crucial for ensuring consistency and isolation in distributed transaction processing. Traditional concurrency algorithms, such as locking-based protocols, usually suffer from performance degradation due to heavy coordination overheads. To overcome this problem, deterministic approaches are widely adopted many systems since they can avoid overhead by eliminating uncertainty. In these systems, every node receives identical batches, orders them according specific rules,...

10.1038/s41598-025-00478-5 article EN cc-by-nc-nd Scientific Reports 2025-06-02

Accelerating Expectation-Maximization Algorithms with Frequent Updates

OPENALEX - Publications

Jiangtao Yin Yanfeng Zhang Lixin Gao

Expectation Maximization is a popular approach for parameter estimation in many applications such as image understanding, document classification, or genome data analysis. Despite the popularity of EM algorithms, it challenging to efficiently implement these algorithms distributed environment. In particular, that frequently update parameters have been shown be much more efficient than their concurrent counterparts. Accordingly, we propose two approaches parallelize environment so scale...

10.1109/cluster.2012.81 article EN 2012-09-01

Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing

OPENALEX - Publications

Qiange Wang Yanfeng Zhang Hao Wang Liang Geng Rubao Lee and 2 more

In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental computing executed synchronously and/or asynchronously. We identify three barriers in existing processing. First, the scope largely limited to monotonic programs. Second, checking on conditions for monotonicity correctness async sophisticated manually done. Third, execution engines may be suboptimal due separation sync execution....

10.1145/3318464.3389712 article EN 2020-05-29

Coming Soon ...