NFDI4DS | UHH-SEMS - Publication Details

LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs

OPENALEX - Publications

Weibin Meng Ying Liu Yichen Zhu Shenglin Zhang Dan Pei and 6 more

Recording runtime status via logs is common for almost every computer system, and detecting anomalies in crucial timely identifying malfunctions of systems. However, manually time-consuming, error-prone, infeasible. Existing automatic log anomaly detection approaches, using indexes rather than semantics templates, tend to cause false alarms. In this work, we propose LogAnomaly, a framework model unstructured stream as natural language sequence. Empowered by template2vec, novel, simple yet...

10.24963/ijcai.2019/658 article EN 2019-07-28

Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks

OPENALEX - Publications

Ping Liu Haowen Xu Qianyu Ouyang Rui Jiao Zhekang Chen and 6 more

The anomalies of microservice invocation traces (traces) often indicate that the quality microservice-based large software service is being impaired. However, timely and accurately detecting trace very challenging due to: 1) number underlying microservices, 2) complex call relationships between them, 3) interdependency response times paths. Our core idea to use machine learning automatically learn overall normal patterns during periodic offline training. In online anomaly detection, a new...

10.1109/issre5003.2020.00014 article EN 2020-10-01

Localizing Failure Root Causes in a Microservice through Causality Inference

OPENALEX - Publications

Yuan Meng Shenglin Zhang Yongqian Sun Ruru Zhang Zhilong Hu and 4 more

An increasing number of Internet applications are applying microservice architecture due to its flexibility and clear logic. The stability is thus vitally important for these applications' quality service. Accurate failure root cause localization can help operators quickly recover failures mitigate loss. Although cross-microservice has been well studied, how localize causes in a so as this not yet studied. In work, we propose framework, MicroCause, accurately the monitoring indicators...

10.1109/iwqos49365.2020.9213058 article EN 2020-06-01

Diagnosing root causes of intermittent slow queries in cloud databases

OPENALEX - Publications

Minghua Ma Zheng Yin Shenglin Zhang Sheng Wang Christopher Zheng and 8 more

With the growing market of cloud databases, careful detection and elimination slow queries are great importance to service stability. Previous studies focus on optimizing that result from internal reasons (e.g., poorly-written SQLs). In this work, we discover a different set which might be more hazardous database users than other queries. We name such Intermittent Slow Queries (iSQs), because they usually intermittent performance issues external at or machine levels). Diagnosing root causes...

10.14778/3389133.3389136 article EN Proceedings of the VLDB Endowment 2020-04-01

Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis

OPENALEX - Publications

Shenglin Zhang Sibo Xia Wenzhao Fan Binpeng Shi Xiao Xiong and 4 more

Widely adopted for their scalability and flexibility, modern microservice systems present unique failure diagnosis challenges due to independent deployment dynamic interactions. This complexity can lead cascading failures that negatively impact operational efficiency user experience. Recognizing the critical role of fault in improving stability reliability systems, researchers have conducted extensive studies achieved a number significant results. survey provides an exhaustive review 98...

10.1145/3715005 article EN ACM Transactions on Software Engineering and Methodology 2025-01-23

Syslog processing for switch failure diagnosis and prediction in datacenter networks

OPENALEX - Publications

Shenglin Zhang Weibin Meng Jiahao Bu Sen Yang Ying Liu and 6 more

Syslogs on switches are a rich source of information for both post-mortem diagnosis and proactive prediction switch failures in datacenter network. However, such can be effectively extracted only through proper processing syslogs, e.g., using suitable machine learning techniques. A common approach to syslog is extract (i.e., build) templates from historical messages then match these templates. existing template extraction techniques either have low accuracies the "correct" set templates, or...

10.1109/iwqos.2017.7969130 article EN 2017-06-01

Robust and Rapid Adaption for Concept Drift in Software System Anomaly Detection

OPENALEX - Publications

Minghua Ma Shenglin Zhang Dan Pei Xin Huang Hongwei Dai

Anomaly detection is critical for web-based software systems. Anecdotal evidence suggests that in these systems, the accuracy of a static anomaly method was previously ensured bound to degrade over time. It due significant change data distribution, namely concept drift, which caused by or personal preferences evolving. Even though dozens detectors have been proposed years context system, they not tackled problem drift. In this paper, we present framework, StepWise, can detect drift without...

10.1109/issre.2018.00013 article EN 2018-10-01

LogTransfer: Cross-System Log Anomaly Detection for Software Systems with Transfer Learning

OPENALEX - Publications

Rui Chen Shenglin Zhang Dongwen Li Yuzhe Zhang Fangrui Guo and 5 more

System logs, which describe a variety of events software systems, are becoming increasingly popular for anomaly detection. However, large system, current unsupervised learning-based methods suffering from low accuracy due to the high diversity while supervised learning nearly infeasible be used in practice because it is time-consuming and labor-intensive obtain sufficient labels different types systems. In this paper, we propose novel framework, LogTransfer, applies transfer anomalous...

10.1109/issre5003.2020.00013 article EN 2020-10-01

Robust Failure Diagnosis of Microservice System Through Multimodal Data

OPENALEX - Publications

Shenglin Zhang Pengxiang Jin Zihan Lin Yongqian Sun Bicheng Zhang and 8 more

Automatic failure diagnosis is crucial for large microservice systems. Currently, most methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study real-world cases to show that combining these sources of (multimodal data) leads a more accurate diagnosis. However, effectively representing and addressing imbalanced failures remain challenging. To tackle issues, propose <italic...

10.1109/tsc.2023.3290018 article EN IEEE Transactions on Services Computing 2023-06-27

Robust Multimodal Failure Detection for Microservice Systems

OPENALEX - Publications

Chenyu Zhao Minghua Ma Zhenyu Zhong Shenglin Zhang Zhiyuan Tan and 8 more

Proactive failure detection of instances is vitally essential to microservice systems because an instance can propagate the whole system and degrade system's performance. Over years, many single-modal (i.e., metrics, logs, or traces) databased anomaly methods have been proposed. However, they tend miss a large number failures generate numerous false alarms ignore correlation multimodal data. In this work, we propose AnoFusion, unsupervised approach, proactively detect through data for...

10.1145/3580305.3599902 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes

OPENALEX - Publications

Yongqian Sun Youjian Zhao Ya Su Dapeng Liu Xiaohui Nie and 6 more

Additive key performance indicators (KPIs) (such as page view (PV), revenue, and error count) with multi-dimensional attributes ISP, Province, DataCenter) are common important in monitoring metrics Internet companies. When an anomaly happens to overall KPI, it is critical but challenging localize the root cause, which one (or more) combination of attribute values multiple dimensions. For example, total PV decrease caused by from “Beijing”or “China Mobile Beijing”, or “Beijing Shanghai”?...

10.1109/access.2018.2804764 article EN cc-by-nc-nd IEEE Access 2018-01-01

PreFix

OPENALEX - Publications

Shenglin Zhang Ying Liu Weibin Meng Zhiling Luo Jiahao Bu and 9 more

In modern datacenter networks (DCNs), failures of network devices are the norm rather than exception, and many research efforts have focused on dealing with after they happen. this paper, we take a different approach by predicting failures, thus operators can intervene "fix" potential before Specifically, in our proposed system, named PreFix, aim to determine during runtime whether switch failure will happen near future. The prediction is based measurements current system status historical...

10.1145/3219617.3219643 article EN 2018-06-12

Rapid and robust impact assessment of software changes in large internet-based services

OPENALEX - Publications

Shenglin Zhang Ying Liu Dan Pei Yu Chen Xianping Qu and 2 more

The detection of performance changes in software change roll-outs Internet-based services is crucial for an operations team, because it allows timely roll-back a when degrades unexpectedly. However, infeasible to manually investigate millions measurements many roll-outs.

10.1145/2716281.2836087 article EN 2015-12-01

PreFix

OPENALEX - Publications

Shenglin Zhang Ying Liu Weibin Meng Zhiling Luo Jiahao Bu and 9 more

In modern datacenter networks (DCNs), failures of network devices are the norm rather than exception, and many research efforts have focused on dealing with after they happen. this paper, we take a different approach by predicting failures, thus operators can intervene "fix" potential before Specifically, in our proposed system, named PreFix, aim to determine during runtime whether switch failure will happen near future. The prediction is based measurements current system status historical...

10.1145/3179405 article EN Proceedings of the ACM on Measurement and Analysis of Computing Systems 2018-04-03

Device-Agnostic Log Anomaly Classification with Partial Labels

OPENALEX - Publications

Weibin Meng Ying Liu Shenglin Zhang Dan Pei Hui Dong and 2 more

Anomaly classification, i.e., detecting whether a network device is anomalous and determining its anomaly category if yes, plays crucial role in troubleshooting. Compared to KPI curves, logs contain too much more valuable information for classification. However, the regular expression based classification techniques cannot tackle challenges lying log We propose LogClass, data-driven framework detect classify anomalies on logs. LogClass combines word representation method PU learning model...

10.1109/iwqos.2018.8624141 article EN 2018-06-01

A Semantic-aware Representation Framework for Online Log Analysis

OPENALEX - Publications

Weibin Meng Ying Liu Yuheng Huang Shenglin Zhang Federico Zaiter and 2 more

Logs are one of the most valuable data sources for large-scale service management. Log representation, which converts unstructured texts to structured vectors or matrices, serves as first step towards automated log analysis. However, current representation methods neither represent domain-specific semantic information logs, nor handle out-of-vocabulary (OOV) words new types logs at runtime. We propose Log2Vec, a semantic-aware framework Log2Vec combines log-specific word embedding method...

10.1109/icccn49398.2020.9209707 article EN 2020-08-01

FUNNEL: Assessing Software Changes in Web-Based Services

OPENALEX - Publications

Shenglin Zhang Ying Liu Dan Pei Yu Chen Xianping Qu and 4 more

The detection of performance changes in software change roll-outs Internet-based services is crucial for an operations team, because it allows timely roll-back a when degrades unexpectedly. However, infeasible to manually investigate millions measurements many roll-outs. In this paper, we present automated tool, FUNNEL, rapid and robust impact assessment large services. FUNNEL automatically collects the related each change. To detect significant behavior changes, adopts singular spectrum...

10.1109/tsc.2016.2539945 article EN IEEE Transactions on Services Computing 2016-03-09

LogClass: Anomalous Log Identification and Classification With Partial Labels

OPENALEX - Publications

Weibin Meng Ying Liu Shenglin Zhang Federico Zaiter Yuzhe Zhang and 6 more

Logs are imperative in the management process of networks and services. However, manually identifying classifying anomalous logs is time-consuming, error-prone, labor-intensive. Additionally, rule-based approaches cannot tackle challenges underlying log identification classification resulting from new types partial labels. We propose LogClass, a framework to automatically robustly identify classify for network service based on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/tnsm.2021.3055425 article EN IEEE Transactions on Network and Service Management 2021-01-28

Efficient KPI Anomaly Detection Through Transfer Learning for Large-Scale Web Services

OPENALEX - Publications

Shenglin Zhang Zhenyu Zhong Dongwen Li Qiliang Fan Yongqian Sun and 7 more

Timely anomaly detection of key performance indicators (KPIs), <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.</i> , service response time, error rate, is utmost importance to Web services. Over the years, many unsupervised deep learning-based approaches have been proposed. To achieve good performance, they require a long period KPI data for model training, which not easy guarantee with frequent changes. Additionally, training overhead...

10.1109/jsac.2022.3180785 article EN IEEE Journal on Selected Areas in Communications 2022-06-08

LogParse: Making Log Parsing Adaptive through Word Classification

OPENALEX - Publications

Weibin Meng Ying Liu Federico Zaiter Shenglin Zhang Yihao Chen and 8 more

Logs are one of the most valuable data sources for large-scale service (e.g., social network, search engine) maintenance. Log parsing serves as first step towards automated log analysis. However, current methods not adaptive. Without intra-service adaptiveness, cannot handle software/firmware upgrade because learned templates match new type logs. In addition, without cross-service logs a be accurately parsed when this is newly deployed. We propose LogParse, an adaptive framework, to support...

10.1109/icccn49398.2020.9209681 article EN 2020-08-01

Detecting Outlier Machine Instances Through Gaussian Mixture Variational Autoencoder With One Dimensional CNN

OPENALEX - Publications

Ya Su Youjian Zhao Ming Sun Shenglin Zhang Xidao Wen and 6 more

Today's large datacenters house a massive number of machines, each which is being closely monitored with multivariate time series (e.g., CPU idle, memory utilization) to ensure service quality. Detecting outlier machine instances crucial for management. However, it challenging task due the multiple classes and various shapes, high dimensionality, lack labels series. In this article, we propose DOMI, novel unsupervised model that combines Gaussian mixture VAE 1D-CNN, <b>d</b>etect...

10.1109/tc.2021.3065073 article EN IEEE Transactions on Computers 2021-03-09

Real-Time Anomaly Detection for Large-Scale Network Devices

OPENALEX - Publications

Tao Lei Minghua Ma Shenglin Zhang Junhua Kuang Xiaowei Guo and 2 more

10.1109/ton.2025.3529861 article EN 2025-01-01

Effect of Quenching Temperature on Microstructure and Hydrogen-Induced Cracking Susceptibility in S355 Steel

OPENALEX - Publications

Chunyan Yan Shenglin Zhang Lingchuan Zhou Zhanpeng Tian Mengdie Shen and 1 more

S355 steels are widely used in various applications. However, they may be affected by hydrogen, which can induce hydrogen-induced cracking (HIC). The effects of the quenching temperature (Twq) on microstructure variation and HIC susceptibility steel was investigated microstructural characterization, hydrogen permeation (HP) test, slow strain rate tensile (SSRT) microprint technique (HMT) hydrogen-charged test. results indicate that treated specimens consisted predominantly lath martensite...

10.3390/ma18051161 article EN Materials 2025-03-05

A novel explainable propagation-based fault diagnosis approach for Clean-In-Place by establishing Boolean network model

OPENALEX - Publications

Jiayi Zhang Xiang Liu Yan Wang Shenglin Zhang Tuanjie Wang and 1 more

10.1016/j.jprocont.2025.103405 article EN Journal of Process Control 2025-03-11

Rapid microwave-assisted refluxing synthesis of hierarchical mulberry-shaped Na3V2(PO4)2O2F@C as high performance cathode for sodium & lithium-ion batteries

OPENALEX - Publications

Yan Hou Kun Chang Zhenyu Wang Shuai Gu Qiong Liu and 5 more

10.1007/s40843-018-9342-0 article EN Science China Materials 2018-10-08