- Software System Performance and Reliability
- Network Security and Intrusion Detection
- Anomaly Detection Techniques and Applications
- Time Series Analysis and Forecasting
- Corrosion Behavior and Inhibition
- Cloud Computing and Resource Management
- Software Engineering Research
- Software-Defined Networks and 5G
- Data Stream Mining Techniques
- Molecular Biology Techniques and Applications
- Software Reliability and Analysis Research
- Data Quality and Management
- Software Testing and Debugging Techniques
- Caching and Content Delivery
- Biological Stains and Phytochemicals
- Concrete Corrosion and Durability
- Network Traffic and Congestion Control
- Data Mining Algorithms and Applications
- Context-Aware Activity Recognition Systems
- Materials Engineering and Processing
- Traffic Prediction and Management Techniques
- Metallurgy and Material Science
- Opportunistic and Delay-Tolerant Networks
- Health and Well-being Studies
- Internet Traffic Analysis and Secure E-voting
Nankai University
2018-2025
Hohai University
2024-2025
Qilu Hospital of Shandong University
2024
Tianjin haihe hospital
2022-2024
Affiliated Hospital of Chengde Medical College
2024
Buchang Pharma (China)
2024
Chengdu University of Traditional Chinese Medicine
2023-2024
Guangxi Normal University
2024
Information Technology Laboratory
2022
CCCC Highway Consultants (China)
2022
Recording runtime status via logs is common for almost every computer system, and detecting anomalies in crucial timely identifying malfunctions of systems. However, manually time-consuming, error-prone, infeasible. Existing automatic log anomaly detection approaches, using indexes rather than semantics templates, tend to cause false alarms. In this work, we propose LogAnomaly, a framework model unstructured stream as natural language sequence. Empowered by template2vec, novel, simple yet...
The anomalies of microservice invocation traces (traces) often indicate that the quality microservice-based large software service is being impaired. However, timely and accurately detecting trace very challenging due to: 1) number underlying microservices, 2) complex call relationships between them, 3) interdependency response times paths. Our core idea to use machine learning automatically learn overall normal patterns during periodic offline training. In online anomaly detection, a new...
An increasing number of Internet applications are applying microservice architecture due to its flexibility and clear logic. The stability is thus vitally important for these applications' quality service. Accurate failure root cause localization can help operators quickly recover failures mitigate loss. Although cross-microservice has been well studied, how localize causes in a so as this not yet studied. In work, we propose framework, MicroCause, accurately the monitoring indicators...
With the growing market of cloud databases, careful detection and elimination slow queries are great importance to service stability. Previous studies focus on optimizing that result from internal reasons (e.g., poorly-written SQLs). In this work, we discover a different set which might be more hazardous database users than other queries. We name such Intermittent Slow Queries (iSQs), because they usually intermittent performance issues external at or machine levels). Diagnosing root causes...
Widely adopted for their scalability and flexibility, modern microservice systems present unique failure diagnosis challenges due to independent deployment dynamic interactions. This complexity can lead cascading failures that negatively impact operational efficiency user experience. Recognizing the critical role of fault in improving stability reliability systems, researchers have conducted extensive studies achieved a number significant results. survey provides an exhaustive review 98...
Syslogs on switches are a rich source of information for both post-mortem diagnosis and proactive prediction switch failures in datacenter network. However, such can be effectively extracted only through proper processing syslogs, e.g., using suitable machine learning techniques. A common approach to syslog is extract (i.e., build) templates from historical messages then match these templates. existing template extraction techniques either have low accuracies the "correct" set templates, or...
Anomaly detection is critical for web-based software systems. Anecdotal evidence suggests that in these systems, the accuracy of a static anomaly method was previously ensured bound to degrade over time. It due significant change data distribution, namely concept drift, which caused by or personal preferences evolving. Even though dozens detectors have been proposed years context system, they not tackled problem drift. In this paper, we present framework, StepWise, can detect drift without...
System logs, which describe a variety of events software systems, are becoming increasingly popular for anomaly detection. However, large system, current unsupervised learning-based methods suffering from low accuracy due to the high diversity while supervised learning nearly infeasible be used in practice because it is time-consuming and labor-intensive obtain sufficient labels different types systems. In this paper, we propose novel framework, LogTransfer, applies transfer anomalous...
Automatic failure diagnosis is crucial for large microservice systems. Currently, most methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study real-world cases to show that combining these sources of (multimodal data) leads a more accurate diagnosis. However, effectively representing and addressing imbalanced failures remain challenging. To tackle issues, propose <italic...
Proactive failure detection of instances is vitally essential to microservice systems because an instance can propagate the whole system and degrade system's performance. Over years, many single-modal (i.e., metrics, logs, or traces) databased anomaly methods have been proposed. However, they tend miss a large number failures generate numerous false alarms ignore correlation multimodal data. In this work, we propose AnoFusion, unsupervised approach, proactively detect through data for...
Additive key performance indicators (KPIs) (such as page view (PV), revenue, and error count) with multi-dimensional attributes ISP, Province, DataCenter) are common important in monitoring metrics Internet companies. When an anomaly happens to overall KPI, it is critical but challenging localize the root cause, which one (or more) combination of attribute values multiple dimensions. For example, total PV decrease caused by from “Beijing”or “China Mobile Beijing”, or “Beijing Shanghai”?...
In modern datacenter networks (DCNs), failures of network devices are the norm rather than exception, and many research efforts have focused on dealing with after they happen. this paper, we take a different approach by predicting failures, thus operators can intervene "fix" potential before Specifically, in our proposed system, named PreFix, aim to determine during runtime whether switch failure will happen near future. The prediction is based measurements current system status historical...
The detection of performance changes in software change roll-outs Internet-based services is crucial for an operations team, because it allows timely roll-back a when degrades unexpectedly. However, infeasible to manually investigate millions measurements many roll-outs.
In modern datacenter networks (DCNs), failures of network devices are the norm rather than exception, and many research efforts have focused on dealing with after they happen. this paper, we take a different approach by predicting failures, thus operators can intervene "fix" potential before Specifically, in our proposed system, named PreFix, aim to determine during runtime whether switch failure will happen near future. The prediction is based measurements current system status historical...
Anomaly classification, i.e., detecting whether a network device is anomalous and determining its anomaly category if yes, plays crucial role in troubleshooting. Compared to KPI curves, logs contain too much more valuable information for classification. However, the regular expression based classification techniques cannot tackle challenges lying log We propose LogClass, data-driven framework detect classify anomalies on logs. LogClass combines word representation method PU learning model...
Logs are one of the most valuable data sources for large-scale service management. Log representation, which converts unstructured texts to structured vectors or matrices, serves as first step towards automated log analysis. However, current representation methods neither represent domain-specific semantic information logs, nor handle out-of-vocabulary (OOV) words new types logs at runtime. We propose Log2Vec, a semantic-aware framework Log2Vec combines log-specific word embedding method...
The detection of performance changes in software change roll-outs Internet-based services is crucial for an operations team, because it allows timely roll-back a when degrades unexpectedly. However, infeasible to manually investigate millions measurements many roll-outs. In this paper, we present automated tool, FUNNEL, rapid and robust impact assessment large services. FUNNEL automatically collects the related each change. To detect significant behavior changes, adopts singular spectrum...
Logs are imperative in the management process of networks and services. However, manually identifying classifying anomalous logs is time-consuming, error-prone, labor-intensive. Additionally, rule-based approaches cannot tackle challenges underlying log identification classification resulting from new types partial labels. We propose LogClass, a framework to automatically robustly identify classify for network service based on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Timely anomaly detection of key performance indicators (KPIs), <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.</i> , service response time, error rate, is utmost importance to Web services. Over the years, many unsupervised deep learning-based approaches have been proposed. To achieve good performance, they require a long period KPI data for model training, which not easy guarantee with frequent changes. Additionally, training overhead...
Logs are one of the most valuable data sources for large-scale service (e.g., social network, search engine) maintenance. Log parsing serves as first step towards automated log analysis. However, current methods not adaptive. Without intra-service adaptiveness, cannot handle software/firmware upgrade because learned templates match new type logs. In addition, without cross-service logs a be accurately parsed when this is newly deployed. We propose LogParse, an adaptive framework, to support...
Today's large datacenters house a massive number of machines, each which is being closely monitored with multivariate time series (e.g., CPU idle, memory utilization) to ensure service quality. Detecting outlier machine instances crucial for management. However, it challenging task due the multiple classes and various shapes, high dimensionality, lack labels series. In this article, we propose DOMI, novel unsupervised model that combines Gaussian mixture VAE 1D-CNN, <b>d</b>etect...
S355 steels are widely used in various applications. However, they may be affected by hydrogen, which can induce hydrogen-induced cracking (HIC). The effects of the quenching temperature (Twq) on microstructure variation and HIC susceptibility steel was investigated microstructural characterization, hydrogen permeation (HP) test, slow strain rate tensile (SSRT) microprint technique (HMT) hydrogen-charged test. results indicate that treated specimens consisted predominantly lath martensite...