Bo Qiao

ORCID: 0000-0002-8997-8317
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Cloud Computing and Resource Management
  • Software System Performance and Reliability
  • Network Security and Intrusion Detection
  • Anomaly Detection Techniques and Applications
  • Data Stream Mining Techniques
  • IoT and Edge/Fog Computing
  • Machine Learning and Data Classification
  • Software Engineering Research
  • Advanced Neural Network Applications
  • Blockchain Technology Applications and Security
  • Domain Adaptation and Few-Shot Learning
  • Machine Learning and Algorithms
  • Topic Modeling
  • Software Reliability and Analysis Research
  • Advanced Graph Neural Networks
  • Neural Networks and Applications
  • Time Series Analysis and Forecasting
  • Traffic Prediction and Management Techniques
  • Data Visualization and Analytics
  • Advanced Image and Video Retrieval Techniques
  • Data Management and Algorithms
  • Data Mining Algorithms and Applications
  • Adversarial Robustness in Machine Learning
  • Distributed and Parallel Computing Systems
  • Imbalanced Data Classification Techniques

Microsoft Research Asia (China)
2018-2024

Southern University of Science and Technology
2023

Microsoft Research (United Kingdom)
2019-2022

Hunan Agricultural University
2009-2021

Friedrich-Alexander-Universität Erlangen-Nürnberg
2018-2020

Northwestern Polytechnical University
2015

University of Reading
2008

Lanzhou Institute of Chemical Physics
2008

Chinese Academy of Sciences
2008

Northeastern University
2007

Logs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, existing methods mainly construct detection model using log event data extracted from historical logs. However, we find that do not work well in practice. These close-world assumption, which assumes is stable over time set distinct events known. our empirical study shows practice, often contains previously unseen...

10.1145/3338906.3338931 article EN 2019-08-09

The management of cloud service incidents (unplanned interruptions or outages a service/product) greatly affects customer satisfaction and business revenue. After years efforts, enterprises are able to solve most automatically timely. However, in practice, we still observe critical that occurred an unexpected manner orchestrated diagnosis workflow failed mitigate them. In order accelerate the understanding unprecedented provide actionable recommendations, modern incident system employs...

10.1145/3368089.3417055 article EN 2020-11-08

Feature engineering is a crucial step for developing effective machine learning models. Traditionally, feature performed manually, which requires much domain knowledge and time-consuming. In recent years, many automated methods have been proposed. These improve the accuracy of model by automatically transforming original features into set new features. However, existing either lack ability to perform high-order transformations or suffer from space explosion problem. this paper, we present...

10.1109/icdm.2019.00017 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2019-11-01

In cloud systems, incidents affect the availability of services and require quick mitigation actions. Once an incident occurs, operators developers often examine logs to perform fault diagnosis. However, large volume diverse overwhelming details in log data make manual diagnosis process time-consuming error-prone. this paper, we propose Onion, automatic solution for precisely efficiently locating incident-indicating logs, which can provide useful clues diagnosing incidents. We first point...

10.1145/3468264.3473919 article EN 2021-08-18

Performance anomaly alerting based on trace data plays an important role in assuring the quality of online service systems. However, engineers find that many anomalies reported by existing techniques are not interest for them to take further actions. For a large scale with hundreds different microservices, current methods either fire lots false alarms applying simple thresholds temporal metrics (i.e., latency), or run complex end-to-end deep learning model limited interpretability. Engineers...

10.1109/icse-seip58684.2023.00029 article EN 2023-05-01

Detecting and analyzing potential anomalous performances in cloud computing systems is essential for avoiding losses to customers ensuring the efficient operation of systems. To this end, a variety automated techniques have been developed identify anomalies computing. These are usually adopted track performance metrics system (e.g., CPU, memory, disk I/O), represented by multivariate time series. However, given complex characteristics data, effectiveness these methods affected. Thus,...

10.1109/tvcg.2019.2934613 article EN IEEE Transactions on Visualization and Computer Graphics 2019-01-01

Time series classification is a popular and important topic in machine learning, it suffers from the class imbalance problem many real-world applications. In this paper, to address problem, we propose novel practical oversampling method named T-SMOTE, which can make full use of temporal information time-series data. particular, for each sample minority class, T-SMOTE generates multiple samples that are close border. Then, based on those near border, synthesizes more samples. Finally,...

10.24963/ijcai.2022/334 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

Scoring systems are commonly seen for platforms in the era of Big Data. From credit scoring financial services to membership scores E-commerce shopping platforms, platform managers use such guide users towards encouraged activity pattern, and manage resources more effectively efficiently. To establish systems, several "empirical criteria" first determined, followed by a dedicated top-down design each score factor, which usually requires enormous effort adjust tune function new application...

10.1109/tkde.2023.3341430 article EN IEEE Transactions on Knowledge and Data Engineering 2023-12-12

Virtual machine (VM) provisioning is a common and critical problem in cloud computing. In industrial platforms, there are huge number of VMs provisioned per day. Due to the complexity resource constraints, it needs be carefully optimized make platforms effectively utilize resources. Moreover, practice, VM from scratch requires fairly long time, which would degrade customer experience. Hence, advisable provision ahead for upcoming demands. this work, we formulate practical scenario as...

10.24963/ijcai.2020/208 article EN 2020-07-01

In large-scale cloud systems, unplanned service interruptions and outages may cause severe degradation of availability. Such incidents can occur in a bursty manner, which will deteriorate user satisfaction. Identifying rapidly accurately is critical to the operation maintenance system. industrial practice, are typically detected through analyzing issue reports, generated over time by monitoring services. large number reports quite challenging. An report multi-dimensional: it has many...

10.1145/3368089.3409741 article EN 2020-11-08

With the rapid deployment of cloud platforms, high service reliability is critical importance. An industrial platform contains a huge number disks, and disk failure common cause unreliability. In recent years, many machine learning based prediction approaches have been proposed, they can predict failures on status data before actually happen. this way, proactive actions be taken in advance to improve reliability. However, existing treat each individually do not explore influence neighboring...

10.1145/3442381.3449867 article EN 2021-04-19

Positive-unlabeled learning (PU learning) is an important case of binary classification where the training data only contains positive and unlabeled samples. The current state-of-the-art approach for PU cost-sensitive approach, which casts as a problem relies on unbiased risk estimator correcting bias introduced by However, this requires knowledge class prior subject to potential label noise. In paper, we propose novel dubbed PULNS, equipped with effective negative sample selector, optimized...

10.1609/aaai.v35i10.17064 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework meticulously observe and analyze graphical interface (GUI) control information applications. This enables seamlessly navigate operate within individual across them requests, even when spanning multiple The incorporates interaction module, facilitating action grounding without human intervention enabling...

10.48550/arxiv.2402.07939 preprint EN arXiv (Cornell University) 2024-02-08

Cloud systems have become increasingly popular in recent years due to their flexibility and scalability. Each time cloud computing applications services hosted on the are affected by a outage, users can experience slow response times, connection issues or total service disruption, resulting significant negative business impact. Outages usually comprised of several concurring events/source causes, therefore understanding context outages is very challenging yet crucial first step toward...

10.1145/3611643.3613891 article EN 2023-11-30

Programming image processing algorithms on hardware accelerators such as graphics units (GPUs) often exhibits a trade-off between software portability and performance portability. Domain-specific languages (DSLs) have proven to be promising remedy, which enable optimizations generation of efficient code from concise, high-level algorithm representation.

10.1145/3207719.3207723 article EN 2018-05-28

Cloud-based services are surging into popularity in recent years. However, outages, i.e., severe incidents that always impact multiple services, can dramatically affect user experience and incur economic losses. Locating the root-cause service, service contains root cause of outage, is a crucial step to mitigate outage. In current industrial practice, this generally performed bootstrap manner largely depends on human efforts: directly causes outage identified first, suspected traced back...

10.1109/icse43902.2021.00085 article EN 2021-05-01

The optimization of resource is crucial for the operation public cloud systems such as Microsoft Azure, well servers dedicated to workloads large customers 365. Those tasks often need take unknown parameters into consideration and can be formulated Prediction+Optimization problems. This paper proposes a new method named Correlation-Aware Heuristic Search (CAHS) that capable accounting uncertainty in delivering effective solutions difficult We apply this solving predictive virtual machine...

10.1609/aaai.v35i14.17467 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Cloud providers often have resources that are not being fully utilized, and they may offer them at a lower cost to make up for the reduced availability of these resources. However, customers be hesitant use such offerings (such as spot VMs) making trade-offs between resource is always straightforward. In this work, we propose Snape (Spot On-demand Perfect Mixture), an intelligent framework optimize by dynamically mixing on-demand VMs with VMs. Through detailed characterization based on real...

10.1145/3582016.3582028 article EN 2023-03-20

Optimizing data-intensive applications such as image processing for GPU targets with complex memory hierarchies requires to explore the tradeoffs among locality, parallelism, and computation. Loop fusion one of classical optimization techniques has been proven effective improve locality at function level. Algorithms in are increasing their complexities generally consist many kernels a pipeline. The inter-kernel communications intensive exhibit another opportunity improvement system scope...

10.5555/3314872.3314901 article EN Symposium on Code Generation and Optimization 2019-02-16

Combinatorial interaction testing (CIT) is an important technique for highly configurable software systems with demonstrated effectiveness in practice. The goal of CIT to generate test cases covering the interactions configuration options, under certain hard constraints. In this context, constrained arrays (CCAs) are frequently used as CIT. Constrained Covering Array Generation (CCAG) NP-hard combinatorial optimization problem, solving which requires effective method generating small CCAs....

10.1109/icse43902.2021.00030 article EN 2021-05-01

There has been a rapidly increasing demand for developing highly configurable software systems, which urgently calls effective testing methods. In practice, t-wise coverage widely recognized as useful metric to evaluate the quality of test suite and achieving high is important ensuring adequacy. However, state-of-the-art methods usually cost fairly long time generate large suites pairwise (i.e., 2-wise coverage), would lead ineffective inefficient systems. this paper, we propose novel local...

10.1145/3468264.3468622 article EN 2021-08-18
Coming Soon ...