Advancing Data Completeness and Strategically Directing Record Reviews With a Machine Learning Approach
DOI:
10.1115/ipc2024-134143
Publication Date:
2024-12-13T19:11:17Z
AUTHORS (5)
ABSTRACT
Abstract
This study presents an innovative machine learning approach for predicting the operating stress level as a percentage of the Specified Minimum Yield Strength (% SMYS) when primary pipeline properties such as grade and wall thickness in steel pipelines were incomplete.
This becomes particularly vital for pipelines where such data is missing, a common challenge in pipeline integrity management. The parameter, % SMYS, is essential for classifying pipeline segments into integrity management and regulatory categories such as Distribution and Transmission. However, missing records often necessitate time-consuming manual reviews of retained documentation to determine the appropriate values to record in their digital pipeline records databases. These reviews, while necessary, may not all have the same impacts on the company’s integrity management approach, decision-making, or mitigation of risks. There may be small diameter (< NPS2), low pressure pipelines operating at a low stress level (< 5% SMYS) ranging up to large diameter (> NPS24) pipelines operating at much higher stress levels (> 30% SMYS). The method’s goal is to strategically steer manual record reviews and enhancement efforts toward segments with the most significant impact, especially those potentially operating above 30% SMYS and above 20% SMYS in the distribution category.
This study employed machine learning tools to estimate the stress level or operating category (e.g., < 20% SMYS, 20%–30% SMYS, and > 30% SMYS) of pipeline segments. The model was trained and tested with a dataset of steel pipe segments with complete records, where stress level is available. The models will be used to estimate the stress levels of pipeline segments with incomplete records. The records that are estimated as high stress levels or operating categories (20%–30% SMYS, and > 30% SMYS) are then prioritized for manual review. When the stress level of the missing record is predicted, the problem is framed as a regression problem, but when the operating stress level category of the missing record is estimated, the problem is framed as a classification problem. The implications and results of these two different frameworks are compared in this study. This paper used data splitting and cross-validation for training, validation, and testing the models. In this study, models including random forest and extreme gradient boosting (XGBoost) regressor were utilized.
Once records are prioritized and selected for manual records review, the results of the review can then be used to objectively evaluate the performance of the predictive model. In addition, these results can be fed back into the training dataset to improve the future performance of the model. This study provides broad guidance for the use of machine learning for dealing with incomplete data and to facilitate the integrity assessment of large pipeline systems.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....