The Future of Machine Learning on Corrosion

Machine learning is something you encounter daily whether you realize it or not. From smart phones, apps, Siri and Alexa, online advertisements, and self-driving cars, you potentially encounter artificial intelligence (AI) thousands of times per day. But did you ever consider how machine learning could influence the corrosion industry? Joseph Mazzella and Tom Hayden of Engineering Director, Inc. (EDI) (Evanston, Illinois, USA) are doing just that.

“EDI is a consulting firm specializing in developing, implementing, measuring, and administrating lean business processes and strategies, through the effective use of information technology, AI, and geographical information systems (GIS),” says Mazzella, who is CEO of the company. “We have a keen focus on the corrosion industry.”

While Mazzella’s background is in corrosion, operations, and sales engineering, Hayden’s is in software development with a background in consumer technology, having been an early employee at Facebook and GrubHub. They may seem like an unlikely partnership, but when Mazzella went searching for weather data for a corrosion research project with Enbridge Pipeline, Inc. Canada (Edmonton, Alberta, Canada), he crossed paths with Hayden, who was operating an open source data library for processing National Oceanic and Atmospheric Administration weather feeds, and the rest is history.

“I needed some data and I needed a data scientist, so I happened to stumble across Tom, and it so happened that he really took a liking to the corrosion industry,” says Mazzella. “So, he’s gone from having little exposure to corrosion, to generating spatial algorithms that predict corrosion growth rates for underground pipeline and atmospheric steel assets for Enbridge Pipeline. He really took a liking to our industry.”

“I love the field,” replies Hayden, who is CTO of EDI in addition to lecturing at Northwestern University (Chicago, Illinois, USA). “I think it’s a really interesting set of problems. From machine learning, computation, algorithms—this is a really fascinating set of problems because it’s this really difficult physical problem of trying to understand corrosion and model corrosion and building ways to estimate risk. It combines all these different aspects of physics with computer science, and you can do some cool stuff now.”

That cool stuff includes research into estimating the corrosion growth rates in underground pipelines using machine based learning, as well as the development of a computer vision app, which is currently in a proof of concept phase, that uses machine learning for visual inspection of corrosion.

Predicting Pipeline Corrosion

Mazzella and Hayden, along with colleagues Len Krissa and Haralampos Tsaprailis, both with Enbridge, set out to estimate corrosion risk in pipelines.1 The team utilized corrosion growth rate estimates collected from a dataset of operations from a North American pipeline operator, and developed machine learning algorithms to estimate risk. Their goal was to build response functions for corrosion growth rates for underground pipelines. Impressively, they were able to establish an accuracy rate of over 95% in their predictions on over 25,000 kilometers of active crude oil pipeline in North America at 10 meter increments.

“We're trying to use machine learning to help manage risk and it’s about generating probabilities of failure, probability of corrosion growth rates, or what is the actual growth rate itself,” says Hayden. “People have used statistics to estimate the corrosion growth rates for a long time; we’re just trying take it to the next level—using computational technology, deep learning, and neural networks.”

Estimating corrosion rates for underground pipelines is far different from those above ground. A multitude of factors contribute to corrosion in underground pipelines, including alternating current (AC) interference, atmospheric conditions, soil parameters, cathodic protection, road salts, and geographic features. Geostatistical tools attempt to mimic these conditions to target areas most likely to have advanced corrosion, reducing the risk of failure. Historically, it has been difficult to classify the corrosivity of environmental conditions in underground pipelines, with most independent studies being isolated to a single geographical location. Pinpointing corrosion can limit unnecessary excavation, making it more cost effective. And the more data available, the more successful these tools can be.

FIGURE 1 The corrosion growth rate of pipelines available for model training and evaluation is shown in red.

The researchers collected data from both public and proprietary sources and made three main transformations to the data: categorical data, one-hot encoding, and binarization. Additionally, they utilized training data in which the primary independent variable was leveraged from a study that used in-line inspection (ILI) back-to-back measurements. These values were collected by a North American operator using magnetic flux leakage measurement, and includes measurements in the United States and Canada at the resolution of each girth weld address along the pipeline (Figure 1).

The researchers had to “train” the machine learning models by including input from dependent variables (Figure 2), such as:

• Soil properties.

• Atmospheric conditions, including time of wetness, mean average temperature, total number of days below freezing, sulfides, and chlorides.

• AC interference, including proximity to high voltage powerlines and proximity to power substations.

• Proximity to roadways, railways, water, and other pipelines.

• Magnetic anomalies from satellite data.

• Pipeline features, including years in service and manufacturer.

• Rectifer Amperage

FIGURE 2 Example of independent variables used in machine learning training models consisting of environmental and pipeline data, and predicted corrosion growth rates.

Machine Learning

The researchers evaluated three main approaches to ILI back-to-back corrosion growth rate. The first was a log-linear regression with transparent feature mappings. The second was a modern machine learning toolset—eXtreme gradient boosting (xgboost). The third was an artificial neural network that was training on the same data. Typically, the more data provided to the model, in the form of pipeline specifications, the more dependable it was.

“Unlike classical statistics, the downside to using these AI algorithms is that the model itself is challenging to interpret,” notes Hayden. “There is no simple way to gaze into the AI and understand what it is doing. In this sense, the algorithms are a ‘black box.’” It’s difficult to understand the logic the algorithm is using because you can see the input, and you can see the output, but there is little visibility into how the model actually works (Figure 3). Without knowing how, it is difficult to detect bias, find mistakes, and build a causal understanding of corrosion mechanisms.

It is difficult for researchers to determine the algorithms, and this lack of knowledge is referred to as the “black box.”

After comparing and contrasting the three different modeling technologies, they found, at least in the cases presented, that an algorithm called xgboost showed the best fit. “This result is not surprising,” states Hayden. “xgboost performs well on many problems across many industries.”

There’s an App for That!

Wouldn’t it be convenient to take a photo of a corroded pipeline and have an app correctly classify the level of corrosion? There’s an app for that…almost! Mazzella and Hayden explain that it is currently in a “proof of concept” phase to show what is possible. “The app in its current form is not really the future of the app. The app in the present form is to show that we can do AI or machine learning with an app and train it to recognize corrosion,” explains Mazzella. “The buildout for this is for the next release to be able to recognize and interpret corrosion to a visual standard.” The prototype can be downloaded from the Apple Store at

For example, if an inspector is performing a site survey or being trained to recognize a standard, they can take a photo with their phone and the app can provide specifics related to a certain standard and rate the type and level of corrosion. Instead of the inspector taking a picture and concluding what level of corrosion it is, the app can do it.

These computer vision tools have seen significant improvements over the past decade. “Computer vision algorithms specifically take photos as input and can output anything—information on the images, people identified in them, or the degree of corrosion visible, along with locations, and severity,” says Hayden. “This technology will revolutionize the corrosion inspection industry by augmenting manual inspections or replacing them altogether, allowing more frequent automated scans.” Hayden emphasizes that this can only work if vast datasets are compiled of corrosion in a variety of settings, and not just pipelines, but also marine environments, architecture, infrastructure, etc.


No project is without its challenges and this undertaking has been no different. Before they were able to even begin the machine algorithms and spatial analytics, the team spent three years researching available public data sources. And not just U.S. sources, but global sources, since they aspire to have the technology available worldwide. Once they obtained the data, it then had to be compared with field samples to ensure the readings were in line. “The hardest part about this project was basically finding public data sources and understanding what data was consistent and reliable globally,” says Mazzella.

And once the data was obtained, there were additional challenges. “By far the hardest part is making your data useful,” notes Hayden. “It is one hurdle to generate a corrosion growth rate estimate, but a much harder problem making the predictions usable to professionals out in the field.” The goal is to provide the user with useful reporting and interpretation. If, for example, a user needed to perform an excavation, they would have information regarding the risk of failure in specific locations and be able to view the features and growth rates. The researchers aim to ensure that their technology lines up with the practical application of everyday work through geospatial tools.

Moving Forward

Because Hayden is so passionate about this venture, he is leading a NACE task group, TG 589. This group is devoted to amassing an industry-wide, open-source collection of labeled corrosion images for computer vision use. Building an image repository of diverse datasets will allow computer scientists to build algorithms and technology to implement optical service recognition. But, as previously noted, collecting this much data is not an easy task and they cannot do it alone. They are asking operators or corrosion industry professionals to contribute photos of corrosion to build out their repository. This has been successful in other industries, for instance radiologists have been putting out datasets for five+ years and it has resulted in breakthroughs in radiology processing.

“AI has the upside potential to dramatically improve many of the hurdles faced by those in pipeline integrity,” says Hayden. “It can deliver the promise of improved safety, compliance, and efficiency from slow cycles on observing and estimating corrosion to real-time assessments of assets.”


1 J. Mazzella, et al., “Estimating Corrosion Growth Rate for Underground Pipeline: A Machine Learning Based Approach,” CORROSION 2019, paper no. 13456 (Houston, TX: NACE International, 2019).

Related Articles