Building Predictive Maintenance Solutions with Azure ML – Webinar Summary

This blog post is authored by Yan Zhang, Data Scientist at Microsoft.

Last month, we hosted an advanced analytics webinar on building predictive maintenance solutions. This blog post is a summary of the same.

Predictive Maintenance: IoT vs. Traditional

Predictive maintenance has been gaining attention lately because of the increased interest around Internet of Things (IoT) applications. The following table is a conceptual comparison of predictive maintenance in IoT to traditional predictive maintenance. Actual use cases aren’t always as clear cut.

 

Predictive Maintenance in IoT

Traditional Predictive Maintenance

Goal

Improve production and/or maintenance efficiency.

Ensure the reliability of machine operation.

Data

Data stream (time varying features), multiple data sources.

Very limited time varying features.

Scope

Component level, system level.

Parts level.

Approach

Data driven.

Model driven.

Tasks

Any task that improves production or maintenance efficiency. Examples: Failure prediction, fault/failure detection and diagnosis, maintenance actions recommendation.

Failure prediction (prognosis), fault/failure detection and diagnosis (diagnosis).

Webinar Description

In the webinar, we presented a failure prediction scenario in predictive maintenance applications. Failure prediction in predictive maintenance is defined as a technique to predict when an in-service machine will fail so that maintenance can be planned in advance.

Through a real-world example, three different ML models – regression, binary classification and multi-class classification – were formulated in Azure ML. By showing step-by-step procedures for data input, data preprocessing, data labeling and feature engineering to prepare the training and testing data on a publicly available dataset, we showed how to build an end-to-end failure prediction solution in Azure ML.

To help you build solutions on Azure ML, we have developed predictive maintenance template experiments for your reference. Another Azure service, Azure Stream Analytics, is very relevant in this scenario as well, as its great at handling stream data.  

Data Relevancy and Qualification in the Failure Prediction Scenario

We had several audience questions regarding data relevancy and data qualification. It is possible to build an accurate ML model only when relevant features are used for training. For instance, if the target is to predict car wheel failure, the training data should contain wheel-related features (e.g. telemetry data reflecting the health status of wheels, mileage in an hour/day, and the load of the car). However, if the target is to predict car engine failures, we probably need a different set of training data with car engine related features included.

In the next sections, we address data qualification criteria for this failure prediction scenario.

Use Data from the Component or Parts Level

First off, it’s best to use data from the component or parts level. When formulating a predictive problem, we need to carefully select the target failure event to predict. This decision is made based on both business needs and data availability. In general, it is more valuable to predict specific failure events than blurred ones.

Taking the failure prediction for car wheels as an example, we may predict “is the wheel going to have a failure”, or “is the front of the car going to have a wheel failure”, or “is the whole car going have a wheel failure”. Of these three prediction problems, the first one is to predict a more specific event, and the third is the most blurred one. In order to predict failure events more specifically, the data should have clear IDs at the component/parts level. For example, both the car operation data and maintenance records should contain information on specific wheels by ID in order to predict if a wheel is going to fail.

Data Should Include Failure Events

Secondly, data should include information that represents failure events. When building a binary classification model that predicts failure or non-failure examples, the algorithm learns normal operation patterns and failure patterns through the training process. The training data must contain a sufficient number of examples in both categories in order to learn these two different patterns.

In predictive maintenance applications, failure events are generally very rare. In this case, we may use some approximation methods to generate more failure events. To generate failure event data:

  • Use maintenance records and parts replacement history to approximate failures.

  • Approximate failures by identifying anomalies (outliers) in the training data.

  • Make use of domain knowledge as much as possible. For example, if the domain knowledge reveals that certain features should correlate with each other under normal conditions, we can infer that it indicates a faulty condition if these features values are not highly correlated.  

Extract an Aging Pattern from Data to Predict Failure

Finally, data should indicate an aging pattern in order to predict a machine’s remaining useful life (RUL) or to predict failures within a time frame. In order to predict RUL – i.e. how many more days (or hours, miles, transactions, etc.) a machine is likely to last before it fails – we assume the machine’s health status will degrade at some point during its operation.

But how do we know if the data indicates such a degrading pattern? Start with domain knowledge. For example, if the domain knowledge reveals that certain sensors reflect the health status of a machine component, or if heavy usage results in a greater likelihood of failure then light usage, we should create features around this information. Data visualization and correlation tests can be also helpful.      

Thanks for your interest in this topic, and we encourage you to get started with predictive maintenance using Azure ML today. As always, we welcome your feedback and comments.

Yan