A Scalable End-to-End Anomaly Detection System using Azure Batch AI

This post is authored by Said Bleik, Senior Data Scientist at Microsoft.

In a previous post I showed how Batch AI can be used to train many anomaly detection models in parallel for IoT scenarios. Although model training tasks are usually the most demanding ones in AI applications, making predictions at scale on a continuous basis can be challenging as well. This is especially common in IoT applications where data streams of many devices need to be processed and scored constantly, often in real-time.

To complete the pipeline of an end-to-end solution I've created a walkthrough on GitHub that includes submitting and scheduling prediction jobs in addition to training of models. The solution comprises several Azure cloud services and Python code that interacts with those services. The scheduling component allows continuous training and scoring in a production environment. The diagram below shows the proposed solution architecture where the main components are Azure services that can easily connect to each other through configuration or SDKs. This is a general solution and only one way of designing predictive maintenance solutions. In practice, you can replace any of these components with your favorite tools/services or add more components to handle the complexities of your specific use case.

Solution Architecture

The walkthrough and scripts can be found in this GitHub repository.

Said