Retail Customer Churn Prediction: How-To Guide Now Available

This post is authored by Lixun Zhang, Data Scientist, Daisy Deng, Software Engineer, and Tao Wu, Principal Data Scientist Manager, at Microsoft.

Predicting customer churn rate is among the most sought-after machine learning and analytics applications for retail stores, and of high value to companies that are eager to take advantage of the ever-increasing amounts of customer data they are collecting. Retaining existing customers is estimated to be five times cheaper than the cost of attracting new ones, and so businesses want to be proactive about things and predict who is likely to churn before it happens. Businesses also wish to identify the factors that are related to high churn rates, which in turn helps them apply resources towards acquiring the right type of customers in the first place.

Microsoft has been active in the domain of churn prediction, having published several resources to help businesses understand the data science process behind customer churn prediction.

We are now pleased to announce the Retail Customer Churn Prediction Solution How-to Guide, available in Cortana Intelligence Gallery and a GitHub repository.

What's the Guide About?

The Guide includes a Solution Overview for Business Audiences and a Technical Deployment Guide that provides the steps needed to implement an end-to-end solution to predict customer churn rates, including data ingestion, data storage, data movement, machine learning / advanced analytics, model operationalization, model retraining, and visualization.

The specific business case in the Guide is about predicting churn rate such that the question "What is the probability that a customer will churn soon?" can be answered.

We say a customer churned when that customer spent no money at the store in the last 21 days. This definition can be customized by two factors: the number of days from today and the amount of money spent. For example, some businesses might define a churned customer as someone who has made less than $10 in purchases over the last 30 days. The problem is formatted as a two-class classification problem, and a machine learning algorithm is used to create the predictive model that learns from the simulated data based on the Tafeng dataset, which can be found in this GitHub repository resource folder. The data includes transaction-level information such as user-id, item-id, quantity, and value, as well as user-level information such as age and region.

Who will Benefit from the Guide?

The Guide was developed with three distinct audiences in mind: business decision makers, data scientists, and engineers.

The Solution Overview for Business Audiences helps you understand the business implications of customer churn, providing a high-level view of how churn rate analytics can be streamlined with Cortana Intelligence.

Data scientists and engineers will benefit from the Technical Deployment Guide, which provides detailed instructions on how to stitch together on-premises and Azure services. The Technical Deployment Guide includes an Azure ML experiment that provides a starting point for data scientists to develop churn prediction models. Interested data scientists can also learn to generate powerful visualizations using Power BI.

To get started and learn more, check out the Guide in the Cortana Intelligence Gallery.

Data scientists looking for guidance on building models for customer churn can visit the Retail Customer Churn Prediction Template, which covers the steps needed to implement a customer churn model, including feature engineering, label creation, training and evaluation.

To create an on-premises version of this solution using SQL Server R Services, take a look at the Customer Churn Prediction Template with SQL Server R Services, which walks you through that process.

Do put the guide to use in the real world, and share your feedback and thoughts with us, below.

Lixun, Daisy & Tao