RFM: A Simple and Powerful Approach to Event Modeling

This post is by Gal Oshri, a Program Manager in the Data Group at Microsoft.

Introduction 

RFM is a simple and intuitive technique for segmenting customers and has been used by marketers for decades. RFM also has surprising value in machine learning applications despite its simplicity. This blog post describes how a generic technique has allowed us to come within 1% accuracy of winning solutions in various ML competitions, such as placing in the top 30 entries of the KDD Cup 2015 and getting a boost of 502 positions on the leaderboard of an AirBnB Kaggle competition.

RFM With KDD Cup 2015 Data 

Intro to RFM and the Data-Set 

RFM has been widely used in direct marketing and database marketing for identifying the customers who are most likely to respond or make a purchase [1]. It stands for:

  • Recency: When was the last time the customer made a purchase?
  • Frequency: How many purchases has the customer made?
  • Monetary value: How much revenue was made from the customer?

The exact definition of RFM depends on the scenario, but the intuition is consistent: Customers who have bought something last week are more likely to buy something again than customers who haven’t bought anything in a year. Similarly, customers who buy many items and who spend a lot of money are also more valuable. Note that, in many scenarios, the M might represent some other form of value than revenue. For example, it could be total time spent using a product.

To keep things concrete, we will use the KDD Cup 2015 data-set as our primary example. This data-set consists of student logs in online university courses. The learning task is to predict which students will drop out of courses. The data-set is of the following form:

Gal-1

We have a set of logs based on things the user did, like working on a particular problem or watching a video. Each event has a timestamp, a course ID (cid), a student ID (uid), and an enrollment ID (eid) which is unique for every course-student pair. Our goal is to predict which enrollment IDs will not be seen in a future time period.

ML With Raw RFM Values 

The simplest application of RFM to this problem is on the enrollment IDs. Intuitively, a student is more likely to drop out if:

  1. The student hasn’t attended class recently. This represents recency and can be expressed by the enrollment ID’s last timestamp.
  2. The student hasn’t solved many problems or watched many videos. This represents frequency and can be viewed as the number of events for the enrollment ID.
  3. The student hasn’t spent many hours in the course. This is one interpretation of (non-monetary) value in this scenario and can be calculated as the number of unique hours on which the enrollment ID has had an event.

These features can be easily calculated with database operations like MAX, COUNT, and SUM. They also have a variety of other nice properties such as online computation that we will discuss later in this post.

After aggregating RFM values for each enrollment ID, we can add the known churn labels (training data).  The data-set now looks like this:

Gal-2

This data-set is now in a format that is suitable for training a model that predicts the churn label based on the RFM features. Using boosted decision trees as our learner, we build the following experiment in Azure ML: ​

Gal-3

The experiment does 10-fold cross validation on our data using a two-class boosted decision tree with “minimum number of samples per leaf node” set to 50. This experiment results in an AUC of 0.866. For comparison, the winning entry to the KDD cup had an AUC of 0.909. We’re not doing so bad for just 3 extremely simple features!

Adding RFM Bucket Values 

A simple extension of what we have done so far is to partition the users into five buckets for recency, frequency, and monetary value. We will show that it adds value to the ML solution but it also provides human readable segmentation information. Bucket 1 has the lowest values (e.g. haven’t visited recently) and bucket 5 has the highest values (e.g. visited very recently). A user might be in bucket 5 for recency, bucket 3 for frequency, and bucket 4 for monetary value. This user can be described as having RFM values 534. This makes RFM 555 users the most valuable and RFM 111 users the least valuable.

There are various approaches to partitioning the users into buckets. The simplest approach involves sorting the users independently for recency, frequency, and monetary value, and then choosing bucket split points so that each bucket has 20% of the population. This is an unweighted approach and is identical to partitioning the users based on quintiles. A weighted approach chooses the split points so that each bucket has 20% of the total value. For example, if there is $10M in revenue and the leading customer generated $2M, bucket 5 for monetary value consists only of that customer. An alternative to the independent approach is the tree based approach, where recency has 5 buckets, frequency has 5 buckets per recency bucket, and monetary value has 5 buckets per frequency bucket. This means that, as an example, a frequency value of 4 can have different meaning when recency is 4 instead of 5. This distinction is useful in various marketing scenarios as it helps, for example, to differentiate between users in the same recency bucket. A comparison between the independent and tree based approaches is shown in the figure below.

Gal-4

For simplicity, we will focus on the independent unweighted approach to bucketize the users but include the results for the other techniques. After bucketizing the users, we can calculate the churn rate per bucket and visualize it as follows:

Gal-5

Figure created using Matplotlib

The color scale goes from red (most likely to churn) to blue (least likely to churn). The size of the points indicates how many users had the specified RFM value. The shade of the points indicates depth in the visualization. This figure highlights the transition from likely to churn to unlikely to churn as we move from 111 to 555. We briefly note that when we use the weighted tree -based approach to bucketization, the 111 and 555 groups are stronger indicators of churn in this scenario. However, we will see below that they do not add additional value for the ML model. A detailed exploration of RFM visualization can be found in [2].

Going back to our churn model, we can use these bucket value as features to see if they provide additional value:

Gal-9

Running the experiment again gives us an AUC of 0.877, another decent improvement that brings us closer to the competition’s winner. It’s worth noting that we get an AUC of 0.868 if we remove the actual RFM values and only use the bucketized values. This shows that the RFM buckets are very meaningful. We can vary the number of buckets to 255 or use percentiles (100 buckets) to slightly improve the score, but our results highlight that even using the simple and intuitive 5 buckets provides good AUC. If we add the 5-point buckets using the weighted and tree based approaches, there is negligible additional value.

Extending RFM Beyond Enrollment IDs 

So far, we have calculated RFM values for every enrollment ID. However, we can also calculate RFM values for other entities as well, such as the course and student, and add them to the appropriate rows (based on the enrollment ID). One way to imagine this is that students might be less likely to drop a very popular course which many students have participated in (i.e. courses that have a high Frequency value). We can further featurize our data by calculating RFM for every type of event. For example, we can calculate the RFM for each enrollment ID’s video watching events. We can also do vector subtractions to create features that are comparisons. For example, we can compare a student’s last visit to a course and when the course ended. We can add many features and allow the learning algorithm to decide which ones are important.

Following the steps outlined above, we can increase our AUC to 0.901. We are less than 1% away from the winning solution. An experiment showing our solution can be found in the Cortana Intelligence Gallery here. Let’s quickly summarize the various results we’ve seen:

Features AUC
Enrollment ID RFM raw values 0.866
Enrollment ID RFM 5-point buckets 0.868
Enrollment ID RFM 5-point buckets + RFM raw values 0.877
Enrollment ID RFM 5-point buckets + RFM raw values + 5-point tree based buckets + 5-point weighted tree based buckets 0.877
Full RFM featurizer in the Cortana Intelligence Gallery 0.901
Competition winner 0.909

 

Additional Properties of RFM Features 

We mentioned that RFM features can be calculated using basic database operations and showed how these features are valuable for ML models. However, RFM features have a variety of other properties that make them easy and effective to use:

  • The raw values can be updated in an online manner when a new event arrives (e.g. update latest timestamp or increment a count).
  • This means we can also update the RFM buckets online by looking at the current bucket boundaries using a streaming solution such as Azure Stream Analytics.
  • These bucket boundaries can be computed using a sample of the data since they are not sensitive to the tails of the distribution. This means our periodic updates are relatively cheap. Alternatively, we can calculate the bucket values when a user’s RFM bucket values are needed.
  • RFM values are robust to a shift in data distribution if the bucket boundaries are recomputed periodically. If there is a sudden increase in the time students spend doing coursework in the KDD data, the bucket boundaries will reflect that when we recalculate them.

RFM in Other Scenarios 

RFM features are not only helpful in churn prediction problems. In a recent Kaggle competition to predict in which country a new Airbnb user will make her/his first booking, the RFM featurizer was used with minimal configuration changes to get an NDCG@5 score of 0.883. For comparison, the winning entry had a score of 0.886. In the competition, there was a user data-set with information about the users (available from 2010 to 2014) and a sessions data-set of what users did during their sessions (2014 only). A single Fast Tree trained on the user data-set got a score of 0.870. Using the RFM featurizer on the sessions data (and corresponding user data) achieved a score of 0.882. By taking a simple weighted average of these two models, the score increased to 0.883. Note that this work was done after the competition ended.

RFM features have shown value in a variety of other ML competitions and customer scenarios. In a recent customer engagement, we found that with a small number of RFM features we can do a good job at predicting which users will take a particular action on a website. One learning from this project is that being able to calculate the RFM features in near real-time is important. A user might not come back to the website, so we want to be able to make decisions quickly. Our solution enabled calculating and using the RFM features in an online manner using Azure Event Hub, Azure Stream Analytics, and Azure ML.

Conclusion 

There is a vast range of scenarios where RFM features can have a positive impact. We are exploring a general solution that will enable you to send user logs and automatically featurize them in near real-time so you can use RFM features for a variety of problems.

Gal

 

References​​ 

  1. Fader, Peter S., Bruce GS Hardie, and Ka Lok Lee. “RFM and CLV: Using iso-value curves for customer base analysis.” Journal of Marketing Research 42.4 (2005): 415-430.​​
  2. Kohavi, Ron, and Rajesh Parekh. “Visualizing RFM Segmentation.” SDM. 2004.