The Advanced Analytics Solution Building Process – Business Problem Formulation

This is the third in a series of posts on business transformation through analytics – how organizations can run successful analytics pilots and build a thriving data science practice. The post is by Ilan Reiter, Principal Data Science Manager at Microsoft. 

We introduced the Advanced Analytics Solution Building Process (AA SBP) in a previous post, including its four phases, namely Business Problem Formulation, Data Acquisition, Data Science, and Business Integration.

The first phase, Business Problem Formulation, is the topic of today’s blog. Before we dive deeper into it, let’s look at the overall workflow and where the Business Problem Formulation phase fits in. Here’s a picture of the entire AA SBP process:

The goal of the Business Problem Formulation Phase is to establish the business foundations of the advanced analytics process and to help maximize the likelihood of successfully completing it.  As Stephen R. Covey explains in his book, The 7 Habits of Highly Effective People, we need to begin with an end in mind. This phase is designed to discover opportunities to drive business value from an organization’s underlying data assets. It is typically owned by business or product drivers or the CIO, and is performed in close collaboration with the data science team. This phase is strongly coupled with the Business Integration and Data Science phases. Such coupling is key to ensure that all teams are fully aligned and executing under the same set of business goals.

The phase consists of 4 steps:

  1. Opportunity Discovery
  2. Qualification
  3. Problem Formulation
  4. Consumption Scenarios

Let us look these steps in more detail.

Opportunity Discovery

The purpose of this step is to discover opportunities to derive business value from data that is collected and owned by the organization. Most modern businesses are constantly collecting data from various sources as part of their everyday operations – data originating from customer transactions, internal business processes, sensors, and so forth. But businesses struggle to convert this data into knowledge or useful actions.

A critical starting point is to identify business problems that can be potentially solved by using data and advanced analytics. To facilitate the opportunity discovery process, we believe that organizations should adopt and practice a data-driven culture. That means they should:

  • Define clear metrics to measure business process outcomes.
  • Consistently collect data from all business operations.
  • Persistently drive the use of data to answer business questions.

Furthermore, business managers should ask questions that facilitate the use of data. For example:

  • How can we use data to increase revenues by 20%?
  • Can we predict which users are most likely to use our product?
  • Can we predict the probability that a give user will churn?
  • Can we use behavioral data to segment our users?
  • Can we use historical data to forecast future sales?
  • Can we use sensor data to predict the lifespan of a mechanical part?

In answering such questions, it becomes possible to identify situations where existing data could become quite useful.

A good outcome for this step is to identify and collect several potential use cases. You could even have a use cases backlog to register potential ideas, and prioritize them in the subsequent Qualification step.


The goal of this step is to qualify and prioritize potential use cases and business opportunities.

In a data-driven culture, use cases and ideas will constantly emerge. To ensure proper investment in these ideas, there is a need to screen ideas based on clear criteria. The criteria applied should ensure that a proper implementation of a given use case will be likely to succeed. In this context, success means the creation of the expected business value with a clear ROI (Return on Investment). Here are a few qualification criteria for use cases:

  • The business problem is predictive in nature. Are we looking for insight (i.e. answering the question of why something happened) or are we predicting an event or outcome? Predictive use cases are typically associated with greater business value as they allow taking actions against a specific predicted outcome.
  • There is a clear path of action once a prediction has been made. For example, predicting that a user is likely to churn can trigger a specific action to help retain them.
  • The organization can set quantitative and qualitative goals to demonstrate a successful solution implementation. For example, reducing customer churn by 5%. Such goals should be derived from the organization’s business goals.
  • There is a clear integration scenario with the company’s business workflow. For example, a churn prediction solution could be integrated with an existing campaign management system that is used to retain and re-engage customers.
  • There is data with sufficient quality to support the use case. This should be qualified in conjunction with the data science team.
  • There is a clear way to make this data available to an analytics platform, be it cloud based or on-premises.
  • There is clear way to establish an end to end data flow to facilitate the movement of data on an ongoing basis and operationalize the solution.
  • The desired ROI can be achieved, and within the desired timeframe.

Qualifying use cases using the above criteria can greatly improve their success rate and establish a good beachhead for the future implementations.

Problem Formulation

Once we have discovered and qualified a promising use case, we should then scope and define the success criteria for an Advanced Analytics (AA) implementation project.  At this point we are entering an important step in which we need to clearly communicate to the data scientists and the rest of the organization the specifics of the use case and its goals. In doing so, we are increasing the odds of successful implementation and the delivery of business value. Another important goal of this step is to set the foundation for scoping the data problem (this will be discussed in a future post).

It is therefore essential that the problem be well-formulated. An effective way of formulating a problem is to specify it as a set of qualitative and quantitative attributes. The following table lists some of these attributes along with an example use case:





Problem statement


High churn rate results in loss of revenues and high cost to the company


Business goal


Reduce churn by 10% within 6 months


Business metrics


Churn rate measured as a % of existing customers, based on who left the service in the latest period




Predicting “likely to churn” customers and targeting them with a special campaign


Proposed solution


Use customer attributes and behavioral data to predict probability to churn


Required data


Customer attributes and usage activity log


Data sources


Customer profile data and website event log repository


Data volume


Over 10 million customers every period


Performance requirements


Precision > 80%


Recall > 90%


Execution time per batch < 1 hour


Implantation resources


A data scientist and data engineers




Between 4 to 5 sprints of 2 weeks each


Business stakeholders


Joe Smith – CIO; Mary Lu – Business Analyst




If the goal is met, then ROI is expected within one year of use


By completing this set of attributes, we can establish a framework for kicking off and tracking the progress of the project until its successful completion.

Consumption Scenarios

In the final step of the Business Problem Formulation phase, we are interested in understanding how the newly created solution, once ready, will integrate with existing business processes. By doing this at an early stage of the project, we can better predict if and how the solution will work. Not only does this help in the qualification process, it also sets the stage for the business evaluation step (to be discussed in a future post) in which we test out these scenarios.

The table below includes some typical questions associated with consumption scenarios and how they would be answered with an example use case. In this example we are predicting customer churn with a view to reduce the churn rate and increase retention:





When will it be used?


Before running a new marketing campaign.


How often will it be used?


On a daily basis, once a day per customer.


What is the input for the prediction?


List of existing customers and their attributes, and their activities last week.


What is the source of the input?


Customer ID list + website activity log.


What is the expected output from the prediction?


Probability of churn, on a per customer basis.


Where/how will the output be used?


To determine whether a customer should be included in the upcoming campaign.


How will the output be integrated in the existing business flow?


By integrating it into the campaign management system.


How will business value be created?


Target only users that are likely to churn and prevent them from churning. Avoid extra costs by not targeting customers that are likely to continue.


We would like to ensure that we can clearly answer each of these questions.  If we have been able to do that then we may have a winner. Otherwise, we may want to move on the next idea in our backlog.

In our next blog post of this series we will look into the Data Science phase and how it can accomplish the business goals that have been now set.