Which Algorithm Family Can Answer My Question?

This post, authored by Brandon Rohrer, Senior Data Scientist at Microsoft, is the last in a three-part series introducing data science with no jargon. The first post was titled What Can Data Science Do For Me? and the second was on What Types of Questions Can Data Science Answer?

There are a few data science questions that seem to pop up a lot. They’re listed here, together with the best algorithm family. If you don’t see yours or one like it, let us know and we’ll add it. Several of these questions have links to sample experiments or working examples in the Azure ML Marketplace.

 

 

first choice ○ may also work well

Two-class classification

Multi-class classification

Regression

Anomaly detection

Unsupervised learning

Reinforcement learning

Predictive Maintenance

Should I replace this part now?

 

 

 

 

When will this part fail?

 

 

 

 

Will this tire fail in the next thousand miles?

 

 

 

 

Is this unit behaving in an unusual way?

 

 

 

 

What is the remaining useful life of this aircraft engine?

 

 

 

 

 

Which vehicle needs servicing most urgently?

 

 

 

Is this pressure reading unusual?

 

 

 

 

Are these voltages normal for this season and time of day?

 

 

 

 

Is this internet message typical?

 

 

 

 

Is this network activity part of an attack?

 

 

 

 

 

Out of a thousand units, how many of this model of bearings will survive 10,000 hours of use?

 

 

 

 

How likely is this employee to be an insider security threat?

 

 

 

 

Which printer models fail the same way?

 

 

 

 

Which groups of sensors in this jet engine tend to vary with (and against) each other?

 

 

 

 

 

first choice ○ may also work well

 

Two-class classification

Multi-class classification

Regression

Anomaly detection

Unsupervised learning

Reinforcement learning

Marketing

Will my customer leave me for a competitor?

 

 

 

 

When will this customer make another purchase?

 

 

 

 

Will this customer renew their subscription?

 

 

 

 

Of all my customers, which 10% should receive an offer?

 

 

 

 

Should this customer receive a promotional offer?

 

 

 

 

Will this customer respond to this marketing campaign?

 

 

 

 

 

Which offer should this customer receive?

 

 

 

 

Is the customer amenable to upselling?

 

 

 

 

 

Will this customer tip for their taxi ride?

 

 

 

 

 

Does the $5 coupon or the 25% off coupon result in more return customers?

 

 

 

 

How likely is this person to make a purchase?

 

 

 

 

It this review positive or negative?

 

 

 

 

Is the overall Twitter mood on my business positive or negative?

 

 

 

 

How many new followers will I get next week?

 

 

 

 

 

Which advertisement should be listed first for this reader?

 

 

 

 

Will this customer click on the top link?

 

 

 

 

Is this combination of purchases very different from what this customer has made in the past?

 

 

 

 

What will my fourth quarter sales be for the nation?

 

 

 

 

 

What fraction of pulls on this slot machine result in payout?

 

 

 

 

What other products is this customer likely to buy?

 

 

 

 

Which other customers have similar preferences to this one?

 

 

 

 

 

What is a natural way to divide this set of customers into groups?

 

 

 

 

 

How highly rated will this wine be?

 

 

 

 

Which viewers like the same kind of movies?

 

 

 

 

Where should I place this ad on the webpage so that the viewer is most likely to click it?

 

 

 

 

first choice ○ may also work well

 

Two-class classification

Multi-class classification

Regression

Anomaly detection

Unsupervised learning

Reinforcement learning

Finance

What will the price of this commodity be in thirty days?

 

 

 

 

What will the price of this stock be next week?

 

 

 

 

 

How many shares of this stock should I buy right now?

 

 

 

 

Will mortgage interest rates go up, down, or remain the same next week?

 

 

 

 

How many orders will there be from a region for this product next month?

 

 

 

 

Is this credit card charge fraudulent?

 

 

 

 

Does this applicant pose an acceptable credit risk?

 

 

 

 

 

Does this applicant pose an exceptionally high credit risk?

 

 

 

 

 

How likely is this customer to repay a car loan?

 

 

 

 

Will the U.S. enter a recession in the next year?

 

 

 

 

 

What are the most common patterns in gasoline price changes?

 

 

 

 

 

What is a natural way to break this set of companies up into groups?

 

 

 

 

 

How much are the stock prices in my portfolio likely to change in the next year?

 

 

 

 

 

first choice ○ may also work well

 

Two-class classification

Multi-class classification

Regression

Anomaly detection

Unsupervised learning

Reinforcement learning

Operational Efficiency

What will the demand for this item (or service) be next month?

 

 

 

 

How many bikes will be rented in the next hour?

 

 

 

 

 

How much beer will be consumed at this event?

 

 

 

 

 

What price should I set on this item?

 

 

 

 

Is this flight going to be on time?

 

 

 

 

 

What fraction of today’s flights will depart on time?

 

 

 

 

How many employees should be scheduled to work on Black Friday?

 

 

 

 

Is it time to order more of this product?

 

 

 

 

What management practices do successful CEOs have in common?

 

 

 

 

 

first choice ○ may also work well

 

Two-class classification

Multi-class classification

Regression

Anomaly detection

Unsupervised learning

Reinforcement learning

Energy Forecasting

How many kilowatts will be demanded from my wind farm 30 minutes from now?

 

 

 

 

During which days of the week does this electrical substation have similar electrical power demands?

 

 

 

 

Is this grid likely to face an overload situation in the next day?

 

 

 

 

What will the consumer demand be in this region over the next month?

 

 

 

 

 

Is the power usage in this grid unusual?

 

 

 

 

first choice ○ may also work well

 

Two-class classification

Multi-class classification

Regression

Anomaly detection

Unsupervised learning

Reinforcement learning

Internet of Things

Has this patient’s health suddenly taken a turn for the worse?

 

 

 

 

Is your heart rate within your typical training range?

 

 

 

 

What activity is the wearer of a fitness tracker engaged in?

 

 

 

 

 

Should the robot vacuum clean the living room or continue to charge?

 

 

 

 

Do I move this obstacle or navigate around it?

 

 

 

 

Which aircraft is causing this radar signature?

 

 

 

 

 

Who is the speaker in this recording?

 

 

 

 

 

What will the temperature be next Tuesday?

 

 

 

 

 

Will an earthquake occur in this city in the next year?

 

 

 

 

 

Should the thermostat adjust the temperature higher, lower, or leave it where it is?

 

 

 

 

Should I continue driving at the same speed, brake, or accelerate in response to that yellow light?

 

 

 

 

first choice ○ may also work well

 

Two-class classification

Multi-class classification

Regression

Anomaly detection

Unsupervised learning

Reinforcement learning

Text and Speech Processing

What is a natural way to break these documents into five topic groups?

 

 

 

 

 

Which topic category does this document belong to?

 

 

 

 

 

Which subject area does this news article belong to?

 

 

 

 

 

What groups of words tend to occur together in this set of documents? (What are the topics they cover?)

 

 

 

 

 

Which handwritten digit is this?

 

 

 

 

 

Which handwritten letter of the alphabet is this?

 

 

 

 

 

Which subject folder does this email belong to?

 

 

 

 

 

What is this person talking about?

 

 

 

What is the translation of this sentence from English into Chinese?

 

 

 

Is the person speaking into the phone authorized to use the phone?

 

 

 

 

first choice ○ may also work well

 

Two-class classification

Multi-class classification

Regression

Anomaly detection

Unsupervised learning

Reinforcement learning

Image Processing and Computer Vision

What are the 4 color groups that occur in this image?

 

 

 

 

 

Does this mammogram show breast cancer?

 

 

 

 

 

Is there a dog or a bench in this image?

 

 

 

 

What objects are in this image?

 

 

 

 

 

How many people are there in this photo?

 

 

 

 

 

What is the age and gender of the person in the photo?

 

 

 

 

What is this person doing in this video?

 

 

 

 

 

Does anyone behave suspiciously in this surveillance video?

 

 

 

 

 

In which direction should this robot move given what it sees?

 

 

 

 

 

After you choose the algorithm family that fits your question, the next step is to choose your algorithm and get to work. From here, it gets a bit more technical, but the final results are worth it. Visit How to Choose a Machine Learning Algorithm in Azure ML and the Machine Learning Algorithm Cheat Sheet for Azure ML to take the next step.

Brandon
Follow me on Twitter or ping me on LinkedIn