Twenty Years of Machine Learning at Microsoft

This blog post is authored by John Platt , a Distinguished Scientist at Microsoft Research.

People may not realize it: Microsoft has more than twenty years of experience in creating machine learning systems and applying them to real problems. This experience is much longer than the recent buzz around Big Data and Deep Learning. It certainly gives us a good perspective on a variety of technologies and what it takes to actually deploy ML in production.

The story of ML at Microsoft started in 1992. We started working with Bayesian Networks, language modeling, and speech recognition. By 1993, Eric Horvitz, David Heckerman, and Jack Breese started the Decision Theory Group in Research and XD Huang started the Speech Recognition Group. In the 90s, we found that many problems, such as text categorization and email prioritization, were solvable through a combination of linear classification and Bayes networks. That work produced the first content-based spam detector and a number of other prototypes and products.

As we were working on solving specific problems for Microsoft products, we also wanted to get our tools directly into the hands of our customers. Making usable tools requires more than just clever algorithms: we need to consider the end-to-end user experience. We added predictive analytics to the Commerce Server product in order to provide recommendation service to our customers. We shipped the SQL Server Data Mining product in 2005, which allowed customers to build analytics on top of our SQL Server product.

As our algorithms became more sophisticated, we started solving tougher problems in fields related to ML, such as information retrieval, computer vision, and speech recognition. We blended the best ideas from ML and from these fields to make substantial forward progress. As I mentioned in my previous post, there are a number of such examples. Jamie Shotton, Antonio Criminisi, and others used decision forests to perform pixel-wise classification, both for human pose estimation and for medical imaging. Li Deng, Frank Seide, Dong Yu, and colleagues applied deep learning to speech recognition.

In addition to more sophisticated algorithms for existing problems, we have been exploring new frameworks for machine learning. The most common frameworks in ML are classification and regression. In these frameworks, ML learns a mapping from a vector of data to either a label (classification) or a value (regression). But, ML can do much more than produce labels or values. There’s a whole sub-field of ML called “structured output prediction”. An early example of this was “learning to rank”, where ML produces a ranked list of items (very useful for Bing, as I mentioned before). Another interesting framework is the construction of causal models, which we have used to model our advertising system. Yet another framework is generating programs directly from data (rather than through a model).

As ML researchers, we are super excited about Microsoft Azure ML. Azure ML will create models that can be deployed to the cloud, rather than being restricted to one particular data management platform (such as SQL). Creating cloud services with ML should reduce the friction of getting ML into specific applications. As researchers, we would love to capture all of our experience and algorithms into the Azure ML product, so that our customers can use their creativity to build ML-based products.

In future blog posts, we will describe some of our current ML research topics. We can also go into more detail about some of the technology mentioned, above. If you find a particular research topic interesting, please let us know and we will try to get guest blog posts written by the creator of the technology. Thanks for reading!

John Platt
Learn more about my research. Follow me on twitter.