In-database Machine Learning in SQL Server 2017

This post is authored by Sumit Kumar, Senior Program Manager, Microsoft and Nellie Gustafsson, Program Manager, Microsoft

We are excited to announce the general availability of SQL Server 2017 and Machine Learning Services. You can start using Python-based in-database Machine Learning Services for production usage now. With support for both R and Python, we have rebranded ‘R Services’ to ‘Machine Learning Services’. SQL Server now supports the three most popular data science languages and enables you to use the latest AI and ML packages from the open source world in-database, across ALL editions on Windows – making SQL Server 2017 the commercial database with built-in AI.

As we have covered in previous posts, there are many advantages of using this technology, such as the elimination of data movement, ease of deployment, improved security and better scale and performance. These abilities make SQL Server a powerful enterprise platform for machine learning. Examples of what some customers have built using Machine Learning Services:

In-database Python integration

With the full support of in-database Python in SQL Server Machine Learning Services in SQL Server 2017, the vast population of Python developers and ML practitioners can now leverage the power of SQL Server alongside their Python code. And the SQL Server developers now have access to the extensive Python ML and AI libraries from the open source ecosystem along with the latest innovations from Microsoft (revoscalepy and microsoftml libraries) for developing intelligent applications with in-database analytics.

Python operationalization with T-SQL

Full Python integration with the sp_execute_external_script infrastructure in SQL Server enables the enterprise-grade operationalization of Python models and scripts as simple stored procedures.

Streaming data from SQL to Python processes and MPI ring parallelization support provides much-improved performance to the Python scripts.

Python remote compute in SQL Server

With the SQL Server remote compute context, data scientists and developers can push the compute of Python code to the server from their development environments, to explore data and develop models without the need to move data.

In-database Python integration is not limited to just machine learning and AI solutions – it is equally useful for general purpose data analysis work by combining Python and SQL in powerful ways; leveraging strengths of respective languages.

Orders of magnitude faster scoring

After demonstrating industry leading 1 million+ rows/sec batch scoring performance, we are now introducing Native Scoring for even faster prediction! Some concurrent prediction scenarios require close to real-time response times. Models trained using the RevoScaleR and revoscalepy packages are supported by this new PREDICT verb (a system table value function) which makes it easy to embed this performant scoring functionality in regular T-SQL SELECT statements without invoking the R or Python runtime. Native scoring is also available on SQL Server on Linux.

Real-time scoring is also available to SQL Server 2016 customers on upgrading in-database R to the latest release of Microsoft Machine Learning Server.

Improving R package management in SQL Server

One of the key values of R is its vibrant community with thousands of open source packages. We have further improved the R package management in SQL Server. We have a rich set of R functions to do package management in SQL Server that gives users the ability to install, uninstall and manage packages in various roles and scopes. In addition, it is now possible to install R packages on SQL Server using TSQL commands (CREATE EXTERNAL LIBRARY). This approach ensures availability of the previously installed packages when a server fails over.

Machine Learning Server

Along with the general availability of SQL Server 2017, we have also announced the general availability of the new Microsoft Machine Learning Server! This is the underlying software that is integrated into SQL Server as Machine Learning Services. Machine Learning Server is the transformation of Microsoft R Server into an even more flexible platform that offers a choice of R and Python languages and brings the best of algorithmic innovations from the open source world and Microsoft. Its multi-platform support enables customers to build portable models wherever their data is and operationalize the models on platforms like SQL Server, making the intelligence easily consumable by business applications.

Key new algorithmic innovations of Machine Learning Server benefitting the SQL Server scenarios are:

Revoscalepy

This package has the Pythonic version of Microsoft’s proprietary Parallel External Memory Algorithms (APIs for linear and logistic regressions, decision tree, boosted tree and random forest) and a rich set of APIs for ETL, remote compute contexts and data sources. These are the same scalable and parallelized algorithms (with ‘rx’ prefix) that have been the differentiating value proposition of Microsoft R Server and allow scaling analytics to arbitrarily large datasets, way beyond the available memory.

microsoftml

This package is a set of state of the art, battle-tested ML algorithms and transforms with Python bindings including deep neural net, one class SVM, fast tree, forest, linear and logistic regressions etc. In addition, this package contains pre-trained models for extracting features from images using ResNet models, and doing sentiment analysis from English language text, which dramatically simplifies the creation and deployment of complex AI scenarios on image and text data.

We have also simplified the pricing model to make it easier to acquire and use Machine learning Server on Hadoop. Each SQL Server EE core under Software Assurance gives you rights to use Machine Learning Server on 5 nodes of Hadoop.

Call to action

SQL Server Machine Learning Services is available in all editions of SQL Server 2017 on Windows and we encourage you to download and explore the above-mentioned enhancements in the free express or developer editions. Our R and Python getting started tutorials will walk you through building your first machine learning solutions with SQL Server. You can find additional tutorials on Microsoft docs.