Python in SQL Server 2017: enhanced in-database machine learning

We are excited to share the preview release of in-database analytics and machine learning with Python in SQL Server. Python is one of the most popular languages for data science and has a rich ecosystem of powerful libraries.

Starting with the CTP 2.0 release of SQL Server 2017, you can now bring Python-based intelligence to your data in SQL Server.

The addition of Python builds on the foundation laid for R Services in SQL Server 2016 and extends that mechanism to include Python support for in-database analytics and machine learning. We are renaming R Services to Machine Learning Services, and R and Python are two options under this feature.

The Python integration in SQL Server provides several advantages:

  • Elimination of data movement: You no longer need to move data from the database to your Python application or model. Instead, you can build Python applications in the database. This eliminates barriers of security, compliance, governance, integrity, and a host of similar issues related to moving vast amounts of data around. This new capability brings Python to the data and runs code inside secure SQL Server using the proven extensibility mechanism built in SQL Server 2016.
  • Easy deployment: Once you have the Python model ready, deploying it in production is now as easy as embedding it in a T-SQL script, and then any SQL client application can take advantage of Python-based models and intelligence by a simple stored procedure call.
  • Enterprise-grade performance and scale: You can use SQL Server’s advanced capabilities like in-memory table and column store indexes with the high-performance scalable APIs in RevoScalePy package. RevoScalePy is modeled after RevoScaleR package in SQL Server R Services. Using these with the latest innovations in the open source Python world allows you to bring unparalleled selection, performance, and scale to your SQL Python applications.
  • Rich extensibility: You can install and run any of the latest open source Python packages in SQL Server to build deep learning and AI applications on huge amounts of data in SQL Server. Installing a Python package in SQL Server is as simple as installing a Python package on your local machine.
  • Wide availability at no additional costs: Python integration is available in all editions of SQL Server 2017, including the Express edition.

Data scientists, application developers, and database administrators can all benefit from this new capability.

  • Data scientists can build models using the full datasets on the SQL Server instead of moving data to your IDE or being forced to work with samples of data. Working from your Python IDE, you can execute Python code that runs in SQL Server on the data in SQL Server and get the results in your IDE. You are no longer dependent on application developers to deploy your models for production use, which often involves translating models and scripts to a different application language. These models can be deployed to production easily by embedding them in T-SQL stored procedures. You can use any open source Python package for machine learning in SQL Server. The usage pattern is identical to the now popular SQL Server R Services.
  • Application developers can take advantage of Python-based models by simply making a stored procedure call that has Python script embedded in it. You don’t need a deep understanding of the inner workings of the Python models, or have to translate it to a line of business language in close coordination with data scientists to consume it. You can even leverage both R and Python models in the same application—they are both stored procedure calls.
  • Database administrators can enable Python-based applications and set up policies to govern how Python runtime behaves on SQL Server. You can manage, secure, and govern the Python runtime to control how the critical system resources on the database machine are used. Security is ensured by mechanisms like process isolation, limited system privileges for Python jobs, and firewall rules for network access.

The standard open source CPython interpreter (version 3.5) and some Python packages commonly used for data science are downloaded and installed during SQL Server setup if you choose the Python option in the feature tree.

Currently, a subset of packages from the popular Anaconda distribution is included along with Microsoft’s RevoScalePy package. The set of packages available for download will evolve as we move toward general availability of this feature. Users can easily install any additional open source Python package, including the modern deep learning packages like Cognitive Toolkit and TensorFlow to run in SQL Server. Taking advantage of these packages, you can build and deploy GPU-powered deep learning database applications.

Currently, Python support is in “preview” state for SQL Server 2017 on Windows only.

We are very excited about the possibilities this integration opens up for building intelligent database applications. Please watch the Python based machine learning in SQL Server presentation and Joseph Sirosh Keynote at Microsoft Data Amp 2017 event for demos and additional information. We encourage you to install SQL Server 2017. Please share your feedback with us as we work toward general availability of this technology.

Thank you!

Sumit Kumar, Senior Program Manager, SQL Server Machine Learning Services

Nagesh Pabbisetty, Director of Program Management, Microsoft R Server and Machine Learning