We are excited to share the preview release of in-database analytics and machine learning with Python in SQL Server. Python is one of the most popular languages for data science and has a rich ecosystem of powerful libraries.
Starting with the CTP 2.0 release of SQL Server 2017, you can now bring Python-based intelligence to your data in SQL Server.
The addition of Python builds on the foundation laid for R Services in SQL Server 2016 and extends that mechanism to include Python support for in-database analytics and machine learning. We are renaming R Services to Machine Learning Services, and R and Python are two options under this feature.
The Python integration in SQL Server provides several advantages:
- Elimination of data movement: You no longer need to move data from the database to your Python application or model. Instead, you can build Python applications in the database. This eliminates barriers of security, compliance, governance, integrity, and a host of similar issues related to moving vast amounts of data around. This new capability brings Python to the data and runs code inside secure SQL Server using the proven extensibility mechanism built in SQL Server 2016.
- Easy deployment: Once you have the Python model ready, deploying it in production is now as easy as embedding it in a T-SQL script, and then any SQL client application can take advantage of Python-based models and intelligence by a simple stored procedure call.
- Enterprise-grade performance and scale: You can use SQL Server’s advanced capabilities like in-memory table and column store indexes with the high-performance scalable APIs in RevoScalePy package. RevoScalePy is modeled after RevoScaleR package in SQL Server R Services. Using these with the latest innovations in the open source Python world allows you to bring unparalleled selection, performance, and scale to your SQL Python applications.
- Rich extensibility: You can install and run any of the latest open source Python packages in SQL Server to build deep learning and AI applications on huge amounts of data in SQL Server. Installing a Python package in SQL Server is as simple as installing a Python package on your local machine.
- Wide availability at no additional costs: Python integration is available in all editions of SQL Server 2017, including the Express edition.
Data scientists, application developers, and database administrators can all benefit from this new capability.
- Data scientists can build models using the full datasets on the SQL Server instead of moving data to your IDE or being forced to work with samples of data. Working from your Python IDE, you can execute Python code that runs in SQL Server on the data in SQL Server and get the results in your IDE. You are no longer dependent on application developers to deploy your models for production use, which often involves translating models and scripts to a different application language. These models can be deployed to production easily by embedding them in T-SQL stored procedures. You can use any open source Python package for machine learning in SQL Server. The usage pattern is identical to the now popular SQL Server R Services.
- Application developers can take advantage of Python-based models by simply making a stored procedure call that has Python script embedded in it. You don’t need a deep understanding of the inner workings of the Python models, or have to translate it to a line of business language in close coordination with data scientists to consume it. You can even leverage both R and Python models in the same application—they are both stored procedure calls.
- Database administrators can enable Python-based applications and set up policies to govern how Python runtime behaves on SQL Server. You can manage, secure, and govern the Python runtime to control how the critical system resources on the database machine are used. Security is ensured by mechanisms like process isolation, limited system privileges for Python jobs, and firewall rules for network access.
The standard open source CPython interpreter (version 3.5) and some Python packages commonly used for data science are downloaded and installed during SQL Server setup if you choose the Python option in the feature tree.
Currently, a subset of packages from the popular Anaconda distribution is included along with Microsoft’s RevoScalePy package. The set of packages available for download will evolve as we move toward general availability of this feature. Users can easily install any additional open source Python package, including the modern deep learning packages like Cognitive Toolkit and TensorFlow to run in SQL Server. Taking advantage of these packages, you can build and deploy GPU-powered deep learning database applications.
Currently, Python support is in “preview” state for SQL Server 2017 on Windows only.
We are very excited about the possibilities this integration opens up for building intelligent database applications. Please watch the Python based machine learning in SQL Server presentation and Joseph Sirosh Keynote at Microsoft Data Amp 2017 event for demos and additional information. We encourage you to install SQL Server 2017. Please share your feedback with us as we work toward general availability of this technology.
Thank you!
Sumit Kumar, Senior Program Manager, SQL Server Machine Learning Services
Nagesh Pabbisetty, Director of Program Management, Microsoft R Server and Machine Learning
Sounds very useful! It this feature likely to appear in SQL Server Azure any time soon?
Yes, we are working on enabling both R & Python in Azure SQL DB. I do not have a timeline to share at the moment. But feel free to reach out to me offline & we can discuss further about your scenarios/roadmap in general.
yipeee… only good news today 🙂
This will be as useless as R services now. SQL server just does not have scalability to run any useful model
R and Python already support loading data into data frame from SQL Server. What is this useful for?
This integration is about moving the R/Python compute to SQL Server machine to eliminate data movement across machines. If you move millions/billions of rows to the client for modeling or scoring then the network overhead will dominate end-to-end execution time. Moreover the R/Python integration in SQL Server works with parallel query processing in SQL Server, security & resource governance.
For example, you can execute a query that runs in parallel (DOP = 8) that trains a model in parallel (with RevoScaleR or revoscalepy or MicrosoftML). This mode of execution can also be leveraged for scoring in parallel coupled with streaming capabilities in SQL Server. These features allow you to run more concurrent scripts with resource policies enforced from within SQL Server. This is hard to achieve if you run R / Python scripts on a standalone server.
Feel free to reach out to me offline & I will be happy to walk you through the integration, how it differs from running R/Python script from a client, performance advantages.
I suggest you watch the video and read the text before making a post that makes you look like an ignorant idiot
Gr8. Does this have memory limitation of R? Does tge dataset need to completely fit in Server’s memory space?
We do not run R / Python within the SQL Server process or memory space. The R / Python processes run outside of the SQL Server address space & share the machine resources. This is also done for security reasons.
Yes, by default many of the data structures in R / Python are memory resident objects so the same limitations apply. However, Microsoft ships many algorithms as part of the R Server package (RevoScaleR or revoscalepy) that has a SQL Server data source object which can work with data that doesn’t fit in memory & supports parallel execution. Using SQL data source object, you can run a parallel query in SQL Server that sends data to many R / Python processes in parallel to compute say linmod/logit/tree model. This can also be used for scoring scenarios with streaming capability.
Feel free to reach out to me offline. I will be happy to walk-through the samples & demo the performance aspects.
Why suck an old version of Python?
Based on our testing, we found that many packages have incompatibility issues with latest Python 3.6 version. We will look at feedback & figure out what version to ship with SQL Server 2017 RTM. Currently, we are shipping Python version 3.5.2.
Will SQL Server 2016 be upgraded to include support for Python as well?