How to use External R packages & libraries in MAML
Azure ML studio is Data Scientists’ favorite tool that provides enough functionality for creating and maintaining models for predictive analytics. However, there could be a situation when the existing modules do not suffice for an experiment or, set of requirements. In these cases, ML studio provides the facility of extending the functionality of ML Studio through the R language by using the Execute R Script module. This can also be used in case when you already have your ML module written in R language and you want to import it to ML studio.
This module accepts multiple input datasets and it yields a single dataset as output. You can type an R script into the R Script parameter of the Execute R Script module.
Several R packages are available in addition to the standard packages of the base installation. Currently, it is not possible for you to install R packages directly into the ML studio through the GUI. However, we can install them into the individual workspaces via R code.
A list of the packages included in the current release is provided in the List of installed packages table below.
Listing all currently-installed packages
The list of installed packages can change. To get the complete list, include the following lines in the Execute R Script module send the list to the output dataset:
#R code to be used with Execute R Script
out <- data.frame(installed.packages())
To view the output log, run the experiment, select the Execute R Script module, and click the View output log link near the bottom of the module parameter pane. At this point, we support 400+ R packages out of MAML’s R Engine.
Now, to install & use R language packages which are not supported by the ML studio and are required in your code, we allow the following method.
Here are the key steps:
Zip up the package(s) to be installed into your workspace for the experiment from R package repository on your machine.
Note: We might need to use multiple packages even in situations where we need only one external library in our code since it might have dependencies which are also not present by default in MAML.
Zip all the zipped packages into another zip file so that all the required packages are bundled together and ready to go
Click on +New at the bottom of the page and upload the zipped file created above as a dataset
Verify that the file has been uploaded successfully
Now, in your experiment add Execute R Script module if it is already not added and connect the dataset input port (if needed) and type/paste your R code which uses the library that we are going to install using the external zipped packages
Note: There are 3 input ports on this module
- Drag and drop the uploaded zipped file which contains the packages to be installed
- Use the 3rd input port in the Execute R Script module and connect the zipped file in the previous step
- In the R code just before using the external library which depends on the package(s), use the following code to install the package(s). In this example, I am using the library fpc.
#install package dependancies
install.packages("src/mclust.zip", lib = ".", repos = NULL, verbose = TRUE)
(success.mclust <- library("mclust", lib.loc = ".", logical.return = TRUE, verbose = TRUE))
install.packages("src/flexmix.zip", lib = ".", repos = NULL, verbose = TRUE)
(success.flexmix <- library("flexmix", lib.loc = ".", logical.return = TRUE, verbose = TRUE))
#Install actual package
install.packages("src/fpc.zip", lib = ".", repos = NULL, verbose = TRUE)
(success <- library("fpc", lib.loc = ".", logical.return = TRUE, verbose = TRUE))
library(fpc,lib.loc = ".")
- Post this run the experiment and use the intended library.
- Once the run is complete, we can look at the output of Execute R Script. Please note that there are two output ports. The first one
- Output port 1 is used to visualize and “Save as Dataset” if there is a dataset output while output port 2 outputs any standard output like verbose output and R plots. In the example here it shows that all the packages which were presented to Execute R Script module from the 3rd input port inside of a zipped file were extracted to be used and placed into the path [“src”].
Additionally, in case you need to use your existing .R and .RData files in MAML, please use the same method as above of zip and upload. Additionally, in the R code inside of the Execute R Script module, please use the following section to provide the names of the files. The zipped file input via the 3rd input port will extract the contents of the zipped file into the path [“SRC”] in the workspace sandbox.
**Note: Please take into account the legalities of using the R Language Packages while using this functionality.