Build Your Own R Modules in Azure ML

This post is by Roope Astala, Senior Program Manager in Microsoft’s Information Management and Machine Learning team.

Azure ML currently offers almost 100 modules to solve a wide spectrum of data science problems that our customers may encounter. Nevertheless, what if you need more, or maybe something a bit different from what we have to offer?

Custom R Modules

Custom R Modules give you a way to extend the built-in module set with your own. You can share these modules with friends or co-workers by putting them in GitHub.

Custom R modules are first-class citizens – they can be used in experiments and operationalized in web services just like built-in modules. You can use such modules for things such as:

  • Handling of domain-specific data formats.

  • Flexible data transformations.

  • Customized feature construction and extraction.

Within your R script, you can use hundreds of R packages preinstalled in Azure ML. You can even bundle your own packages with the module.

Example

As an example, let’s create a module that takes some JSON-formatted data and parses it into an Azure ML dataset. The module consists of 3 parts:

  • An R code file that defines what the module does.

  • Optionally, any accompanying files – e.g. configuration files or R code packages.

  • An XML file that defines what inputs and output and parameters the module will have. In a sense, the XML is the skeleton of the module, and the R code its muscle.

The module takes in one input, a dataset which consists of a JSON-formatted string, and one output, the contents of JSON objects as a flattened dataset. It also has one parameter: a string that specifies null replacement value. The corresponding R script is:

parse_json.R:

parse_json <- function(data_in, nullvalue="NA") {

   library(RJSONIO)

   library(plyr)

   data_out <- ldply(fromJSON(as.character(data_in[1,1]),nullValue=nullvalue,simplify=TRUE))

   return(data_out)

}

The XML description defines the name of the module, which R function to call to run the module, as well as input and output datasets, and input parameters.

parse_json.xml:

<Module name="Parse JSON Strings">  

<Owner>AzureML User</Owner>

  <Description>This is my module description. </Description>

  <Language name="R" sourceFile="parse_json.R" entryPoint="parse_json"/> 

    <Ports>

      <Output id="data_out" name="Parsed dataset" type="DataTable">

        <Description>Combined Data</Description>

      </Output>

      <Input id="data_in" name="JSON formatted dataset" type="DataTable">

                <Description>Input dataset</Description>

      </Input>

    </Ports>

    <Arguments>

      <Arg id="nullvalue" name="Null replacement value" type="string" isOptional = "true">

        <Description>Value used to replace JSON null value</Description>

      </Arg>

    </Arguments>

</Module>

To add the module to Azure ML, you simply put the different files into a zip package and upload the package by selecting +NEW > Module in your Azure ML Studio workspace. Once uploaded, your module appears in “Custom” category in the module palette, alongside all the built-in modules:

You can now use the new R module to build experiments, and deploy it to production by publishing your experiment as web service.

Summary

Custom R Modules are a great way for you to extend Azure ML’s built-in modules. Such modules can be used in experiments, operationalized in web services and shared with your colleagues and the community. Although the example provided in this blog post is a simple one, custom R modules can be far more complex and can take multiple inputs and outputs and parameters of different types. Also, they have access to the same user interfaces as built-in modules, e.g. column selectors and drop-down menus of parameters. In the future, we plan to add support for input and output types beyond datasets: e.g. learners and transformations.

Do give it a try and share your feedback with us below.

Roope