May 23, 2017

A comparison of Machine Learning in R and Azure ML studio – which approach should you use?

By

Theta

A comparison of Machine Learning in R and Azure ML studio – which approach should you use?

R and Azure Machine Learning (AML) are tools commonly used for machine learning tasks, and can be used in different ways, individually or together.

  1. R Only: A machine learning models is built in R and published as a web service to Azure. There is no need to build the model in AML.
  2. Hybrid: Initially, a machine learning model is built in R. Later, it is used in AML, via modules that allow writing R code in AML studio. This requires retraining the model in Azure.
  3. Azure Only: A machine learning model is built in AML only. R and Python codes may be used at various steps during the process within AML to enhance the analytics capability.

How do you figure out which approach to take, and which machine learning tool to use? I’ve found it helpful to consider the following questions to establish the best fit.

Which tool is good during development?

R has the most extensive set of techniques and algorithms available, and has wonderful visualization support. In Azure ML, we can import any R or Python algorithm; this makes it much more beneficial. For visualizing data one still needs to use R, Excel or another tool.

For feature engineering and data cleaning tasks, R provides a much bigger set of techniques; Azure ML also has a reasonable set of built-in algorithms, which can be extended using math and R scripting modules.

Which tool is good during production and particularly how easy it is to retrain and publish a model?

Retraining (updating) an R model is a simple process, but publishing an R model in Azure is complex and can take some time when done for the first time. This operation also requires a few dependencies that are external to R. Retraining and publishing Azure ML models is comparatively very simple.

Recommendation

For most machine learning applications, Azure ML has the benefit of both worlds. It is simple and its functionality can be extended when needed. Model retraining and publishing is straightforward and can be done with ease.