November 27, 2019
Enhancing Digital Twins Part 3: Predictive Maintenance with Azure Databricks
In this post, we will detail the R analyses we conducted on our Predictive Maintenance dataset from post 2 to help our digital twin projects make more informed maintenance suggestions.
Part 3 of a 4-part series about displaying Predictive Maintenance insights in Digital Twins.
In part 1 of this series, we introduced the concept of Predictive Maintenance in digital twins. In part 2 we showed how we set up Databricks to prepare for Predictive Maintenance analyses.
1. Notebook analyses set up (R)
We’d created our notebook with its primary language set to R, but we will also be making use of Python in part 4 to write our R analyses to Azure blob storage.
To change the language in Databricks’ cells to either Scala, SQL, Python or R, prefix the cell with ‘%’, followed by the language.
You will not need to prefix the cell with anything if you are using your primary language.
As we’d stated in part 1, we plan to use Apache Spark to handle all our dataset operations. In order to use Apache Spark in Databricks, we will need to establish a connection with ‘sparklyr’. If working in R studio, we’d need to first install the ‘sparklyr’ package, but we’ve found most packages in CRAN have already been preinstalled for Databricks notebooks so only loading is necessary.
We will reserve Cmd 1 for loading packages and initializations. Insert a new cell to continue.
In the new cell, Cmd 2, we will read in our Predictive Maintenance dataset from Databricks’ FileStore, which we uploaded in part 2. Our dataset is located at ‘/FileStore/tables/maintenancedata.csv’.
To run the notebook cells, click ‘Run All’ from the top toolbar. You will know all the cells succeeded if there are no error tracebacks underneath each present cell. Once you’ve succeeded, you will have finished initializing the notebook for further analyses.
We will continue to split each block of analysis into separate cells for better separation of concerns and readability.
Just before we move on to conducting Predictive Maintenance, you might like to load all the packages we ended up using.
Notice that package ‘GGally’ is throwing us an error. It’s not yet available in Databricks so we are unable to load it in. We will have to install it within the cluster that houses our notebook. Head to your cluster, navigate to ‘Libraries’, and ‘Install New’ CRAN Package - GGally.
Once installed, loading the GGally package should no longer be an issue.
2. Predictive Maintenance analyses (R)
We will now run through a few of the analyses we used to construct our Predictive Maintenance report.
2.1. We used boxplots to study the distribution of variables based on a 5-number summary.
The first graph above shows the statistical distribution of a machine’s lifetime. It reveals that on average, machines break after 80 weeks, and break within 60 to 100 weeks. The second and third graphs show that branch C and Aircon units tend to break several weeks before the others.
2.2.0. A survival plot that implements the Kaplan-Meier (KM) method helped us to estimate the time it takes for a machine to break.
print(maintenance_graph) indicates there are 1000 machines, of which 397 are broken. The median survival is 80 years.
2.2.1. Another survival plot that implements the Kaplan-Meier (KM) method helped us to estimate the time it takes for a machine to break, grouped by branch.
print(maintenance_graph2) showed:
branch A – Had 336 machines, of which 123 are broken. The median survival is 80 years.
branch B – Had 356 machines, of which 150 are broken. The median survival is 80 years.
branch C – Had 308 machines, of which 124 are broken. The median survival is 81 years.
2.2.2. Another survival plot that implements the Kaplan-Meier (KM) method helped us to estimate the time it takes for a machine to break, grouped by machine.
print(maintenance_graph3) showed:
Aircon Unit – 242 machines, of which 114 are broken. The median survival is 65 years.
Cooling Unit – 266 machines, of which 91 are broken. The median survival is 92 years.
Forklift – 254 machines, of which 116 are broken. The median survival is 80 years.
L.B.Door Motor – 238 machines, of which 76 are broken. The median survival is 88 years.
2.3. We performed a survival regression analysis to estimate the relationships between independent and dependent variables
2.3.0. Using our survival regression analysis, we can now predict which machine will break next and therefore prioritise maintenance on these machines to avoid a break in business operations.
print(ActionsPriority) showed:
18 machines that should be changed this month, ordered by remaining lifetime, listed with their ids.
2.3.1. We might want to change the machines with less than a week of remaining lifetime and re-run our program every few days to change the next machine that will break. Here we have a data table that can be automatically emailed to the manager every week.
The machines are all classified into 3 classes based on RemainingLT - urgent, medium and good
As shown above, 4 machines are in urgent need of maintenance.
3. Predictive Maintenance results (R)
Now that we have enough information for our digital twin, we will initialise some variables which will be used populate a JSON body.
We will commit this JSON body to Azure Blob Storage in part 4.
In this post, we discussed how we used Apache Spark through CRAN’s sparklyr package to process our dataset in the cloud, in R, to produce a Predictive Maintenance report we can access from our digital twin project. We hope we’ve given you some inspiration on the analyses you can conduct in your own Predictive Maintenance reports and how to do this with Databricks.
In our final post, we will walk through how we committed our Predictive Maintenance report to Azure Blob Storage, extending the capabilities of our digital twin.