Building Machine Learning Models
Building machine learning models is an iterative process, and it involves a bunch of tasks as depicted in the diagram below.
For the connected car solution, we would build two machine learning models – one for predictive maintenance and other for driver behavior analysis.
The following list shows high-level steps to build machine learning model using Azure ML. The steps listed below are generic and applicable to building any machine learning model.
- Select the data sources that you would need for building the model. In the connected car scenario, our data sources are Azure Blob (raw vehicle data), Azure Dynamo DB (Asset Meta Data) and a second Azure Blob which contains historical records for vehicle maintenance and driver classification. In the absence of the historical records, it is still possible to build predictive models with unsupervised learning techniques and later correlating the outputs manually and deriving insights. The entire process tends to be very complex. Most of the tools don’t support this methodology and expect you to provide labeled data (input and output). In future, data generated from the connected product would be one key valuable asset, and you would see various data providers providing such historical records (like trends) for analysis.
- The Azure Data Factory is an optional data service added to the design to transfer and analyze the raw data and create data process pipelines to make the data consumable. The Data Factory, in particular, is useful if you need to integrate with multiple systems and perform data processing to arrive at the desired output.
- The third step is about preparing the data to be used by the model. This involves cleaning and filtering the data, normalizing the data, creating labeled inputs for classification and most importantly creating relevant feature sets based on the use case requirements. Selecting a feature set and building the model is a complex exercise and requires thorough understanding and expertise on machine learning and it’s outside the scope of this book. Preparation of data is the most crucial and time-consuming step in building the model. As part of this step, you would also create train and test set. You would train the model using the train set and test your model iteratively using the test set. Azure ML provides visual composition tools to enable prepare the data. Azure ML is available over the web, and you can execute the entire end-to-end process without installing any additional software.
- Once the data is prepared, you start building the model using Azure ML by selecting the type of model (regression, classification, etc.) and algorithms associated with it and use the data from the previous step. For instance, for the regression model, you could use neural network regression, decision forest algorithms, etc. You can evaluate all the models to understand which one performs better for your data set. As mentioned this is an iterative step. For the connected car solution, we will perform predictive maintenance using regression algorithms and for behavior analysis, we use multi-class classification. The regression model output would be a confidence score that indicates whether maintenance is required for the equipment or not. For behavior analysis, the model could be classified as aggressive, neutral, etc.
- Next step is publishing the machine learning model as the web service so that it can be consumed by the application through an API call.
As mentioned earlier, the real challenge is building the machine learning model and training the model to predict a reasonable outcome. This requires a significant effort and training to get a reasonable prediction over a period of time. Azure Stream Analytics lets you combine data from multiple streams, so you could combine real-time and historical data and arrive at an outcome. For instance, you can also combine streams to detect an anomaly in real-time through machine learning models.
Currently, there are no pre-built machine learning models available for industries, and hence an offline process is required to build the model iteratively. In future, we envision machine learning models would be available as services for each industry like predictive maintenance for vehicles or specific machinery types. All then you have to do is provide the data to the machine learning models for prediction. We had discussed this concept in the earlier chapter where we had talked about Solution Template in first section