This blog highlights predictive analysis use case using Azure ML studio. For the experiment purpose, we considered Blood donation dataset and the goal is to predict whether the donor donated blood or not, where 1 indicates a donor during the target period, and 0 a non-donor. We consider this problem as binary-classification/two-class classification problem to predict whether the donor donated blood or not. An ideal way to attempt this problem is to use the logistic regression.
The sample dataset is shown in the following figure. The dataset includes the months since last donation, and frequency, or the total number of donations, time since last donation, and amount of blood donated.
Azure ML Studio Workflow:
The workflow developed using the Azure ML Studio is shown below.
The workflow highlights the following steps:
• Step 1: Load dataset
◦ First, we loaded blood donation data (raw) from a palette of datasets and modules. Just search Blood in the Search box at the top of this palette. Drag the Blood donation dataset and drop to the workflow (i.e. experiment canvas). A user can get the detailed view of the dataset as shown below:
• Step 2: Pre-process the dataset
◦ Next step is to pre-process the dataset. Look for the missing value in the blood donation dataset. Search for the Clean Missing Data in the Search box at the top of the palette. Drag this Clean Missing Data module and drop to the workflow. Configuration for the missing value is shown below:
• Step 3: Select the features
◦ Now, a user needs to select the features or data columns from the data to predict the class for blood donation. Search for the Select Columns in Dataset in the Search box at the top of the palette. Drag this module and drop to the workflow.
◦ Split Data: A user needs to split the data to enable the predictive analysis. Search for the Split Data in the Search box at the top of the palette. Drag this module and drop to the workflow. As shown in the following figure, 75 % of total data goes to the training set and rest i.e. 25 % of total goes to the testing set.
• Step 4: Choose and apply a learning algorithm
◦ Train Model: In order to train the model, a user needs to select the target column i.e. class for this case (whether a person is donor or not). Search for the Train Model in the Search box at the top of the palette. Drag this module and drop to the workflow. As shown in the following figure, a user needs to select the class as the target variable.
◦ Logistic Regression: As user know this is binary classification problem, a user needs to use logisitic regression. Search for the Two-Class Logistic Regression in the Search box at the top of the palette. Drag this module and drop to the workflow. A user needs to set the configuration parameter for the Logistic Regression as shown in the following figure.
• Step 5: Prediction and validation
◦ A user can validate the Logistic regression model using the Score Model and Evaluate Model modules. The Score Model module appends the predicted label along with probabilities as shown in the following figure.
◦ A user can evaluate the performance of the predictive model using the Evaluate Model module. The Evaluate Model provides the various evaluation results such as ROC, PRECISION/RECALL, LIFT etc. to evaluate the efficiency of the model as shown below: