Is 4410 AzureML Regression Predict Auto Price-1
Is 4410 AzureML Regression Predict Auto Price-1
In this Lab, you will learn how to use Azure Machine Learning Studio to develop a simple predictive
analytics model. To design an experiment, you assemble a set of components that are used to create,
train, test, and evaluate the model. In addition, you might leverage additional modules to preprocess the
data, perform feature selection and/or reduction, split the data into training and test sets, and evaluate
or cross-validate the model. The following five basic steps can be used as a guide for creating an
experiment.
Create a Model
Step 1: Get data
Step 2: Preprocess data
Step 3: Define features
Train the Model
Step 4: Choose and apply a learning algorithm
Test the Model
Step 5: Predict over new data
1. To remove the normalized-losses column, Click on the drop down arrow of the Data
Transformation, Manipulation, then drag the Select Columns in Dataset module, and connect
it to the output port of the Automobile price data (Raw) dataset. This module allows you to
select which columns of data you want to include or exclude in the model.
2. Select the Select Columns in Dataset module and click Launch column selector in the
Properties panel (the right panel).
a. In the Select columns window click on WITH RULES, then click on All columns in the
filter drop-down called Begin With. This directs Select Columns to pass all columns
through (except for the ones you are about to exclude).
b. In the next row, select Exclude and column names, and then click inside the text box.
A list of columns is displayed; select normalized-losses and it will be added to the text
box. This is shown in figure below
c. Click the check mark on the right bottom corner to close the column selector.
3. Next we will deal with missing data in our dataset. Drag the Clean Missing Data (type clean
…in the search box) module to the experiment canvas and connect it to the Select Columns
module. You will use the default properties; this replaces the missing value with a 0. See Figure
for details.
4. Now click RUN.
5. When the experiment completes successfully, each of the modules will have a green check
mark indicating its successful completion.
6. You can verify that there are no more missing values by comparing the “price” column and/or
the “bore” column by visualizing the output port of the Automobile price data module and the
Clean Missing Data module.
4. Click the √ at the right bottom of the Select columns pop-up window (see above screen
capture.
Confirm that the features you have selected to use to predict the price of the used cars by visualizing
the output after you Run the experiment. Notice you have 8 columns (7 features and the label, what
you are predicting).
This is the output of the Split Data module (20%) of the dataset held back for testing the model. You
will need to use an example to test your webservices from this set.
Step 4: Choosing and Applying Machine Learning Algorithms
When constructing a predictive model, you first need to train the model, and then validate that the
model is effective. In this experiment, you will build a regression model, train the regression model and
use it to predict the price of an automobile. Specifically, you will train a simple linear regression
model. After the model has been trained, you will use validate the model.
We will split the data set into a training set and a testing set. The following steps is used to accomplish
this .
1. Split the data into training and testing sets: Select and drag the Split Data module to the
experiment canvas and connect it to the output of the last Select Columns module. In the
Properties panel of the Split Data module, set the Splitting mode to Split Rows, Fraction of
rows in the first output dataset to 0.8, and Random seed to 0, in the Properties panel. You will
use this 80% (164 rows) of the data to train the model and hold back 20% for testing.
2. Run the experiment. This allows the Select Columns and Split modules to pass along column
definitions to the modules you will be adding next. You should confirm that you have the
correct number of rows by visualizing the module after you run it. Visualize the left (.8) and
right (.2) output port to confirm that you have 164 and 41 rows respectively.
3. To select the learning algorithm, expand the Machine Learning category in the module palette
to the left of the canvas and then expand Initialize Model. This displays several categories of
modules that can be used to initialize a learning algorithm.
4. For this experiment, select the Linear Regression module under the Regression category and
drag it to the experiment canvas. (Alternatively, type regression in the search box, then drag
the Linear Regression module onto the Canvas area).
5. Similarly, type Train model in the search box, drag the Train Model module to the experiment
canvas. Connect the left output port of the Split Data module to the right input of the Train
Model module, next connect the output of the Linear Regression module to the left input of the
Train Model module. Your canvas should look like the following diagram.
6. Next, you will need to let the Training module know what you want to train on (i.e. predict)
select the Train Model module and click the Launch column selector in the Properties panel
and select the price column (see below). This is the feature that your model is going to predict.
Figure shows this target selection.
7. Run the experiment. The result is a trained regression model that can be used to score new
samples to make predictions.
The metrics returned for regression models are generally designed to estimate the amount of error. A
model is considered to fit the data well if the difference between observed and predicted values is
small. However, looking at the pattern of the residuals (the difference between any one predicted point
and its corresponding actual value) can tell you a lot about potential bias in the model.
The following metrics are reported and for evaluating regression models. When you compare models,
they are ranked by the metric you select for evaluation.
The term "error" represents the difference between the predicted value (Score Label) and the true
value (price). The absolute value or the square of this difference is usually computed to capture the
total magnitude of error across all instances, as the difference between the predicted and true value
could be negative in some cases. The error metrics measure the predictive performance of a regression
model in terms of the mean deviation of its predictions from the true values. Lower error values mean
the model is more accurate in making predictions. An overall error metric of zero means that the model
fits the data perfectly.
The coefficient of determination, which is also known as R squared, is also a standard way of
measuring how well the model fits the data. It can be interpreted as the proportion of variation
explained by the model. A higher proportion is better in this case, where 1 indicates a perfect fit.
To summarize:
Mean absolute error (MAE) measures how close the predictions are to the actual outcomes;
thus, a lower score is better.
Root mean squared error (RMSE) creates a single value that summarizes the error in the
model. By squaring the difference, the metric disregards the difference between over-prediction
and under-prediction.
Relative absolute error (RAE) is the relative absolute difference between expected and actual
values; relative because the mean difference is divided by the arithmetic mean.
Relative squared error (RSE) similarly normalizes the total squared error of the predicted
values by dividing by the total squared error of the actual values.
Coefficient of determination, often referred to as R2, represents the predictive power of the
model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means
there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can
be entirely normal and high values can be suspect.
The next step is to setup web service. Web Service allows you to enter information to predict the price
of the used car in the context of this case.
The following steps accomplish this:
1. Click on SET UP WEB SERVICE (if the Predictive Web service is greyed out, you may have
to Save and then Exit AzureML, then get back into the experiment)
Next you will have the following (modules were rearranged to provide clarity):
In the Predictive experiment connect the Web service input as follows:
Next, Remove “price” from the second Select Column in DataSet module. Price is what we want to
predict when we provide the features for our model.
Click on the Select Column module and make the changes in the Properties panel:
Next click on RUN, then Deploy web service (Next to RUN), the following appears.
Deliverables:
Upload a Lab Report of your experiment. You must follow the format of the report provide in this
assignment.