0% found this document useful (0 votes)
70 views15 pages

Is 4410 AzureML Regression Predict Auto Price-1

This document provides instructions for using Azure Machine Learning Studio to develop a predictive model to predict automobile prices. The key steps are: 1. Get and preprocess the automobile price data, which includes removing missing values and selecting relevant features like make, engine size, and mileage. 2. Split the data into 80% for training and 20% for testing the model. 3. Choose a linear regression algorithm and train it on the selected features from the training data to develop a model that predicts price. 4. Use the trained model to make price predictions on the held-out 20% test data and evaluate the model's performance on unseen data.

Uploaded by

Truc Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views15 pages

Is 4410 AzureML Regression Predict Auto Price-1

This document provides instructions for using Azure Machine Learning Studio to develop a predictive model to predict automobile prices. The key steps are: 1. Get and preprocess the automobile price data, which includes removing missing values and selecting relevant features like make, engine size, and mileage. 2. Split the data into 80% for training and 20% for testing the model. 3. Choose a linear regression algorithm and train it on the selected features from the training data to develop a model that predicts price. 4. Use the trained model to make price predictions on the held-out 20% test data and evaluate the model's performance on unseen data.

Uploaded by

Truc Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Assignment 4: Using AZUREML to predict the price of used automobile.

In this Lab, you will learn how to use Azure Machine Learning Studio to develop a simple predictive
analytics model. To design an experiment, you assemble a set of components that are used to create,
train, test, and evaluate the model. In addition, you might leverage additional modules to preprocess the
data, perform feature selection and/or reduction, split the data into training and test sets, and evaluate
or cross-validate the model. The following five basic steps can be used as a guide for creating an
experiment.

Create a Model
Step 1: Get data
Step 2: Preprocess data
Step 3: Define features
Train the Model
Step 4: Choose and apply a learning algorithm
Test the Model
Step 5: Predict over new data

Step 1: Getting the Data


Azure Machine Learning Studio provides a number of sample datasets. In addition, you can import
data from many different sources. In this example, you will use the included sample dataset called
Automobile price data (Raw), which contains automobile price data.
1. To start a new experiment, click +NEW at the bottom of the Machine Learning Studio window
and select EXPERIMENT. This gives you the option to start with a blank experiment, follow a
tutorial, or choose from the list of sample experiments from the Gallery you saw in the previous
section.
2. Select Blank Experiment and rename the experiment IS 4410 A4 YourLastNameFirstName
3. To the left of the Machine Learning Studio is a list of experiment items. Click Saved Datasets,
and type “automobile” in the search box. Find Automobile price data (Raw).
4. Drag the dataset into the experiment. You can also double-click the dataset to include it in the
Experiment.

Step 2: Preprocessing the Data


Before you start designing the experiment, it is important to preprocess the dataset. In most cases, the
raw data needs to be preprocessed before it can be used as input to train a predictive analytic model.
From the earlier exploration, you may have noticed that there are missing values in the data. As a
precursor to analyzing the data, these missing values need to be cleaned. For this experiment, you will
substitute the missing values with a designated value. In addition, the normalized-losses column will be
removed as this column contains too many missing values. To view the data, right click the output
port, Visualize.

1. To remove the normalized-losses column, Click on the drop down arrow of the Data
Transformation, Manipulation, then drag the Select Columns in Dataset module, and connect
it to the output port of the Automobile price data (Raw) dataset. This module allows you to
select which columns of data you want to include or exclude in the model.
2. Select the Select Columns in Dataset module and click Launch column selector in the
Properties panel (the right panel).
a. In the Select columns window click on WITH RULES, then click on All columns in the
filter drop-down called Begin With. This directs Select Columns to pass all columns
through (except for the ones you are about to exclude).
b. In the next row, select Exclude and column names, and then click inside the text box.
A list of columns is displayed; select normalized-losses and it will be added to the text
box. This is shown in figure below
c. Click the check mark on the right bottom corner to close the column selector.

3. Next we will deal with missing data in our dataset. Drag the Clean Missing Data (type clean
…in the search box) module to the experiment canvas and connect it to the Select Columns
module. You will use the default properties; this replaces the missing value with a 0. See Figure
for details.
4. Now click RUN.
5. When the experiment completes successfully, each of the modules will have a green check
mark indicating its successful completion.
6. You can verify that there are no more missing values by comparing the “price” column and/or
the “bore” column by visualizing the output port of the Automobile price data module and the
Clean Missing Data module.

Step 3: Defining the Features


In machine learning, features are individual measurable properties created from the raw data to help the
algorithms to learn the task at hand. Understanding the role played by each feature is super important.
For example, some features are better at predicting the target (in this case the “price”) than others. In
addition, some features (the 24 columns – i.e. make, fuel-type, body-type…etc) can have a strong
correlation with other features (e.g. city-mpg vs. highway-mpg). Adding highly correlated features as
inputs might not be useful since they contain similar information. For this exercise, you will build a
predictive model that uses a subset of the features of the Automobile price data (Raw) dataset to
predict the price for other automobiles. Each row represents an automobile. Each column is a feature of
that automobile. It is important to identify a good set of features that can be used to create the
predictive model. Often, this requires experimentation and knowledge about the problem domain. For
illustration purpose, you will use the Select Columns in dataset module from Data Transformation,
Manipulation group to select the following features: make, body-style, wheel-base, engine-size,
horsepower, peak-rpm, highway-mpg, and price(the Label, or what you want to predict).
1. Drag a second Select Columns module to the experiment canvas. Connect it to the Clean
Missing Data module.
2. Click on Select Columns in Dataset module to select the module then click Launch column
selector in the properties pane.
3. Select WITH RULES in the Select columns window, select No columns for Begin With,
then select Include and column names in the filter row. From the drop down list select the
following column names: make, body-style, wheel-base, engine-size, horsepower, peak-
rpm, highway-mpg, and price . This directs the module to pass through only these
columns.

4. Click the √ at the right bottom of the Select columns pop-up window (see above screen
capture.
Confirm that the features you have selected to use to predict the price of the used cars by visualizing
the output after you Run the experiment. Notice you have 8 columns (7 features and the label, what
you are predicting).

This is the output of the Split Data module (20%) of the dataset held back for testing the model. You
will need to use an example to test your webservices from this set.
Step 4: Choosing and Applying Machine Learning Algorithms
When constructing a predictive model, you first need to train the model, and then validate that the
model is effective. In this experiment, you will build a regression model, train the regression model and
use it to predict the price of an automobile. Specifically, you will train a simple linear regression
model. After the model has been trained, you will use validate the model.
We will split the data set into a training set and a testing set. The following steps is used to accomplish
this .

1. Split the data into training and testing sets: Select and drag the Split Data module to the
experiment canvas and connect it to the output of the last Select Columns module. In the
Properties panel of the Split Data module, set the Splitting mode to Split Rows, Fraction of
rows in the first output dataset to 0.8, and Random seed to 0, in the Properties panel. You will
use this 80% (164 rows) of the data to train the model and hold back 20% for testing.
2. Run the experiment. This allows the Select Columns and Split modules to pass along column
definitions to the modules you will be adding next. You should confirm that you have the
correct number of rows by visualizing the module after you run it. Visualize the left (.8) and
right (.2) output port to confirm that you have 164 and 41 rows respectively.
3. To select the learning algorithm, expand the Machine Learning category in the module palette
to the left of the canvas and then expand Initialize Model. This displays several categories of
modules that can be used to initialize a learning algorithm.
4. For this experiment, select the Linear Regression module under the Regression category and
drag it to the experiment canvas. (Alternatively, type regression in the search box, then drag
the Linear Regression module onto the Canvas area).
5. Similarly, type Train model in the search box, drag the Train Model module to the experiment
canvas. Connect the left output port of the Split Data module to the right input of the Train
Model module, next connect the output of the Linear Regression module to the left input of the
Train Model module. Your canvas should look like the following diagram.
6. Next, you will need to let the Training module know what you want to train on (i.e. predict)
select the Train Model module and click the Launch column selector in the Properties panel
and select the price column (see below). This is the feature that your model is going to predict.
Figure shows this target selection.

7. Run the experiment. The result is a trained regression model that can be used to score new
samples to make predictions.

Step 5: Predicting Over New Data


Now that you’ve trained the model, you can use it to score the other 20% of your data and see how well
your model predicts on unseen data.
1. Drag the Score Model module to the experiment canvas and connect the left input port to the
output of the Train Model module, and the right input port to the test data output (right port, the
20% hold back) of the Split Data module. See Figure for details.
2. Run the experiment and view the output from the Score Model module (by right-clicking the
output port and selecting Visualize). The output will show the predicted values (Scored Labels)
for price along with the known values (price) from the test data. The Scored Label is the price
of the car that the model predicted from the 20 % of the data that was held back to test the
accuracy of the model.
3. Finally, to test the quality of the results, select and drag the Evaluate Model module to the
experiment canvas, and connect the left input port to the output of the Score Model module
(there are two input ports because the Evaluate Model module can be used to compare two
different models).
4. Run the experiment and view the output from the Evaluate Model module (right-click the
output port and select Visualize). The following statistics are shown for your model:
a. Mean Absolute Error (MAE): The Mean (average) of absolute errors (an error is the
difference between the predicted value and the actual value).
b. Root Mean Squared Error (RMSE): The square root of the average of squared errors.
c. Relative Absolute Error: The average of absolute errors relative to the absolute difference
between actual values and the average of all actual values.
d. Relative Squared Error: The average of squared errors relative to the squared difference
between the actual values and the average of all actual values.
e. Coefficient of Determination: Also known as the R squared value, this is a statistical
metric indicating how well a model fits the data.
Metrics for regression models

The metrics returned for regression models are generally designed to estimate the amount of error. A
model is considered to fit the data well if the difference between observed and predicted values is
small. However, looking at the pattern of the residuals (the difference between any one predicted point
and its corresponding actual value) can tell you a lot about potential bias in the model.

The following metrics are reported and for evaluating regression models. When you compare models,
they are ranked by the metric you select for evaluation.

The term "error" represents the difference between the predicted value (Score Label) and the true
value (price). The absolute value or the square of this difference is usually computed to capture the
total magnitude of error across all instances, as the difference between the predicted and true value
could be negative in some cases. The error metrics measure the predictive performance of a regression
model in terms of the mean deviation of its predictions from the true values. Lower error values mean
the model is more accurate in making predictions. An overall error metric of zero means that the model
fits the data perfectly.

The coefficient of determination, which is also known as R squared, is also a standard way of
measuring how well the model fits the data. It can be interpreted as the proportion of variation
explained by the model. A higher proportion is better in this case, where 1 indicates a perfect fit.

To summarize:

 Mean absolute error (MAE) measures how close the predictions are to the actual outcomes;
thus, a lower score is better.
 Root mean squared error (RMSE) creates a single value that summarizes the error in the
model. By squaring the difference, the metric disregards the difference between over-prediction
and under-prediction.
 Relative absolute error (RAE) is the relative absolute difference between expected and actual
values; relative because the mean difference is divided by the arithmetic mean.
 Relative squared error (RSE) similarly normalizes the total squared error of the predicted
values by dividing by the total squared error of the actual values.
 Coefficient of determination, often referred to as R2, represents the predictive power of the
model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means
there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can
be entirely normal and high values can be suspect.

The next step is to setup web service. Web Service allows you to enter information to predict the price
of the used car in the context of this case.
The following steps accomplish this:

1. Click on SET UP WEB SERVICE (if the Predictive Web service is greyed out, you may have
to Save and then Exit AzureML, then get back into the experiment)
Next you will have the following (modules were rearranged to provide clarity):
In the Predictive experiment connect the Web service input as follows:
Next, Remove “price” from the second Select Column in DataSet module. Price is what we want to
predict when we provide the features for our model.
Click on the Select Column module and make the changes in the Properties panel:

Next click on RUN, then Deploy web service (Next to RUN), the following appears.

Click on New Web Services Experience.

The following screen appears. Click on Test endpoint


Input some data and report on the results you obtained.
Be sure you incorporate all screen captures into your report.
Enter the data to predict the price of the car. Be sure to report your results of this step.

Deliverables:
Upload a Lab Report of your experiment. You must follow the format of the report provide in this
assignment.

You might also like