0% found this document useful (0 votes)
14 views

Module I-Part 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Module I-Part 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Advanced Machine Learning Code: 18AI72

Module I - Advanced 6
Machine Learning

Dr. Varalatchoumy M
Prof.&Head – Dept. of AIML
Head –CHOSS,
Cambridge Institute of Technology, Bangalore
6.1 | OVERVIEW
Machine learning algorithms are a subset of artificial intelligence (AI) that imitates
the human learning process.
Humans learn through multiple experiences how to perform a task.
Similarly, machine learning algorithms develop multiple models (usually using
multiple datasets) and each model is analogous to an experience
Mitchell (2006) defined machine learning as follows:
Machine learns with respect to a particular task T, performance metric P
following experience E, if the system reliably improves its performance P at task
T following experience E.
• Let the task T be a classification problem
• Performance P can be measured through several metrics such as overall accuracy,
sensitivity, specificity, and area under the receive operating characteristic curve
(AUC)
• Experience E is analogous to different classifiers generated in machine learning
algorithms
The major difference between statistical learning and machine learning is that
statistical learning depends heavily on validation of model assumptions and hypothesis testing,
objective of machine learning is to improve prediction accuracy.

For example, while developing a regression model, we check for assumptions such as normality of residuals,
significance of regression parameters and so on. However, in the case of the random forest using classification
trees, the most important objective is the accuracy/performance of the model.

Two ML algorithms:
1. Supervised Learning: In supervised learning, the datasets have the values of input
variables (feature values) and the corresponding outcome variable. The algorithms learn
from the training dataset and predict the outcome variable for a new record with values
of input variables. Linear regression and logistic regression are examples of supervised
learning algorithms.
2. Unsupervised Learning: In this case, the datasets will have only input variable
values, but not the output. The algorithm learns the structure in the inputs. Clustering
and factor analysis are examples of unsupervised learning and will be discussed in
Chapter 7.
6.1.1 | How Machines Learn?
In supervised learning, the algorithm learns using a function called loss function, cost
function or error function, which is a function of predicted output and the desired output. If
h(Xi) is the predicted output and yi is the desired output, then the loss function is

n is the total number of records for which the predictions are made.
The function defined above is a sum of squared error (SSE).
SSE is the loss function for a regression model.
The objective is to learn the values of parameters (aka feature weights) that minimize the
cost function.
Machine learning uses optimization algorithms which can be used for minimizing the loss
function.
Most widely used optimization technique is called the Gradient Descent.
In the next section, we will discuss a regression problem and understand how gradient descent algorithm minimizes the
loss function and learn the model parameters.

6.2 | GRADIENT DESCENT ALGORITHM


In this section, we will discuss how gradient descent (GD) algorithm can be used for estimating the values of
regression parameters, given a dataset with inputs and outputs
The error is given by,
6.2.1 | Developing a Gradient Descent Algorithm for Linear Regression Model

• For better understanding the GD algorithm, we will implement the GD algorithm using the dataset
Advertising.csv.
• The dataset contains the examples of advertisement spends across multiple channels such as Radio,
TV, and Newspaper, and the corresponding sales revenue generated at different time periods.

The dataset has the following elements:


1. TV – Spend on TV advertisements
2. Radio – Spend on radio advertisements
3. Newspaper – Spend on newspaper advertisements
4. Sales – Sales revenue generated

For predicting future sales using spends on different advertisement channels, we can build a regression
model.
6.2.1.1 Loading the Dataset
6.2.1.2 Set X and Y Variables
For building a regression model, the inputs TV, Radio, and Newspaper are taken as X
features and Sales Y is taken as the outcome variable.
6.2.1.3 Standardize X and Y
It is important to convert all variables into one scale. This can be done by subtracting mean from each
value of the variable and dividing by the corresponding standard deviation of the variable.
def initialize( dim ):
np.random.seed(seed=42)
random.seed(42)
#Initialize the bias.
b = random.random()
#Initialize the weights.
w = np.random.rand( dim )
return b, w
#dim - is the number of weights to be
initialized besides the bias
To initialize the bias and 3 weights, as we have three input variables TV, Radio and
Newspaper, we can invoke the initialize() method as follows:
b, w = initialize( 3 )
print( "Bias: ", b, "Weights: ", w )
Method 2: Predict Y Values from the Bias and Weights
Calculate the Y values for all the inputs, given the bias and weights. We will use matrix multiplication of weights with
input variable values. matmul() method in numpy library can be used for matrix multiplication. Each row of X can be
multiplied with the weights column to produce the predicted outcome variable.
# Inputs:
# b - bias
# w - weights
# X - the input matrix
6.2.1.5 Finding the Optimal Bias and Weights
The updates to the bias and weights need to be done iteratively, until the cost is minimum. It can take several
iterations and is time-consuming. There are two approaches to stop the iterations:
1. Run a fixed number of iterations and use the bias and weights as optimal values at the end these
iterations.
2. Run iterations until the change in cost is small, that is, less than a predefined value (e.g., 0.001).

We will define a method run_gradient_descent(), which takes alpha and num_iterations as parameters
and invokes methods like initialize(), predict_Y(), get_cost(), and update_beta().

Also, inside the method,


1. variable gd_iterations_df keeps track of the cost every 10 iterations.
2. default value of 0.01 for the learning parameter and 100 for number of iterations will be used.
6.3.1 | Steps for Building Machine Learning Models

The steps to be followed for building, validating a machine learning model and
measuring its accuracy
are as follows:
1. Identify the features and outcome variable in the dataset.
2. Split the dataset into training and test sets.
3. Build the model using training set.
4. Predict outcome variable using a test set.
5. Compare the predicted and actual values of the outcome variable in the test set and
measure accuracy using measures such as mean absolute percentage error (MAPE) or
root mean square error (RMSE).
6.3.1.2 Building Linear Regression Model with Train Dataset
Linear models are included in sklearn.linear_model module. We will use
LinearRegression method for building the model and compare with the results we
obtained through our own implementation of gradient descent algorithm.
https://scikit-
learn.org/stable/modules/linear_model.html#:~:text=The%20f
ollowing%20are%20a%20set,if%20is%20the%20predicted%20v
alue.
6.3.1.2 Building Linear Regression Model with Train Dataset
6.3.1.4 Measuring Accuracy
Root Mean Square Error (RMSE) and R-squared are two key accuracy measures
for Linear Regression Models.
sklearn.metrics package provides methods to measure various metrics.
For regression models, mean_squared_error and r2_score can be used to calculate
MSE and R-squared values, respectively.

## Importing metrics from sklearn


from sklearn import metrics
6.3.2 | Bias-Variance Trade-off

Model errors can be decomposed into two components: bias and variance.
Understanding these two components is key to diagnosing model accuracies
and avoiding model overfitting or underfitting.
High bias can lead to building underfitting model, whereas high variance
can lead to overfitting models.

The term "variance" refers to the degree of change that may be expected in the
estimation of the target function as a result of using multiple sets of training data. The
disparity between the values that were predicted and the values that were actually
observed is referred to as bias

You might also like