0% found this document useful (0 votes)
15 views53 pages

Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning

Uploaded by

yetsedaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views53 pages

Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning

Uploaded by

yetsedaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Artificial intelligence is a technology using which we can create intelligent systems that can

simulate human intelligence.

Based on capabilities, AI can be classified into three types:

o Weak AI
o Generative AI
o Strong AI:- The future of AI is Strong AI for which it is said that it will be
intelligent than humans.
Machine Learning Tutorial
1.supervised Leaning
2.Un Supervised Learning
3.Reinforcement Learning
Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

Un supervised Learning:-The training is provided to the machine with the set of


data that has not been labeled, classified, or categorized, and the algorithm
needs to act on that data without any supervision. The goal of unsupervised
learning is to restructure the input data into new features or a group of objects
with similar patterns.

In unsupervised learning, we don't have a predetermined result. The


machine tries to find useful insights from the huge amount of data. It can
be further classifieds into two categories of algorithms:

o Clustering
o Association

Reinforcement learning is a feedback-based learning method, in which a


learning agent gets a reward for each right action and gets a penalty for each
wrong action.

Machine learning life cycle involves seven major steps, which are given
below:

o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment

Gathering Data

This step includes the below tasks:

o Identify various data sources


o Collect data
o Integrate the data obtained from different sources

Data Preparation

This step can be further divided into two processes:

o Data exploration:

It is used to understand the nature of data that we have to work


with. We need to understand the characteristics, format, and quality
of data.
A better understanding of data leads to an effective outcome. In
this, we find Correlations, general trends, and outliers.

o Data pre-processing:

Now the next step is preprocessing of data for its analysis.

Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in
the next step.

In real-world applications, collected data may have various issues,


including:
o Missing Values
o Duplicate data
o Invalid data
o Noise

It is mandatory to detect and remove the above issues because it can negatively
affect the quality of the outcome.

Now the cleaned and prepared data is passed on to the analysis step. This
step involves:

o Selection of analytical techniques


o Building models
o Review the result

The aim of this step is to build a machine learning model to analyse the data
using various analytical techniques and review the outcome. It starts with the
determination of the type of the problems, where we select the machine learning
techniques such as Classification, Regression, Cluster
analysis, Association, etc. then build the model using prepared data, and
evaluate the model.

Now the next step is to train the model, in this step we train our model to
improve its performance for better outcome of the problem. Training a model is
required so that it can understand the various patterns, rules, and, features

Once our machine learning model has been trained on a given dataset, then we
test the model. In this step, we check for the accuracy of our model by providing
a test dataset to it.

Machine Learning:- Machine learning is about extracting knowledge from the


data. It can be defined as,
Machine learning is a subfield of artificial intelligence, which enables machines to learn from past
data or experiences without being explicitly programmed.

Types of datasets

o Numerical data:- Such as house price, temperature, etc.


o Categorical data:- Such as Yes/No, True/False, Blue/green, etc.
o Ordinal data:- These data are similar to categorical data but can
be measured on the basis of comparison.

Text dataset used in NLP

Time Series Datasets:


Time series datasets include information focuses gathered after some
time. They are generally utilized in determining, abnormality location, and
pattern examination.

Examples :

o Securities exchange information


o Climate information
o Sensor readings.

Tabular Datasets:
o Tabular datasets are organized information coordinated in tables or
calculation sheets. They contain lines addressing examples or tests
and segments addressing highlights or qualities. Tabular datasets
are utilized for undertakings like relapse and arrangement. The
dataset given before in the article is an illustration of a tabular
dataset.

In building ML applications, datasets are divided into two parts:

o Training dataset:
o Test Dataset

Data Preprocessing steps

o Getting the dataset


o Importing libraries
o Importing datasets
o Finding Missing Data
o Encoding Categorical Data
o Splitting dataset into training and test set
o Feature scaling

Matplotlib: The second library is matplotlib, which is a Python 2D plotting


library, and with this library, we need to import a sub-library pyplot. This library
is used to plot any type of charts in Python for the code. It will be imported as
below:
To import a dataset using the read_csv() function from the pandas library, you
can do it in one line as follows: import pandas as pd; df =
pd.read_csv('file_path_or_url')

1. data_set= pd.read_csv('Dataset.csv')

2. Extracting dependent and independent variables:

In our dataset, there are three independent variables that are Country, Age,
and Salary, and one is a dependent variable which is Purchased.

Extracting independent variable:

To extract an independent variable, we will use iloc[ ] method of Pandas


library. It is used to extract the required rows and columns from the
dataset

1. x= data_set.iloc[:,:-1].values
In the above code, the first colon(:) is used to take all the rows, and the second
colon(:) is for all the columns. Here we have used :-1, because we don't want to
take the last column as it contains the dependent variable. So by doing this, we
will get the matrix of features.

Extracting dependent variable:

To extract dependent variables, again, we will use Pandas .iloc[] method.

1. y= data_set.iloc[:,3].values

2.
Ways to handle missing data:

3. There are mainly two ways to handle missing data, which are:

4. By deleting the particular row: The first way is used to


commonly deal with null values. In this way, we just delete the
specific row or column which consists of null values. But this way is
not so efficient and removing data may lead to loss of information
which will not give the accurate output.

5. By calculating the mean: In this way, we will calculate the mean


of that column or row which contains any missing value and will put
it on the place of missing value. This strategy is useful for the
features which have numeric data such as age, salary, year, etc.
Here, we will use this approach.
6. To handle missing values, we will use Scikit-learn library in our
code, which contains various libraries for building machine learning
models. Here we will use Imputer class
of sklearn.preprocessing library. Below is the code for it:

7. #handling missing data (Replacing missing data with the mean valu
e)
8. from sklearn.preprocessing import Imputer
9. imputer= Imputer(missing_values ='NaN', strategy='mean', axis =
0)
10. #Fitting imputer object to the independent variables x.
11. imputerimputer= imputer.fit(x[:, 1:3])
12. #Replacing missing data with the calculated mean value
13. x[:, 1:3]= imputer.transform(x[:, 1:3])
As we can see in the above output, the missing values have been replaced with
the means of rest column values.

5) Encoding Categorical data:


Categorical data is data which has some categories such as, in our
dataset; there are two categorical variable, Country, and Purchased.

Since machine learning model completely works on mathematics and


numbers, but if our dataset would have a categorical variable, then it may
create trouble while building the model. So it is necessary to encode these
categorical variables into numbers.

For Country variable:

Firstly, we will convert the country variables into categorical data. So to


do this, we will use LabelEncoder() class from preprocessing library.

1. #Catgorical data
2. #for Country Variable
3. from sklearn.preprocessing import LabelEncoder
4. label_encoder_x= LabelEncoder()
5. x[:, 0]= label_encoder_x.fit_transform(x[:, 0])

6. Explanation:

7. In above code, we have imported LabelEncoder class of sklearn


library. This class has successfully encoded the variables into
digits.
But in our case, there are three country variables, and as we can see in
the above output, these variables are encoded into 0, 1, and 2. By
these values, the machine learning model may assume that there is
some correlation between these variables which will produce the wrong
output. So to remove this issue, we will use dummy encoding.

Dummy Variables:

Dummy variables are those variables which have values 0 or 1. The 1


value gives the presence of that variable in a particular column, and rest
variables become 0. With dummy encoding, we will have a number of
columns equal to the number of categories.

In our dataset, we have 3 categories so it will produce three columns


having 0 and 1 values. For Dummy Encoding, we will
use OneHotEncoder class of preprocessing library.

1. #for Country Variable


2. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
3. label_encoder_x= LabelEncoder()
4. x[:, 0]= label_encoder_x.fit_transform(x[:, 0])
5. #Encoding for dummy variables
6. onehot_encoder= OneHotEncoder(categorical_features= [0])
7. x= onehot_encoder.fit_transform(x).toarray()

For Purchased Variable:

1. labelencoder_y= LabelEncoder()
2. y= labelencoder_y.fit_transform(y)

Splitting the Dataset into the Training set and Test


set.
In machine learning data preprocessing, we divide our dataset into a
training set and test set. This is one of the crucial steps of data
preprocessing as by doing this, we can enhance the performance of our
machine learning model.

Suppose, if we have given training to our machine learning model by a


dataset and we test it by a completely different dataset. Then, it will
create difficulties for our model to understand the correlations between
the models.

If we train our model very well and its training accuracy is also very high, but we
provide a new dataset to it, then it will decrease the performance. So we always
try to make a machine learning model which performs well with the training set
and also with the test dataset. Here, we can define these datasets as:

For splitting the dataset, we will use the below lines of code:

from sklearn.model_selection import train_test_split


x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, r
andom_state=0) .

Explanation:

o In the above code, the first line is used for splitting arrays of the
dataset into random train and test subsets.
o In the second line, we have used four variables for our output that
are
o x_train: features for the training data
o x_test: features for testing data
o y_train: Dependent variables for training data
o y_test: Independent variable for testing data
o In train_test_split() function, we have passed four parameters in
which first two are for arrays of data, and test_size is for specifying
the size of the test set. The test_size maybe .5, .3, or .2, which tells
the dividing ratio of training and testing sets.
o The last parameter random_state is used to set a seed for a
random generator so that you always get the same result, and the
most used value for this is 42.

7) Feature Scaling
o Feature scaling is the final step of data preprocessing in machine
learning. It is a technique to standardize the independent variables
of the dataset in a specific range. In feature scaling, we put our
variables in the same range and in the same scale so that no any
variable dominate the other variable.

o As we can see, the age and salary column values are not on the
same scale. A machine learning model is based on Euclidean
distance, and if we do not scale the variable, then it will cause
some issue in our machine learning model.

o Euclidean distance is given as:


If we compute any two values from age and salary, then salary values will
dominate the age values, and it will produce an incorrect result. So to
remove this issue, we need to perform feature scaling for machine
learning.

There are two ways to perform feature scaling in machine learning


Here, we will use the standardization method for our dataset.

For feature scaling, we will

import StandardScaler class of sklearn.preprocessing library as:

from sklearn.preprocessing import StandardScaler

Now, we will create the object of StandardScaler class for independent


variables or features. And then we will fit and transform the training
dataset.

1. st_x= StandardScaler()
2. x_train= st_x.fit_transform(x_train)

For test dataset, we will directly apply transform() function instead of


fit_transform() because it is already done in training set.

x_test= st_x.transform(x_test)

Combining all the steps:

Now, in the end, we can combine all the steps together to make our
complete code more understandable.
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('Dataset.csv')
8.
9. #Extracting Independent Variable
10. x= data_set.iloc[:, :-1].values
11.
12. #Extracting Dependent variable
13. y= data_set.iloc[:, 3].values
14.
15. #handling missing data(Replacing missing data with the mean
value)
16. from sklearn.preprocessing import Imputer
17. imputer= Imputer(missing_values ='NaN', strategy='mean', a
xis = 0)
18.
19. #Fitting imputer object to the independent varibles x.
20. imputerimputer= imputer.fit(x[:, 1:3])
21.
22. #Replacing missing data with the calculated mean value
23. x[:, 1:3]= imputer.transform(x[:, 1:3])
24.
25. #for Country Variable
26. from sklearn.preprocessing import LabelEncoder, OneHotEnco
der
27. label_encoder_x= LabelEncoder()
28. x[:, 0]= label_encoder_x.fit_transform(x[:, 0])
29.
30. #Encoding for dummy variables
31. onehot_encoder= OneHotEncoder(categorical_features= [0])

32. x= onehot_encoder.fit_transform(x).toarray()
33.
34. #encoding for purchased variable
35. labelencoder_y= LabelEncoder()
36. y= labelencoder_y.fit_transform(y)
37.
38. # Splitting the dataset into training and test set.
39. from sklearn.model_selection import train_test_split
40. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size=
0.2, random_state=0)
41.
42. #Feature Scaling of datasets
43. from sklearn.preprocessing import StandardScaler
44. st_x= StandardScaler()
45. x_train= st_x.fit_transform(x_train)
46. x_test= st_x.transform(x_test)
Supervised Learning Dataset

Steps Involved in Supervised Learning:


o First Determine the type of training dataset
o Collect/Gather the labelled training data.
o Split the training dataset into training dataset, test dataset, and
validation dataset.
o Determine the input features of the training dataset, which should
have enough knowledge so that the model can accurately predict
the output.
o Determine the suitable algorithm for the model, such as support
vector machine, decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we need
validation sets as the control parameters, which are the subset of
training datasets.
o Evaluate the accuracy of the model by providing the test set. If the
model predicts the correct output, which means our model is
accurate.

1. Regression

Regression algorithms are used if there is a relationship between the input


variable and the output variable. It is used for the prediction of continuous
variables, such as Weather forecasting, Market Trends, etc. Below are
some popular Regression algorithms which come under supervised
learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical,


which means there are two classes such as Yes-No, Male-Female, True-
false, etc.

Spam Filtering,

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Disadvantages of supervised learning:


o Supervised learning models are not suitable for handling the
complex tasks.
o Supervised learning cannot predict the correct output if the test
data is different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the
classes of object.

Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.

The goal of unsupervised learning is to find the underlying structure of


dataset, group that data according to similarities, and represent
that dataset in a compressed format.

o Clustering: Clustering is a method of grouping the objects into


clusters such that objects with most similarities remains into a
group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data
objects and categorizes them as per the presence and absence of
those commonalities.
o Association: An association rule is an unsupervised learning
method which is used for finding the relationships between variables
in the large database. It determines the set of items that occurs
together in the dataset. Association rule makes marketing strategy
more effective. Such as people who buy X item (suppose a bread)
are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
o In supervised learning, models need to find the mapping function to
map the input variable (X) with the output variable (Y).

Regression analysis is a statistical method used to model the relationship between a


dependent (target) variable and one or more independent (predictor) variables. It helps
understand how the value of the dependent variable changes in response to changes in
independent variables while keeping others fixed, predicting continuous values such as
temperature, age, salary, and price.

Regression is a supervised learning technique which helps in finding


the correlation between variables and enables us to predict the
continuous output variable based on the one or more predictor
variables. It is mainly used for prediction, forecasting, time series
modeling, and determining the causal-effect relationship
between variables. In Regression, we plot a graph between the
variables which best fits the given datapoints, using this plot, the
machine learning model can make predictions about the data. In
simple words, "Regression shows a line or curve that passes
through all the datapoints on target-predictor graph in such a
way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and
line tells whether a model has captured a strong relationship or not.

o Prediction of rain using temperature and other factors


o Determining Market trends
o Prediction of road accidents due to rash driving.

Terminologies Related to the Regression Analysis:


o Dependent Variable: The main factor in Regression analysis which
we want to predict or understand is called the dependent variable. It
is also called target variable.
o Independent Variable: The factors which affect the dependent
variables or which are used to predict the values of the dependent
variables are called independent variable, also called as
a predictor.
o Outliers: Outlier is an observation which contains either very low
value or very high value in comparison to other observed values. An
outlier may hamper the result, so it should be avoided.
o Multicollinearity: If the independent variables are highly
correlated with each other than other variables, then such condition
is called Multicollinearity. It should not be present in the dataset,
because it creates problem while ranking the most affecting
variable.
o Underfitting and Overfitting: If our algorithm works well with the
training dataset but not well with test dataset, then such problem is
called Overfitting. And if our algorithm does not perform well even
with training dataset, then such problem is called underfitting.

Why do we use Regression Analysis?


As mentioned above, Regression analysis helps in the prediction of a
continuous variable. There are various scenarios in the real world where
we need some future predictions such as weather condition, sales
prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately. So for such case
we need Regression analysis which is a statistical method and used in
machine learning and data science. Below are some other reasons for
using Regression analysis:

Types of regression

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear regression is a straightforward statistical method used for predictive analysis,
modeling the linear relationship between continuous variables. It can be simple (with one
input variable) or multiple (with more than one input variable). The model helps predict
values, such as an employee's salary based on years of experience, by showing how
changes in independent variables affect the dependent variable.

o Below is the mathematical equation for Linear regression:

1. Y= aX+b

Here, Y = dependent variables (target variables),


X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates


o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

Logistic regression is a supervised learning algorithm used for classification problems


where the dependent variable is binary (e.g., 0 or 1, Yes or No). Unlike linear regression,
it predicts probabilities using the sigmoid function, making it ideal for modeling
outcomes like spam detection or binary decision-making.

o Logistic regression uses sigmoid function or logistic function


which is a complex cost function. The function can be represented
as:

o f(x)= Output between the 0 and 1 value.


o x= input to the function
o e= base of natural logarithm.

o It uses the concept of threshold levels, values above the threshold


level are rounded up to 1, and values below the threshold level are
rounded up to 0.

There are three types of logistic regression:

o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
o Polynomial Regression is a type of regression which models
the non-linear dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve
between the value of x and corresponding conditional values

In Polynomial regression, the original features are transformed


into polynomial features of given degree and then modeled using
a linear model.

Linear regression equation Y= b 0+ b1x, is transformed into Polynomial


regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.

Here Y is the predicted/target output, b0, b1,... bn are the regression


coefficients. x is our independent/input variable.

Note: This is different from Multiple Linear regression in such a way that in Polynomial
regression, a single element has different degrees instead of multiple variables with the
same degree.
Support Vector Regression(SVR) is a regression algorithm which works for
continuous variables. Keywords used in SVR.

o Hyperplane: In general SVM, it is a separation line between two


classes, but in SVR, it is a line which helps to predict the continuous
variables and cover most of the datapoints.
o Kernel: It is a function used to map a lower-dimensional data into
higher dimensional data.
o Boundary line: Boundary lines are the two lines apart from
hyperplane, which creates a margin for datapoints.
o Support vectors: Support vectors are the datapoints which are
nearest to the hyperplane and opposite class.

The main goal of SVR is to consider the maximum datapoints


within the boundary lines and the hyperplane (best-fit line) must
contain a maximum number of datapoints.

Here, the blue line is called hyperplane, and the other two lines are known
as boundary lines:

Decision Tree Regression is a supervised learning algorithm used for both classification and
regression problems, handling categorical and numerical data. It constructs a tree-like
structure where each internal node tests an attribute, branches represent outcomes, and leaf
nodes provide the final prediction. The tree starts from a root node and recursively splits into
child nodes until reaching a decision.
o Random forest is one of the most powerful supervised learning
algorithms which is capable of performing regression as well as
classification tasks.
o The Random Forest regression is an ensemble learning method
which combines multiple decision trees and predicts the final output
based on the average of each tree output. The combined decision
trees are called as base models, and it can be represented more
formally as:
o g(x)= f0(x)+ f1(x)+ f2(x)+....

o Random forest uses Bagging or Bootstrap


Aggregation technique of ensemble learning in which aggregated
decision tree runs in parallel and do not interact with each other.

Ridge Regression:

o Ridge regression is one of the most robust versions of linear


regression in which a small amount of bias is introduced so that we
can get better long term predictions.
o The amount of bias added to the model is known as Ridge
Regression penalty. We can compute this penalty term by
multiplying with the lambda to the squared weight of each individual
features.
o The equation for ridge regression will be:

o Ridge regression is a regularization technique, which is used to


reduce the complexity of the model. It is also called as L2
regularization.

Lasso Regression:

o Lasso regression is another regularization technique to reduce the


complexity of the model.
o It is similar to the Ridge Regression except that penalty term
contains only the absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0,
whereas Ridge Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso
regression will be:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,

Y= Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)

a0= intercept of the line (Gives an additional degree of freedom)

a1 = Linear regression coefficient (scale factor to each input value).

ε = random error
The values for x and y variables are training datasets for Linear Regression model
representation.

Type of Linear Regression

1. Simple Linear Regression:

If a single independent variable is used to predict the value of a numerical


dependent variable, then such a Linear Regression algorithm is called
Simple Linear Regression.

2. Multiple Linear regression:

If more than one independent variable is used to predict the value of a


numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.

A linear line showing the relationship between the dependent and independent
variables is called a regression line. A regression line can show two types of
relationship:

Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent


variable increases on X-axis, then such a relationship is termed as a
Positive linear relationship.
Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and independent


variable increases on the X-axis, then such a relationship is called a
negative linear relationship.
When working with linear regression, our main goal is to find the best fit line that
means the error between predicted values and actual values should be
minimized. The best fit line will have the least error.

The different values for weights or the coefficient of lines (a 0, a1) gives a different
line of regression, so we need to calculate the best values for a 0 and a1 to find
the best fit line, so to calculate this we use cost function.

o We can use the cost function to find the accuracy of the mapping
function, which maps the input variable to the output variable. This
mapping function is also known as Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost
function, which is the average of squared error occurred between the
predicted values and actual values. It can be written as: For the above
linear equation, MSE can be calculated as.

Where,

N=Total number of observation

Yi = Actual value

(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual
will be high, and so cost function will high. If the scatter points are close to the
regression line, then the residual will be small and hence the cost function.

Gradient Descent:

o Gradient descent is used to minimize the MSE by calculating the


gradient of the cost function.
o A regression model uses gradient descent to update the coefficients
of the line by reducing the cost function.
o It is done by a random selection of values of coefficient and then
iteratively update the values to reach the minimum cost function.
o Model Performance:
o The Goodness of fit determines how the line of regression fits the
set of observations. The process of finding the best model out of
various models is called optimization. It can be achieved by below
method:

1. R-squared method:

o R-squared is a statistical method that determines the goodness of


fit.
o It measures the strength of the relationship between the dependent
and independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference between
the predicted values and actual values and hence represents a good
model.
o It is also called a coefficient of determination, or coefficient of
multiple determination for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression

Below are some important assumptions of Linear Regression. These are


some formal checks while building a Linear Regression model, which
ensures to get the best possible result from the given dataset.

Linear relationship between the features and target:

Linear regression assumes the linear relationship between the dependent


and independent variables.

Small or no multicollinearity between the features:

Multicollinearity means high-correlation between the independent


variables. Due to multicollinearity, it may difficult to find the true
relationship between the predictors and target variables. Or we can say, it
is difficult to determine which predictor variable is affecting the target
variable and which is not. So, the model assumes either little or no
multicollinearity between the features or independent variables.

Homoscedasticity Assumption:

Homoscedasticity is a situation when the error term is the same for all the
values of independent variables. With homoscedasticity, there should be
no clear pattern distribution of data in the scatter plot.

Normal distribution of error terms:

Linear regression assumes that the error term should follow the normal
distribution pattern. If error terms are not normally distributed, then
confidence intervals will become either too wide or too narrow, which may
cause difficulties in finding coefficients.

It can be checked using the q-q plot. If the plot shows a straight line
without any deviation, which means the error is normally distributed.

No autocorrelations:

The linear regression model assumes no autocorrelation in error terms. If


there will be any correlation in the error term, then it will drastically
reduce the accuracy of the model. Autocorrelation usually occurs if there
is a dependency between residual errors.

The key point in Simple Linear Regression is that the dependent


variable must be a continuous/real value. However, the independent
variable can be measured on continuous or categorical values.

Simple Linear regression algorithm has mainly two objectives

o Model the relationship between the two variables. Such as


the relationship between Income and expenditure, experience and
Salary, etc.
o Forecasting new observations. Such as Weather forecasting
according to temperature, Revenue of a company according to the
investments in a year, etc.
o Simple Linear Regression Model:
o The Simple Linear Regression model can be represented using the
below equation:
o y= a0+a1x+ ε
Where,

a0= It is the intercept of the Regression line (can be obtained putting


x=0)

a1= It is the slope of the regression line, which tells whether the line is
increasing or decreasing.

ε = The error term. (For a good model it will be negligible).

To implement the Simple Linear regression model in machine learning


using Python, we need to follow the below steps:

Step-1: Data Pre-processing

import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
data_set= pd.read_csv('Salary_Data.csv')
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 1].values

Step-2: Fitting the Simple Linear Regression to the Training Set:

Now the second step is to fit our model to the training dataset. To do so,
we will import the LinearRegression class of the linear_model library
from the scikit learn. After importing the class, we are going to create an
object of the class named as a regressor. The code for this is given
below:

#Fitting the Simple Linear Regression model to the training dataset


from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)

Step: 3. Prediction of test set result:

dependent (salary) and an independent variable (Experience). So, now,


our model is ready to predict the output for the new observations. In this
step, we will provide the test dataset (new observations) to the model to
check whether it can predict the correct output or not.
We will create a prediction vector y_pred, and x_pred, which will contain
predictions of test dataset, and prediction of training set respectively.

#Prediction of Test and Training set result


y_pred= regressor.predict(x_test)
x_pred= regressor.predict(x_train)

Step: 4. visualizing the Training set results:

Now in this step, we will visualize the training set result. To do so, we will
use the scatter() function of the pyplot library, which we have already
imported in the pre-processing step. The scatter () function will create a
scatter plot of observations.

fter that, we will assign labels for x-axis and y-axis using xlabel() and
ylabel() function.

Finally, we will represent all above things in a graph using show(). The
code is given below:

mtp.scatter(x_train, y_train, color="green")


mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Training Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()

Step: 5. visualizing the Test set results:

In the previous step, we have visualized the performance of our model on


the training set. Now, we will do the same for the Test set. The complete
code will remain the same as the above code, except in this, we will use
x_test, and y_test instead of x_train and y_train.

Here we are also changing the color of observations and regression line to
differentiate between the two plots, but it is optional.

#visualizing the Test set results


mtp.scatter(x_test, y_test, color="blue")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Test Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship between a single dependent continuous variable and more than one independent
variable.

Backward elimination is a feature selection technique while building a


machine learning model. It is used to remove those features that do not
have a significant effect on the dependent variable or prediction of output.
There are various ways to build a model in Machine Learning, which are:

1. All-in
2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score Comparison

Steps for Backward Elimination method:


We will use the same model which we build in the previous chapter of
MLR. Below is the complete code for it:

#importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
#importing datasets
data_set= pd.read_csv('50_CompList.csv')
#Extracting Independent and dependent Variable
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 4].values
#Catgorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x= LabelEncoder()
x[:, 3]= labelencoder_x.fit_transform(x[:,3])
onehotencoder= OneHotEncoder(categorical_features= [3])
x= onehotencoder.fit_transform(x).toarray()
#Avoiding the dummy variable trap:
x = x[:, 1:]
#Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, rando
m_state=0)
#Fitting the MLR model to the training set:
from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)
#Predicting the Test set result;
y_pred= regressor.predict(x_test)
#Checking the score
print('Train Score: ', regressor.score(x_train, y_train))
print('Test Score: ', regressor.score(x_test, y_test))

o Polynomial Regression is a regression algorithm that models the


relationship between a dependent(y) and independent variable(x) as
nth degree polynomial. The Polynomial Regression equation is given
below:

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

o Hence, "In Polynomial regression, the original features are


converted into Polynomial features of required degree
(2,3,..,n) and then modeled using a linear model."
Note: A Polynomial Regression algorithm is also called Polynomial Linear Regression
because it does not depend on the variables, instead, it depends on the coefficients,
which are arranged in a linear fashion.

Equation of the Polynomial Regression Model:


Simple Linear Regression equation: y = b0+b1x .........(a)

Multiple Linear Regression equation: y= b0+b1x+ b2x2+


b3x3+....+ bnxn .........(b)

Polynomial Regression equation: y= b0+b1x + b2x2+ b3x3+....+


bnxn ..........(c)

Types of ML Classification Algorithms:


o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

So for evaluating a Classification model, we have the following ways:

1. Log Loss or Cross-Entropy Loss:

o It is used for evaluating the performance of a classifier, whose


output is a probability value between the 0 and 1.
o For a good binary Classification model, the value of log loss should
be near to 0.
o The value of log loss increases if the predicted value deviates from
the actual value.
o The lower log loss represents the higher accuracy of the model.
o For Binary classification, cross-entropy can be calculated as:

1. ?(ylog(p)+(1?y)log(1?p))

Where y= Actual output, p= predicted output.

2. Confusion Matrix:

o The confusion matrix provides us a matrix/table as output and


describes the performance of the model.
o It is also known as the error matrix.
o The matrix consists of predictions result in a summarized form,
which has a total number of correct predictions and incorrect
predictions. The matrix looks like as below table:

Actual Positive Actual Negative


o

Predicted Positive True Positive False Positive

Predicted Negative False Negative True Negative

3. AUC-ROC curve:
o ROC curve stands for Receiver Operating Characteristics
Curve and AUC stands for Area Under the Curve.
o It is a graph that shows the performance of the classification model
at different thresholds.
o To visualize the performance of the multi-class classification model,
we use the AUC-ROC Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True
Positive Rate) on Y-axis and FPR(False Positive Rate) on X-axis.

Use cases of Classification Algorithms


Classification algorithms can be used in different places. Below are some
popular use cases of Classification Algorithms:

o Email Spam Detection


o Speech Recognition
o Identifications of Cancer tumor cells.
o Drugs Classification
o Biometric Identification, etc.

Logistic regression is a supervised learning algorithm used for classification


tasks, predicting a categorical outcome (e.g., Yes/No) based on independent
variables. Unlike linear regression, which predicts continuous values, logistic
regression outputs probabilities between 0 and 1, used to classify data into
discrete categories.

o Logistic Regression can be used to classify the observations using


different types of data and can easily determine the most effective
variables used for the classification. The below image is showing the
logistic function:
Note: Logistic regression uses the concept of predictive modeling as regression;
therefore, it is called logistic regression, but is used to classify samples;
Therefore, it falls under the classification algorithm.

Logistic Function (Sigmoid Function):


o The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the "S" form. The S-
form curve is called the Sigmoid function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which
defines the probability of either 0 or 1. Such as values above the threshold
value tends to 1, and a value below the threshold values tends to 0.

Logistic Regression Equation:


The Logistic regression equation can be obtained from the Linear
Regression equation. The mathematical steps to get Logistic Regression
equations are given below:

o We know the equation of the straight line can be written as:


o In Logistic Regression y can be between 0 and 1 only, so for this let's
divide the above equation by (1-y):

o ut we need range between -[infinity] to +[infinity], then take


logarithm of the equation it will become:

Type of Logistic Regression:


On the basis of the categories, Logistic Regression can be classified into
three types:

o Binomial: In binomial Logistic regression, there can be only two possible


types of the dependent variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as "cat", "dogs",
or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible
ordered types of dependent variables, such as "low", "Medium", or "High"
o K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data
and available cases and put the new case into the category that is
most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new
data point based on the similarity. This means when new data
appears then it can be easily classified into a well suite category by
using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not
make any assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and
at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and
when it gets new data, then it classifies that data into a category
that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks
similar to cat and dog, but we want to know either it is a cat or dog.
So for this identification, we can use the KNN algorithm, as it works
on a similarity measure. Our KNN model will find the similar features
of the new data set to the cats and dogs images and based on the
most similar features it will put it in either cat or dog category.

How does K-NN work?


The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
o Step-4: Among these k neighbors, count the number of the data points in
each category.
o Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
o Step-6: Our model is ready.
o Next, we will calculate the Euclidean distance between the data
points. The Euclidean distance is the distance between two points,
which we have already studied in geometry. It can be calculated as:
How to select the value of K in the K-NN
Algorithm?
Below are some points to remember while selecting the value of K in the
K-NN algorithm:

o There is no particular way to determine the best value for "K", so we need
to try some values to find the best out of them. The most preferred value
for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
o Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:


o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


o Always needs to determine the value of K which may be complex some
time.
o The computation cost is high because of calculating the distance between
the data points for all the training samples.
o # importing libraries
o import numpy as nm
o import matplotlib.pyplot as mtp
o import pandas as pd
o
o #importing datasets
o data_set= pd.read_csv('user_data.csv')
o
o #Extracting Independent and dependent Variable
o x= data_set.iloc[:, [2,3]].values
o y= data_set.iloc[:, 4].values
o
o # Splitting the dataset into training and test set.
o from sklearn.model_selection import train_test_split
o x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25,
random_state=0)
o
o #feature Scaling
o from sklearn.preprocessing import StandardScaler
o st_x= StandardScaler()
o x_train= st_x.fit_transform(x_train)
o x_test= st_x.transform(x_test)
Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in Machine Learning.

SVM algorithm can be used for Face detection, image classification,


text categorization, etc.

Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a
single straight line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a
straight line, then such data is termed as non-linear data and
classifier used is called as Non-linear SVM classifier.

Hyperplane: There can be multiple lines/decision boundaries to segregate the


classes in n-dimensional space, but we need to find out the best decision
boundary that helps to classify the data points. This best boundary is known as
the hyperplane of SVM.

How does SVM works?


Linear SVM:

The working of the SVM algorithm can be understood by using an


example. Suppose we have a dataset that has two tags (green and blue),
and the dataset has two features x1 and x2. We want a classifier that can
classify the pair(x1, x2) of coordinates in either green or blue. Consider
the below imag
So as it is 2-d space so by just using a straight line, we can easily separate these
two classes. But there can be multiple lines that can separate these classes.
Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane. SVM algorithm finds the
closest point of the lines from both the classes. These points are called support
vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane
Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight


line, but for non-linear data, we cannot draw a single straight line.
Consider the below image:

So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data,
we will add a third dimension z. It can be calculated as:
z=x2 +y2

Naïve Bayes Classifier Algorithm


o Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification
problems.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a


certain feature is independent of the occurrence of other features. Such as
if the fruit is identified on the bases of color, shape, and taste, then red,
spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending on
each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is
used to determine the probability of a hypothesis with prior knowledge. It
depends on the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the


observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that


the probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the


evidence.

P(B) is Marginal Probability: Probability of Evidence.

Regression vs. Classification in Machine


Learning
Regression and Classification algorithms are Supervised Learning
algorithms. Both the algorithms are used for prediction in Machine
learning and work with the labeled datasets. But the difference between
both is how they are used for different machine learning problems.

The main difference between Regression and Classification algorithms


that Regression algorithms are used to predict the continuous values
such as price, salary, age, etc. and Classification algorithms are used
to predict/Classify the discrete values such as Male or Female, True or
False, Spam or Not Spam, etc.

Consider the below diagram:


Regression Algorithm Classification Algorithm

In Regression, the output variable In Classification, the output


must be of continuous nature or real variable must be a discrete
value. value.

1).Why overfitting occurs?


The possibility of overfitting occurs when the criteria used for training the
model is not as per the criteria used to judge the efficiency of a model.

2).What is the method to avoid overfitting?


Overfitting occurs when we have a small dataset, and a model is trying to
learn from it. By using a large amount of data, overfitting can be avoided.
But if we have a small database and are forced to build a model based on
that, then we can use a technique known as cross-validation. In this
method, a model is usually given a dataset of a known data on which
training data set is run and dataset of unknown data against which the
model is tested. The primary aim of cross-validation is to define a dataset
to "test" the model in the training phase. If there is sufficient data,
'Isotonic Regression' is used to prevent overfitting.

3).What is the meaning of Overfitting in Machine learning?


Overfitting can be seen in machine learning when a statistical model
describes random error or noise instead of the underlying relationship.
Overfitting is usually observed when a model is excessively complex. It
happens because of having too many parameters concerning the number
of training data types. The model displays poor performance, which has
been overfitted.
4).Differentiate between inductive learning and deductive
learning?
In inductive learning, the model learns by examples from a set of
observed instances to draw a generalized conclusion. On the other side, in
deductive learning, the model first applies the conclusion, and then the
conclusion is drawn.

5).How is KNN different from k-means?


KNN or K nearest neighbors is a supervised algorithm which is used for
classification purpose. In KNN, a test sample is given as the class of the
majority of its nearest neighbors. On the other side, K-means is an
unsupervised algorithm which is mainly used for clustering. In k-means
clustering, it needs a set of unlabeled points and a threshold only. The
algorithm further takes unlabeled data and learns how to cluster it into
groups by computing the mean of the distance between different
unlabeled points.

What are the different types of Algorithm methods in


Machine Learning?
The different types of algorithm methods in machine earning are:

o Supervised Learning
o Semi-supervised Learning
o Unsupervised Learning
o Transduction
o Reinforcement Learning

What is the trade-off between bias and variance?


Both bias and variance are errors. Bias is an error due to erroneous or
overly simplistic assumptions in the learning algorithm. It can lead to the
model under-fitting the data, making it hard to have high predictive
accuracy and generalize the knowledge from the training set to the test
set.

Variance is an error due to too much complexity in the learning algorithm.


It leads to the algorithm being highly sensitive to high degrees of variation
in the training data, which can lead the model to overfit the data.

To optimally reduce the number of errors, we will need to tradeoff bias


and variance.
Five popular algorithms are:

o Decision Trees
o Probabilistic Networks
o Neural Networks
o Support Vector Machines
o Nearest Neighbor

What do you mean by ensemble learning?


Numerous models, such as classifiers are strategically made and combined to
solve a specific computational program which is known as ensemble learning.
The ensemble methods are also known as committee-based learning or learning
multiple classifier systems.

What is a model selection in Machine Learning?


The process of choosing models among diverse mathematical models,
which are used to define the same data is known as Model Selection.
Model learning is applied to the fields of statistics, data mining,
and machine learning.

Describe Precision and Recall?


Precision and Recall both are the measures which are used in the
information retrieval domain to measure how good an information
retrieval system reclaims the related data as requested by the user.

Precision can be said as a positive predictive value. It is the fraction of


relevant instances among the received instances.

On the other side, recall is the fraction of relevant instances that have
been retrieved over the total amount or relevant instances. The recall is
also known as sensitivity.

he classification methods that SVM can handle are:

o Combining binary classifiers


o Modifying binary to incorporate multiclass learning

What do you understand by the Confusion Matrix?


A confusion matrix is a table which is used for summarizing the
performance of a classification algorithm. It is also known as the error
matrix.
Where,

TN= True Negative


TP= True Positive
FN= False Negative
FP= False Positive
Explain True Positive, True Negative, False Positive, and False Negative in
Confusion Matrix with an example.
True Positive
When a model correctly predicts the positive class, it is said to be a true positive.
For example, Umpire gives a Batsman NOT OUT when he is NOT OUT.
True Negative
When a model correctly predicts the negative class, it is said to be a true
negative.
For example, Umpire gives a Batsman OUT when he is OUT.
False Positive
When a model incorrectly predicts the positive class, it is said to be a false
positive. It is also known as 'Type I' error.
For example, Umpire gives a Batsman NOT OUT when he is OUT.
False Negative
When a model incorrectly predicts the negative class, it is said to be a false
negative. It is also known as 'Type II' error.
For example, Umpire gives a Batsman OUT when he is NOT OUT.
What according to you, is more important between model accuracy and model
performance?

Model accuracy is a subset of model performance. The accuracy of the model is directly
proportional to the performance of the model. Thus, better the performance of the model,
more accurate are the predictions.

What is Bagging and Boosting?


o Bagging is a process in ensemble learning which is used for improving
unstable estimation or classification schemes.
o Boosting methods are used sequentially to reduce the bias of the
combined model.

What are the similarities and differences between bagging


and boosting in Machine Learning?
Similarities of Bagging and Boosting

o Both are the ensemble methods to get N learns from 1 learner.


o Both generate several training data sets with random sampling.
o Both generate the final result by taking the average of N learners.
o Both reduce variance and provide higher scalability.

Differences between Bagging and Boosting

o Although they are built independently, but for Bagging, Boosting tries to
add new models which perform well where previous models fail.
o Only Boosting determines the weight for the data to tip the scales in favor
of the most challenging cases.
o Only Boosting tries to reduce bias. Instead, Bagging may solve the
problem of over-fitting while boosting can increase it.

Describe dimension reduction in machine learning.


Dimension reduction is the process which is used to reduce the number of
random variables under considerations.

Dimension reduction can be divided into feature selection and extraction.


Why instance-based learning algorithm sometimes
referred to as Lazy learning algorithm?
In machine learning, lazy learning can be described as a method where
induction and generalization processes are delayed until classification is
performed. Because of the same property, an instance-based learning
algorithm is sometimes called lazy learning algorithm.

What do you understand by the F1 score?


The F1 score represents the measurement of a model's performance. It is
referred to as a weighted average of the precision and recall of a model.
The results tending to 1 are considered as the best, and those tending
to 0 are the worst. It could be used in classification tests, where true
negatives don't matter much.

What do you understand by Underfitting?


Underfitting is an issue when we have a low error in both the training set
and the testing set. Few algorithms work better for interpretations but fail
for better predictions.

When does regularization become necessary in Machine


Learning?
Regularization is necessary whenever the model begins to overfit/
underfit. It is a cost term for bringing in more features with the objective
function. Hence, it tries to push the coefficients for many variables to zero
and reduce cost term. It helps to reduce model complexity so that the
model can become better at predicting (generalizing).

What is Regularization? What kind of problems does


regularization solve?
A regularization is a form of regression, which constrains/ regularizes or
shrinks the coefficient estimates towards zero. In other words, it
discourages learning a more complex or flexible model to avoid the risk of
overfitting. It reduces the variance of the model, without a substantial
increase in its bias.

Regularization is used to address overfitting problems as it penalizes the


loss function by adding a multiple of an L1 (LASSO) or an L2 (Ridge) norm
of weights vector w.
Why do we need to convert categorical variables into
factor? Which functions are used to perform the
conversion?
Most Machine learning algorithms require number as input. That is why we
convert categorical values into factors to get numerical values. We also
don't have to deal with dummy variables.

The functions factor() and as.factor() are used to convert variables into
factors.

Do you think that treating a categorical variable as a


continuous variable would result in a better predictive
model?
For a better predictive model, the categorical variable can be considered
as a continuous variable only when the variable is ordinal in nature.

You might also like