Ai&ml Unit 5
Ai&ml Unit 5
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
22IT401
ARTIFICIAL
INTELLIGENCE AND
MACHINE LEARNING
Department: Information Technology
Batch/Year: 2022-2026 / II
Created by: Dr. T. Mahalingam
Dr. S. Selvakanmani
Date: 12-02-2024
Table of Contents
SLIDE
S.NO. CONTENTS
NO.
1 CONTENTS 5
2 COURSE OBJECTIVES 7
SLIDE
S.NO. CONTENTS
NO.
24 ASSIGNMENT 1- UNIT 4 64
30 ASSESSMENT SCHEDULE 72
PRE-REQUISITE CHART
22IT401-ARTIFICIAL INTELLIGENCE
AND MACHINE LEARNING
22MA401- Probability
and Statistics
Introduction: What Is Ai, The Foundations Of Artificial Intelligence, The History Of Artificial
Intelligence, The State Of The Art. Intelligent Agents: Agents And Environments, Good Behaviour:
The Concept Of Rationality, The Nature Of Environments, And The Structure Of Agents. Solving
Problems By Searching: Problem-Solving Agents, Uninformed Search Strategies, Informed
(Heuristic) Search Strategies, Heuristic Functions. Beyond Classical Search: Local Search
Algorithms and Optimization Problems, Searching With Nondeterministic Actions And Partial
Observations, Online Search Agents And Unknown Environments. Constraint Satisfaction
Problems: Definition, Constraint Propagation, Backtracking Search, Local Search, The Structure Of
Problems.
List of Exercise/Experiments
List of Exercise/Experiments
1. Implementation of forward and backward chaining.
2. Implementation of unification algorithms.
9
4. 22IT401 ARTIFICIAL INTELLIGENCE AND MACHINE
LEARNING
LTPC
UNIT 3 LEARNING
3003
Learning from Examples: Forms of Learning, Supervised Learning, Learning Decision
Trees, Evaluating and Choosing the Best Hypothesis, The Theory of Learning, Regression
and Classification with Linear Models, Artificial Neural Networks. Applications: Human
computer interaction (HCI), Knowledge management technologies, AI for customer
relationship management, Expert systems, Data mining, text mining, and Web mining,
Other current topics.
List of Exercise/Experiments
1. Numpy Operations
2. NumPy arrays
4. NumPy Exercise:
(i) Write code to create a 4x3 matrix with values ranging from 2 to 13.
(ii) Write code to replace the odd numbers by -1 in the following array.
(iii) Perform the following operations on an array of mobile phones prices 6999,
7500, 11999, 27899, 14999, 9999.
e) Apply GST of 18% on mobile phones prices and update this array.
TOTAL : 45 PERIODS
10
4. 22IT401 ARTIFICIAL INTELLIGENCE AND MACHINE
LEARNING
LTPC
UNIT 4 FUNDAMENTALS OF MACHINE LEARNING
3003
Motivation for Machine Learning, Applications, Machine Learning, Learning associations,
Classification, Regression, The Origin of machine learning, Uses and abuses of machine
learning, Success cases, How do machines learn, Abstraction and knowledge
representation, Generalization, Factors to be considered, Assessing the success of
learning, Metrics for evaluation of classification method, Steps to apply machine learning
to data, Machine learning process, Input data and ML algorithm, Classification of machine
learning algorithms, General ML architecture, Group of algorithms, Reinforcement
learning, Supervised learning, Unsupervised learning, Semi-Supervised learning,
Algorithms, Ensemble learning, Matching data to an appropriate algorithm.
List of Exercise/Experiments
1. Build linear regression models to predict housing prices using python , using data set
available Google colabs.
2. Stock Ensemble-based Neural Network for Stock Market Prediction using Historical Stock
Data and Sentiment Analysis.
11
4. 22IT401 ARTIFICIAL INTELLIGENCE AND MACHINE
LEARNING
List of Exercise/Experiments
LTPC
Use Cases
3003
Case Study 1: Churn Analysis and Prediction (Survival Modelling)
Cox-proportional models
Churn Prediction
Imbalanced Data
Neural Network
Case study 3: Sentiment Analysis or Topic Mining from New York Times
Part-of-Speech Tagging
A/B testing
User based
Item Based
Segmentation Strategies
Lifetime Value
Risk Profiling
Portfolio Optimization
Graph Construction
12
Route Optimization
5.COURSE OUTCOME
Cognitive/
Affective Expected
Course
Course Outcome Statement Level of the Level of
Code
Course Attainment
Outcome
Course Outcome Statements in Cognitive Domain
13
6.CO-PO/PSO MAPPING
C211 K
3 2 1 1 3 2 3
.5 3
C211 2.8 2 1.2 1.2 3 0.8 3
14
UNIT V
13
LECTURE PLAN – UNIT V
UNIT 5 INTRODUCTION
Sl.
No PRO POS ED
LECT URE
ACTUAL
NO OF LECTURE
PERTAIN TAXONOMY
TOPIC PERIO MODE OF
ING LEVEL
DS DELIVERY
CO(s)
PERI OD PERIOD
21.03.2024 21.03.2024
Supervised Learning, Regression, (2) (2)
1 Linear regression, Multiple linear
1 CO2 K2 MD1
regression, A multiple regression
analysis
26.03.2024 26.03.2024
Overfitting, Detecting overfit (1) (1)
3 1 CO2 K2 MD1
models: Cross validation
11.04.2024 11.04.2024
Decision trees: Background, (2) (2)
5 Decision trees, Decision trees for 1 CO2 K3 MD1
credit card promotion
15
LECTURE NOTES
UNIT 5
UNIT 5
In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.
In supervised learning, models are trained using labelled dataset, where the model learns about
each type of data. Once the training process is completed, the model is tested on the basis of
test data (a subset of the training set), and then it predicts the output.
The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify
the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it classifies
the shape on the bases of a number of sides, and predicts the output.
o Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we need validation sets as
the control parameters, which are the subset of training datasets.
o Evaluate the accuracy of the model by providing the test set. If the model predicts the
correct output, which means our model is accurate.
Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come
under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there
are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
Advantages of Supervised learning:
o With the help of supervised learning, the model can predict the output on the basis of
prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.
o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is different from
the training dataset.
We can understand the concept of regression analysis using the below example:
Example: Suppose there is a marketing company A, who does various advertisement every
year and get sales on that. The below list shows the advertisement made by the company in the
last 5 years and the corresponding sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know
the prediction about the sales for this year. So to solve such type of prediction problems in
machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best fits the given datapoints, using
this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.
o Regression estimates the relationship between the target and the independent variable.
Types of Regression
There are various types of regressions which are used in data science and machine learning.
Each type has its own importance on different scenarios, but at the core, all the regression
methods analyze the effect of the independent variable on dependent variables. Here we are
discussing some important types of regression which are given below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
Y= aX+b
o Salary forecasting
Regression models are used to describe relationships between variables by fitting a line to the
observed data. Regression allows you to estimate how a dependent variable changes as the
independent variable(s) change.
Multiple linear regression is used to estimate the relationship between two or more
independent variables and one dependent variable. You can use multiple linear regression
when you want to know:
1. How strong the relationship is between two or more independent variables and one
dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect
crop growth).
2. The value of the dependent variable at a certain value of the independent variables (e.g.
the expected yield of a crop at certain levels of rainfall, temperature, and fertilizer
addition).
them.
Multiple linear regression makes all of the same assumptions as simple linear regression:
Homogeneity of variance (homoscedasticity): the size of the error in our prediction doesn’t
change significantly across the values of the independent variable.
Independence of observations: the observations in the dataset were collected using
statistically valid sampling methods, and there are no hidden relationships among variables.
In multiple linear regression, it is possible that some of the independent variables are actually
correlated with one another, so it is important to check these before developing the regression
model. If two independent variables are too highly correlated (r2 > ~0.6), then only one of them
should be used in the regression model.
Linearity: the line of best fit through the data points is a straight line, rather than a curve or
some sort of grouping factor.
• … = do the same for however many independent variables you are testing
To find the best-fit line for each independent variable, multiple linear regression calculates
three things:
• The regression coefficients that lead to the smallest overall model error.
• The t statistic of the overall model.
• The associated p value (how likely it is that the t statistic would have occurred by
chance if the null hypothesis of no relationship between the independent and
dependent variables was true).
It then calculates the t statistic and p value for each regression coefficient in the model.
import pandas as pd
def sklearn_to_df(data_loader):
X_data = data_loader.data
X_columns = data_loader.feature_names
X = pd.DataFrame(X_data, columns=X_columns)
y_data = data_loader.target
y = pd.Series(y_data, name='target')
return x, y
x, y = sklearn_to_df(load_boston())
x, y, test_size=0.2, random_state=42)
mulreg = MultipleLinearRegression()
mulreg.fit(x_train, y_train)
pred = mulreg.predict(x_test)
# calculate r2_score
When predicting a complex process's outcome, it is best to use multiple linear regression
instead of simple linear regression.
A simple linear regression can accurately capture the relationship between two variables in
simple relationships. On the other hand, multiple linear regression can capture more complex
interactions that require more thought.
A multiple regression model uses more than one independent variable. It does not suffer from
the same limitations as the simple regression equation, and it is thus able to fit curved and non-
linear relationships. The following are the uses of multiple linear regression.
1. Planning and Control.
2. Prediction or Forecasting.
Estimating relationships between variables can be exciting and useful. As with all other
regression models, the multiple regression model assesses relationships among variables in
terms of their ability to predict the value of the dependent variable.
Why and When to Use Multiple Regression Over a Simple OLS Regression?
When you're trying to predict something, it's usually helpful to start with a linear model. But
sometimes things aren't so simple.
Multiple regression is used when you want to predict a dependent variable using more than one
independent variable. It's the same type of regression as ordinary linear squares (OLS)
regression. On the other hand, OLS regression distinguishes the effect of an explanatory
variable on a continuous dependent variable by comparing the distributions of these variables
based on the changes in the value of the explanatory variables.
MLR can use more than one explanatory variable at once. This allows you to make better
predictions about what might happen in your data if certain changes were made.
Examples : Example #1
Let us try and understand the concept of multiple regression analysis with the help of an
example. But, first, let us try to find out the relation between the distance covered by an
UBER driver and the age of the driver, and the number of years of experience of the
driver.
To calculate multiple regression, go to the “Data” tab in Excel and select the “Data Analysis”
option. For further procedure and calculation, refer to the: Analysis ToolPak in Excel article.
1. y = MX + MX + b
2. y= 604.17*-3.18+604.17*-4.06+0
3. y= -4377
In this particular example, we will see which variable is the dependent variable and which
variable is the independent variable. The dependent variable in this regression equation is the
distance covered by the UBER driver, and the independent variables are the age of the driver
and the number of experiences he has in driving.
Example #2
Let us try and understand the concept of multiple regression analysis with the help of
another example. Let us try to find the relation between the GPA of a class of students,
the number of hours of study, and the student’s height.
Go to the “Data” tab in Excel and select the “Data Analysis” option for the calculation.
y= 1.08*.03+1.08*-.002+0
y= .0325
In this particular example, we will see which variable is the dependent variable and which
variable is the independent variable. The dependent variable in this regression is the GPA,
and the independent variables are study hours and the height of the students
Overfitting, Detecting overfit models: Cross validation, Cross validation: The ideal
procedure, Parameter estimation,
Overfitting
A modeling error that occurs when a function corresponds too closely to a particular set
of data
What is Overfitting?
Overfitting is a term used in statistics that refers to a modeling error that occurs when a function
corresponds too closely to a particular set of data. As a result, overfitting may fail to fit
additional data, and this may affect the accuracy of predicting future observations.
Overfitting can be identified by checking validation metrics such as accuracy and loss. The
validation metrics usually increase until a point where they stagnate or start declining when the
model is affected by overfitting. During an upward trend, the model seeks a good fit, which,
when achieved, causes the trend to start declining or stagnate.
Summary
• Overfitting is a modeling error that introduces bias to the model because it is too closely
related to the data set.
• Overfitting makes the model relevant to its data set only, and irrelevant to any other
data sets.
• Some of the methods used to prevent overfitting include ensembling, data
augmentation, data simplification, and cross-validation.
How to Detect Overfitting?
Detecting overfitting is almost impossible before you test the data. It can help address the
inherent characteristic of overfitting, which is the inability to generalize data sets. The data can,
therefore, be separated into different subsets to make it easy for training and testing. The data
is split into two main parts, i.e., a test set and a training set.
The training set represents a majority of the available data (about 80%), and it trains the model.
The test set represents a small portion of the data set (about 20%), and it is used to test the
accuracy of the data it never interacted with before. By segmenting the dataset, we can examine
the performance of the model on each set of data to spot overfitting when it occurs, as well as
see how the training process works.
The performance can be measured using the percentage of accuracy observed in both data
sets to conclude on the presence of overfitting. If the model performs better on the training set
than on the test set, it means that the model is likely overfitting.
One of the ways to prevent overfitting is by training with more data. Such an option makes it
easy for algorithms to detect the signal better to minimize errors. As the user feeds more
training data into the model, it will be unable to overfit all the samples and will be forced to
generalize to obtain results.
Users should continually collect more data as a way of increasing the accuracy of the model.
However, this method is considered expensive, and, therefore, users should ensure that the data
being used is relevant and clean.
2. Data augmentation
An alternative to training with more data is data augmentation, which is less expensive
compared to the former. If you are unable to continually collect more data, you can make the
available data sets appear diverse.
Data augmentation makes a sample data look slightly different every time it is processed by
the model. The process makes each data set appear unique to the model and prevents the model
from learning the characteristics of the data sets.
Another option that works in the same way as data augmentation is adding noise to the input
and output data. Adding noise to the input makes the model become stable, without affecting
data quality and privacy, while adding noise to the output makes the data more diverse.
However, noise addition should be done with moderation so that the extent of the noise is not
so much as to make the data incorrect or too different.
3. Data simplification
Overfitting can occur due to the complexity of a model, such that, even with large volumes of
data, the model still manages to overfit the training dataset. The data simplification method is
used to reduce overfitting by decreasing the complexity of the model to make it simple enough
that it does not overfit.
Some of the actions that can be implemented include pruning a decision tree, reducing the
number of parameters in a neural network, and using dropout on a neutral network. Simplifying
the model can also make the model lighter and run faster.
4. Ensembling
Ensembling is a machine learning technique that works by combining predictions from two or
more separate models. The most popular ensembling methods include boosting and bagging.
Boosting works by using simple base models to increase their aggregate complexity. It trains
a large number of weak learners arranged in a sequence, such that each learner in the
sequence learns from the mistakes of the learner before it.
Boosting combines all the weak learners in the sequence to bring out one strong learner. The
other ensembling method is bagging, which is the opposite of boosting. Bagging works by
training a large number of strong learners arranged in a parallel pattern and then combining
them to optimize their predictions.
Predicted R-squared has several cool features. First, you can just include it in the output as you
fit the model without any extra steps on your part. Second, it’s easy to interpret. You simply
compare predicted R-squared to the regular R-squared and see if there is a big difference.
What is cross-validation?
The idea is to select the approach that maximizes performance. This is the model that will be
deployed into production. Besides, you also want to get a reliable estimate of that model’s
performance.
Suppose you do cross-validation to select a model. You test many alternatives using 5-fold
cross-validation. Then, a linear regression comes out on top.
Should you re-train the linear regression using all available data? or should you use the models
trained during cross-validation?
This part creates some confusion among data scientists — not only among beginners but also
among more seasoned professionals.
After cross-validation, you should re-train the best approach using all available data. Here’s a
quote taken from the legendary book Elements of Statistical Learning [1](parenthesis mine):
Our final chosen model [after cross-validation] is f(x), which we then fit to all the data.
Some practitioners keep the best models trained during cross-validation. Following the example
above, you’d keep 5 linear regression models. Then, during the deployment stage, you’d
average their predictions for each prediction.
Fewer data
By not re-training, you’re not using all available instances for creating a model.
This can lead to a sub-optimal model unless you have tons of data. Training with all available
instances is likely to generalize better.
Re-training is especially important in time series because the most recent observations are used
for testing. By not re-training in these, the model might miss newly emerged patterns.
Increased costs
One can argue that combining the 5 models trained during cross-validation leads to better
performance.
Yet, it’s important to understand the implications. You’re no longer using a simple,
interpretable, linear regression.
Your model is an ensemble whose individual models are trained by random subsampling.
Random subsampling is a way of introducing diversity in ensembles. Ensembles often perform
better than single models. But, they also lead to extra costs and lower transparency.
That would solve the problem of increased costs. Yet, it’s not clear which version of the model
you should choose.
There are two reasons re-training can be skipped. If the data set is large or if re-training is too
costly. These two issues are often linked.
Here’s an example of how you can re-train the best model after cross-validation:
# 5-fold cross-validation
cv = KFold(n_splits=5)
# optimizing the number of trees of a RF
model = RandomForestRegressor()
param_search = {'n_estimators': [10, 50,
100]}
The goal is to optimize the number of trees in a Random Forest. This is done with
the GridSearchCV class from scikit-learn. You can set the parameter refit=True, and the best
model is re-trained after cross-validation automatically.
You can do this explicitly by getting the best parameters from GridSearchCV to initialize a
new model:
best_model = RandomForestRegressor(**gs.best_params_)
best_model.fit(X, y)
Cross-validation and re-training cover the first two points, but not the third.
Why is that?
Cross-validation is often repeated several times before selecting a final model. You test
different transformations and hyperparameters. So, you end up adjusting your method until
you’re happy with the result.
This can lead to overfitting because the details of the validation sets can leak into the model.
Thus, the performance estimate you get from cross-validation can be too optimistic. You can
read more about this in the article in reference [2].
This is one of the reasons why Kaggle competitions have two leaderboards, one public and
another private. This prevents competitors from overfitting the test set.
You should make an extra evaluation step. After cross-validation, you evaluate the selected
model in a held-out test set. The full workflow is like this:
3. Re-train the chosen model using the training data and evaluate it on the test set.
This provides you with an unbiased performance estimate;
4. Re-train the chosen model using all available data and deploy it.
Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes
or No, True or False, Spam or not spam, etc.
o It uses the concept of threshold levels, values above the threshold level are rounded up to 1,
and values below the threshold level are rounded up to 0.
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Imagine, you got a new job offer and need to decide whether you are going to take it or leave
it. You consider several factors; starting from Salary, distance and commute time to the
office, other perks and benefits, career growth, and so on. But you don’t choose all the
factors at one time. Your brain processes the information through a series of if-else branches
like the picture below:
You start thinking about the salary first; which becomes the main factor or starting point for
youranalysis.
If salary criteria are met and it’s above $50K then do you think the commute time to the office
is more than an hour or less? If the office is nearby and you can reach it easily, then you start
thinking about whether the office offers you coffee and other perks. Gradually if all those
conditions are met, you finally go ahead and accept the offer.
The decision tree algorithm works exactly in the same fashion. In some sense, it’s the real-
world replica of how a human brain makes decisions with series of clarifications and asks
processing each one at a time in a sequential manner.
To create a decision tree, the algorithm begins by considering all the available features (also
called “attributes”) of the input data. It then selects the feature that best splits the data into
different classes or categories. This process is repeated for each split, with the algorithm
choosing the feature that best divides the data at each step. The process continues until the tree
is fully grown, or until a stopping criterion is reached (such as a maximum tree depth or a
minimum number of samples in a leaf node).
Once the decision tree is created, it can be used to make predictions on new input data by
following the path down the tree based on the feature values of the input data.
Decision trees are easy to interpret and understand, and they can handle both continuous and
categorical data. However, they can be prone to overfitting, particularly if the tree is allowed
to grow too deep. To mitigate this, techniques such as pruning (removing branches from the
tree) or limiting the maximum depth of the tree can be used.
• Root node: The top node of a decision tree, representing the entire population
or sample.
• Splitting: The process of dividing a node into two or more sub-nodes based on a
feature or attribute value.
• Decision node: A node that represents a decision to be made based on the value
of a feature or attribute.
• Leaf node: A terminal node that does not have any sub-nodes, representing a
classification or prediction.
• Pruning: The process of removing branches from a decision tree to reduce
overfitting and improve generalization to new data.
• Decision boundary: The line or plane that separates different classes or categories
in the data.
• Gini index: A measure of the purity of the nodes in a decision tree, based on
the proportion of samples belonging to a particular class.
• Information gain: A measure of the reduction in entropy (randomness or
uncertainty) caused by splitting the data based on a particular feature.
• Overfitting: The phenomenon where a model fits the training data too well and
does not generalize well to new data.
• Underfitting: The phenomenon where a model does not fit the training data well
and therefore performs poorly on both the training and test data.
A decision tree in general is termed a Classification and Regression Tree (CART). It can be
used for both classification problems as well as for continuous variable predictions too.
However, in this article, we will restrict ourselves to a real-world example in classification
only.
There are multiple real-life applications of Decision trees. Some examples include:
• Medical diagnosis: Make medical diagnoses based on a set of symptoms or test results.
• Credit approval: Banks and financial institutions can use decision trees to predict
the likelihood of an individual defaulting on a loan or credit card based on their
credit history and other factors.
• Marketing: Predict customer behavior and make targeted marketing campaigns
based on factors such as age, income, and purchasing history.
• Fraud detection: Identify fraudulent transactions in areas such as credit card use or
insurance claims.
• Oil reservoir characterization: Predict the characteristics of an oil reservoir based
on data such as rock type and porosity.
• Customer churn prediction: Predict the likelihood of a customer churning
(leaving a company) based on factors such as their usage patterns and customer
service
interactions.
Growing a tree
The decision tree algorithm starts at the root node and progresses downward in search of the
purest set of data points. Speaking simply, the objective of a decision tree algorithm is to create
splits at different nodes such that the resulting nodes (set of observations or points) are as
homogeneous as possible.
As can be seen in the below figure, node A is an equal mix of blue and yellow dots and the
most impure node in that sense, node C is all blue and the purest set of data points, and node B
falls in-between node A and C.
Concept of Entropy and Splits:
In decision tree analysis, entropy is a measure of the impurity or randomness of a set of data.
It is commonly used to evaluate the quality of a split in a decision tree. The idea is that a split
that results in pure, homogeneous subsets (low entropy) is more useful for making accurate
predictions than a split that results in mixed or heterogeneous subsets (high entropy).
where p(i) is the proportion of data points in the set that belong to class i.
For example, consider a set of data with two classes, A and B. If the data is perfectly balanced,
with 50% of the data points belonging to class A and 50% belonging to class B, the entropy
would be 1. If the data is completely imbalanced, with all data points belonging to class A or
all data points belonging to class B, the entropy would be 0.
In a decision tree, the entropy of a set of data is used to evaluate the quality of a split. A split
that results in pure, homogeneous subsets (low entropy) is more useful for making accurate
predictions than a split that results in mixed or heterogeneous subsets (high entropy). The goal
of the decision tree is to find the split that results in the lowest possible entropy, so that the
resulting subsets are as pure as possible.
While building a decision tree it becomes very important in choosing the right feature or
predictor for splitting and growing the treetop to down. To obtain the right set of features, the
concept of Information gain is used which is developed on the principle of maximum entropy
reduction while traversing from the top node to the bottom node by choosing the right set of
features.
The concept of information gain is presented below:
Say in a real-life problem, you need to decide which factor is more important among Energy
level and motivation for going to the gym. While exploring this the following set of responses
in the form of a decision tree were observed.
Therefore, it’s evident that information gain or reduction in entropy would be higher if we
chose Energy as the next feature. Therefore, the tree would select “Energy” as the next
splitting criteria.
The split with the highest information gain will be taken as the first split. The process will
continue until all children nodes are pure, or until the information gain is 0. That’s the reason,
decision tree algorithms are termed greedy algorithms. They build the tree until each and
every node becomes completely pure.
However, growing a tree to reach the purest set of nodes may not be always feasible owing to
computational challenges and overfitting problems on the training data. That is why the
concept of pruning comes into the picture. The growth of the decision tree can be restricted
by cutting the branches using hyperparameter tuning or by cost complexity pruning. The
details of this are outside the scope of this article. However, we should have a clear
understanding of these aspects as well while building a model using the CART algorithm.
For this family of models, the research needs to have at hand a dataset with some observations
without the need of having also the labels/classes of the observations.
Unsupervised learning studies how systems can infer a function to describe a hidden structure
from unlabeled data. The system doesn’t predict the right output, but instead, it explores the data
and can draw inferences from datasets to describe hidden structures from unlabeled data.
Unsupervised models can be further grouped. into clustering and association cases.
• Clustering: A clustering problem is where you want to unveil
the inherent groupings in the data, such as grouping animals based on some
characteristics/features e.g. number of legs.
Some examples of models that belong to this family are the following: PCA, K-means,
DBSCAN, mixture models etc.
K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science. In this topic, we will learn what is K-means
clustering algorithm, how the algorithm works, along with the Python implementation of k-
means clustering.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim
of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is
given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into
different clusters. It means here we will try to group these datasets into two different
clusters.
o We need to choose some random k points or centroid to form the cluster. These points
can be either the points from the dataset or any other point. So, here we are selecting
the below two points as k points, which are not the part of our dataset. Consider the
below image:
o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have studied to
calculate the distance between two points. So, we will draw a median between both
From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color them as
blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by choosing a new
centroid. To choose the new centroids, we will compute the center of gravity of these
centroids, and will find new centroids as below:
o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the
same process of finding a median line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and two blue
points are right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new
centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign the data
points. So, the image will be:
o We can see in the above image; there are no dissimilar data points on either side of the
line, which means our model is formed. Consider the below image:
o
As our model is ready, so we can now remove the assumed centroids, and the two final clusters
will be as shown in the below image:
The performance of the K-means clustering algorithm depends upon highly efficient clusters
that it forms. But choosing the optimal number of clusters is a big task. There are some different
ways to find the optimal number of clusters, but here we are discussing the most appropriate
method to find the number of clusters or value of K. The method is given below:
Trial error search and delayed reward are the most relevant characteristics of reinforcement
learning. This family of models allows the automatic determination of the ideal behavior
within a specific context in order to maximize the desired performance.
Reward feedback is required for the model to learn which action is best and this is known as
“the reinforcement signal”.
A multi-armed bandit is a complicated slot machine wherein instead of 1, there are several
levers which a gambler can pull, with each lever giving a different return. The probability
distribution for the reward corresponding to each lever is different and is unknown to the
gambler.
The task is to identify which lever to pull in order to get maximum reward after a given set of
trials. This problem statement is like a single step Markov decision process, which I discussed
in this article. Each arm chosen is equivalent to an action, which then leads to an immediate
reward.
The below table shows the sample results for a 5-armed Bernoulli bandit with arms labelled as
1, 2, 3, 4 and 5:
This is called Bernoulli, as the reward returned is either 1 or 0. In this example, it looks like
the arm number 3 gives the maximum return and hence one idea is to keep playing this arm in
order to obtain the maximum reward (pure exploitation).
Just based on the knowledge from the given sample, 5 might look like a bad arm to play, but
we need to keep in mind that we have played this arm only once and maybe we should play it
a few more times (exploration) to be more confident. Only then should we decide which arm
to play (exploitation).
Use Cases
Bandit algorithms are being used in a lot of research projects in the industry. I have listed some
of their use cases in this section.
Clinical Trials
The well being of patients during clinical trials is as important as the actual results of the study.
Here, exploration is equivalent to identifying the best treatment, and exploitation is treating
patients as effectively as possible during the trial.
Network Routing
Routing is the process of selecting a path for traffic in a network, such as telephone networks
or computer networks (internet). Allocation of channels to the right users, such that the overall
throughput is maximised, can be formulated as a MABP.
Online Advertising
The goal of an advertising campaign is to maximise revenue from displaying ads. The
advertiser makes revenue every time an offer is clicked by a web user. Similar to MABP, there
is a trade-off between exploration, where the goal is to collect information on an ad’s
performance using click-through rates, and exploitation, where we stick with the ad that has
performed the best so far.
Game Designing
Building a hit game is challenging. MABP can be used to test experimental changes in game
play/interface and exploit the changes which show positive experiences for players.
A decision is an attribute that you (or your organization) have the authority to alter as the
decision-maker explicitly. When people make decisions, substantial things should be
considered in doing business, such as how to invest a new project, how much to spend
and sell, where to place a website, or what budget to devote to marketing.
When a decision tree is very abstract and hard to explain certain thing, an influence diagram
is more useful as it would provide a higher-level description of what was found using the
decision tree.
• Empathy mapping;
• Experience mapping;
• Customer journey mapping;
• Service design (blueprint) mapping.
In influence diagrams, the semantics are of two kinds - Arrows and Nodes.
o Arrow
An arrow will denote an influence. An arrow from A to B implies that understanding A will
directly affect our assumption or opinion for B. An effect communicates the pertinence
information, which may mean a causal interaction, or a flow of data, information, or money,
but need not.
o Node
A node is a predecessor at the beginning of an arc; a node at the end is a successor.
(2) Uncertainty node is drawn as an oval (corresponding to the ambiguity to be based on);
(3) The deterministic node is drawn as a double oval (corresponding to a special kind of
uncertainty where the end decision is already known).
The influence diagram displays system dependency. There is an essential contrast between
the influence diagrams and the decision trees. Decision trees provide much more information
on a potential decision.
Influence diagrams are directly connected to and mostly used in combination with decision
trees. An influence diagram provides a summary of the knowledge in a decision tree.
A decision tree is a diagram of a set of connected choices with different outcomes. It allows a
person or organization, based on their costs, probabilities, and benefits, to evaluate available
options against each other.
Decision trees can become immensely complicated. A more compact influence diagram may
be a suitable substitute in these situations. Influence diagrams simplify the emphasis on
essential choices, inputs, and goals.
A decision tree can be used to develop automatic predictive models with machine learning,
data mining, and analytics applications.
Step 2: According to your needs, to customize anything you like, from text to the shapes.
Step 3: Once you are satisfied, just export your influence diagram in various formats, like
Microsoft Office, Graphs, PDF, PS, Visio and more.
Examples of The Influence Diagram
Source:EdrawMax
The first example is a basic influence diagram, which illustrates how the business operates
from investment to the final making profits. Since it is the simple influence diagram, the
information has been visualized and easy to understand.
The above example denotes the influence diagram of a store including aspects like
salesperson, cashier, product, money, etc. This diagram represents the key areas of decision
and uncertainty and is connected with arrows.
Risk Modeling
Credit risk modeling–the process of estimating the probability someone will pay back a
loan–is one of the most important mathematical problems of the modern world. In this article,
we’ll explore from the ground up how machine learning is applied to credit risk modeling.
You don’t need to know anything about machine learning to understand this article!
To explain credit risk modeling with machine learning, we’ll first develop domain knowledge
about credit risk modeling. Then, we’ll introduce four fundamental machine learning systems
that can be used for credit risk modeling:
• K-Nearest Neighbors
• Logistic Regression
• Decision Trees
• Neural Networks
By the end of this article, you’ll understand how each of these algorithms can be applied to the
real-world problem of credit risk modeling, and you’ll be well on your way to understanding
the field of machine learning in general!
Let’s begin learning about what credit risk modeling is by looking at a simple situation.
The Situation
Say your buddy Ted needs ten bucks. You’ll want those bucks back, so he promises
he’ll repay you tomorrow when you see him again.
SensitivityAnalysis
Sensitivity analysis is a powerful tool used in many different disciplines to analyze the impact
of certain changes on a given system or model. It can be used for risk management, cost-
benefit analysis, statistical modeling, and other applications. By understanding the sensitivity
of a system to changes in its parameters, companies can make more informed decisions and
develop better strategies for success. In this article, we will discuss what sensitivity analysis
is, its benefits, steps for conducting a sensitivity analysis, applications, limitations, data
requirements, key considerations, and tools used.
What is SensitivityAnalysis?
Sensitivity analysis is a technique used to determine how much a system or model's output
changes when one or more of its inputs change. It is typically used to measure the effect of a
single change in one or more parameters on the output of a system. It is used to assess the
impact of input changes on a system's performance measure, such as cost or profit.
Sensitivity analysis helps companies understand the sensitivity of their systems or models to
changes in their inputs. This allows them to identify which parameters are most important to
their output and which can be safely changed without compromising their performance.
Sensitivity analysis can also be used to identify relationships between input parameters and
output measures.
Sensitivity analysis is a powerful tool for understanding the behavior of complex systems. It
can be used to identify areas of potential improvement, as well as to identify areas of risk. By
understanding the sensitivity of a system to changes in its inputs, companies can make
informed decisions about how to optimize their systems and models. Additionally, sensitivity
analysis can be used to identify areas of potential risk, allowing companies to take proactive
steps to mitigate those risks.
Finally, the sensitivity analysis should be documented and the results should be shared with
stakeholders. This will ensure that everyone is aware of the potential impacts of changes to
the system's parameters and can make informed decisions about how to adjust them in the
future.
Applications of SensitivityAnalysis
Sensitivity analysis can be used in many different applications, including financial
management, risk management, decision-making, cost-benefit analysis, statistical modeling,
and machine learning. In financial management, sensitivity analysis can be used to identify
potential risks and opportunities in investments. In risk management, sensitivity analysis can
help identify potential risks associated with certain processes or activities. In decision-making,
sensitivity analysis can help managers decide which parameters are most important and which
can be safely adjusted without compromising their decision. In cost-benefit analysis, sensitivity
analysis helps identify opportunities for cost savings and increased efficiency. In statistical
modeling and machine learning, sensitivity analysis can help identify significant correlations
between input parameters and output measures.
Sensitivity analysis can also be used to identify areas of potential improvement in existing
processes or systems. By analyzing the sensitivity of different parameters, organizations can
identify areas where changes can be made to improve efficiency or reduce costs.
Additionally, sensitivity analysis can be used to identify potential areas of risk in a system or
process, allowing organizations to take proactive steps to mitigate those risks.
2 https://www.youtube.com/watch?v=aTZnuhTCFtI
The analysis of variance for https://www.youtube.com/watch?v=fYStutigCkE
multiple regression, Examples
for multiple regression
3 https://www.youtube.com/watch?v=vv0fWd09-js
Overfitting, Detecting https://www.youtube.com/watch?v=PF2wLKv2lsI
overfit models: Cross
validation
4 Cross validation: The ideal
https://www.youtube.com/watch?v=PK37PqkIOg4
procedure, Parameter
&list=PLfFghEzKVmjunyr8OPegxrX7y83IDuZN
estimation, Logistic regression
V
5
Decision trees: Background, https://www.youtube.com/watch?v=MiJ9LjJBGaY&list=
Decision trees, Decision trees PLdKd-j64gDcC5TCZEqODMZtAotCfm5Zkh
for credit card promotion
https://www.youtube.com/watch?v=w1bFfpW_-LA
6
An algorithm for building https://www.youtube.com/watch?v=y6VwIcZAUkI
decision trees, Attribute
selection measure: https://www.youtube.com/watch?v=coOTEc-
Information gain, Entropy 0OGw
7
Decision Tree: https://www.youtube.com/watch?v=SVwFJZeWdt
Weekend example, g
Occam’s Razor,
Converting a tree to https://www.youtube.com/watch?v=VOIIvr8tWf4
rules
8
Unsupervised learning, Semi https://www.youtube.com/watch?v=KzJORp8bgqs
Supervised learning,
Clustering, K – means
clustering, Automated
discovery
Assignments
Unit - V
71
Assignment Questions
Assignment Questions – Very Easy
Q. ASSIGNMENT QUESTIONS Marks Knowledg CO
e level
No.
1 Differentiate between supervised and unsupervised 5 K2 CO5
learning.
Explain the concept of regression in supervised 5 K2 CO5
2
learning.
72
Assignment Questions
Course Outcomes:
CO5: Improve problem solving skills using the acquired knowledge in the areas of,
reasoning, natural language understanding, computer vision, automatic programming
and machine learning.
*Allotment of Marks
15 - 5 20
73
Part A – Questions
& Answers
Unit – V
74
Part A - Questions & Answers
1. What is Supervised Learning?
Definition: Supervised learning is a type of machine learning where the algorithm
learns from labeled data, making predictions or decisions based on input-output
pairs.
Key Points:
It requires a dataset with labeled examples.
It includes regression and classification tasks.
2. Define Regression in Machine Learning.
Definition: Regression is a supervised learning technique used to predict continuous
values based on input features.
Key Points:
Linear regression is a common type of regression where the relationship between
the independent and dependent variables is approximated using a linear function.
3. Explain Multiple Linear Regression.
Definition: Multiple linear regression is a regression model that examines the linear
relationship between multiple independent variables and a single dependent
variable.
Key Points:
It extends linear regression to accommodate multiple predictors.
Each independent variable contributes to the prediction of the dependent variable
with its own coefficient.
4. What is Overfitting in Machine Learning?
Definition: Overfitting occurs when a model learns the training data too well,
capturing noise and irrelevant patterns that do not generalize to new data.
Key Points:
Overfit models perform well on training data but poorly on unseen data.
Regularization techniques can help mitigate overfitting.
5. How is Overfitting Detected?
Cross-validation is a common technique used to detect overfitting.
It involves splitting the dataset into training and validation sets multiple times and
evaluating the model's performance on each split.
6. Explain Logistic Regression.
Definition: Logistic regression is a type of regression used for classification tasks,
where the output is a probability value representing the likelihood of a particular
class.
Key Points:
It uses the logistic function (sigmoid function) to map predictions to probabilities.
It's suitable for binary classification tasks.
7. What is Decision Tree in Machine Learning?
Definition: A decision tree is a predictive model that maps observations about an
item to conclusions about its target value.
Key Points:
It consists of nodes that represent decision rules based on input features.
It's used for both classification and regression tasks.
75
8. Describe K-means Clustering.
Definition: K-means clustering is an unsupervised learning algorithm used to partition
a dataset into K distinct, non-overlapping clusters.
Key Points:
It assigns data points to the nearest cluster centroid iteratively.
It aims to minimize the within-cluster variance.
76
16. Explain Occam's Razor in the Context of Decision Trees.
Occam's Razor is a principle that suggests selecting the simplest explanation or
model that adequately explains the data. In decision trees, it translates to favoring
simpler tree structures over more complex ones to avoid overfitting and promote
generalization.
17. Describe an Algorithm for Building Decision Trees.
Decision tree algorithms, such as ID3 or CART, recursively partition the dataset based
on the attributes that best separate the target variable. They select attributes that
maximize information gain or minimize impurity at each node to construct the tree.
18. Define Attribute Selection Measure: Entropy.
Entropy is a measure of impurity or disorder in a dataset. In decision trees, it's used
to quantify the uncertainty in the target variable's distribution. A lower entropy
indicates less uncertainty and better attribute choice for splitting.
19. Explain Converting a Tree to Rules.
Converting a decision tree into rules involves translating the decision logic
represented by the tree's nodes and branches into a set of if-then rules. Each path
from the root to a leaf node corresponds to a rule, making the decision-making
process transparent and interpretable.
20. What is Automated Discovery?
Automated discovery refers to the process of automatically identifying patterns,
structures, or relationships in data without explicit human intervention. Machine
learning algorithms play a significant role in automated discovery by extracting
insights from large datasets.
21. Define Multi-Armed Bandit Algorithms.
Multi-Armed Bandit algorithms are a class of algorithms used in reinforcement
learning and online decision-making settings where an agent must repeatedly choose
between multiple actions, each with uncertain rewards. The goal is to maximize the
cumulative reward while balancing exploration and exploitation.
22. Explain Influence Diagrams.
Influence diagrams are graphical models used to represent decision-making problems
under uncertainty. They depict the relationships between decisions, uncertainties,
and objectives, facilitating probabilistic reasoning and optimal decision-making
strategies.
23. What is Risk Modeling?
Risk modeling involves quantifying and assessing the potential risks associated with
specific actions, events, or decisions. It aims to provide insights into the likelihood
and impact of various risk factors, enabling informed decision-making and risk
management strategies.
24. Describe Decision Tree: Weekend Example.
Example: A decision tree could be used to predict whether a person goes for outdoor
activities on the weekend based on factors like weather conditions, availability of
transportation, and personal preferences. Each decision node represents a factor,
and each leaf node represents a decision (e.g., stay indoors or go outdoors).
77
25. Explain Parameter Estimation.
Parameter estimation involves determining the values of parameters in a model to
best fit the observed data. In machine learning, it often involves optimizing a loss
function to find the parameters that minimize prediction error on the training data.
78
Part B – Questions
Unit – V
79
Part B Questions
Q. No. Questions K Level CO
Mapping
Define supervised learning and provide an K4 CO5
1.
example of its application in real-world
scenarios.
Explain the concept of linear regression.
2. K2 CO5
Provide an example illustrating its use.
https://onlinecourses.nptel.ac.in/noc21_cs42/preview
An Introduction to Artificial Intelligence
By Prof. Mausam | IIT Delhi
https://www.coursera.org/learn/computational-thinking-problem-
solving
https://www.coursera.org/learn/artificial-intelligence-education-
for-teachers
https://www.coursera.org/specializations/ai-healthcare
https://www.coursera.org/learn/predictive-modeling-machine-
learning
https://www.drdobbs.com/parallel/the-practical-application-of-
prolog/184405220
71
REAL TIME APPLICATION- UNIT V
Neural Networks find extensive applications in areas where traditional computers
don’t fare too well. Like, for problem statements where instead of programmed
outputs, you’d like the system to learn, adapt, and change the results in sync with
the data you’re throwing at it. Neural networks also find
rigorous applications whenever we talk about dealing with noisy or incomplete data.
And honestly, most of the data present out there is indeed noisy.
With their brain-like ability to learn and adapt, Neural Networks form the entire
basis and have applications in Artificial Intelligence, and consequently, Machine
Learning algorithms. Before we get to how Neural Networks power Artificial
Intelligence, let’s first talk a bit about what exactly is Artificial Intelligence.
For the longest time possible, the word “intelligence” was just associated with the
human brain. But then, something happened! Scientists found a way of training
computers by following the methodology our brain uses. Thus came Artificial
Intelligence, which can essentially be defined as intelligence originating from
machines. To put it even more simply, Machine Learning is simply providing
machines with the ability to “think”, “learn”, and “adapt”.
With so much said and done, it’s imperative to understand what exactly are the use
cases of AI, and how Neural Networks help the cause. Let’s dive into
the applications of Neural Networks across various domains – from Social
Media and Online Shopping, to Personal Finance, and finally, to the smart assistant
on your phone.
You should remember that this list is in no way exhaustive, as the applications
of neural networks are widespread. Basically, anything that makes the machines
learn is deploying one or the other type of neural network.
72
Social Media
The ever-increasing data deluge surrounding social media gives the creators of these
platforms the unique opportunity to dabble with the unlimited data they have. No
wonder you get to see a new feature every fortnight. It’s only fair to say that all of this
would’ve been like a distant dream without Neural Networks to save the day.
Neural Networks and their learning algorithms find extensive applications in the world of
social media. Let’s see how:
Facebook
As soon as you upload any photo to Facebook, the service automatically highlights faces
and prompts friends to tag. How does it instantly identify which of your friends is in the
photo?
The answer is simple – Artificial Intelligence. In a video highlighting Facebook’s Artificial
Intelligence research, they discuss the applications of Neural Networks to power their
facial recognition software. Facebook is investing heavily in this area, not only within the
organization, but also through the acquisitions of facial-recognition startups
like Face.com (acquired in 2012 for a rumored $60M), Masquerade (acquired in 2016 for
an undisclosed sum), and Faciometrics (acquired in 2016 for an undisclosed sum).
In June 2016, Facebook announced a new Artificial Intelligence initiative that uses
various deep neural networks such as DeepText – an artificial intelligence engine
that can understand the textual content of thousands of posts per second, with
near-human accuracy.
Instagram
Instagram, acquired by Facebook back in 2012, uses deep learning by making use
of a connection of recurrent neural networks to identify the contextual meaning of
an emoji – which has been steadily replacing slangs (for instance, a laughing
emoji could replace “rofl”).
By algorithmically identifying the sentiments behind emojis, Instagram creates
and auto-suggests emojis and emoji related hashtags. This may seem like a
minor application of AI, but being able to interpret and analyze this emoji-to-text
translation at a larger scale sets the basis for further analysis on how people use
Instagram
Online Shopping
Do you find yourself in situations where you’re set to buy something, but you end
up buying a lot more than planned, thanks to some super-awesome
recommendations?
Yeah, blame neural networks for that. By making use of neural network and its
learnings, the e-commerce giants are creating Artificial Intelligence systems that
know you better than yourself. Let’s see how:
Search
Your Amazon searches (“earphones”, “pizza stone”, “laptop charger”, etc) return a
list of the most relevant products related to your search, without wasting much
time. In a description of its product search technology, Amazon states that
its algorithms learn automatically to combine multiple relevant features. It uses
past patterns and adapts to what is important for the customer in question.
And what makes the algorithms “learn”? You guessed it right – Neural Networks!
Recommendations
Amazon shows you recommendations using its “customers who viewed this item
also viewed”, “customers who bought this item also bought”, and also via curated
recommendations on your homepage, on the bottom of the item pages, and
through emails. Amazon makes use of Artificial Neural Networks to train its
algorithms to learn the pattern and behaviour of its users. This, in turn, helps
Amazon provide even better and customized recommendations.
CONTENT BEYOND SYLLABUS – UNIT V
1. Fuzzy Logic .
2. Uncertainty
3. Explain how MYCIN works?
75
ASSESSMENT SCHEDULE
Name of the
S.NO Start Date End Date Portion
Assessment
86
PRESCRIBED TEXT BOOKS AND REFERENCE BOOKS
REFERENCES:
77
Mini Projects
Problem statement : A dataset from kaggle with all the taxi ride
details which include drop_location,drop_time,pickup_time are
given our task is to use any ML prediction to predict the time taken
with given input. Dataset :
https://www.kaggle.com/code/angadchau/taxi-time-prediction
Thank you
Disclaimer:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document through
email in error, please notify the system manager. This document contains proprietary
information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or
copy through e-mail. Please notify the sender immediately by e-mail if you have
received this document by mistake and delete this document from your system. If
you are not the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this information is
strictly prohibited.
79