AI&ML Unit 4
AI&ML Unit 4
Nair
Assistant Professor(SS)
19MECC1701-ARTIFICIAL DepartmentINTELLIGENCE AND MACHINE LEARNING
of Mechanical Engineering
Dr.Mahalingam College of Engineering and
Technology,
Dept of Mechanical Engineering Pollachi.
1
Course Code: 19MECC1701 Course Title: Artificial Intelligence & Machine Learning
CO4: Explain the basic concept, and application of Machine Learning. Understand
CO5: Explain the classification and clustering techniques for decision making. Understand
Introduction: Basic definitions, types of learning, hypothesis, space and inductive bias,
5
Basic Concepts in Machine Learning
• Machine Learning is continuously growing in the IT world and gaining strength in
different business sectors.
• Although Machine Learning is in the developing phase, it is popular among all
technologies. It is a field of study that makes computers capable of
automatically learning and improving from experience.
• Hence, Machine Learning focuses on the strength of computer programs with
the help of collecting data from various observations.
• The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly. As a human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer systems and here comes
the machine learning to make things easy for us.
• We can train machine learning algorithms by providing them the huge amount of data
and let them explore the data, construct the models, and predict the required output
automatically. The performance of the machine learning algorithm depends on the
amount of data, and it can be determined by the cost function. With the help of
machine learning, we can save both time and money.
• The importance of machine learning can be easily understood by its uses cases,
Currently, machine learning is used in self-driving cars, cyber fraud detection, face
recognition, and friend suggestion by Facebook, etc. Various top companies such as
Netflix and Amazon have build machine learning models that are using a vast amount of
data to analyze the user interest and recommend product accordingly.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 9
What is Machine Learning?
• Machine Learning is defined as a technology that is used to train machines to perform
various actions such as predictions, recommendations, estimations, etc., based on historical data
or past experience.
• Machine Learning enables computers to behave like human beings by training them with the
help of past experience and predicted data.
• There are three key aspects of Machine Learning, which are as follows:
• Task: A task is defined as the main problem in which we are interested. This task/problem can be
related to the predictions and recommendations and estimations, etc.
• Experience: It is defined as learning from historical or past data and used to estimate and
resolve future tasks.
• Performance: It is defined as the capacity of any machine to resolve any machine learning task
or problem and provide the best outcome for the same. However, performance is dependent on
the type of machine learning problems.
• Reinforcement Learning
• After completion of training, we input the picture of a cat and ask the machine to
identify the object and predict the output. Now, the machine is well trained, so it will
check all the features of the object, such as height, shape, colour, eyes, ears, tail, etc., and
find that it's a cat. So, it will put it in the Cat category. This is the process of how the
machine identifies the objects in Supervised Learning.
• The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y).
• Some real-world applications of supervised learning are Risk Assessment, Fraud
Detection, Spam filtering, etc.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 16
Categories of Supervised Machine Learning
• Supervised machine learning can be classified into two types of problems, which are given below:
• Classification
• Regression
• a) Classification
• Classification algorithms are used to solve the classification problems in which the output variable
is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification
algorithms predict the categories present in the dataset. Some real-world examples of classification
algorithms are Spam Detection, Email filtering, etc.
• Some popular classification algorithms are given below:
• Random Forest Algorithm
• Decision Tree Algorithm
• Logistic Regression Algorithm
• Support Vector Machine Algorithm
• Clustering
• Association
• Advantages:
• These can be used for complicated tasks compared to the
supervised ones because these algorithms work on the unlabeled dataset.
algorithms
• Unsupervised algorithms are preferable for various tasks as getting
unlabeled dataset is easier as compared to the labelled dataset.
the
• Disadvantages:
• The output of an unsupervised algorithm can be less accurate as the dataset is
not labelled, and algorithms are not trained with the exact output in prior.
• Working with Unsupervised learning is more difficult as it works with the
unlabelled dataset that does not map with the output.
• Advantages:
• It is simple and easy to understand the algorithm.
• It is highly efficient.
• It is used to solve drawbacks of Supervised and Unsupervised Learning
algorithms.
• Disadvantages:
• Iterations results may not be stable.
• We cannot apply these algorithms to network-level data.
• Accuracy is low.
• Video Games:
• RL algorithms are much popular in gaming applications. It is used to gain super-human performance. Some
popular games that use RL algorithms are AlphaGO and AlphaGO Zero.
• Resource Management:
• The "Resource Management with Deep Reinforcement Learning" paper showed that how to use RL in
computer to automatically learn and schedule resources to wait for different jobs in order to
minimize average job slowdown.
• Robotics:
• RL is widely being used in Robotics applications. Robots are used in the industrial and manufacturing area,
and these robots are made more powerful with reinforcement learning. There are different industries
that have their vision of building intelligent robots using AI and Machine learning technology.
• Text Mining
• Text-mining, one of the great applications of NLP, is now being implemented with the help of
Reinforcement Learning by Salesforce company.
• Advantages
• It helps in solving complex real-world problems which are difficult to be
solved by general techniques.
• The learning model of RL is similar to the learning of human beings;
hence most accurate results can be found.
• Helps in achieving long term results.
• Disadvantages
• RL algorithms are not preferred for simple problems.
• RL algorithms require huge data and computations.
• Too much reinforcement learning can lead to an overload of states
which can weaken the results.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 34
Hypothesis
• In most supervised machine learning algorithm, our main goal is to find out a
possible hypothesis from the hypothesis space that could possibly map out the
inputs to the proper outputs.
• Let's understand the hypothesis (h) and hypothesis space (H) with a two-
dimensional coordinate plane showing the distribution of data as follows.
• Now, assume we have some test data by which ML algorithms predict the
outputs for input as follows:
• If we divide this coordinate plane in such as way that it can help you to
predict output or result as follows:
• Now, evaluate model performance using the validation set. If the model
performs well with the validation set, perform the further step,
else check for the issues.
• Leave-P-out cross-validation
• K-fold cross-validation
• It can also be used for the meta-analysis, as it is already being used by the
data scientists in the field of medical statistics.
Here,
The values for x and y variables are training datasets for Linear Regression model
representation
• Linear regression can be further divided into two types of the algorithm:
• Simple Linear Regression:
• If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called
Simple Linear Regression.
• Multiple Linear regression:
• If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm
is called Multiple Linear Regression.
• Simple Linear Regression is a type of Regression algorithms that models the relationship between
a dependent variable and a single independent variable. The relationship shown by a Simple
Linear Regression model is linear or a sloped straight line, hence it is called Simple Linear Regression.
• The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on continuous or
categorical values.
• Model the relationship between the two variables. Such as the relationship between Income and
expenditure, experience and Salary, etc.
• Overfitting occurs when a machine learning model is trained too well on the
training data, including capturing noise, outliers, and irrelevant patterns that
are specific to the training data.
• While this might result in a low error on the training set, the model performs
poorly on new, unseen data (test data), as it has effectively "memorized" the
data rather than learned the general patterns.
• Low training error: The model fits the training data almost perfectly.
• High test/validation error: When tested on unseen data, the model performs
poorly.
• Complexity: The model may have too many parameters or be too flexible
(e.g., a very deep neural network or a high-degree polynomial regression).
• Insufficient training data: With a small amount of data, the model may try to
learn every specific detail, including noise.
• Too many features: Including too many input variables or features may lead
the model to memorize specific relationships.
• Model complexity: A model that is too complex (e.g., too many layers in a
neural network or high-degree polynomial) can capture unnecessary patterns.
Underfitting occurs when a model is too simple and unable to capture the
underlying patterns in the training data.
This leads to poor performance on both the training set and the test set because
the model fails to understand the complexity of the data and learn the
relationships between input and output variables.
Underfitting
• Example: We can understand the underfitting using below output of the
linear regression model:
increasing
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 78
Characteristics of Underfitting:
High training error: The model does not even fit the training data well.
High test/validation error: Since the model fails to capture the patterns, its
performance on new data is also poor.
Simplicity: The model might be too simple to learn the actual relationships in the
data (e.g., using linear regression on data with nonlinear relationships).
Causes of Underfitting:
Oversimplified model: Using a model that is too basic, like a linear regression model
for data that requires a more complex model (e.g., polynomial regression or a deep
neural network).
Lack of features: If the input data doesn’t contain enough information, the model
may not be able to learn.
Too much regularization: Applying too much regularization may make the model too
rigid, preventing it from learning important relationships.
How to Avoid Underfitting:
Increase model complexity: Use a more complex model that can capture more
Feature engineering: Add more relevant features to the input dataset to help the
Scenario 1: Underfitting
If we apply a linear regression model (i.e., fitting a straight line to the data), it may not capture
the complex relationship between house size and price. In reality, house prices may increase non-
linearly with size (e.g., houses above a certain size may have premium pricing), but a linear
model oversimplifies the problem and cannot capture this.
• Result: The model performs poorly on both the training and test data. It is underfitting because
it is too simple to capture the pattern in the data.
Example : Predicting Housing Prices
Scenario 2: Overfitting
Now, let’s apply a polynomial regression model with a very high degree (e.g.,
a 10th-degree polynomial). This model may fit the training data very well,
creating a curve that perfectly passes through every single training point,
even capturing the noise in the data (such as anomalies or errors in pricing).
•Result: The model performs extremely well on the training data but very
poorly on the test data. This is because it has overfitted the data and captured
noise or random fluctuations that do not generalize to unseen houses.
Example : Predicting Housing Prices
• lattice: The lattice package supports the creation of the graphs displaying the
variable or relation between multiple variables with conditions.
• DataExplorer: This R package focus to automate the data visualization and data
handling so that the user can pay attention to data insights of the project.
• dplyr: This R package is used to summarize the tabular data of machine learning with rows and
columns. It applies the “split-apply-combine” approach.
• Esquisse: This R package is used to explore the data quickly to get the information it holds. It also
allows to plot bar graph, histograms, curves, and scatter plots.
• caret: This R package attempts to streamline the process for creating predictive models.
• janitor: This R package has functions for examining and cleaning dirty data. It is basically built for
the purpose of user-friendliness for beginners and intermediate users.
• rpart: This R package helps to create the classification and regression models using two-stage
procedures. The resulting models are represented as binary trees.
• Root Node: Root node is from where the decision tree starts. It represents the
entire dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the
tree.
• Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
• In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree.
• This algorithm compares the values of root attribute with the
record (real dataset) attribute and, based on the comparison,
follows the branch and jumps to the next node.
• For the next node, the algorithm again compares the attribute value
with the other sub-nodes and move further.
• It continues the process until it reaches the leaf node of the tree. The
complete process can be better understood using the below algorithm:
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 100
Instance-based learning
• Instance-based learning: It generates classification predictions using
specific
only instances.
• Instance-based learning algorithms do not maintain a set of abstractions derived
from specific instances.
• This approach extends the nearest neighbour algorithm, which has large storage
requirements.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 101
Performance dimensions used for instance-based learning algorithm
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 102
Functions of instance-based learning
• Functions are as follows:
• Similarity: Similarity is a machine learning method that uses a nearest
neighbour approach to identify the similarity of two or more objects
to each other based on algorithmic distance functions.
• Classification: Process of categorizing a given set of data into classes, It
can be performed on both structured or unstructured data. The
process starts with predicting the class of given data points. The
classes are often referred to as target, label or categories.
• Concept Description: Much of human learning involves acquiring
general concepts from past experiences. This description can then
be used to predict the class labels of unlabelled cases.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 103
Advantages & Disadvantages of Instance-based Learning
• It has the ability to adapt to previously unseen data, which means that
one can store a new instance or drop the old instance.
• Large amount of memory required to store the data, and each query
involves starting the identification of a local model from scratch.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 104
Bayes theorem
• Bayes theorem is given by an English statistician, philosopher, and Presbyterian minister
named Mr. Thomas Bayes in 17th century.
• Bayes provides their thoughts in decision theory which is extensively used in important
mathematics concepts as Probability.
• Bayes theorem is also widely used in Machine Learning where we need to predict classes
precisely and accurately. An important concept of Bayes theorem named Bayesian
method is used to calculate conditional probability in Machine Learning application that
includes classification tasks.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 105
Bayes theorem
• Bayes theorem is also known with some other name such as Bayes rule
or Bayes Law.
• Bayes theorem helps to determine the probability of an event with
random knowledge.
• It is used to calculate the probability of occurring one event while other
one already occurred. It is a best method to relate the
condition probability and marginal probability.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 106
What is Bayes Theorem?
• Bayes theorem is one of the most popular machine learning concepts
that helps to calculate the probability of occurring one event
with uncertain knowledge while other one has already occurred.
• Bayes' theorem can be derived using product rule and conditional
probability of event X with known event Y:
• According to the product rule we can express as the probability of event
X with known event Y as follows;
• P(X ? Y)= P(X|Y) P(Y) {equation 1}
• Further, the probability of event Y with known event X:
• P(X ? Y)= P(Y|X) P(X) {equation 2}
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 107
Bayes theorem
• Mathematically, Bayes theorem can be expressed by combining both
equations on right hand side. We will get:
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 109
Collaborative filtering
• Collaborative filtering is used by most recommendation systems to find similar patterns
or information of the users, this technique can filter out items that users like on the basis
of the ratings or reactions by similar users.
• An example of collaborative filtering can be to predict the rating of a particular user based
on user ratings for other movies and others’ ratings for all movies.
• This concept is widely used in recommending movies, news, applications, and so many
other items.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 110
Collaborative Filtering
• Let’s take one example understand more about what is
and Collaborative Filtering,
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 111
Example
• Let’s assume I have user U1, who likes movies m1,m2,m4. user U2 who likes
movies m1,m3,m4, and user U3 who likes movie m1.
• So our job is to recommend which are the new movie to watch for the user
U3 next.
• So here we can see users U1, U2, U3 watch/likes movies m1, so three have
the same taste. now in user U1, U2 has like/watch movies m4, so user
U3 could like movie m3 so I recommend movie m4, this is the flow of
logic.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 112
Types of Filtering
Example: If User A and User B both rated several movies similarly, and User A liked a movie
that User B hasn’t seen, that movie will be recommended to User B.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 113
Steps in Collaborative Filtering
Data Collection: Gather user-item interaction data, which is often in the form of a sparse matrix where
rows represent users and columns represent items (e.g., movies or products). The entries in the matrix
Similarity Calculation: Compute the similarity between users or items. This can be done using various
Recommendation Generation: Once the similarities between users or items are calculated,
recommendations are generated by identifying items that are highly rated by similar users (user-based)
or items that are similar to the ones the user has previously liked (item-based).
Example of Collaborative Filtering:
Consider a movie recommendation system like Netflix. Suppose we have a user, John, who has
watched and rated several action movies highly but hasn’t rated any comedy movies.
Collaborative filtering can recommend movies to John in two ways:
User-based: The system identifies other users who have watched and rated similar action movies. It
then looks for movies these similar users have rated highly but that John hasn’t watched yet, and
recommends those to John.
Item-based: The system looks at the action movies John has rated highly and finds other movies that
are similar in genre, actors, or themes, which are also highly rated by other users. These similar movies
are recommended to John.
Advantages of Collaborative Filtering:
No need for domain knowledge: Collaborative filtering relies purely on user interaction data, so it
doesn’t require knowledge about the specific features of items.
Personalized recommendations: By leveraging user preferences, collaborative filtering can provide highly
personalized suggestions tailored to individual tastes.
Consider an e-commerce platform like Amazon that wants to recommend products to its users
based on their previous purchases and browsing history. Here’s how collaborative filtering could be
applied:
• User-Based Collaborative Filtering: Amazon identifies users who have purchased similar items
(e.g., laptops) as User A. It finds that these users have also bought certain laptop accessories
that User A hasn’t bought yet, such as laptop stands or external hard drives. These accessories
are then recommended to User A.
• Item-Based Collaborative Filtering: If User A has bought a laptop, collaborative filtering looks
at other users who bought the same or similar laptops. It identifies products that these users
often purchase together with the laptop (e.g., a particular type of mouse or laptop bag) and
recommends these items to User A.
Feature reduction
• Feature reduction, also known as dimensionality reduction, is
process of reducing the number of features in a resource heavy
the
computation without losing important information.
• Reducing the number of features means the number of variables is
reduced making the computer’s work easier and faster.
• Feature reduction can be divided into two processes: feature selection
and feature extraction.
• There are many techniques by which feature reduction is accomplished.
Some of the most popular are generalized discriminant analysis,
autoencoders, non-negative matrix factorization, and principal
component analysis.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 118
Feature reduction
• In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done.
• These factors are basically variables called features. The higher the
number of features, the harder it gets to visualize the training set
and then work on it.
• Sometimes, most of these features are correlated, and hence
redundant. This is where dimensionality reduction algorithms come
into play.
• Dimensionality reduction is the process of reducing the number
of random variables under consideration, by obtaining a set of
principal variables. It can be divided into feature selection and
feature extraction.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 119
Why is this Useful?
• The purpose of using feature reduction is to reduce the number of features (or
variables) that the computer must process to perform its function. Feature reduction
leads to the need for fewer resources to complete computations or tasks. Less
computation time and less storage capacity needed means the computer can do more
work. During machine learning, feature reduction removes multicollinearity resulting in
improvement of the machine learning model in use.
• Another benefit of feature reduction is that it makes data easier to visualize for
humans, particularly when the data is reduced to two or three dimensions which can
be easily displayed graphically. An interesting problem that feature reduction can help
with is called the curse of dimensionality. This refers to a group of phenomena in which
a problem will have so many dimensions that the data becomes sparse. Feature
reduction is used to decrease the number of dimensions, making the data less sparse
and more statistically significant for machine learning applications.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 120
Example
• An intuitive example of dimensionality reduction can be discussed through a simple e-mail
classification problem, where we need to classify whether the e-mail is spam or not.
• This can involve a large number of features, such as whether or not the e-mail has a generic
title, the content of the e-mail, whether the e-mail uses a template, etc.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 121
Dimensionality reduction/Feature reduction
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 122
Dimensionality reduction/Feature reduction
• The various methods used for dimensionality reduction include:
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Generalized Discriminant Analysis (GDA)
Advantages of Dimensionality Reduction
• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any.
• Disadvantages of Dimensionality Reduction
• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to define datasets.
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 123
THANK
YOU
Dept of Mechanical Engineering 19MECC1701- Artificial Intelligence & Machine Learning 113