0% found this document useful (0 votes)
27 views13 pages

Unit

Uploaded by

korapatiusharani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views13 pages

Unit

Uploaded by

korapatiusharani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT – I

FUNDAMENTALS OF DEEP LEARNING


ARTIFICIAL INTELLIGENCE:
Simulation of human brain, character, behaviours and everything was captured by the
Machines; here the Machines are Computer systems.
History of Machine learning:
Machine learning was first conceived from the mathematical modeling of neural networks. A
paper by logician Walter Pitts and neuroscientist Warren McCulloch, published in 1943,
attempted to mathematically map out thought processes and decision making in human
cognition.
1950 — Alan Turing creates the “Turing Test” to determine if a computer has real
intelligence. To pass the test, a computer must be able to fool a human into believing it is
also human
1952 — Arthur Samuel wrote the first computer learning program. The program was the
game of checkers, and the IBM computer improved at the game the more it played,
studying which moves made up winning strategies and incorporating those moves into
its program

1957 — Frank Rosenblatt designed the first neural network for computers (the perceptron),
which simulate the thought processes of the human brain.

1967 — The “nearest neighbor” algorithm was written, allowing computers to begin using
very basic pattern recognition. This could be used to map a route for traveling salesmen,
starting at a random city but ensuring they visit all cities during a short tour.
1997 — IBM’s Deep Blue beats the world champion at chess.

2014 – Facebook develops DeepFace, a software algorithm that is able to recognize or


verify individuals on photos to the same level as humans can.
2015 – Amazon launches its own machine learning platform.

1
Fig. Life cycle of Machine Learning
In 1959, the term "Machine Learning" was first coined by Arthur Samuel.
The term “machine learning” was coined by Arthur Samuel, a computer scientist at IBM and
a pioneer in AI and computer gaming.

2
Samuel designed a computer program for playing checkers. The more the program played,
the more it learned from experience, using algorithms to make predictions.

PROBABILISTIC MODELING
Machine learning algorithms today rely heavily on probabilistic models, which take into
consideration the uncertainty inherent in real-world data. These models make predictions
based on probability distributions, rather than absolute values, allowing for a more nuanced
and accurate understanding of complex systems. One common approach is Bayesian
inference, where prior knowledge is combined with observed data to make predictions.
Another approach is maximum likelihood estimation, which seeks to find the model that
best fits observational data.
What are Probabilistic Models?
Probabilistic models are an essential component of machine learning, which aims to learn
patterns from data and make predictions on new, unseen data. They are statistical models
that capture the inherent uncertainty in data and incorporate it into their predictions.
Probabilistic models are used in various applications such as image and speech
recognition, natural language processing, and recommendation systems. In recent years,
significant progress has been made in developing probabilistic models that can handle large
datasets efficiently.
Categories Of Probabilistic Models
These models can be classified into the following categories:
 Generative models
 Discriminative models.
 Graphical models
Generative models:
Generative models aim to model the joint distribution of the input and output variables.
These models generate new data based on the probability distribution of the original
dataset. Generative models are powerful because they can generate new data that resembles
the training data. They can be used for tasks such as image and speech synthesis, language
translation, and text generation.
Discriminative models
The discriminative model aims to model the conditional distribution of the output variable
given the input variable. They learn a decision boundary that separates the different classes
of the output variable. Discriminative models are useful when the focus is on making
accurate predictions rather than generating new data. They can be used for tasks such
as image recognition, speech recognition, and sentiment analysis.
Graphical models
These models use graphical representations to show the conditional dependence between
variables. They are commonly used for tasks such as image recognition, natural language
processing, and causal inference.
Disadvantages Of Probabilistic Models
There are also some disadvantages to using probabilistic models.
 One of the disadvantages is the potential for overfitting, where the model is too specific
to the training data and doesn’t perform well on new data.
 Not all data fits well into a probabilistic framework, which can limit the usefulness of
these models in certain applications.
 Another challenge is that probabilistic models can be computationally intensive and
3
require significant resources to develop and implement.
EARLY NEURAL NETWORKS:

The neural nets described by McCullough and Pitts in 1944 had thresholds and weights, but they
weren’t arranged into layers, and the researchers didn’t specify any training mechanism. What
McCullough and Pitts showed was that a neural net could, in principle, compute any function that a
digital computer could. The result was more neuroscience than computer science: The point was to
suggest that the human brain could be thought of as a computing device.

The first trainable neural network, the Perceptron, was demonstrated by the Cornell University
psychologist Frank Rosenblatt in 1957. The Perceptron’s design was much like that of the modern
neural net, except that it had only one layer with adjustable weights and thresholds, sandwiched
between input and output layers.

Perceptrons were an active area of research in both psychology and the fledgling discipline of
computer science until 1959, when Minsky and Papert published a book titled “Perceptrons,” which
demonstrated that executing certain fairly common computations on Perceptrons would be
impractically time consuming.

By the 1980s, however, researchers had developed algorithms for modifying neural nets’ weights
and thresholds that were efficient enough for networks with more than one layer, removing many of
the limitations identified by Minsky and Papert. The field enjoyed a renaissance

KERNAL METHODS:

Kernel method is the mathematical technique that is used in machine learning for analyzing data. This method
uses Kernel function - that maps data from one space to another space.

It is generally used in Support Vector Machines (SVMs) where the algorithms classify data by finding the
hyperplane that separates the data points of different classes.

The most important benefit of Kernel Method is that it can work with non-linearly separable data, and it works
with multiple Kernel functions - depending on the type of data.

Because the linear classifier can solve a very limited class of problems, the kernel trick is employed to empower
the linear classifier, enabling the SVM to solve a larger class of problems.

4
What are the types of Kernel methods in SVM models?
Support vector machines use various kinds of kernel methods. Here are a few of them:
1. Linear Kernel
It is used when the data is linearly separable.
K(x1, x2) = x1 . x2

2. Polynomial Kernel
It is used when the data is not linearly separable.
K(x1, x2) = (x1 . x2 + 1)d
3. Gaussian Kernel
The Gaussian kernel is an example of a radial basis function kernel. It can be represented with this equation:
k(xi, xj) = exp(-𝛾||xi - xj||2)
4. Exponential Kernel
Similar to RBF kernel, but it decays much more quickly.
k(x, y) =exp(-||x -y||22)

5. Laplacian Kernel
Similar to RBF Kernel, it has a sharper peak and faster decay.
k(x, y) = exp(- ||x - y||)

What are the applications of kernel method?


1. Classification
2. Regression
3. Clustering
4. Anomaly detection
5. Feature extraction
6. Natural language processing
7. Computer vision
8. Time series analysis
9. Recommender systems
10. Bio-informatics
11. Signal processing
12. Robotics

DECISION TREES:

A decision tree is a predictive model that uses a flowchart-like structure to make decisions based on
input data. It divides data into branches and assigns outcomes to leaf nodes. Decision trees are used for
classification and regression tasks, providing easy-to-understand models.

A decision tree is a hierarchical model used in decision support that depicts decisions and their potential
outcomes, incorporating chance events, resource expenses, and utility. This algorithmic model utilizes
conditional control statements and is non-parametric, supervised learning, useful for both classification
and regression tasks. The tree structure is comprised of a root node, branches, internal nodes, and leaf
nodes, forming a hierarchical, tree-like structure.

It is a tool that has applications spanning several different areas. Decision trees can be used for
classification as well as regression problems. The name itself suggests that it uses a flowchart like a tree
structure to show the predictions that result from a series of feature-based splits. It starts with a root node
and ends with a decision made by leaves

5
Decision Tree Terminologies

Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.

RANDOM FOREST ALGORITHM:

Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can
be used for both Classification and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the
performance of the model

As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of
relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes
of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.
6
Steps for Random Forest algorithm work:
Random Forest works in two-phase first is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is
given to the Random Forest classifier. The dataset is divided into subsets and given to each
decision tree. During the training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results, the Random Forest
classifier predicts the final decision. Consider the below image:

7
Applications of Random Forest
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.

GBM in Machine Learning:

Gradient Boosting Machine (GBM) is one of the most popular forward learning ensemble
methods in machine learning. It is a powerful technique for building predictive models for
regression and classification tasks.

GBM helps us to get a predictive model in form of an ensemble of weak prediction models
such as decision trees. Whenever a decision tree performs as a weak learner then the resulting
algorithm is called gradient-boosted trees.

It enables us to combine the predictions from various learner models and build a final
predictive model having the correct prediction.

So, the answer to these questions is that a different subset of features is taken by the nodes of
each decision tree to select the best split. It means, that each tree behaves differently, and
hence captures different signals from the same data

Gradient boosting machines consist 3 elements as follows:

o Loss function
o Weak learners
o Additive model

Let's understand these three elements in detail

8
1. Loss function:

Although, there is a big family of Loss functions in machine learning that can be used depending on the type of
tasks being solved. The use of the loss function is estimated by the demand of specific characteristics of the
conditional distribution such as robustness. While using a loss function in our task, we must specify the loss
function and the function to calculate the corresponding negative gradient. Once, we get these two functions, they
can be implemented into gradient boosting machines easily. However, there are several loss functions have been
already proposed for GBM algorithms
2. Weak Learner:

Weak learners are the base learner models that learn from past errors and help in building a strong predictive
model design for boosting algorithms in machine learning. Generally, decision trees work as a weak learners in
boosting algorithms
3.Additive Model:

The additive model is defined as adding trees to the model. Although we should not add multiple trees at a time,
only a single tree must be added so that existing trees in the model are not changed. Further, we can also prefer the
gradient descent method by adding trees to reduce the loss

MACHINE LEARNING:

Machine learning is a subset of AI, which enables the machine to automatically learn
from data, improve performance from past experiences, and make predictions. Machine learning
contains a set of algorithms that work on a huge amount of data. Data is fed to these algorithms to train
them, and on the basis of training, they build the model & perform a specific task

These ML algorithms help to solve different business problems like Regression,


Classification, Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1. Supervised Machine Learning

As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and based
on the training, the machine predicts the output. Here, the labelled data specifies that some of
the inputs are already mapped to the output. More preciously, we can say; first, we train the
machine with the input and corresponding output, and then we ask the machine to predict the
output using the test dataset.
Categories of Supervised Machine Learning

Supervised machine learning can be classified into two types of problems, which are given
below:
9
o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous output
variables, such as market trends, weather prediction, etc.

2. Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabeled dataset, and the machine predicts the output without
any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.

Categories of Unsupervised Machine Learning

Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the data. It is
a way to group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other groups. An
example of the clustering algorithm is grouping the customers by their purchasing behaviour.

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting


relations among variables within a large dataset. The main aim of this learning algorithm is to
find the dependency of one data item on another data item and map those variables
accordingly so that it can generate maximum profit. This algorithm is mainly applied
in Market Basket analysis, Web usage mining, continuous production, etc.
10
3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between


Supervised and Unsupervised machine learning. It represents the intermediate ground
between Supervised (With Labelled training data) and Unsupervised learning (with no
labelled training data) algorithms and uses the combination of labelled and unlabeled datasets
during the training period.

4. Reinforcement Learning

Reinforcement learning works on a feedback-based process, in which an AI agent (A


software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance. Agent gets rewarded
for each good action and get punished for each bad action; hence the goal of reinforcement
learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.

EVALUATING MACHINE LEARNING MODELS:

Machine Learning Model Evaluation


Model evaluation is the process that uses some metrics which help us to analyze the performance of the
model. As we all know that model development is a multi-step process and a check should be kept on
how well the model generalizes future predictions. Therefore evaluating a model plays a vital role so that
we can judge the performance of our model. The evaluation also helps to analyze a model’s key
weaknesses. There are many metrics like Accuracy, Precision, Recall, F1 score, Area under Curve,
Confusion Matrix, and Mean Square Error. Cross Validation is one technique that is followed during the
training phase and it is a model evaluation technique as well.

Accuracy
Accuracy is defined as the ratio of the number of correct predictions to the total number of predictions.
This is the most fundamental metric used to evaluate the model.

PRECISION:
Precision is the ratio of true positives to the summation of true positives and false positives. It basically
analyses the positive predictions

RECALL:

Recall is the ratio of true positives to the summation of true positives and false negatives. It basically
analyses the number of correct positive samples

11
F1 score
The F1 score is the harmonic mean of precision and recall. It is seen that during the precision-
recall trade-off if we increase the precision, recall decreases and vice versa. The goal of the F1
score is to combine precision and recal

Confusion Matrix
A confusion matrix is an N x N matrix where N is the number of target classes. It represents the
number of actual outputs and the predicted outputs

Cross Validation and Holdout


Cross Validation is a method in which we do not use the whole dataset for training. In this
technique, some part of the dataset is reserved for testing the model. There are many types of
Cross-Validation out of which K Fold Cross Validation is mostly used. In K Fold Cross
Validation the original dataset is divided into k subsets. The subsets are known as folds. This is
repeated k times where 1 fold is used for testing purposes. Rest k-1 folds are used for training the
model. So each data point acts as a test subject for the model as well as acts as the training
subject. It is seen that this technique generalizes the model well and reduces the error rate

Holdout is the simplest approach. It is used in neural networks as well as in many classifiers. In
this technique, the dataset is divided into train and test datasets.
UNDERFITTING AND OVERFITTING:
Bias and Variance in Machine Learning
 Bias: Assumptions made by a model to make a function easier to learn. It is actually the
error rate of the training data. When the error rate has a high value, we call it High Bias and
when the error rate has a low value, we call it low Bias.

 Variance: The difference between the error rate of training data and testing data is called
variance. If the difference is high then it’s called high variance and when the difference in
errors is low then it’s called low variance. Usually, we want to make a low variance for
generalized our model.

Underfitting in Machine Learning


A statistical model or a machine learning algorithm is said to have underfitting when it cannot
capture the underlying trend of the data, i.e., it only performs well on training data but
performs poorly on testing data. (It’s just like trying to fit undersized pants!) Underfitting
destroys the accuracy of our machine-learning model. Its occurrence simply means that our
model or the algorithm does not fit the data well enough. It usually happens when we have less
data to build an accurate model and also when we try to build a linear model with fewer non-
linear data. In such cases, the rules of the machine learning model are too easy and flexible to
be applied to such minimal data, and therefore the model will probably make a lot of wrong
predictions. Underfitting can be avoided by using more data and also reducing the features by
feature selection

Reasons for Underfitting


1. High bias and low variance.

12
2. The size of the training dataset used is not enough.
3. The model is too simple.
4. Training data is not cleaned and also contains noise in it.

Overfitting in Machine Learning


A statistical model is said to be overfitted when the model does not make accurate predictions
on testing data. When a model gets trained with so much data, it starts learning from the noise
and inaccurate data entries in our data set. And when testing with test data results in High
variance. Then the model does not categorize the data correctly, because of too many details
and noise. The causes of overfitting are the non-parametric and non-linear methods because
these types of machine learning algorithms have more freedom in building the model based on
the dataset and therefore they can really build unrealistic models. A solution to avoid
overfitting is using a linear algorithm if we have linear data or using the parameters like the
maximal depth if we are using decision trees

Reasons for Overfitting:

1. High variance and low bias.


2. The model is too complex.
3. The size of the training data.

13

You might also like