0% found this document useful (0 votes)

22 views59 pages

Data Science-Unit-4 - 05.10.23

Uploaded by

rishavsingh7478

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views59 pages

Data Science-Unit-4 - 05.10.23

Uploaded by

rishavsingh7478

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 59

UNIT IV

MACHINE LEARNING BASICS

Overview of machine learning concepts - Over fitting and Under fitting –

Correctness - The Bias-Variance Trade-off - Feature Extraction and Selection -
Decision trees - linear regression - Naive Bayes.
OVERVIEW OF MACHINE
LEARNING CONCEPTS
• Artificial Intelligence and Machine Learning are correlated with each other,
and yet they have some differences.
• Artificial Intelligence is an overarching concept that aims to create
intelligence that mimics human-level intelligence.
• Artificial Intelligence is a general concept that deals with creating human-
like critical thinking capability and reasoning skills for machines.
• On the other hand, Machine Learning is a subset or specific application of
Artificial intelligence that aims to create machines that can learn
autonomously from data.
• Machine Learning is specific, not general, which means it allows a
machine to make predictions or take some decisions on a specific problem
using data.
MACHINE LEARNING :
DEFINITION
• Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and
algorithms to imitate the way that humans learn, gradually
improving its accuracy.

• Machine learning (ML) is the subset of artificial intelligence (AI)

that focuses on building systems that learn or improve
performance based on the data they consume.

• A computer program is said to learn from experience E with

respect to some class of tasks T and performance measure P, if
its performance at tasks in T, as measured by P, improves with
experience E.
BROAD TYPES OF MACHINE
LEARNING
• Supervised Learning
• Unsupervised learning
• Semi-supervised Learning
• Reinforcement Learning
SUPERVISED MACHINE LEARNING

• Supervised learning is the types of machine learning in which machines

are trained using well "labelled" training data, and on basis of that data,
machines predict the output. The labelled data means some input data is
already tagged with the correct output.
• In supervised learning, the training data provided to the machines work
as the supervisor that teaches the machines to predict the output
correctly. It applies the same concept as a student learns in the
supervision of the teacher.
• Supervised learning is a process of providing input data as well as correct
output data to the machine learning model. The aim of a supervised
learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).
• In the real-world, supervised learning can be used for Risk Assessment,
Image classification, Fraud Detection, spam filtering, etc.
SUPERVISED MACHINE LEARNING
SUPERVISED MACHINE
LEARNING
SUPERVISED MACHINE
LEARNING
• Supervised Machine Learning
includes Regression and Some of the more popular algorithms in
Classification algorithms. these categories are:
• It relation between an • Linear Regression
independent and a dependent • Regression Trees
variable. • Non-Linear Regression
• It demonstrates the impact on • Bayesian Linear Regression
the dependent variable when the • Polynomial Regression
independent variable is changed
• Random Forest
in any way.
• Decision Trees
• So the independent variable is
• Logistic Regression
called the explanatory variable
and the dependent variable is • Support vector Machines
called the factor of interest.
Advantages of Supervised learning:

• With the help of supervised learning, the model can predict the output on the
basis of prior experiences.
• In supervised learning, we can have an exact idea about the classes of objects.
• Supervised learning model helps us to solve various real-world problems such
as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:

• Supervised learning models are not suitable for handling the complex tasks.
• Supervised learning cannot predict the correct output if the test data is different
from the training dataset.
• Training required lots of computation times.
• In supervised learning, we need enough knowledge about the classes of object.
UNSUPERVISED LEARNING
• Unsupervised learning is a type of machine learning in which models
are trained using unlabeled dataset and are allowed to act on that
data without any supervision

• Unsupervised learning is a type of machine learning algorithm used

to draw inferences from datasets consisting of input data without
labeled responses.

• In unsupervised learning algorithms, classification or categorization

is not included in the observations.
.
Advantages of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to

supervised learning because, in unsupervised learning, we don't have labeled
input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised Learning

• Unsupervised learning is intrinsically more difficult than supervised learning as it

does not have corresponding output.
• The result of the unsupervised learning algorithm might be less accurate as
input data is not labeled, and algorithms do not know the exact output in
advance.
SOME OF THE MORE POPULAR ALGORITHMS IN THESE
CATEGORIES ARE:

• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
WHAT IS MACHINE LEARNING USED FOR?

• Machine Learning is used in almost all modern technologies and this is

only going to increase in the future.
• In fact, there are applications of Machine Learning in various fields
ranging from smartphone technology to healthcare to social media, and
so on.
• Smartphones use personal voice assistants like Siri, Alexa, Cortana,
etc.
• Machine Learning is also used in social media. Let’s take
Facebook’s ‘People you may know’ as an example.
• Machine Learning is also very important in healthcare diagnosis as it
can be used to diagnose a variety of problems in the medical field.
Over fitting and Under fitting
OVER FITTING AND UNDER FITTING
• Overfitting and Underfitting are the two main problems that occur in
machine learning and degrade the performance of the machine learning
models.
• The main goal of each machine learning model is to generalize well.
Here generalization defines the ability of an ML model to provide a
suitable output by adapting the given set of unknown input.
• It means after providing training on the dataset, it can produce reliable
and accurate output.
OVERFITTING

• Overfitting occurs when our machine learning model tries to cover all the
data points or more than the required data points present in the given
dataset.
• Because of this, the model starts caching noise and inaccurate values
present in the dataset, and all these factors reduce the efficiency and
accuracy of the model.
• The over fitted model has low bias and high variance.
OVER FITTING AND UNDER FITTING
• Before understanding the overfitting and underfitting, let's understand some
basic term that will help to understand this topic well:
• Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
• Noise: Noise is unnecessary and irrelevant data that reduces the performance
of the model.
• Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference between
the predicted values and the actual values.
• Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.
OVER FITTING AND UNDER FITTING

As we can see from the above graph, the model tries to cover all the data points present
in the scatter plot. It may look efficient, but in reality, it is not so. Because the goal of the
regression model to find the best fit line, but here we have not got any best fit, so, it will
generate the prediction errors.
HOW TO AVOID THE OVERFITTING IN MODEL

• Both overfitting and underfitting cause the degraded performance of the

machine learning model. But the main cause is overfitting, so there are
some ways by which we can reduce the occurrence of overfitting in our
model.
• Cross-Validation
• Training with more data
• Removing features
• Early stopping the training
• Regularization
• Ensembling
UNDER FITTING

• Underfitting occurs when our machine learning model is not able to

capture the underlying trend of the data.
• To avoid the overfitting in the model, the fed of training data can be
stopped at an early stage, due to which the model may not learn enough
from the training data.
• As a result, it may fail to find the best fit of the dominant trend in the
data.
• In the case of underfitting, the model is not able to learn enough from the
training data, and hence it reduces the accuracy and produces unreliable
predictions.
UNDER FITTING

• An underfitted model has high bias and low variance.

As we can see from the above diagram, the model is unable to capture the data points
present in the plot.
How to avoid under fitting:
•By increasing the training time of the model.
•By increasing the number of features.
CORRECTNESS

• Data scientists know that when they build training sets, they need to
watch out for data leakage in order to ensure that a model is only trained
on the correct data.
• Data leakage occurs when models are trained on examples that did not
really occur in the real world.
• In time-series models, data leakage typically is caused by adding features
to your training set that occurred after a given prediction would have
occurred.
• When feature generation, predictions, and label generation occur at
different points in time, data leakage can easily be introduced into your
training sets.
CORRECTNESS

• Imagine you have an e-commerce website that makes product

recommendations. The features for this model might include:
• RFM metrics, such as the sum of products purchased by a user over the
last week or month or year, calculated every week
• Summary of the items currently in a user’s cart that are updated in real
time
The Bias-Variance
What is Bias?
• In general, a machine learning model analyses the data, find patterns in
it and make predictions.
• While training, the model learns these patterns in the dataset and
applies them to test data for prediction.
• While making predictions, a difference occurs between prediction values
made by the model and actual values/expected values, and this
difference is known as bias errors or Errors due to bias.
• It can be defined as an inability of machine learning algorithms such as
Linear Regression to capture the true relationship between the data
points.
• Each algorithm begins with some amount of bias because bias occurs
from assumptions in the model, which makes the target function simple
to learn.
A model has either:

• Low Bias: A low bias model will make fewer assumptions about the form
of the target function.

• High Bias: A model with a high bias makes more assumptions, and the
model becomes unable to capture the important features of our
dataset. A high bias model also cannot perform well on new data.

• Generally, a linear algorithm has a high bias, as it makes them learn fast.
The simpler the algorithm, the higher the bias it has likely to be
introduced. Whereas a nonlinear algorithm often has low bias.
THE BIAS-VARIANCE
• Bias is one type of error that occurs due to wrong assumptions
about data such as assuming data is linear when in reality, data
follows a complex function.
• On the other hand, variance gets introduced with high
sensitivity to variations in training data.
• This also is one type of error since we want to make our model
robust against noise.
THE BIAS-VARIANCE
• Before coming to the mathematical definitions, we need to know about
random variables and functions.
• Let’s say, f(x) is the function which our given data follows. We will build
few models which can be denoted as f\hat(x).
• Each point on this function is a random variable having the number of
values equal to the number of models.
• To correctly approximate the true function f(x), we take expected value of

• f(x) : E [f(x)]
THE BIAS-VARIANCE
• Bias : f - E[f]
• Variance : E[f^2] - E[f]] = E[(f - E[f])^2]
• Let’s see some visuals of what importance both of these terms hold.
THE BIAS-VARIANCE
Trade-off
TRADE-OFF
• In Machine Learning, the performance and complexity of the model not only
depends on certain parameters, assumptions and conditions.
• but also on the quality of data that is used to train the model and that’s
one of the steps that everyone goes through i.e. cleaning and standardizing
the data.
• If the data is not cleaned and standardized then no matter how fine tune
the model parameters and hyper-parameters are, the model will not be
able to provide the best solution.
• If the data is not cleaned and standardized then no matter how fine tune
the model parameters and hyper-parameters are, the model will not be
able to provide the best solution.
TRADE-OFF
SKEWNESS IN DATA
• In simple words, skewness is the measure of how much the probability
distribution of a random variable deviates from the normal distribution
(probability distribution without any skewness).
SKEWNESS IN DATA
• If our data is positively skewed, it means that it has a higher number of
data points having low values.
• So, when we train our model on this data, it will perform better at
predicting data points with lower values as compared to those with higher
values.

• Bias-vs-Variance Trade-Off
• It is one of the important concepts to understand for supervised machine
learning and predictive modeling use cases and the main goal is to
choose a model to train that offers lowest bias versus variance tradeoff
for that dataset or business use case.
Feature Extraction and Selection
FEATURE EXTRACTION

• Feature Extraction
• Feature Extraction is quite a complex concept concerning the
translation of raw data into the inputs that a particular Machine
Learning algorithm requires.
• Features must represent the information of the data in a format that will
best fit the needs of the algorithm that is going to be used to solve the
problem.
• Some of the most popular methods of feature extraction are :
• Bag-of-Words
• TF-IDF
FEATURE EXTRACTION
• Bag of Words: Bag-of-Words is one of the most fundamental methods to
transform tokens into a set of features

• The BoW model is used in document classification, where each word is

used as a feature for training the classifier

• For example, in a task of review based sentiment analysis, the presence

of words like ‘fabulous’, ‘excellent’ indicates a positive review, while
words like ‘annoying’, ‘poor’ point to a negative review .
FEATURE EXTRACTION
• There are 3 steps while creating a BoW model :

• 1. The first step is text-preprocessing which involves:

• converting the entire text into lower case characters.
• removing all punctuations and unnecessary symbols.

• 2. The second step is to create a vocabulary of all unique words from

the corpus. Let’s suppose, we have a hotel review text. Let’s consider 3 of
these reviews, which are as follows :
• good movie
• not a good movie
• did not like
FEATURE SELECTION

• A feature is an attribute that has an impact on a problem or is

useful for the problem, and choosing the important features
for the model is known as feature selection.
• Each machine learning process depends on feature
engineering, which mainly contains two processes;
• feature Selection is defined as, "It is a process of
automatically or manually selecting the subset of most
appropriate and relevant features to be used in model
building." Feature selection is performed by either including
the important features or excluding the irrelevant features in
the dataset without changing them.
NEED FOR FEATURE SELECTION

• We collect a huge amount of data to train our model and help it to learn better.
• Generally, the dataset consists of noisy data, irrelevant data, and some part of useful
data.
• Moreover, the huge amount of data also slows down the training process of the
model, and with noise and irrelevant data, the model may not predict and perform
well.
• Below are some benefits of using feature selection in machine
learning:
• It helps in avoiding the curse of dimensionality.
• It helps in the simplification of the model so that it can be
easily interpreted by the researchers.
• It reduces the training time.
• It reduces overfitting hence enhance the generalization.
FEATURE SELECTION TECHNIQUES

• There are mainly two types of Feature Selection

techniques, which are:
• Supervised Feature Selection technique
Supervised Feature selection techniques consider
the target variable and can be used for the
labelled dataset.
• Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore
the target variable and can be used for the un
labelled dataset.
FEATURE SELECTION TECHNIQUES
Decision trees
• Decision trees

• Decision Tree is a Supervised learning technique that can be used for

both classification and Regression problems, but mostly it is preferred for
solving Classification problems. It is a tree-structured classifier, where
internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.

• In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node. Decision nodes are used to make any decision and have
multiple branches, whereas Leaf nodes are the output of those decisions
and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the
given dataset.

• It is a graphical representation for getting all the possible solutions to a

problem/decision based on given conditions.

• It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like
structure.
• In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
• Below diagram explains the general structure of a decision tree:
• Decision Tree Terminologies

• Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.

• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.

• Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.

• Branch/Sub Tree: A tree formed by splitting the tree.

• Pruning: Pruning is the process of removing the unwanted branches from the tree.

• Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.
linear regression
• Machine Learning is a branch of Artificial intelligence that focuses on
the development of algorithms and statistical models that can learn
from and make predictions on data.

• Linear regression is also a type of machine-learning algorithm more

specifically a supervised machine-learning algorithm that learns from
the labeled datasets and maps the data points to the most optimized
linear functions.

• which can be used for prediction on new datasets.

Supervised learning has two types:

• Classification: It predicts the class of the dataset based on the

independent input variable. Class is the categorical or discrete values.
like the image of an animal is a cat or dog?
• Regression: It predicts the continuous output variables based on the
independent input variable. like the prediction of house prices based
on different parameters like house age, distance from the main road,
What Is a Regression?
• Regression is a statistical method used in finance, investing, and other disciplines that
attempts to determine the strength and character of the relationship between one
dependent variable (usually denoted by Y) and a series of other variables (known as
independent variables).

• The general form of each type of regression model is:

Simple linear regression:

Y=a+bX+u

Multiple linear regression:

Y=a + b1X1+b2X2 +b3X3 +...+byXt + u

where:
Y=The dependent variable
X=The explanatory (independent) variable(s)
a=The y-intercept
b=(beta coefficient) is the slope of the explanatory
variable(s)
u=The regression residual or error term
Applications of linear regression

• Market analysis.
• Financial analysis.
• Sports analysis.
• Environmental health.
• Medicine.
• Least squares.
• Predicting outcomes.
Naive Bayes
• The Naïve Bayes classifier is a supervised machine learning algorithm, which
is used for classification tasks, like text classification.

• It is also part of a family of generative learning algorithms, meaning that it

seeks to model the distribution of inputs of a given class or category.

• Naïve Bayes is also known as a probabilistic classifier since it is based on

Bayes’ Theorem.
Advantages

• Less complex: Compared to other classifiers, Naïve Bayes is considered a

simpler classifier since the parameters are easier to estimate. As a result, it’s
one of the first algorithms learned within data science and machine learning
courses.
•
• Scales well: Compared to logistic regression, Naïve Bayes is considered a fast
and efficient classifier that is fairly accurate when the conditional
independence assumption holds. It also has low storage requirements.

• Can handle high-dimensional data: Use cases, such document classification,

can have a high number of dimensions, which can be difficult for other
classifiers to manage.
• Disadvantages:

• Subject to Zero frequency: Zero frequency occurs when a categorical

variable does not exist within the training set.

• Unrealistic core assumption: While the conditional independence

assumption overall performs well, the assumption does not always hold,
leading to incorrect classifications
Applications:

• Spam filtering: Spam classification is one of the most popular applications of

Naïve Bayes cited in literature.

• Document classification: Document and text classification go hand in hand.

Another popular use case of Naïve Bayes is content classification. Imagine the
content categories of a News media website.

• Sentiment analysis: While this is another form of text classification, sentiment

analysis is commonly leveraged within marketing to better understand and
quantify opinions and attitudes around specific products and brands.

• Mental state predictions: Using MRI data, naïve bayes has been leveraged to
predict different cognitive states among humans. The goal of this research was
to assist in better understanding hidden cognitive states, particularly among
brain injury patients.

Monitor Materno-Fetal Especializado C20
No ratings yet
Monitor Materno-Fetal Especializado C20
4 pages
ML & DL
No ratings yet
ML & DL
19 pages
5.3 Model
No ratings yet
5.3 Model
26 pages
Mod 1
No ratings yet
Mod 1
15 pages
DL UNIT 1 (AB22) Continution
No ratings yet
DL UNIT 1 (AB22) Continution
9 pages
Complete ML Concepts
No ratings yet
Complete ML Concepts
30 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
Unit 4
No ratings yet
Unit 4
61 pages
Chapter 1-ML
No ratings yet
Chapter 1-ML
27 pages
Lecture 4 Machine Learning - BCSC
No ratings yet
Lecture 4 Machine Learning - BCSC
45 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
Lecture - 1
No ratings yet
Lecture - 1
35 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
Unit I 2
No ratings yet
Unit I 2
78 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Classification
No ratings yet
Classification
53 pages
UNIT 1 - Introduction (Types of Machine Learning)
100% (1)
UNIT 1 - Introduction (Types of Machine Learning)
21 pages
Machine Learning: Understanding The Basics of Machine Learning and Its Applications
No ratings yet
Machine Learning: Understanding The Basics of Machine Learning and Its Applications
24 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
27 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
Machine Learning Notes "2023
No ratings yet
Machine Learning Notes "2023
31 pages
Unit 2
No ratings yet
Unit 2
63 pages
Introduction To ML
No ratings yet
Introduction To ML
17 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
4 pages
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d B39decf3b0fc
No ratings yet
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d B39decf3b0fc
19 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
ML Interview Questions
No ratings yet
ML Interview Questions
60 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Unit 1
No ratings yet
Unit 1
19 pages
Emsemble Methods-Pages-Deleted
No ratings yet
Emsemble Methods-Pages-Deleted
2 pages
Unit5 ML Introduction
No ratings yet
Unit5 ML Introduction
32 pages
U&O Fitting
No ratings yet
U&O Fitting
6 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
FinQuiz - Curriculum Note, @InsightSquad Study Session 3, Reading 7
No ratings yet
FinQuiz - Curriculum Note, @InsightSquad Study Session 3, Reading 7
11 pages
Unit 1 MLF 1
No ratings yet
Unit 1 MLF 1
33 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
ML Hand Written Notes
No ratings yet
ML Hand Written Notes
19 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Supervised Learning
No ratings yet
Supervised Learning
19 pages
Machine Learning Juunit2.pdf Lands
No ratings yet
Machine Learning Juunit2.pdf Lands
7 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
Machine Learning Reg
No ratings yet
Machine Learning Reg
45 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
Ai Unit-4-1
No ratings yet
Ai Unit-4-1
9 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Unit-1 DLL
No ratings yet
Unit-1 DLL
73 pages
There Are Key Areas in The Process of Machine Learning, Like
No ratings yet
There Are Key Areas in The Process of Machine Learning, Like
45 pages
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
No ratings yet
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
25 pages
Unit 1
100% (1)
Unit 1
13 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
25 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
20 pages
Intorduction of ML
No ratings yet
Intorduction of ML
14 pages
Machine Learning BE Merged Modules
No ratings yet
Machine Learning BE Merged Modules
561 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
Unit 3 - ML
No ratings yet
Unit 3 - ML
15 pages
Module1 And2
No ratings yet
Module1 And2
122 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
Introductiontomachinelearning 230723174746 1a0e5edc
No ratings yet
Introductiontomachinelearning 230723174746 1a0e5edc
27 pages
Data Science - UNIT V
No ratings yet
Data Science - UNIT V
46 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
Data Science Unit 2-11-08 2023
No ratings yet
Data Science Unit 2-11-08 2023
78 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
Implementation of NTRIP and Management System in NIGNET Network
No ratings yet
Implementation of NTRIP and Management System in NIGNET Network
78 pages
2024 03 29 Csiac Dod Cybersecurity Policy Chart - PDF - Safe
100% (2)
2024 03 29 Csiac Dod Cybersecurity Policy Chart - PDF - Safe
1 page
Unit 5 - Design Concept (Sofrware Engineering) - NSG Academy
No ratings yet
Unit 5 - Design Concept (Sofrware Engineering) - NSG Academy
11 pages
Css Practical No - 10
No ratings yet
Css Practical No - 10
5 pages
A Comprehensive Survey For Intelligent Spam Email Detection
No ratings yet
A Comprehensive Survey For Intelligent Spam Email Detection
35 pages
Proactive Measures To Mitigate Cyber Security Challenges in IoT Based Smart Healthcare Networks
No ratings yet
Proactive Measures To Mitigate Cyber Security Challenges in IoT Based Smart Healthcare Networks
4 pages
22kV SWITCH ROOM OPTION
No ratings yet
22kV SWITCH ROOM OPTION
1 page
Case Study - Alembic Pharma
No ratings yet
Case Study - Alembic Pharma
3 pages
Quiz Unit Four
No ratings yet
Quiz Unit Four
6 pages
Tunmi Project
No ratings yet
Tunmi Project
51 pages
Electrical Drawings Explained
No ratings yet
Electrical Drawings Explained
80 pages
D-PVM-DS-01 Dell Technologies Exam Valid Dumps
No ratings yet
D-PVM-DS-01 Dell Technologies Exam Valid Dumps
4 pages
7941 17755 1 SM
No ratings yet
7941 17755 1 SM
17 pages
CV - 2023 03 11 014419
No ratings yet
CV - 2023 03 11 014419
1 page
Pract4
No ratings yet
Pract4
4 pages
Cisco DevNets01t03
No ratings yet
Cisco DevNets01t03
57 pages
PFISTER
No ratings yet
PFISTER
1,238 pages
2024 04 22 10.35.37
No ratings yet
2024 04 22 10.35.37
2 pages
Mastering SciPy - Sample Chapter
No ratings yet
Mastering SciPy - Sample Chapter
45 pages
How2electronics Com BLDC Brushless DC Motor Driver Circuit 555
No ratings yet
How2electronics Com BLDC Brushless DC Motor Driver Circuit 555
10 pages
Samsung Ah68 02293b Users Manual 280484
No ratings yet
Samsung Ah68 02293b Users Manual 280484
39 pages
JPL Practice T01
No ratings yet
JPL Practice T01
9 pages
Topic Research Worksheet: Examples of Acceptable Internet Sources
No ratings yet
Topic Research Worksheet: Examples of Acceptable Internet Sources
16 pages
Blockchain Technology Applications in Healthcare: An Overview
No ratings yet
Blockchain Technology Applications in Healthcare: An Overview
11 pages
CRED Shortlist
No ratings yet
CRED Shortlist
60 pages
Laboratory Activity 2: Code
No ratings yet
Laboratory Activity 2: Code
2 pages
Nftables - The Ip (6) Tables Successor
100% (1)
Nftables - The Ip (6) Tables Successor
24 pages
Patient Management Information - System
No ratings yet
Patient Management Information - System
12 pages
Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal - 1
No ratings yet
Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal - 1
2 pages

Data Science-Unit-4 - 05.10.23

Uploaded by

Data Science-Unit-4 - 05.10.23

Uploaded by

UNIT IV

MACHINE LEARNING BASICS

Overview of machine learning concepts - Over fitting and Under fitting –

• Machine learning (ML) is the subset of artificial intelligence (AI)

• A computer program is said to learn from experience E with

• Supervised learning is the types of machine learning in which machines

Disadvantages of supervised learning:

• Unsupervised learning is a type of machine learning algorithm used

• In unsupervised learning algorithms, classification or categorization

• Unsupervised learning is used for more complex tasks as compared to

Disadvantages of Unsupervised Learning

• Unsupervised learning is intrinsically more difficult than supervised learning as it

• Machine Learning is used in almost all modern technologies and this is

• Both overfitting and underfitting cause the degraded performance of the

• Underfitting occurs when our machine learning model is not able to

• An underfitted model has high bias and low variance.

• Imagine you have an e-commerce website that makes product

• The BoW model is used in document classification, where each word is

• For example, in a task of review based sentiment analysis, the presence

• 1. The first step is text-preprocessing which involves:

• 2. The second step is to create a vocabulary of all unique words from

• A feature is an attribute that has an impact on a problem or is

• There are mainly two types of Feature Selection

• Decision Tree is a Supervised learning technique that can be used for

• It is a graphical representation for getting all the possible solutions to a

• Branch/Sub Tree: A tree formed by splitting the tree.

• Linear regression is also a type of machine-learning algorithm more

• which can be used for prediction on new datasets.

Supervised learning has two types:

• Classification: It predicts the class of the dataset based on the

• The general form of each type of regression model is:

Simple linear regression:

Multiple linear regression:

Y=a + b1​X1​+b2​X2 +b3​X3 +...+by​Xt + u

• It is also part of a family of generative learning algorithms, meaning that it

• Naïve Bayes is also known as a probabilistic classifier since it is based on

• Less complex: Compared to other classifiers, Naïve Bayes is considered a

• Can handle high-dimensional data: Use cases, such document classification,

• Subject to Zero frequency: Zero frequency occurs when a categorical

• Unrealistic core assumption: While the conditional independence

• Spam filtering: Spam classification is one of the most popular applications of

• Document classification: Document and text classification go hand in hand.

• Sentiment analysis: While this is another form of text classification, sentiment

You might also like

Y=a + b1X1+b2X2 +b3X3 +...+byXt + u