Big-Data Unit-3

Brief Introduction to Machine Learning, How Machine learning Differs from Traditional Programming, Machine Learning life cycle, Types of Machine Learning, Supervised Machine Learning and their algorithms.

Uploaded by

Tulshiram Kamble

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

224 views54 pages

Big-Data Unit-3

Uploaded by

Tulshiram Kamble

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

Introduction to

Machine Learning
Unit-3
Introduction
• It is subfield of artificial intelligence.
• The goal of machine learning is to understand the structure of data and
fit that data into models that can be understood and utilized by people.
• In traditional computing , algorithms are set of instruction used by
computer to calculate or problem solve for simple tasks assigned to
computers.
• It is used by many industries for automating tasks and doing complex
data analysis.
• It focus mainly on designing of system thereby allowing them to learn
and make predictions based on some set of matrices in machines.
Definition
• Machine learning is the science of getting computers to learn and act
like humans do and improve their learning over time in autonomous
fashion, by feeding them data and information in the form of
observations and real-world interactions.
Artificial Intelligence
A program that can sense, reason, act and adapt.

Machine learning
Algorithms whose performance improve
as they are exposed to more data over
time.

Deep Learning
Subset of machine learning
in which multi-layered
Neural networks learn
from large amount of data.
• Machine Learning-It is branch of artificial intelligence which aim to
create intelligent system which do human like jobs by learning from
lot of relevant data.
• Deep learning- It is subset of machine learning in artificial lntelligance
that has network capable of learning unsupervised from data that is
unstructured. Also known as deep neural learning or deep neural
network.
• Artificial Intelligence- it refer to the simulation of human intelligence
in machines that are programmed to think like human and mimic their
action.
How Machine learning Differs from
Traditional Programming.
• 1. Traditional Computing- Algorithms are set of explicitly programmed
instructions used by computers to calculate or solve problem.
• In Machine learning algorithms instead allow for computers to train
on data inputs and use statistical analysis in order to output values
that fall within a specific range.
• Traditional Programming- Data & Program is run on the computer to
produce the output
• Machine learning: Data & output is run on the computer to create a
program. This program can be used in traditional programming.
Goals of Machine learning
• Primary goal of machine learning is to allow the computer learn
automatically without human intervention or assistance and adjust
actions accordingly.
• The goal of machine learning generally is to understand the structure
of data and fit that data into models that can be understood and
utilized by people.
• The goal of machine learning is to facilities computers in building
models from sample data.
• The goal of machine learning is to develop general purpose
algorithms of practical value.
Application of Machine Learning
• Image recognition
• Speech recognition
• Online fraud detection
• Stock Market trading
• Automatic Language translation.
Machine Learning life cycle
• Machine learning life cycle is a cyclic process to build an efficient
machine learning project.
• Main purpose of the life cycle to find a solution to the problem or
project.
• Life cycle steps:
• Gathering Data
• Data preparation
• Data wrangling
• Analyse data
• Train the model
• Test the model
• Deployment
Gathering Data
• This step to identify and obtain all data related problems.
• Identify different data sources, as data can be collected from various
source such as files, database, internet or mobile devices.
• The quality and quantity of the collected data will determine the
efficiency of the output.
• The more will be data , the more accurate will be the prediction.
Data Preparation
• It is a step where we put our data into suitable place and prepare it to
use in our machine learning training.
• In this step, we put all data together and then randomize the ordering
of data.
• This step can be further divided into two processes:
• Data Exploration: understand the nature of data that we have to work
with. We need to understand the characteristic , format and quality of
the data. In this step we find correlations, general trends and outliers.
• Data pre-processing: Now the next step is pre-processing of data for its
analysis
Data Wrangling
• It is the process of cleaning and converting raw data into a useable
format.
• It is process of cleaning the data , selecting the variable to use &
transforming the data in a proper format to make it more suitable for
analysis in the next step.
• Collected data may have various issues like Missing values, Duplicate
Data, Invalid Data, Noise. So, we use various filtering techniques to
clean the data.
• It is mandatory remove these issues, because its negative impact
affect on outcomes.
Data Analysis
• This step involves:
• Selection of analytical techniques
• Building models
• Review results.
• It starts with the determination of the type of the problems where
we select the machine learning techniques such as classification,
Regression, Cluster analysis, Association etc. then build the model
using prepared data, and evaluate the model.
Train Model
• We train our model to improve its performance for better outcome of
the problem.
• We use datasets to train the model using the machine learning
algorithms.
• Training a model is required so that it can understand the various
pattern , rules and features.
Test Model
• We check the accuracy of our model by providing a test dataset to it.
• Testing model determines the percentages accuracy of the model as
per the requirement of project or problem.
Deployment
• The last step of machine learning life cycle is deployment, where we
deploy the model in the real world system.
Advantages of Machine Learning
• Identifies trends & patterns easily.
• No human interference is required- automation
• Continuous improvement
• Handle Multidimensional & large amount of multi-variety data.
Disadvantages of Machine Learning
• Data acquisition
• Time & resources
• Interpretation of results
• High error-susceptibility
Types of Machine learning
• Supervised Machine Learning
• Unsupervised Machine Learning
Supervised Machine Learning
• The type of learning algorithm where the input & the desired output
are provided is known as the supervised learning algorithm.
• It used labeled data to train machines in order to make them learn and
establish relationships between given inputs & outputs.
• The objective of a supervised learning model is to predict the correct
label for newly presented input data.
• Y=f(x)
• Where Y is the predicted output i.e. determined by a mapping function
that assigns a class to an input value x.
• It is fast learning mechanism with high accuracy.
Labeled Dataset
• The dataset with outputs known for a given input is called a Labeled
Dataset.
• E.g- An Image of fruit with the fruit name is known.so when a new
image of fruit is shown, it compares with the training set to predict
the answer.
How Supervised machine learning
works?
• In this type the output is already known. There is a mapping of input with
desired output. Hence to create a model, the machine is fed with lots of
training input data.
• 1. The training data helps to achieve the accuracy for the created model. The
generated model is now ready to be fed with new input data and predict the
outcomes.
• 2. During training the algorithm will search for pattern in the data that match
with the desired output. The training process continues until the model
achieves a desired level of accuracy on the training data.
• 3. After training a supervised learning algorithm will take in new unseen inputs
and will determine which label the new inputs will be classified as based on
prior training data.
Supervised Learning Classified
• Classification
• Regression
Classification
• It means to group the output into a class.
• If data set is discrete, Boolean or categorical then it is a classification
Problem.
• Classification problems require the algorithm to predict a discrete value,
identifying the input data as belonging to a specific category or group.
• This technique can be used to classify products by Department, category,
subcategory etc.
• Email Spam Detection
• Problem: You want to build a model that can automatically classify
incoming emails as either "Spam" or "Not Spam" (also known as "Ham").
Regression
• Regression problem has a real number (number with decimal point)
as its output.
• It is mostly used for finding out the relationship between variables &
forecasting.
• Regression is a fundamental concept in supervised learning used to
predict a continuous outcome or target variable based on one or
more input features. Unlike classification, which predicts discrete class
labels, regression models predict numerical values.
K-Nearest Neighbor(KNN) Algorithm for Machine Learning

• K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case
into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified into a well suite category by using K- NN
algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying(core)
data.
• It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it
stores the dataset and at the time of classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that
data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want
to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it
works on a similarity measure. Our KNN model will find the similar features of the new data set to
the cats and dogs images and based on the most similar features it will put it in either cat or dog
category.

W
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of problem,
we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or class
of a particular dataset. Consider the below diagram:
How does K-NN work?
• The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
• Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required category. Consider the
below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry. It can
be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors
in category A and two nearest neighbors in category B. Consider the below image:

o As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
How to select the value of K in the K-NN Algorithm?
Below are some points to remember while selecting the value of K in the K-NN algorithm:
There is no particular way to determine the best value for "K", so we need to try some values
to find the best out of them. The most preferred value for K is 5.

o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in
the model.
o Large values for K are good, but it may find some difficulties.
• Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
• Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex
some time.
• The computation cost is high because of calculating the distance
between the data points for all the training samples.
Naive Bayes
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models
that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of
the probability of an object.
• Some popular examples of Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the

probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to predict a class
of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other
Algorithms.
• It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or unrelated, so it
cannot learn the relationship between features.
Decision Tree
• Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred
for solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the
outcome.
• In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision
and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the
given dataset.
• It is a graphical representation for getting all the possible solutions
to a problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the
root node, which expands on further branches and constructs a tree-
like structure.
• In order to build a tree, here we can use the CART algorithm, which
stands for Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.
Why use Decision Trees?
• There are various algorithms in Machine learning, so choosing the
best algorithm for the given dataset and problem is the main point to
remember while creating a machine learning model. Below are the
two reasons for using the Decision tree:
• Decision Trees usually mimic human thinking ability while making a
decision, so it is easy to understand.
• The logic behind the decision tree can be easily understood because it
shows a tree-like structure.
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more
homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from
the tree.
• Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
How does the Decision Tree
algorithm Work?
• Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for the best
attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
Advantages of the Decision Tree
• It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other
algorithms.
Disadvantages of the Decision Tree
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
• For more class labels, the computational complexity of the decision
tree may increase.
Support Vector Machine Algorithm
• Support Vector Machine or SVM is one of the most popular
Supervised Learning algorithms, which is used for Classification as
well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The idea behind SVM is to find the optimal boundary (called a
hyperplane) that best separates data points of different classes in the
feature space.
Components of the SVM
• Hyperplane: In SVM, a hyperplane is a decision boundary that separates
different classes in the feature space. The hyperplane is a line in 2D
space, a plane in 3D space, and a hyperplane in higher-dimensional
spaces.
• Support Vectors: Support vectors are the data points that are closest to
the hyperplane. These points are critical in defining the position and
orientation of the hyperplane. The SVM algorithm seeks to maximize
the margin (distance) between the hyperplane and the support vectors.
• Margin: The margin is the distance between the hyperplane and the
nearest data points from either class. SVM aims to find the hyperplane
that maximizes this margin, ensuring that the model is as general as
possible.
Types of SVM
• Linear SVM: Used when the data can be separated by a straight line
(or a hyperplane in higher dimensions). The algorithm finds the
optimal hyperplane that separates the classes.
• Non-Linear SVM: Used when the data cannot be separated by a
straight line. In this case, SVM uses a technique called the kernel trick
to map the data into a higher-dimensional space where a hyperplane
can separate the classes.
Advantages of SVM:
• Effective in High Dimensions: SVM works well when the number of
dimensions (features) is greater than the number of data points.
• Memory Efficient: SVM uses a subset of training points (support
vectors) to make decisions, which is memory efficient.
• Robust to Overfitting: Especially in high-dimensional space, if
appropriately regularized.
Disadvantages of SVM:
• Not Suitable for Large Datasets: SVM is computationally intensive and
may not perform well with very large datasets.
• Difficult to Choose the Right Kernel: The performance of SVM is highly
dependent on the choice of kernel and parameters.
• Interpretability: SVM models are not as easily interpretable as some
other models like decision trees.
Unsupervised Machine learning
• In this type of machine learning models are trained using unlabelled
dataset and are allowed to act on that data without any supervision.
• Unsupervised learning cannot be directly applied to a regression or
classification problem because unlike supervised learning, we have
the input data but no corresponding output data. The goal of
unsupervised learning is to find the underlying structure of dataset,
group that data according to similarities, and represent that dataset
in a compressed format.
• Here, we have taken an unlabeled input data, which means it is not
categorized and corresponding outputs are also not given. Now, this
unlabeled input data is fed to the machine learning model in order to
train it. Firstly, it will interpret the raw data to find the hidden
patterns from the data and then will apply suitable algorithms such as
k-means clustering, Decision tree, etc.
• Once it applies the suitable algorithm, the algorithm divides the data
objects into groups according to the similarities and difference
between the objects.

AngularJS-unit 2
100% (1)
AngularJS-unit 2
41 pages
It3511 - Full Stack Web Development Lab Manual Final 14112023
No ratings yet
It3511 - Full Stack Web Development Lab Manual Final 14112023
81 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
JQuery Unit-1
0% (1)
JQuery Unit-1
19 pages
Big-Data Unit-4
No ratings yet
Big-Data Unit-4
110 pages
Angular JS Notes
No ratings yet
Angular JS Notes
33 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Machine Learning Notes
100% (10)
Machine Learning Notes
19 pages
Angular JS Unit-4
No ratings yet
Angular JS Unit-4
43 pages
Creating ABC ID
No ratings yet
Creating ABC ID
10 pages
Angular JS Unit-3
No ratings yet
Angular JS Unit-3
29 pages
JQuery Unit-2
No ratings yet
JQuery Unit-2
18 pages
JQuery Unit-3
No ratings yet
JQuery Unit-3
14 pages
R20-3-2-Mean Stack Lab Manual
100% (1)
R20-3-2-Mean Stack Lab Manual
239 pages
Bba CA Project Sem 4
100% (1)
Bba CA Project Sem 4
5 pages
Data Science Handwritten Notes
No ratings yet
Data Science Handwritten Notes
44 pages
Angularjs Tutorial PDF
No ratings yet
Angularjs Tutorial PDF
25 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
78 pages
Big Data Analytics Notes
67% (3)
Big Data Analytics Notes
16 pages
Cluster
No ratings yet
Cluster
42 pages
Bootstrap Notes
No ratings yet
Bootstrap Notes
21 pages
SQL Notes by Apna College
75% (4)
SQL Notes by Apna College
29 pages
DHTML Notes
100% (1)
DHTML Notes
16 pages
SQL (Notes)
100% (1)
SQL (Notes)
59 pages
AngularJS Notes
80% (5)
AngularJS Notes
23 pages
Angular JS
75% (4)
Angular JS
183 pages
ML 1
No ratings yet
ML 1
35 pages
Syllabus of Add On Subject (407) Jquery For BBA (CA) Sem IV PDF
No ratings yet
Syllabus of Add On Subject (407) Jquery For BBA (CA) Sem IV PDF
3 pages
Social Media Analytics Unit-1
No ratings yet
Social Media Analytics Unit-1
43 pages
Big Data Analytics
100% (3)
Big Data Analytics
79 pages
Big Data (KCS-061)
No ratings yet
Big Data (KCS-061)
46 pages
ML
No ratings yet
ML
19 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
Machine Learning Notes - TutorialsDuniya
100% (1)
Machine Learning Notes - TutorialsDuniya
58 pages
Daa
No ratings yet
Daa
68 pages
Data Warehousing Notes Aktu
0% (1)
Data Warehousing Notes Aktu
10 pages
Daa Unit 3 Daa Unit 3 Handwritten Notes
100% (1)
Daa Unit 3 Daa Unit 3 Handwritten Notes
20 pages
Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
Unit 3 - DS - 1st Year
No ratings yet
Unit 3 - DS - 1st Year
5 pages
ADA Complete Notes
33% (3)
ADA Complete Notes
151 pages
Angular Js
No ratings yet
Angular Js
536 pages
Counting Ones in A Window: The Cost of Exact Counts
100% (1)
Counting Ones in A Window: The Cost of Exact Counts
13 pages
HTML 5: Spiders
No ratings yet
HTML 5: Spiders
33 pages
R Unit 1 2018 Notes
No ratings yet
R Unit 1 2018 Notes
36 pages
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
100% (1)
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
2 pages
Machine Learning 1
No ratings yet
Machine Learning 1
34 pages
JavaScript Syllabus
No ratings yet
JavaScript Syllabus
3 pages
Data Science & Engineering: M.Tech
No ratings yet
Data Science & Engineering: M.Tech
18 pages
Multivariate
100% (1)
Multivariate
78 pages
Major Project
No ratings yet
Major Project
5 pages
Introduction To JavaScript Notes
No ratings yet
Introduction To JavaScript Notes
5 pages
PHP Assignment
100% (2)
PHP Assignment
33 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
38 pages
CSE3100-Lab Manual
No ratings yet
CSE3100-Lab Manual
19 pages
What's Next?: Tree Models Decision Trees Ranking and Probability Estimation Trees
No ratings yet
What's Next?: Tree Models Decision Trees Ranking and Probability Estimation Trees
49 pages
Web Technology Important 2 Marks and 11 Marks
No ratings yet
Web Technology Important 2 Marks and 11 Marks
6 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Industrial Training PPT On Movie Recomendation System
No ratings yet
Industrial Training PPT On Movie Recomendation System
13 pages
Metrics For Software Project Size Estimation
No ratings yet
Metrics For Software Project Size Estimation
3 pages
Module 5
No ratings yet
Module 5
98 pages
Bootstrap
No ratings yet
Bootstrap
17 pages
Web Technology - Lecture Notes, Study Material and Important Questions, Answers
No ratings yet
Web Technology - Lecture Notes, Study Material and Important Questions, Answers
5 pages
Science Unit Plan Final
No ratings yet
Science Unit Plan Final
25 pages
Javascript Notes by Heera Singh Bellary
100% (2)
Javascript Notes by Heera Singh Bellary
133 pages
Project Report Soft
No ratings yet
Project Report Soft
123 pages
Trees Handout
No ratings yet
Trees Handout
51 pages
Graduate Admission Prediction - Data Analytics
No ratings yet
Graduate Admission Prediction - Data Analytics
32 pages
Predictive Modeling Using Segmentation: Nissanlevin Jacob Zahavi
No ratings yet
Predictive Modeling Using Segmentation: Nissanlevin Jacob Zahavi
21 pages
DAA Question Bank
No ratings yet
DAA Question Bank
39 pages
Introduction To Machine Learning and Its Application
No ratings yet
Introduction To Machine Learning and Its Application
8 pages
dd2437 Annda
No ratings yet
dd2437 Annda
45 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Text Classification Using Support Vector Machine IJERTV1IS3174
No ratings yet
Text Classification Using Support Vector Machine IJERTV1IS3174
4 pages
SoftComputing Module I
No ratings yet
SoftComputing Module I
4 pages
Python Syllabus
No ratings yet
Python Syllabus
12 pages
ML Two Marks Question According To Syllabus
No ratings yet
ML Two Marks Question According To Syllabus
4 pages
Browse The Book: First-Hand Knowledge
No ratings yet
Browse The Book: First-Hand Knowledge
31 pages
AI Final Assignment
No ratings yet
AI Final Assignment
27 pages
Stress Detection Report
No ratings yet
Stress Detection Report
11 pages
A Comparative Study of K-Means, DBSCAN and OPTICS
No ratings yet
A Comparative Study of K-Means, DBSCAN and OPTICS
6 pages
Pattern Recognition: Hong Fu, Zheru Chi, Dagan Feng
No ratings yet
Pattern Recognition: Hong Fu, Zheru Chi, Dagan Feng
9 pages
Application of Data Mining Techniques in
No ratings yet
Application of Data Mining Techniques in
11 pages
An Efficient Machine Learning Approach For Diagnosing Parkinson's Disease by Utilizing Voice Features
No ratings yet
An Efficient Machine Learning Approach For Diagnosing Parkinson's Disease by Utilizing Voice Features
20 pages
Curroncol 29 00159 v2
No ratings yet
Curroncol 29 00159 v2
20 pages
CS3491 AI and ML Important Question Bank
No ratings yet
CS3491 AI and ML Important Question Bank
7 pages
73 11 Prasanna Survey
No ratings yet
73 11 Prasanna Survey
8 pages
Major Project (Kartik Joshi)
No ratings yet
Major Project (Kartik Joshi)
4 pages
Cluster Validity
No ratings yet
Cluster Validity
18 pages
Disaster Response Pipeline Project
No ratings yet
Disaster Response Pipeline Project
2 pages

Big-Data Unit-3

Uploaded by

Big-Data Unit-3

Uploaded by

Introduction to

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

You might also like