0% found this document useful (0 votes)
17 views

Classification Algorithm

Uploaded by

r9492046
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Classification Algorithm

Uploaded by

r9492046
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

DATA MINING

Data Mining

Data mining is most commonly defined as


the process of using computers and
automation to search large sets of data for
patterns and trends, turning those findings
into business insights and predictions.
Data Mining

Data mining goes beyond the search


process, as it uses data to evaluate
future probabilities and develop
actionable analyses.
What Are the Benefits
of Data Mining?

Since we live and work in a data-centric world, it’s


essential to get as many advantages as possible.
Data mining provides us with the means of
resolving problems and issues in this challenging
information age.
Data mining benefits include:
• It helps companies gather reliable
information.
• It helps businesses make profitable
production and operational adjustments
• It helps businesses make informed
decisions
Data mining benefits include:
• It helps data scientists quickly initiate automated
predictions of behaviors and trends and discover
hidden patterns.
• It helps detect credit risks and fraud
• It helps data scientists easily analyze enormous
amounts of data quickly.
• Data scientists can use the information to detect
fraud, build risk models, and improve product
safety
Questions that can be answered
through Data Mining
• What kind of customers should
a business target in its next ad
campaign?
• What patterns in behavior are
connected to financial fraud?
Questions that can be answered
through Data Mining
• What are the buying patterns of
customers based on their demographics?
• What are the factors influencing the
success of marketing campaigns?
DATA
ANALYST
Data Analyst
• A data analyst collects, cleans, and
interprets data sets in order to answer a
question or solve a problem. They work
in many industries, including business,
finance, criminal justice, science,
medicine, and government.
Data analysis can take different forms,
depending on the question you’re trying to
answer.
TYPES OF DATA ANALYSIS
-Descriptive analysis tells us what happened
-Diagnostic analysis tells us why it happened
-Predictive analytics forms projections about
the future
-Prescriptive analysis creates actionable
advice on what actions to take.
Phases / Steps in
Analyzing Data
•Identify the data you want to analyze
•Collect the data
•Clean the data in preparation for analysis
•Analyze the data
•Interpret the results of the analysis
Classification
Algorithm
Data Mining
Data Mining Algorithm
-Classification Algorithms.

• Naïve Bayes
• Support Vector Machine
• K-Nearest Neighbours
• Decision Tree
DATASET
• a collection of related sets of information
that is composed of separate elements but
can be manipulated as a unit by a computer:

• They are mostly used in fields like machine


learning, business, and government to gain
insights, make informed decisions, or train
algorithms.
• Datasets play a vital role in every facet of our lives. In
this modern day, all devices are made to collect data
and create datasets for advertisers/businesses to
personalize their advertisements to consumers. The
limitation is that as a result of over-reliance on datasets,
the mining techniques of data have become ethically
questionable with many social media applications and
websites getting criticism for data privacy issues, data
leaks, and so on. As a result, data is the currency and
many companies mine user information without the
user’s knowledge to create datasets.
Steps to Build a
Classification Model
Steps to Build a
Classification Model
Continuation in Building
a Classification Model
Continuation in Building
a Classification Model
Classification Algorithm
• The Classification algorithm is a Supervised
Learning technique that is used to identify the
category of new observations on the basis of
training data. In Classification, a program learns
from the given dataset or observations and then
classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1,
Spam or Not Spam, cat or dog, etc. Classes can
be called as targets/labels or categories.
• It is an important task in data mining
because it enables organizations to make
data-driven decisions. For example,
businesses can assign or classify
sentiments of customer feedback, reviews,
or social media posts to understand how
well their products or services are doing.
Classification Technique
Categories

Binary-Class Classification Multi-Class Classification


Classification Technique
Categories

• Classification techniques can be divided


into categories - binary classification and
multi-class classification. Binary
classification assigns labels to instances
into two classes, such as fraudulent or
non-fraudulent. Multi-class classification
assigns labels into more than two classes,
such as happy, neutral, or sad.
Types of Classification
Algorithm
Some Types of
Classification Algorithm

• Random Forest
• Naïve Bayes
Random Forest Algorithm
• Random Forest is a classifier that contains a number of
decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of
that dataset."

* The greater number of trees in the forest leads to higher


accuracy and prevents the problem of overfitting.
RANDOM FOREST DIAGRAM
Assumptions for Random Forest

• Since the random forest combines


multiple trees to predict the class of the
dataset, it is possible that some decision
trees may predict the correct output,
while others may not. But together, all
the trees predict the correct output.
Assumptions for Random Forest
Why Use Random Forest
Random Forest Applications
Advantages and
Disadvantages
Advantages
and Disadvantages
WEKA
Weka is a collection of machine learning
algorithms for solving real-world data mining
problems. It is written in Java / Python
programming language and runs on almost any
platform. The algorithms can either be applied
directly to a dataset or called from your own
Java or Python code.
• Naïve Bayes
• Support Vector Machine
• K-Nearest Neighbours
• Decision Tree
Naïve Bayes
Classification
Naïve Bayes and Data Mining /
Machine Learning
• Applying Bayes'theorem:

• P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
• P(Sunny|Yes)= 3/10= 0.3
• P(Sunny)= 0.35
• P(Yes)=0.71
• So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

• P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
• P(Sunny|NO)= 2/4=0.5
• P(No)= 0.29
• P(Sunny)= 0.35
• So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
• So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

• Hence on a Sunny day, Player can play the game.


Advantages and
Disadvantages
Where is Naïve Bayes used
Where is Naïve Bayes used
Thank You
• Definition
• Why Use That Algorithm
• Advantages and Disadvantages
• Real World Case Example
= Algorithm Execution

You might also like