0% found this document useful (0 votes)
68 views

BigData&Analytics Module5

This document is a module outline for a course on Big Data and Business Analytics. Module 5 covers decision trees, clustering/segmentation, supervised versus unsupervised learning. It defines key concepts like supervised learning using labeled training data for classification and regression. Unsupervised learning uses unlabeled data to find patterns and clusters. Examples of supervised learning applications include email spam filtering and loan approval classification using decision trees. Clustering can group customers into segments with similar interests.

Uploaded by

Mohamed Ehab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

BigData&Analytics Module5

This document is a module outline for a course on Big Data and Business Analytics. Module 5 covers decision trees, clustering/segmentation, supervised versus unsupervised learning. It defines key concepts like supervised learning using labeled training data for classification and regression. Unsupervised learning uses unlabeled data to find patterns and clusters. Examples of supervised learning applications include email spam filtering and loan approval classification using decision trees. Clustering can group customers into segments with similar interests.

Uploaded by

Mohamed Ehab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Eslsca business school logo

Big Data & Business Analytics


Module (05) –Decision Tree & Clustering/Segmentation
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 Learning Objectives
Module Objectives:
Understand the concepts of decision trees with examples
Understand the concepts of clustering with examples
Understand the difference between supervised and un-supervised learning
Understand the supervised learning process; apply the concepts on classification machine
learning
Understand the unsupervised learning process; apply the concepts on clustering machine
learning

What to Study for Exam:


Module 5 Lecture Notes (with emphasis on above topics)

© 2020 Eslsca. All Rights Reserved 2


Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 1st Supervised versus Un-supervised Learning
Training data refers to the initial data that is used to develop a machine learning model,
from which the model creates and refines its rules. The quality of this data has profound
implications for the model’s prediction.

• Supervised learning, is where user and modeler are involved in choosing the data
features to be used for the model. Training data must be labeled - to teach the
machine how to recognize the outcomes your model is designed to detect, e.g.
classify text sentiment into positive, negative and neutral. Or recognize a human in an
image as male or female.
• Unsupervised learning uses unlabeled data to find patterns in the data, such as
inferences or clustering of data points. For example, frequent topics within facebook
posts, products more likely purchased together.

https://www.cloudfactory.com/training-data-guide
https://www.datarevenue.com/en-blog/what-is-machine-learning-a-visual-explanation

© 2020 Eslsca. All Rights Reserved 3


Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 1st Supervised versus Un-supervised Learning

© 2020 Eslsca. All Rights Reserved 4


Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 2nd Supervised Learning Process

https://steemit.com/steemstem/@noble-noah/data-mining-and-application-big-data-rules-the-world
© 2020 Eslsca. All Rights Reserved 5
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 3rd Supervised Learning: Classification Example
Example: Electronic mail spam filtering
• One can train a model using a supervised machine learning algorithm on a
group of labeled e-mail
• i.e. e-mail that are marked either as spam or not-spam
• in order to predict whether a new e-mail belongs to one of the two
categories (spam or not-spam)

 This is an example of classification

https://steemit.com/steemstem/@noble-noah/data-mining-and-application-big-data-rules-the-world

© 2020 Eslsca. All Rights Reserved 6


Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 3rd Supervised Learning: Classification Examples
Other Classification Examples:

o Identify risk cases, e.g. in weather, financial sector, fraud, etc.

o Word classification, e.g. sentiment analysis, optical character recognition OCR

o Identify a disease, e.g. diabetes vs no diabetes

o Image recognition, e.g. cat or dog or human

o Emotion detection, e.g. happy, sad, angry, disgusted, surprised

IBM Watson-Supervised Learning-Classification example, watch this video:


https://www.youtube.com/watch?v=U6rvaWaiZNg

© 2020 Eslsca. All Rights Reserved 7


Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 3rd Supervised Learning: Decision Trees
• A decision tree helps visualize decision points and the alternatives to consider.

• It helps to think through a decision and weigh the pros and cons of various options.

• It helps you see the implications of each choice and to give the rationale for your proposed
action.

A decision tree has two basic parts:

 nodes contain ideas, assumptions, or facts


 branches connect nodes with each other

Illustrative Video: https://www.youtube.com/watch?v=ydvnVw80I_8


© 2020 Eslsca. All Rights Reserved 8
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 3rd Supervised Learning: Decision Trees- Loan Example
 The goal is:
Use decision tree to classify
future loan applications into
one of following classes:
⃝Yes (approved) and
⃝No (not approved)

Applicant Attributes are:


Age:
¤Young ¤Middle ¤Old
Job_Status:
¤True ¤False
Owns_House:
¤True ¤False
Credit_Rating:
¤Excellent ¤Good ¤Fair
Details in the link: https://iq.opengenus.org/cart-algorithm/
© 2020 Eslsca. All Rights Reserved 9
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 3rd Supervised Learning: Decision Tress- Loan For exam
Example
 What would a decision tree look
like for the given problem?
Here is one example of a
decision tree :

The tree will be constructed in a


top-down approach as follows:
Step 1: Start at the root node with
all training instances
Step 2: Select an attribute on the
basis of splitting criteria (For
example, Gini Index)
Step 3: We need the root node of
the decision tree to have the lowest
possible Gini Index
https://iq.opengenus.org/cart-algorithm/
© 2020 Eslsca. All Rights Reserved 10
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 3rd Supervised Learning: Regression Examples
Cases that can be modelled as a mathematical equation are referred as regression
Examples:

o Predicting the failure of mechanical parts in automobile engines

o Predicting social media share scores

o Predicting performance scores, e.g. restaurant rating, revenues

o Estimating life expectancy

o Estimating population growth

o Temperature forecast

© 2020 Eslsca. All Rights Reserved 11


Big Data & Business Analytics
Course Name Module
Module 5:Name
Decision Tree & Clustering
Module 02
Module 5 4th Un-Supervised Learning: Clustering

Break down the


customers into
groups that share
similar interests
and buying
habits

© 2020 Eslsca. All Rights Reserved 12


Big Data & Business Analytics
Course Name Module
Module 5:Name
Decision Tree & Clustering
Module 02
Module 5 4th Un-Supervised Learning: Clustering
What is a cluster?

o A cluster is a subset of similar objects

o A subset of objects such that the distance


between any of the two objects in the cluster
is less than the distance between any object in
the cluster and any object that is not located
inside it

o The ‘centroid’ of the cluster represents the


average values the members of the cluster
share

o The simplest distance measurement to


determine the clusters are the Euclidean
distance.
x is the reading and y is the centroid
© 2020 Eslsca. All Rights Reserved 13
Big Data & Business Analytics
Course Name Module
Module 5:Name
Decision Tree & Clustering
Module 02
Module 5 4th Un-Supervised Learning: Clustering

Examples:

o Target marketing

o Topic Modelling
(For example; in the shown picture while
analyzing documents related to climate
change the following words were
identified as clusters; global, climate,
warming, change, temperature, carbon)

© 2020 Eslsca. All Rights Reserved 14


Big Data & Business Analytics
Course Name Module
Module 5:Name
Decision Tree & Clustering
Module 02
Module 5 4th Un-Supervised Learning: Clustering
A clustering Example: Mall Customers Segmentation

Mall Customer data puts you in the shoes of the owner of a business in a
shopping mall. You have customer data, and on this basis of the data, you have
to divide the customers into various groups that share.
o Annual income level
o Shopping score

https://www.kaggle.com/soham11/customer-segmentation-kmeans

© 2020 Eslsca. All Rights Reserved 15


Big Data & Business Analytics
Course Name Module
Module 5:Name
Decision Tree & Clustering
Module 02
Module 5 4th Un-Supervised Learning: Clustering
A clustering Example: Mall Customers Segmentation

The clustering algorithm identifies 5 clusters.


The purple cluster needs to be more investigate as the segment annual income is high
But the spending score is low.
© 2020 Eslsca. All Rights Reserved 16
Big Data & Business Analytics
Course Name Module
Module 5:Name
Decision Tree & Clustering
Module 02
Module 5 4th Un-Supervised Learning: Clustering For exam
A clustering Example: Mall Customers Segmentation

Examining the distribution of the purple segment, it is found that the majority are males,
and the segment age density is between 33-48 (as shown in the figures.)
© 2020 Eslsca. All Rights Reserved 17
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 5th Machine Learning Techniques Overview

© 2020 Eslsca. All Rights Reserved 18


Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 6th Rapid Miner Studio Videos
TurboPrep Data Cleansing
https://academy.rapidminer.com/learn/video/turbo-prep-data-cleansing

Creating a Decision Tree Model


https://academy.rapidminer.com/courses/creating-a-decision-tree-model

Rapidminer Automodel Classification


https://academy.rapidminer.com/courses/auto-model-classification

Rapidminer Automodel Clustering & Outlier


https://academy.rapidminer.com/learn/video/auto-model-clustering-outliers

Applying the Model


https://academy.rapidminer.com/learn/video/applying-the-model
© 2020 Eslsca. All Rights Reserved 19
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 Questions

© 2018 MegaSoft. All Rights Reserved 20


Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5

 Module Completed

Module 05

You might also like