BigData&Analytics Module5
BigData&Analytics Module5
• Supervised learning, is where user and modeler are involved in choosing the data
features to be used for the model. Training data must be labeled - to teach the
machine how to recognize the outcomes your model is designed to detect, e.g.
classify text sentiment into positive, negative and neutral. Or recognize a human in an
image as male or female.
• Unsupervised learning uses unlabeled data to find patterns in the data, such as
inferences or clustering of data points. For example, frequent topics within facebook
posts, products more likely purchased together.
https://www.cloudfactory.com/training-data-guide
https://www.datarevenue.com/en-blog/what-is-machine-learning-a-visual-explanation
https://steemit.com/steemstem/@noble-noah/data-mining-and-application-big-data-rules-the-world
© 2020 Eslsca. All Rights Reserved 5
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 3rd Supervised Learning: Classification Example
Example: Electronic mail spam filtering
• One can train a model using a supervised machine learning algorithm on a
group of labeled e-mail
• i.e. e-mail that are marked either as spam or not-spam
• in order to predict whether a new e-mail belongs to one of the two
categories (spam or not-spam)
https://steemit.com/steemstem/@noble-noah/data-mining-and-application-big-data-rules-the-world
• It helps to think through a decision and weigh the pros and cons of various options.
• It helps you see the implications of each choice and to give the rationale for your proposed
action.
o Temperature forecast
Examples:
o Target marketing
o Topic Modelling
(For example; in the shown picture while
analyzing documents related to climate
change the following words were
identified as clusters; global, climate,
warming, change, temperature, carbon)
Mall Customer data puts you in the shoes of the owner of a business in a
shopping mall. You have customer data, and on this basis of the data, you have
to divide the customers into various groups that share.
o Annual income level
o Shopping score
https://www.kaggle.com/soham11/customer-segmentation-kmeans
Examining the distribution of the purple segment, it is found that the majority are males,
and the segment age density is between 33-48 (as shown in the figures.)
© 2020 Eslsca. All Rights Reserved 17
Big Data & Business Analytics
Course Name Module
Module 5: Name
Decision Tree & Clustering
Module 02
Module 5 5th Machine Learning Techniques Overview
Module Completed
Module 05