Machine Learning Introduction
Machine Learning Introduction
• Machine can learn itself from past data and automatically improve.
• For the big organizations branding is important and it will become more easy to
target relatable customer base.
• It is similar to data mining because it is also deals with the huge amount of data.
Difference Between Machine Learning and Artificial
Intelligence
• Artificial Intelligence (AI) and Machine Learning (ML) are two closely
related but distinct fields within the broader field of computer science.
• ML algorithms can identify patterns and trends in data and use them to
make predictions and decisions.
1956 The terminology “Artificial Intelligence” was The terminology “Machine Learning” was first used in 1952 by
1
. originally used by John McCarthy, who also hosted the IBM computer scientist Arthur Samuel, a pioneer in artificial
first AI conference. intelligence and computer games.
Unsupervised Learning
Unsupervised learning is used to predict undefined relationships like meaningful patterns in data.
It is about creating computer algorithms than can improve themselves.
It is expected that machine learning will shift to unsupervised learning to allow programmers to solve problems without
creating models.
Reinforcement Learning
Reinforcement learning is based on non-supervised learning but receives feedback from the user whether the decisions is
good or bad. The feedback contributes to improving the model.
Self-Supervised Learning
Self-supervised learning is similar to unsupervised learning because it works with data without human added labels.
The difference is that unsupervised learning uses clustering, grouping, and dimensionality reduction, while self-
Examples of Machine Learning Problems
• Credit Card Fraud Detection: Given credit card transactions for a customer in a month,
identify those transactions that were made by the customer and those that were not. A
program with a model of this decision could refund those transactions that were fraudulent.
• Digit Recognition: Given a zip codes hand written on envelops, identify the digit for each
hand written character. A model of this problem would allow a computer program to read and
understand handwritten zip codes and sort envelops by geographic region.
• Speech Understanding: Given an utterance from a user, identify the specific request made by
the user. A model of this problem would allow a program to understand and make an
attempt to fulfil that request. The iPhone with Siri has this capability.
• Face Detection: Given a digital photo album of many hundreds of digital photographs,
identify those photos that include a given person. A model of this decision process would
allow a program to organize photos by person. Some cameras and software like iPhoto has
this capability.
• Product Recommendation: Given a purchase history for a customer and a large inventory of
products, identify those products in which that customer will be interested and likely to purchase. A
model of this decision process would allow a program to make recommendations to a customer and
motivate product purchases. Amazon has this capability. Also think of Facebook, GooglePlus
and LinkedIn that recommend users to connect with you after you sign-up.
• Medical Diagnosis: Given the symptoms exhibited in a patient and a database of
anonymized patient records, predict whether the patient is likely to have an illness. A model of this
decision problem could be used by a program to provide decision support to medical professionals.
• Stock Trading: Given the current and past price movements for a stock, determine whether the
stock should be bought, held or sold. A model of this decision problem could provide decision
support to financial analysts.
• Customer Segmentation: Given the pattern of behaviour by a user during a trial period and the
past behaviours of all users, identify those users that will convert to the paid version of the product
and those that will not. A model of this decision problem would allow a program to trigger
customer interventions to persuade the customer to covert early or better engage in the trial.
• Shape Detection: Given a user hand drawing a shape on a touch screen and a database of known
shapes, determine which shape the user was trying to draw. A model of this decision would allow a
program to show the platonic version of that shape the user drew to make crisp diagrams. The
Instaviz iPhone app does this.
Differences between Learning vs Designing:
Learning Designing
It is a process by which a system improves It is a process to design a system based on various
performance from the past experience. requirements.
Learning does not require testing. Designing requires testing
It gains experience from past data. It gains experience when data is fed to the design.
It represents the data with the help of various
As such no representation of the data is required.
functions.
At times it preprocesses the data and does
It doesn’t preprocess the data at all.
filtering of noisy data.
It does require a measuring device. It requires a problem description.
Learning skills is required. Designing skills are required.
Clustering, Description, and Regression is used in Decision tree, table are used while the designing
the learning process. process.
Training data vs Testing data
There are two key types of data used for machine learning training and testing
data.
They each have a specific function to perform when building and evaluating
machine learning models.
Machine learning algorithms are used to learn from data in datasets.
They discover patterns and gain knowledge. make choices, and examine those
decisions.
What is Training data?
• Testing data is used to determine the performance of the trained model,
whereas training data is used to train the machine learning model.
What is Testing Data?
• You will need unknown information to test your machine learning
model after it was created (using your training data). This data is
known as testing data, and it may be used to assess the progress and
efficiency of your algorithms’ training as well as to modify or
optimize them for better results.
Features Training Data Testing Data
Tree models
Rule models.
Rule models consist of a collection of implications or IF-THEN rules.
For tree-based models, the ‘if-part’ defines a segment and the ‘then-part’ defines the
behaviour of the model for this segment.
Geometric Models
• Predictive probability models use the idea of a conditional probability distribution P (Y |X) from
which Y can be predicted from X
• Generative models estimate the joint distribution P (Y, X). Once we know the joint distribution for
the generative models, we can derive any conditional or marginal distribution involving the same
variables Probabilistic models use the idea of probability to classify new entities Naïve Bayes is an
example of a probabilistic classifier.
What is Bayes Theorem in ML?
• ML Bayes Theorem:
Working of Naïve Bayes' Classifier can be understood with the help of the below
example:
• Suppose we have a dataset of weather conditions and corresponding target variable
"Play". So using this dataset we need to decide that whether we should play or not on
a particular day according to the weather conditions. So to solve this problem, we
need to follow the below steps:
• Problem: If the weather is sunny, then the Player should play or not?
Frequency table for the Weather Conditions:
Outlook Play
Weather Yes No
Overcast 5 0
0 Rainy Yes
Rainy 2 2
1 Sunny Yes
Sunny 3 2
2 Overcast Yes
Total 10 5
3 Overcast Yes
4 Sunny No Likelihood table weather condition:
5 Rainy Yes Weather No Yes
6 Sunny Yes Overcast 0 5 5/14= 0.35
7 Overcast Yes Rainy 2 2 4/14=0.29
8 Rainy No Sunny 2 3 5/14=0.35
9 Sunny No
All 4/14=0.29 10/14=0.71
10 Sunny Yes
11 Rainy No Applying Bayes'theorem:
12 Overcast Yes P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
• Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other Algorithms.
• It is the most popular choice for text classification problems.
• Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between features.