Unit No. 1
Unit No. 1
SOPHIA ROBOT
FACEBOOK
FRAUD WITHIN BANK
GOAL OF MACHINE LEARNING
Input Output
Identification
Image Image
Search
ML ………
Speech recognition
Audio Music classification
Speaker identification
………….
ML
Anti-spam
Machine translation
Tex Sentiment analysis
t Summarization
ML
WHY MACHINE LEARNING?
• Computer also should be in position to take decisions autonomously and intelligently
based on the information given (Just like Humans)
• Human learn from experience and computer only follows instruction.
• Different approach than typical programming.
• In regular programming logic that needs intelligence is developed by developer and
computer merely follows the logic.
• Computer’s ability to perform operations much faster than humans is focused.
• Give experience directly to computer to learn and prepare itself for action
• Define the experience in a structured format
• So, computer learn from data(experience) and this process is called “Machine Learning”
PROBLEMS THAT HUMAN CAN NOT ANSWER
• Ability to Predict
• Ability to Classify
• Ability of Grouping
• Ability of Vision
• Language Understanding
MACHINE LEARNING APPLICATIONS
MACHINE LEARNING APPLICATIONS
MACHINE LEARNING APPLICATIONS
DEFINITIONS
• Machine learning is the training of a model from data that generalizes a decision against
a performance measure.
• "A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.“
• For Example, Machine learning behaves similarly to the growth of a child. As a child
grows, her experience E in performing task T increases, which results in higher
performance measure (P).
LEARNING PROBLEMS
Model
Data (algorithms, Output
parameters)
MACHINE LEARNING -ARCHITECTURE
MACHINE LEARNING -ARCHITECTURE
WHAT IS LEARNING
• For performing Machine learning, there are primarily two types of datasets required.
• Training Dataset and Testing Dataset
• Training dataset: It is usually manually prepared, where the input data and the
expected output data are available and prepared. It is important that every piece of
input data has an expected output data point available
• Testing Dataset: It is where we have the input data, and we are interested in predicting
the expected output.
• But there are three datasets are available in Machine learning
• Training dataset, Validation dataset and Testing dataset
• Validation Dataset: The validation dataset refers to the data examples that are verified
against the built classifier and can help tune the accuracy of the output.
TERMINOLOGY
• Phase 2—Validation and Test Phase: This phase is to measure how good the learning
model that has been trained is and estimate the model properties, such as error
measures, recall, precision, and others. This phase uses a validation dataset, and the
output is a sophisticated learning model.
• Phase 3—Application Phase: In this phase, the model is subject to the real-world data
for which the results need to be derived.
COMPONENTS OF MACHINE LEARNING SYSTEM - DATA
a. Training Dataset The training dataset is the dataset that is the base dataset against which the model
is built or trained.
b. Testing Dataset The testing dataset is the dataset that is used to validate the model built. This
dataset is also referred to as a validating dataset.
c. Validation Dataset It is the dataset that is used for final verification of the model (and can be treated
more as user acceptance testing).
TERMS RELATED TO DATA
• Model is the representation of real life object. It mimics behaviour of the object it
represent.
• A simplified description, especially a mathematical one, of a system or process, to assist
calculations and predictions (oxford dictionary).
• mathematical model : a representation in mathematical terms of the behaviour of real
devices and objects
• Models are the output of algorithms applied to a dataset.
CATEGORIES OF MODELS
• Logical Models
• Logical models are more algorithmic in nature and help us derive a set of rules by running
the algorithms iteratively. A Decision tree is one such example
• Geometric Models
• Geometric models use geometric concepts such as lines, planes, and distances. These
models usually operate, or can operate, on high volumes of data
• Probabilistic Models
• Probabilistic models are statistical models that employ statistical techniques. These
models are based on a strategy that defines the relationship between two variables.
TYPES OF LEARNING PROBLEMS
• Classification
• Regression
• Clustering
• Optimization
• Simulation
CLASSIFICATION
• A classification problem is when the output variable is a category, such as “red” or “blue” or
“disease” and “no disease”.
• A classification model attempts to draw some conclusion from observed values.
• For example, when filtering emails “spam” or “not spam”, when looking at transaction data,
“fraudulent”, or “authorized”.
• In short Classification either predicts categorical class labels or classifies data (construct a
model) based on the training set and the values (class labels) in classifying attributes and
uses it in classifying new data.
• There are a number of classification models.
• Classification models include logistic regression, decision tree, random forest, gradient-
boosted tree, multilayer perceptron, one-vs-rest, and Naive Bayes.
CLASSIFICATION
CLASSIFICATION
• A regression problem is when the output variable is a real or continuous value, such as
“salary” or “weight”.
• Many different models can be used, the simplest is the linear regression. It tries to fit data
with the best hyper-plane which goes through the points.
REGRESSION
• The method of identifying similar groups of data in a data set is called clustering.
• Entities in each group are comparatively more similar to entities of that group than those of
the other groups.
• It is process of finding meaningful structure and grouping the similar things
• i.e. data in the same group is more similar to other data in the same group and dissimilar to
other data which are present in different groups.
• Unlabelled datasets uses clustering technique.
CLUSTERING
• The data points in the graph below clustered together can be classified into one single
group. We can distinguish the clusters, and we can identify that there are 3 clusters in the
below picture.
It is not necessary
for clusters to be a
spherical.
MACHINE LEARNING TECHNIQUES
• For example, classification is a technique for grouping things that are similar.
• To actually do classification on some data, a data scientist would have to employ a specific
algorithm like Decision Trees (though there are many other classification algorithms to
choose from).
MACHINE LEARNING TECHNIQUES
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
SUPERVISED LEARNING
• Supervised Learning
• Supervised learning is similar to human learning in presence of Supervisor or Teacher.
• Supervisor/Teacher’s roll is to provide correct feedback to learner.
• Example: Teacher shows set of dog’s images and informs student that these are of Dogs.
student learns from the images the animal called DOG
• What student understands is the properties of dogs that identifies it as dog like its face,
color, voice, etc.
• Parent show the child animals like dogs, cats and help them to recognize them.
SUPERVISED LEARNING
• In Machine Learning, A machine learning model learns from given examples presented in
the form of data.
• Input to machine learning model is data and its various attributes are properties through
which model learns.
• Similar to teacher in human learning, Along with the data the correct output is also
provided that helps model to learn.
SUPERVISED LEARNING
Legs 4
• Supervised learning occurs when an algorithm learns from example data and associated
target responses that can consist of numeric values or string labels, such as classes or tags,
in order to later predict the correct response when posed with new examples.
• The aim of supervised machine learning is to build a model that makes predictions based on
evidence in the presence of uncertainty.
• A supervised learning algorithm takes a known set of input data and known responses to
the data (output) and trains a model to generate reasonable predictions for the response to
new data.
SUPERVISED LEARNING
• This is learning without teachers. Its learning a new concept comparing it with another concept.
• This is basically human’s ability to group similar elements
• Examples:
• Humans group banana, apple, orange, etc as fruits because they are from trees and eaten
without cooking or any other processing. (hence common attributes among these are “ grown
on tree” and “eaten without cooking”)
• Humans group notebooks, pen, books, pencil as school stationary because these are useful in
school. (hence a common attribute among these is “ useful in school”)
• It resembles the methods humans use to figure out that certain objects or events are from the
same class, such as by observing the degree of similarity between objects.
• Important characteristics of unsupervised learning is to find similarity between two Events or
objects.
UNSUPERVISED LEARNING
banana Yellow
guava Yellow
banana Green
guava Green
UNSUPERVISED LEARNING
• Unsupervised learning is where you only have input data (X) and no corresponding output
variables.
• The goal for unsupervised learning is to model the underlying structure or distribution in
the data in order to learn more about the data.
• These are called unsupervised learning because unlike supervised learning above there is
no correct answers and there is no teacher. Algorithms are left to their own devises to
discover and present the interesting structure in the data.
REINFORCEMENT LEARNING
Testing Data
No. of Size Age of Price
Room the (Actual)
s House Performance
3 700 12 21L Trained Model
5 1200 4 60L Measuremen
t
6 1500 6 65L
MEASURING ERROR(ERROR METRICS) : FOR PREDICTION TYPE
MODELS
• The number of correct and incorrect predictions are summarized with count values and
broken down by each class. This is the key to the confusion matrix.
• The confusion matrix shows the ways in which your classification model is confused when it
makes predictions.
• It gives us insight not only into the errors being made by a classifier but more importantly
the types of errors that are being made.
MEASURING ERROR(ERROR METRICS) : FOR CLASSIFICATION TYPE
MODELS
• A confusion matrix is formed from the four outcomes produced as a result of binary
classification.
• Four outcomes of classification
• A binary classifier predicts all data instances of a test dataset as either positive or negative.
• This classification (or prediction) produces four outcomes – true positive, true negative,
false positive and false negative.
• We usually denote them as TP, FP, TN, and FN instead of “the number of true positives”, and
so on.
Actual Class
Cat Non-cat
• Assuming a sample of 27 animals — 8 cats, 6 dogs, and 13 rabbits, the resulting confusion
matrix could look like the table below:
PERFORMANCE MEASUREMENT METRICS
• Following are the metrics used for the Classification Models in the Machine Learning:
1. Accuracy
2. Recall / Sensitivity
3. Precision
4. Specificity
CLASSIFICATION RATE / ACCURACY
• Recall/Sensitivity can be defined as the ratio of the total number of correctly classified
positive examples divide to the total number of positive examples.
• High Recall indicates the class is correctly recognized (small number of FN).
• Recall is given by the relation:
PRECISION
• To get the value of precision we divide the total number of correctly classified positive
examples by the total number of predicted positive examples.
• High Precision indicates an example labelled as positive is indeed positive (small number of
FP).
• Precision is given by the relation:
SPECIFICITY
• It is process that start with defining the data and ends with the model with some defined level
of accuracy.
1. Define Problem
2. Collect data
3. Prepare Data
4. Split data in training validation and testing
5. Algorithm Selection
6. Training the algorithm
7. Evaluate Test Data
8. Parameter Tuning
9. Start Using the model
MACHINE LEARNING PROCESS
1. Define Problem
• What is the problem?
• Why does this problem need a solution?
2. Collect the Data
• By scraping the website
• Different survey
• Sensors
• Website logs
MACHINE LEARNING PROCESS