Unit 1 Part 3
Unit 1 Part 3
o Supervised Learning
o Unsupervised Learning
o Reinforcement Learning
o Semi Supervised Learning
o Classification
o Regression
o Clustering
o Association Rule
o Dimensionality Reduction
1. Supervised Machine Learning
So, it works on the principle of input-output pairs. It requires creating a function that
can be trained using a training data set, and then it is applied to unknown data and
makes some predictive performance. Supervised learning is task-based and tested on
labeled data sets.
1. Classification
➢ Definition
In machine learning, classification is the problem of identifying to which of a set of
categories a new observation belongs, on the basis of a training set of data
containing observations (or instances) whose category membership is known.
➢ EXAMPLE
• Data in Table 1.1 is the training set of data. There are two attributes “Score1”
and “Score2”.
• The class label is called “Result”. The class label has two possible values
“Pass” and “Fail”.
• The data can be divided into two categories or classes: The set of data for
which the class label is “Pass” and the set of data for which the class label
is“Fail”.
• Let us assume that we have no knowledge about the data other than what is
given in the table.
• Now, the problem can be posed as follows: If we have some new data, say
“Score1 = 25” and “Score2 = 36”, what value should be assigned to “Result”
corresponding to the new data; in other words, to which of the two categories
or classes the new observation should be assigned?
iii) Speech recognition: In speech recognition, the input is acoustic and the classes
are words that can be uttered.
iv) Medical diagnosis: In medical diagnosis, the inputs are the relevant information
we have about the patient and the classes are the illnesses. The inputs contain the
patient’s age, gender, past medical history, and current symptoms. Some tests may
not have been applied to the patient, and thus these inputs would be missing.
vi) Compression: Classification rules can be used for compression. By fitting a rule
to the data, we get an explanation that is simpler than the data, requiring less
memory to store and less computation to process.
➢ Algorithms
There are several machine learning algorithms for classification. The following are
some of the well-known algorithms.
a) Logistic regression
b) Naive Bayes algorithm
c) k-NN algorithm
d) Decision tree algorithm
e) Support vector machine algorithm
f) Random forest algorithm
➢ Remarks
2. Regression
➢ Example
Consider the data on car prices given in Table 1.2.
Suppose we are required to estimate the price of a car aged 25 years with distance
53240 KM and weight 1200 pounds.
• Simple linear regression: There is only one continuous independent variable x and
the assumed relation between the independent variable and the dependent variable y
is y = a + bx
• Multivariate linear regression: There are more than one independent variable, say
x1; : : : ; xn,
and the assumed relation between the independent variables and the dependent
variable is
y = a0 + a1x1 +_+ anxn:
• Polynomial regression: There is only one continuous independent variable x and
the assumed
model is
y = a0 + a1x1 +_+ anxn:
• Logistic regression: The dependent variable is binary, that is, a variable which takes
only the values 0 and 1. The assumed model involves certain probability
distributions.
Based on the unlabeled dataset, the model predicts the output. Using unsupervised
learning, the model learns hidden patterns from the dataset by itself without any
supervision.
Unsupervised learning models are mainly used to perform three tasks, which are as
follows:
o Clustering
Clustering is an unsupervised learning technique that involves clustering or
groping the data points into different clusters based on similarities and
differences. The objects with the most similarities remain in the same group,
and they have no or very few similarities from other groups.
Clustering algorithms can be widely used in different tasks such as Image
segmentation, Statistical data analysis, Market segmentation, etc.
Some commonly used Clustering algorithms are K-means Clustering,
hierarchal Clustering, DBSCAN, etc.
o Dimensionality Reduction
AI programs trained with reinforcement learning beat human players in board games
like Go and chess, as well as video games.
…………………………………………………………………………………
Unlike unsupervised learning, which is useful only in a limited set of situations, SSL
works for a variety of problems from classification and regression to clustering and
association.
A semi-supervised learning approach uses small amounts of labeled data and also
large amounts of unlabeled data. This reduces expenses on manual annotation and
cuts data preparation time.