0% found this document useful (0 votes)
6 views11 pages

M Learning

The document provides an overview of machine learning categories, focusing on supervised learning, unsupervised learning, and reinforcement learning, with an emphasis on supervised learning as the primary focus of the class. It explains key concepts such as classification, regression, and clustering, detailing their definitions and applications. Additionally, it outlines various clustering methods, including partitioning, density-based, distribution model-based, and hierarchical clustering.

Uploaded by

dishuu2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views11 pages

M Learning

The document provides an overview of machine learning categories, focusing on supervised learning, unsupervised learning, and reinforcement learning, with an emphasis on supervised learning as the primary focus of the class. It explains key concepts such as classification, regression, and clustering, detailing their definitions and applications. Additionally, it outlines various clustering methods, including partitioning, density-based, distribution model-based, and hierarchical clustering.

Uploaded by

dishuu2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its

performance at
tasks in T, as measured by P, improves with experience E.
— Tom Mitchell, Machine Learning Professor at Carnegie Mellon University

1.3 Overview of the Categories of Machine Learning


The three broad categories of machine learning are summarized in the following figure: Supervised learing, unsupervised learning, and reinforcement
learning. Note that in this class, we will primarily focus on supervised learning, which is the “most developed” branch of machine learning. While we will
also cover various unsupervised learning algorithms, reinforcement learning will be out of the scope of this class.

Labeled data
Supervised Learning Direct feedback
Predict outcome/future

No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data

Decision process
Reinforcement Learning Reward system
Learn series of actions

Figure 3: Categories of Machine Learning (Source: Raschka & Mirjalili: Python Machine Learning, 2nd Ed.)

1.3.1 Supervised Learning

Supervised learning is the subcategory of machine learning that focuses on learning a classification or regression model, that is, learning from labeled
training data (i.e., inputs that also contain the desired outputs or targets; basically, “examples” of what we want to predict).
Figure 4: Illustration of a binary classification problem (plus, minus) and two feature variable (x1 and x2). (Source: Raschka & Mirjalili: Python Machine Learning, 2nd Ed.).
y

Figure 5: Illustration of a linear regression model with one feature variable (x1) and the target variable y. The dashed-line indicates the functional form of the linear
regression model. (Source: Raschka & Mirjalili: Python Machine Learning, 2nd Ed.).

1.3.2 Unsupervised learning

In contrast to supervised learning, unsupervised learning is a branch of machine learning that is concerned with unlabeled data. Common tasks in
unsupervised learning are clustering analysis (assigning group memberships) and dimensionality reduction (compressing data onto a lower-dimensional
subspace or manifold).
x2

x1
Figure 6: Illustration of clustering, where the dashed lines indicate potential group membership assignments of unlabeled data points. (Source: Raschka & Mirjalili: Python
Machine Learning, 2nd Ed.).

1.3.3 Reinforcement learning

Reinforcement is the process of learning from rewards while performing a series of actions. In reinforcement learning, we do not tell the learner or agent,
for example, a (ro)bot, which action to take but merely assign a reward to each action and/or the overall outcome. Instead of having “correct/false” label
for each step, the learner must discover or learn a behavior that maximizes the reward for a series of actions. In that sense, it is not a supervised setting
and somewhat related to unsupervised learning; however, reinforcement learning really is its own category of machine learning. Reinforcement learning
will not be covered further in this class.
Typical applications of reinforcement learning involve playing games (chess, Go, Atari video games) and some form of robots, e.g., drones, warehouse
robots, and more recently selfdriving cars.

Environment
Reward
State
Action

Agent

Figure 7: Illustration of reinforcement learning (Source: Raschka & Mirjalili: Python Machine Learning, 2nd Ed.).

1.3.4 Semi-supervised learning

Losely speaking, semi-supervised learning can be described as a mix between supervised and unsupervised learning. In semi-supervised learning tasks,
some training examples contain outputs, but some do not. We then use the labeled training subset to label the unlabeled portion of the training set, which
we then also utilize for model training.

What is Classification?

Classification tasks fall into a category of problems called “supervised learning.” Such problems involve developing models that learn
from historical data to make predictions on new instances.

More formally, supervised learning tasks are problems in which a function is learned to map an input to an output based on example
input-output pairs. Thus, the objective is to approximate the mapping function (f) from the inputs (X) to the output (y)
What is Clustering ?
The task of grouping data points based on their similarity with each other is called Clustering or Cluster Analysis. This method is
defined under the branch of Unsupervised Learning, which aims at gaining insights from unlabelled data points, that is,
unlike supervised learning we don’t have a target variable.
Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset. It evaluates the similarity based on a
metric like Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group the points with highest similarity score
together.

What is regression in machine learning?

Regression in machine learning is a technique used to capture the relationships between independent and dependent variables, with the
main purpose of predicting an outcome. It involves training a set of algorithms to reveal patterns that characterize the distribution of
each data point. With patterns identified, the model can then make accurate predictions for new data points or input values.

Clustering in Machine Learning

Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset. It can be defined as "A way of
grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in
a group that has less or no similarities with another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color, behavior, etc., and divides them as per the
presence and absence of those similar patterns.
It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML system can use this id to simplify the
processing of large and complex datasets. The clustering technique is commonly used for statistical data analysis.

Example: Let's understand the clustering technique with the real-world example of Mall: When we visit any shopping mall, we can
observe that the things with similar usage are grouped together. Such as the t-shirts are grouped in one section, and trousers are at other
sections, similarly, at vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that we can easily find out
the things. The clustering technique also works in the same way. Other examples of clustering are grouping documents according to the
topic.

The clustering technique can be widely used in various tasks. Some most common uses of this technique are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.

Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (datapoint belongs to only one group) and Soft Clustering (data
points can belong to another group also). But there are also other various approaches of Clustering exist. Below are the main clustering
methods used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical groups. It is also known as the centroid-based method. The most
common example of partitioning clustering is the K-Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number of pre-defined groups. The cluster center
is created in such a way that the distance between the data points of one cluster is minimum as compared to another cluster centroid.
x2

x1

Density-Based Clustering

The density-based clustering method connects the highly-dense areas into clusters, and the arbitrarily shaped distributions are formed as
long as the dense region can be connected. This algorithm does it by identifying different clusters in the dataset and connects the areas of
high densities into clusters. The dense areas in data space are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying densities and high dimensions.
Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the probability of how a dataset belongs to a particular
distribution. The grouping is done by assuming some distributions commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that uses Gaussian Mixture Models (GMM).
Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no requirement of pre-specifying the
number of clusters to be created. In this technique, the dataset is divided into clusters to create a tree-like structure, which is also called
a dendrogram. The observations or any number of clusters can be selected by cutting the tree at the correct level. The most common
example of this method is the Agglomerative Hierarchical algorithm.

You might also like