18CSE397T - Computational Data Analysis Unit - 3: Session - 8: SLO - 1
18CSE397T - Computational Data Analysis Unit - 3: Session - 8: SLO - 1
Introduction
With the rise of COVID-19 cases, many people are not being able to seek
proper medical advice due to the shortage of both human and infrastructure
resources. As a result, we as engineers can contribute our bit to solve this
problem by providing a basic diagnosis to help in identifying the people
suffering from COVID-19. To help us we can make use of Machine Learning
algorithms to ease out this task, among which clustering algorithms come in
handy to use.
For this, we make two clusters based on the symptoms of the patients who are
COVID-19 positive or negative and then predict whether a new incoming
patient is suffering from COVID-19 or not by measuring the
similarity/dissimilarity of the observed symptoms (features) with that of the
infected person’s symptoms.
Similarity measure:
Dissimilarity measure:
1. It’s square and symmetric(AT= A for a square matrix A, where A T represents its
transpose).
2. The diagonals members are zero, meaning that zero is the measure of
dissimilarity between an element and itself.
Nominal attributes can have two or more different states e.g. an attribute
‘color’ can have values like ‘Red’, ‘Green’, ‘Yellow’, ‘Blue’, etc. Dissimilarity for
nominal attributes is calculated as the ratio of total number of mismatches
between two data points to the total number of attributes.
Let M be the total number of states of a nominal attribute. Then the states can
be numbered from 1 to M. However, the numbering does not denote any kind
of ordering and can not be used for any mathematical operations.
Let m be total number of matches between two-point attributes and p be total
number of attributes, then the dissimilarity can be calculated as,
d(i, j)=(p-m)/p
EXAMPLE,
Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades,
height {tall, medium, short}.
zif=(rif−1)/(Mf−1)
s(i, j)=1-d(i, j)
EXAMPLE,
Object ID Attribute
1 High
2 Low
3 Medium
4 High
In this example, we have four objects having ID from 1 to 4.
Now, we normalize the ranking in the range of 0 to 1 using the above formula.