0% found this document useful (0 votes)
34 views25 pages

A023119821006 Report

This document is a term paper report submitted by Diviit MG to the Department of Electronics and Communication Engineering at Amity School of Engineering and Technology. The paper discusses semi-supervised machine learning algorithms and was submitted in partial fulfillment of the requirements for a Bachelor of Technology degree in Artificial Intelligence. The paper was written under the guidance of Mr. Jitendra Singh Jadon and includes a declaration, certificate, and acknowledgements section.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views25 pages

A023119821006 Report

This document is a term paper report submitted by Diviit MG to the Department of Electronics and Communication Engineering at Amity School of Engineering and Technology. The paper discusses semi-supervised machine learning algorithms and was submitted in partial fulfillment of the requirements for a Bachelor of Technology degree in Artificial Intelligence. The paper was written under the guidance of Mr. Jitendra Singh Jadon and includes a declaration, certificate, and acknowledgements section.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

A Term Paper Report

On

SEMI SUPERVISED MACHINE LEARNING


ALGORITHMS
Submitted to
Department of Electronics and Communication Engineering
Amity School of Engineering and Technology

in partial fulfilment of the requirements for the award of the degree of

Bachelor of Technology
(Artificial Intelligence)

By

Diviit MG – A023119821006

under the guidance of

Mr. Jitendra Sigh Jodan

DEPARTMENT OF ARTIFICIAL INTELLIGENCE


AMITY SCHOOL OF ENGINEERING AND TECHNOLOGY
AMITY UNIVERSITY UTTAR PRADESH

i
JULY 2022

Declaration

I, Diviit MG student of B.Tech (AI), Batch:- 2021-25, Amity School of Engineering


and Technology, Amity University Uttar Pradesh, hereby declare that the term paper
titled “Semi Supervised Machine Learning Algorithm” which is submitted by me to the
Department of Artificial Intelligence, Amity School of Engineering and Technology,
Amity University Uttar Pradesh, Noida, in partial fulfilment of requirements for the
award of the degree of Bachelor of Technology (Artificial Intelligence), has not been
previously formed the basis for the award of any degree, diploma or other similar title
or recognition.

Name: Diviit MG
Enrolment No.: A023119821006
Date: 29-07-2022
Place: Noida

ii
Certificate

On the basis of declaration submitted by Diviit MG enrolment no. A023119821006


student of B.Tech. – Artificial Intelligence, I hereby certified that Term Paper titled
“Semi Supervised Machine Learning Algorithm” which is submitted to Department of
Artificial Intelligence, Amity School of Engineering & Technology, Amity University
Uttar Pradesh, Noida, in partial fulfilment of the requirement for the award of the degree
B.Tech. – Artificial Intelligence is an original contribution with existing knowledge and
faithful record of work carried out by him under my guidance and supervision.

To the best of my knowledge, this work has not been submitted in part or full for any
Degree or Diploma to this University or elsewhere.

Place: Noida
Date: 29-07-2022

Mr. Jitendra Singh Jadon


Department of AI, ASET, Amity
University Uttar Pradesh, Noida

iii
Declaration Form

I, Diviit MG student of Bachelor of Technology (AI), Batch: - 2021-25, Department of


Artificial Intelligence, Amity School of Engineering and Technology, Amity University Uttar
Pradesh, hereby declare that I have gone through term paper guidelines including policy on
health and safety, policy on plagiarism, etc.

Name: Diviit MG
Enrolment no.: - A023119821006
Date: 29-07-2022
Place: Noida

4
Acknowledgement

I, Diviit MG, am utilizing this chance to express my gratitude to everyone who guided me over
the span of this project. I am grateful for their help and guidance throughout the process which
enabled me to complete my research in an efficient way. First of all, I would like to thank our
faculty guide Mr. Jitendra Singh Jadon for his guidance, ideas and insights in developing this
project. Lastly, I would like to express thanks to our parents, faculties and friends who helped
us a lot in gathering different information and guiding us from time to time in preparing this
work.

Diviit MG
Bachelor of Technology (AI)
A023119821006
Amity School of Engineering and Technology
Amity University Uttar Pradesh

5
ABSTRACT

Artificial intelligence the rapid growing industry in the past decade gaining attention of most
of the people across the world. The machine learning is a method used to feed intelligence to
machines. A type of machine learning is semi supervised machine learning.
The objective of this paper is to explain semi supervised machine learning by diving into the
basics of semi supervised learning along with discussion of the important assumptions like
smoothness, low density, manifold assumptions. This paper will explain some of the regularly
used algorithms. The most important two sub-divisions under which most of the methods fall
are inductive and transductive. The methods dealing with wrapper’s algorithm as its base are
discussed in detail like self-training and co-training. Then boosting algorithms are discussed in
this paper following into the S3VM and graph-based methods. The basic knowledge regarding
each method can be gained through this paper.

6
INDEX
1. INTRODUCTION

2. SEMI SUPERVISED MACHINE LEARNING


a. Smoothness assumption
b. Low Density assumption
c. Manifold assumption
d. Cluster assumption

3. CLASSIFICATION OF SEMI SUPERVISED LEARNING METHOD


a. Inductive learning
b. Transductive learning
c. Inductive vs Transductive

4. WRAPPER METHOD

5. SELF TRAINING ALGORITHM

6. CO TRAINING ALGORITHM

7. BOOSTING
a. Adaptive boosting
b. Gradient boosting
c. Extreme gradient boosting

8. SUPPORT VECTOR MACHINE

9. GRAPH BASED METHODS


a. Min cut graph
b. Markov random field
c. Gaussian random field
d. Local global consistency

7
SEMI SUPERVISED MACHINE LEARNING ALGORITHMS

Introduction:

“The measure of intelligence is the ability to change”


~ Albert Einstein

Intelligence is the potential to gain and put forward the knowledge and skill that one has learned
in the different stages of life. For past couple of decade people have figured some ways to feed
the human intelligence to the machines. This means that the machines can think, operate on
their own and make their own decisions. The term used for this is “ARTIFICIAL
INTELLIGENCE “. One way of training the computer machines is through machine learning
algorithms and out of which specifically in this research semi supervised type of learning is
discussed. Feeding intelligence to machines was an idea that has been revolving all around the
history for a long time. But all though it was purely imaginative in early periods but with the
evolution of technology the imagination started to become a blooming truth. With the invention
of instruments like calculators, telephone and every other technology brought humans each
step closer to feeding intelligence to machines. But major breakthrough can be seen with the
introduction of computers. Computers enabled scientist to pursue the dream of creating
something revolutionary that is an entirely operatable brain made electronically. As technology
and time progressed the knowledge on artificial intelligence began to expand. Whatever was
just shown in movie started to become reality. Starting from the word “robot” introduction to
the world by Karel Capek scientist and researchers have come way forward by not only creating
robots not only that they gave proper justification for the meaning of the word. The usage of
artificial intelligence or the requirement of AI across all platforms are boosting rapidly. If we
take well know example that is the mobile which we use utilizes or rather depends on AI of the
system to do n number process like photo processing. Recently an artificial intelligence called
DALL-E2 was introduced which capable of drawing thumbnails all by itself. So, like this there
are many discoveries made and discoveries going to made in the field of artificial intelligence.
But rather than simply gaping at these amazing discoveries one should also be gaping at the
algorithms or process behind the scree like how the intelligence is being fed into the machine.
One of such methodology used for building the intelligence of the machine is called machine
learning. As time progressed not only people discover new technology but at the same the
researchers and scientist several different methods each standing for their individual
advantages for the intelligence development of a machine. Hence the devised method is called
algorithms and some of the important ones are discussed in this research that is mainly semi-
supervised machine learning algorithm.

8
MACHINE LEARNING

Machine learning is a sub-type of artificial intelligence that trains the machine based on a
past set of data. It enhances the intelligence so that the predictions given by the machines
become more accurate. To understand how exactly machine learning works, a day-to-day
example can be considered. A teacher teaches a child to identify the colour of the objects.
First, the teacher demonstrates the names of the different colours by pointing to the respective
objects that has a particular colour. Then, as the student learns to identify the colour with the
example demonstrated by the teacher, gradually the student will be asked to identify the
colour of different objects to test his/her understanding. Finally, the child will be able to
identify the object’s colour accurately. This exact same thing happens with machine learning.
First the machine will be trained with a set of sample data, and then after that, test data will
be introduced to check the accuracy of the predictions made.

The data available can be classified into two types:

 labelled data

 unlabelled data

So, a machine trains on labelled data, unlabelled data, or both. Based on this, machine
learning is classified into different types.

 supervised learning, which uses only the labelled data.

 unsupervised learning, which only uses unlabelled data.

 semi-supervised learning, which uses both labelled and unlabelled data.

SEMI SUPERVISED MACHINE LEARNING


Semi supervised learning is a type of machine learning that advances towards the use of both
labelled and unlabelled data set in processing. So, in most cases a small amount labelled data
is merged with large amount of unlabelled data. In general, the process of semi supervised
learning can be said in simple words as large unlabelled data grouped together and then further
these groups are labelled with the help of labelled data. The common ratio of labelled and
unlabelled used is 20:80. But since there is no rule that states the exact percentage of labelled
and unlabelled to be used for the process, semi supervised machine learning depends on certain
assumptions.

9
 SMOOTHNESS ASSUMPTION
The smoothness assumption predicts that the points closer to each has maximum
possibility to fall under the same label set. For example, let’s say a1 € X is a labelled
data and a2, a3 are two unlabelled data which are closely located to each other.
According to this assumption if a1 is nearer to a2 and obviously it will also be closer to
a3, hence they all will be grouped under the same label.

 LOW DENSITY ASSUMPTION


Low density assumption states that the decision boundary should pass through the low-
density region. Basically, it states that the boundary should be placed in an area with
few points. The main reason for this is to not violate the smoothness assumption
because when we consider high density areas the points over there will be closely
accumulated hence according to smoothness assumption points will mostly fall under
same label. But in the case of low-density region this problem is terminated and hence
the smoothness assumption beholds.

 MANIFOLD ASSUMPTION
A manifold is a topological space that is homoeomorphic function that is continuous
inverse function. Topological space is generally a set of points that follow set of rules.
In general, the input data will have higher dimension than the output. For example, if a
patient’s medical data has total of 25 parameters which could all map to a single output
whether the person is healthy or not. Hence, according to this assumption multiple
lower dimensional manifolds containing the data points are present in the input space
and the points under the same manifold are to have same label.

 CLUSTER ASSUMPTION
The dictionary meaning of cluster is to pool together similar things. This assumption
generally states that the data points pooled in the same cluster will fall under same label.
In each and every assumption discussed above the most common thing is that data
points are grouped together based on certain conditions like closeness of points in
similarity assumption and points under the same lower dimensional manifold in
manifold assumption. Hence, we can state that cluster assumption is more generalized
form of other assumptions. The similarity concept that is being chosen plays an
important role in the cluster formation. The strength of a semi supervised learning
method depends how relevantly the data points are clustered together.

The ultimate goal of semi-supervised learning is to squeeze out the unlabelled data in order to
form a more efficient way of learning method. The most difficult thing is to find a perfect semi
supervised method for a given condition since, it is not always that the unlabelled data contain
information that is used for label prediction apart from the one contained labelled data and also
that it is difficult to extract these data. Hence an algorithm should be devised in such way that
the useful information from the unlabelled data should be extracted in an efficient manner. For
the successful application of a method, it is more important that meaningful similarity is found

10
between the data points. This where the assumptions play a big role in devising the apt semi
supervised algorithm by grouping the data points according to the respective conditions.

CLASSIFICATION OF SEMI SUPPERVISED LEARNING METHOD


Semi supervised learning methods can be classified depending on the assumptions being used,
utilization of the unlabelled data, in relation with supervised learning algorithms and etc. The
main two hierarchy of semi supervised classification method is inductive and transductive
learning. They are classified based on the basis of how the data is being utilized.

INDUCTIVE LEARNING
It is a traditional learning method that uses sample data to train the model and the accuracy of
the predictions are drawn out by testing against a dataset that the model has never trained
before.
For example,
Let us take a problem where we have to classify a vehicle as car and bike separately. The green
ones represent the cars and the red ones correspond to the bikes.

The next step is to devise a model that trains on this data and later when introduced test sample
it can start predicting on its own.

11
Now when a test sample is introduced with the blue ones representing the test points and when
use the model to predict the data points whether it is car or bike the model gives the following
result.

(test data)

(predicted data)

TRANSDUCTIVE LEARNING
On the other hand, transductive learning trains on dataset that’s already been observed and
can’t operate on an unseen dataset. It is less ambitious when compared with inductive learning.
Most of the graph methods fall under transductive section. In this method the labels of the
training dataset are learned and after that it guesses the labels of the test sample.

For example,
Considering the same problem of separating the bikes and cars. Now this model will contain
both labelled and unlabelled data in training set itself. The red colour represents bikes, green
colour represents the cars and white colour represents the unlabelled data points. Under the
inductive learning process both sample and test datasets are used, whereas in the transductive

12
method a single model containing all datasets is formed. When the model is processed, we get
the output as following

(input data)

(predicted output)

INDUCTIVE LEARNING VS TRANSDUCTIVE LEARNING

 Inductive learning trains on the sample data and it can apply the knowledge obtained to
predict on the never seen datasets. But in the case of transductive learning both sample
and test data will be introduced in the training phase itself.
 Inductive learning is a predictive model. But on the other hand, tranductive learning is
more of a specific model and the algorithm is re-run when new data points are
encountered.
 Since we need to re-run the tranductive model each and every time new points are
encountered cost wise it is more when compared with the inductive learning

WRAPPERS METHOD

13
Wrapper method is one the well-established method being used for semi supervised machine
learning. In general, the whole process can be explained in two steps. The first step involves
with the training of the model with labelled dataset. The second one is the pseudo labelling step
in which the trained model forms cluster of unlabelled datasets and further these are given
labels thus making them pseudo labels. One of the major advantages with usage of wrapper’s
method is that it can utilize mostly all supervised based learners. A base learner is basically a
weak learner processed using sample data used for training which acts as a classifier and
provides a generalized prediction when pseudo label is passed. There are different algorithms
developed with keeping wrapper method as a base. These algorithms developed from wrapper
method are mainly subjected to
o Number of classifiers being used
o Presence of different types of classifiers
o Usage of single view or multi view data

SELF TRAINING ALOGIRTHM


Self-training algorithm works according to the following steps. First being the most common
procedure is that the model is trained using labelled datasets. The next involves with
introduction unlabelled dataset for the predictions. The third step involves in the filtration of
unlabelled data with highest possible predictions and then we term them as pseudo labels which
will be combined with labelled data. Then the process gets repeated from step two again and
again until no unlabelled data is left behind which means that the model is hundred percent
accurate.

Labelled data

Unlabelled data

Pseudo labelled data

Labelled data

Pseudo labelled data

Unlabelled data

Pseudo labelled data

Process gets repeated until all data is


being satisfied by the condition

14
The advancement in self-learning can be seen through the years after its introduction.
Yarowsky was the one who first proposed self-training as a methodology to approach the words
that have different meaning based on the context in which they are used, example whether the
word feet mean the part of the body or the measurement. Identification of the subjective nouns
was given by Riloff. Classifiers were put forward in order to club the dialogues on the basis of
emotional and non-emotional by Maeireizo. Self-training was also in the process of object
detection by Rosenburg. Self-learning is also used in image classification, machine translation
and in many other areas. Scientists are also analysing theories that tend to show some similarity
between self-learning and graph-based methods.
The self-learning algorithm requires to make vast multitude of decisions based on the pseudo
label selection, further re-use of the pseudo labelled data and end point for the process. The
pseudo label selection is of greater importance because the it decides the data that should be
present in training set. This pseudo label selection entirely depends upon the prediction made
which further is an important quality for the algorithm performance. From this it is also clear
that the algorithms that don’t have efficient probabilistic predictions must require some
modifications. This can be done in two ways, first using grafting and Laplace correction which
boost up the probability of the predictions. The second is that using distance-based measures
that is the predictions are based on the Mahalanobis distance difference between the unlabelled
data point and labelled data under a certain group.

CO TRAINING ALGORITHM
Co training uses multiple supervised classifiers and the main assumption is that features can be
divided into two datasets. These sub features are not dependent on each other. Hence in general,
these two classifiers are trained on labelled data along the two sets. Moving on to the second
step the unlabelled data are classified and there will be a mutual learning process between the
two classifiers. One classifier teaches the other one and vice versa with unlabelled examples
from each other until a strong confidence is built on the predictions. It is like how two friends
help each other during the preparation for exam like that the classifiers trade each other’s
datasets which benefits both of them.
One of the main advantages is the way in which the classifiers being used agree on much larger
labelled and unlabelled for processing which reduces the space size. The most important thing
for a greater efficiency of the co training algorithm is to have the two datasets (sub features) to
be adequately good and independent so that a high confidence is built between the two
classifiers. Even in the absence natural feature split artificial split can be created in order two
the two sets.
As a form of promoting the diversity of classifiers, the approaches of co training process were
mainly focused on the different views of data. These views can be of multiple different types.
Hence these co training methods involve in a broad class of multiple view learning methods.
But in most of the day-to-day conditions there are no distinct views based on the data. Hence
here is where we use single view co training methods. Apart from single and multiple view
methods there is also co regularization in which a single function is formed altogether by
clubbing the multiple classifiers.

15
BOOSTING
Boosting is a process that produces strong classifiers by combining weak classifiers. It is done
by first formulating model from training data. Another second model is formed which doesn’t
include the errors of the first model. This procedure is followed on until the training data is
predicted accurately or a certain required number of models are added together. Another
contrasting type ensemble classifier to the boosting is present which is called bagging. The
bagging on the other hand clubs the output of the base classifiers and together a final prediction
is formulated. But there some limitation in bagging method due to the independency in the base
learners and hence hundred percent semi supervised bagging method can be utilized for self-
learning and on the other hand co-training methods don’t use bootstrapping, a desired
characteristic for bagging. This where boosting holds the upper hand since it can be easily
adapted with any semi-supervised learning due to the introduction of the pseudo labelled data
in each step.
TYPES OF BOOSTING
 Adaptive boosting
o AdaBoost is one of the oldest boosting models formulated. From the name itself
it is clear that it has an adaptive self-correcting nature.
o The weight distribution is similar to each dataset. Then after in each cycle it
distributes more weight to the incorrect items. Thus, the process is continued
until an accurate predicted value is obtained.
o It is adaptive under all conditions but loses its value when it encounters high
dimensionality data.

 Gradient boosting
o Gradient boosting generates accurate results even from the starting than like
correcting the wrong ones like AdaBoost.
o Instead dividing more weight towards the incorrect models, it creates new base
learners which are more effective than the previous ones.
o The result obtained are more accurate

 Extreme gradient boosting


o It is actually a gradient boosting that actually performs on high-speed rates.
o Mainly used for big datasets since it can process extensively.
o Due to the usage of multiple cores the learning and training can occur side by
side

16
SUPPORT VECTOR MACHINE (SEMI SUPERVISED-S3VM)
Support vector machines are based on the maximum margin method. Maximum margin method
is based on increasing the distance between the decision boundary and data points. This method
correlates with the semi supervised low density assumption. These SVM are used in text
categorization like digit segregation. These also works very well on high dimensional data.
Support vector means that subset of training data is used to represent the decision boundary.
Semi supervised support vector machine adopts the supervised SVM but the difference is that
it needs to operate on both labelled and unlabelled data.
For S3VM the proportion of labelled data is small when compare to the large presence of
unlabelled data. So, in general we take the datasets and try to fit a hyperplane in order to
maximize the margin between the positive and negative samples in feature space. Here in the
below picture, it shown that the positive and negative samples are separated and the presence
of dotted lines and solid lines (hyperplane). The distance between the dotted line and solid line
called geometric margin

So, by the concept S3VM make to attempts to maximise the distance between the dotted line
and hyperplane, that is to increase the geometric margin.

THE IDEA OF S3VM

 First step deals with enumeration that is taking in the form power set. All possible
instance will be enumerated 2u and labelling of Xu happens.
 The next step involves in the building of standard support vector machine (only uses
labelled data) and we will try to get the label for training instances X L
 The SVM with large marginal boundary is to be chosen

LOSS FUNCTION USED SVM vs S3VM

 SVM
o SVM uses hinge loss function.

C (x, f(x), y) = max (1-y (ωT X + b), 0)

17
o The hinge loss graph that is represented like hinge that can move forward
direction

 S3VM

o S3VM the hinge loss is modified and the transformed loss is called hat loss.

C (x, f(x), 𝑦̂ ) = max (1- |ωT X + b| , 0)

o The equation is similar to that of hinge except in the place of y we have 𝑦̂ (y


predicted) and the presence of modulus. This is because unlike SVM, S3VM
uses unlabelled data hence predictions need to me made and based on those
predictions the machine needs to re-trained again in order to maximize the
margin. This the whole objective of the hat loss.

o The graph in this case is represented in a triangular shape. When the modulus is
removed we get ± and hence this what is represented but the two sides of
traingle in graph as positive and negative

18
One of the main assumption taken by S3VM is that the classes present (binary or multi
class) are seperated in a way that the decision boundary is placed in low density region
and doesn’t pass through the maximum dense region.

The left diagram shows clearly that S3VM is much closer to the decision boundary but
on the right hand side where the labelled data are located on the opposite sides its clearly
shown the S3VM is having a large gap in the unlabelled data which is not desired.

Some of the problem that needs attention is that the final predictions are imbalanced
when the input instances are not sorted properly so that the decision falls towards the

19
majority class. Optimization is difficult hence it is also diffuclt to remove the non-
convexity problem. This also corresponds to large run time due to less optimization.

There is alternative method which reverts the optimization problem which is SVM light.
This operates based on a mechanism called label switching. The proccess is that is sorts
the labelled and unalbelled data and there two loops present in the algorithm. The outer
loop helps in increasing th elarning rate. The inner loop does the labelling switching.
The labelling switching helps in reducing the hat loss function.

GRAPH BASED SEMI SUPERVISED LEARNING


Graph based semi supervised method is a type of transductive method-based algorithm which
provides the result for the datasets that’s already been introduced in the sample instances. the
graph-based method deals with in total three steps. The three steps being,
o Creation
o Weighting
o Interference
The first two steps together constitute the graph construction phase where the points are joined
with some similarity and weight matrix is formulated. After the construction graph label
prediction is the next step that takes place in two procedures. First one is dealing with the false
labels that doesn’t match the true label and the second one is dealing with distinction present
among the connected points. From this it is clear that the similarity between the points is of
utmost importance for the construction of the graph. So, this similarity obviously corresponds
to the similarity and manifold assumptions. The graph-based method is also related in this way
with “supervised-nearest-neighbour” method and hence this also called semi-supervised way
nearest neighbour method.

The smoothness and closeness of the points are represented as loss function and regularize.
l --- loss function of supervised labelled data
lu --- loss function of unsupervised data both datasets
so, in general graph methods work in a way that 𝑦̂ labelling reduces
𝑙 𝑛 𝑛
𝜆. 𝛴𝑖=𝑙 𝑙 ( 𝑦̂𝑖 , 𝑦𝑖 ) + 𝛴𝑖=1 𝛴𝑗=1 𝑤𝑖𝑗 . 𝑙𝑢 ( 𝑦̂𝑖 , 𝑦̂𝑗 )

𝜆---relative supervised term

Different graph-based model approaches are formulated based n the differences present in the
graph construction and the interference process. The historic way dealt with the improvement
of the interference phase with much lower attention given to construction phase. But later it

20
was proved that the performance of the classifier is deteriorated because of the presence of this
imbalance preference. The interference phase deals with 𝑦̂ predictions made over the Xu
(unlabelled datapoints). Optimization problem kicks in once the fixation is not done for the mis
matched labelled data with the true label.

MIN CUT GRAPH


Blum and Chawla proposed this graph-based method. In this method the k-nearest method
along with €-neighbourhood according to point distance < € are to be joined. After the
construction of the graph the optimization problem is dealt with the min-cut methodology. On
considering the binary case, positive sets denoted as sources where as negative ones as sink.
The principle of min-cut methodology to determine the which edge’s removal block travels
along the positive to negative that is from source towards sink. This as 𝜆 approaches infinity
and 𝑙𝑢 ( 𝑦̂𝑖 , 𝑦̂𝑗 ) = 1 𝑦̂𝑖 ≠ 𝑦̂𝑗 , 1 is an indicator function. Since it is binary case assume that,
2
1 𝑦̂𝑖≠ 𝑦̂𝑗 = ( 𝑦̂𝑖 − 𝑦̂𝑗 ) . Then the objective function is as follows,

𝑙 ( 𝑛 𝑛 2
𝜆. 𝛴𝑖=𝑙 𝑦̂𝑖 − 𝑦𝑖 )2 + 𝛴𝑖=1 𝛴𝑗=1 𝑤𝑖𝑗 . 𝑙𝑢 ( 𝑦̂𝑖 − 𝑦̂𝑗 )

The biggest problem regarding min-cut method is that there is a loss of confidence when it
comes to department of classification. It provides only hard classification.

BOLTZMAN MACHINES – Markov Random Field


Due to the disadvantage provided by min-cut method regarding the classification, there is a
need for some better method which should much more efficient than this. This gave rise to the
Markov random field view over the graph based semi supervised methods.
Hammersley-Clifford theorem states,
“Probability distribution P for random variables X 1…Xn corresponds to Markov random field
with existence of G, graph the join probability function P can be factorized over cliques of G”

1
𝑃 (𝑋 = 𝑥 ) = 𝜋 𝜓 (𝑥 )
𝑧 𝑐€𝐶𝐺 𝑐 𝑐
Where,
Z - normalization constant
CG - set of cliques
𝑥𝑐 - random variables
𝜓𝑐 – random variables

21
Hence with the help Hammersley-Clifford theorem the general formula of graph methods can
also be devised through Markov random field.

GAUSSIAN RANDOM FIELD


The lack of closed solution form in Markov random field is the main disadvantage. This gives
the concern to adopt a method that clears this problem. Gaussian distribution multi-variate can
be devised through usage of quadratic loss function along with real value predictions. Hence
the following observations can be made,

 Closed form solution is attained


 Gaussian probability distribution reduces the error of label prediction
 Hence these random fields are termed as gaussian-random field

Graph Laplacian can be given as L = D – W, with D representing degree matrix. With the
predictions showing harmonic function with the unlabelled datasets and true label condition
satisfied. With the label being predicted for the unlabelled data point this will be equal to
average predictions made for the neighbours,

Where, 𝑁𝑣𝑖 – represents the neighbouring nodes vi. Since the solution obtained are unique it is
easy to obtain label predictions since 𝑦̂𝑖 € [0,1] for every i.
Even before the introduction of Gaussian random field an algorithm called label propagation
which does the process by forcing the estimated labels to traverse towards the neighbour nodes
on the basis of edge weight. This can be given in matrix notation as

This algorithm generally contains two main steps,


 Transfer of each label towards neighbour nodes.
 Labelled data points are to be matched with true label

22
Label propagation method is also delivering the harmonic-function solution. Label propagation
method is a random walk approach method that has a transition matrix that stops when a
labelled node is approached. It resembles the Markov-random walk approach.

LOCAL-GLOBAL CONSISTANCY
Gaussian-random field approach has two major disadvantages.

 One being the attachment of true labels to the labelled points which gives raise to bad
handling of the label noise.
 Second is the irregularity caused in graphs due to the high degree node’s influence.
To resolve this a new method was devised by Zhou which is called local and global consistency.
This is based on the observations made on through the graph-based methods on the regularity
of labels over the manifolds-global and adjacent input space-local.

The first of the two problems are addressed dealing with squared error between the true label
and the predicted label. The second problem is resolved through the standardizing the penalty
over the unlabelled datasets on the basis of degrees of the node. One more important thing is
that the standardization follows by dragging the unlabelled data-points closer to zero.

The last can be expressed in matrix notation like that of min-cut method. The main difference
is that the normalized Laplacian-graph is used. This method uses the close form solution and
has great optimization. Apart from graph interface there some efficient methods introduced
for graph construction also.

23
CONCLUSION
This research paper is based on the semis supervised learning algorithms. The research started
with small introduction to the machine learning and further proceeded towards what is semi
supervised learning. Inductive and transductive types of semi supervised methods have been
discussed. Further proceeding deeper, the different algorithms types like the famous wrapper’s
method, co-training, self-training, S3VM, then at last graph-based methods and the graph
interference were discussed in details. The importance of the assumptions generated for the
success rate of semi supervised machine learning are discussed in detail which forms the basis
of most of the semi supervised based algorithm. New methods of for feeding intelligence to
machine is being devised continuously. The basic principle of each and every new method is
to overcome the problem faced by the previous method, which we have seen in most of the
cases discussed above. Therefore, the simple summary of this report is that semi supervised
learning is sub type of machine learning that efficiently utilizes both the labelled and unlabelled
for the predictions.

24
REFERNCES
 https://minds.wisconsin.edu/handle/1793/60444
 https://www.youtube.com/c/RanjiRaj18
 https://www.geeksforgeeks.org/ml-semi-supervised-learning/
 https://en.wikipedia.org/wiki/Semi-supervised_learning
 https://link.springer.com/article/10.1007/s10994-019-05855-6
 https://www.youtube.com/watch?v=65RV3O4UR3w

25

You might also like