Moving Points Algorithm
Moving Points Algorithm
January 2024
Abstract—This paper attempts to give the reader an under- ern classes of objects such as NumPy boosted processing speed
standing of existing simple classifiers introduced as long ago as of these classifiers, the base algorithms have some aspects
1943 along with their pros and cons. The aim is to perform which introduce redundancy or less than optimum magnitude
an analysis of the simplest of classifiers and go through with
an attempt to build a classifier that can classify data that is of updates to take place. Some classifiers which might perform
linearly separable while balancing accuracy and computational generally well in terms of speed and accuracy are also included
efficiency using geometrical ideas. We won’t be going into the in our study. Our aim is not to beat any classifier or family
depths of mechanics of some classifiers and their various differing of classifiers, rather to perform a comparison to establish the
versions, since our focus is not on their mechanics, rather we potential of extensive usage of geometrical ideas in machine
aim to understand the broader set of underlying concepts which
are used in these classifiers. Instead will look at a train of learning and classification problems.
successive classifiers closely related to each other, with each being
an improvement to its predecessor and also briefly touch upon II. E XISTING C LASSIFIERS
some present geometrical classifiers. Our ultimate aim being, to Machine Learning models can be broadly classified into two
deviate from the decades old track of statistical inference and types, namely Classifiers and Regressors. Linear classifiers
try to create a model based on geometry, starting of course with
small and easy problems. were the first step in development of Artificial Intelligence.
They gave a machine the ability to behave like a single neuron.
I. I NTRODUCTION Since 1943, from the inception of the first mathematical
The first mathematical model of a neuron was described model of a neuron we’ve come a long way with powerful
by Warren McCulloch and Walter Pitts in 1943. This was tools like Decision Trees, Logistic Regression and Support
followed by Frank Rosenblatt’s implementation of a Percep- Vector Machines. We will go through three of the foundational
tron. It was further improved upon by Prof. Bernard Widrow classifiers namely the McCulloch-Pitts neuron, Perceptron
and his student Ted Hoff with the introduction of Adaptive and ADALINE. After that we’ll take a look at an existing
Linear Neuron. This was also the time when geometrical geometrical classifier as well.
model K-Nearest Neighbors was introduced. Each of these
A. McCulloch-Pitts Mathematical Model of Neuron, 1943
implementations or models, though better than their predeces-
sor, still has some inherent flaws. The aim of this research is Warren McCulloch, a neuroscientist and Walter Pitts, a logi-
to understand and further avoid them while building a new cian together created the first mathematical model of a neuron
classifier. It is important to note that stating these models in 1943 [6]. The model consisted of two parts (or functions),
have inherent flaws by no means diminishes the importance which will be denoted as g(x) and f (x). The function g(x)
of these models as foundational to the study of neural activity, is an aggregator which takes a series of successive stimulus
machine learning and artificial intelligence. These models (or inputs) and calculates the net input. The series of stimulus
are far from obsolete and are still implemented in various can be represented as a matrix X with elements from set S ∈
advanced machine learning libraries like Scikit-Learn [7]. Our {0, 1}. The net input is a singular, real and measurable value
aim is to improve the computational performance of these which acts as the input y to the function f (y). The function
models while using a field of mathematics that is not generally f (y) is the decider or threshold function which governs the
associated with machine learning. This is an experiment to decision that whether the neuron will fire or not.
diverge from the branch of statistical inference and explore the In context of this paper, the expression will fire refers to
relationship between other fields of mathematics and machine the act of a neuron sending an electrical or electromechanical
learning. impulse upon receiving a set of stimuli in a given time frame,
whose aggregate value is greater than or equal to the threshold
A. Problem Statement value θ. The statement that a neuron will fire upon receiving an
The original implementations of machine learning models aggregate stimulus equal to the threshold value is subjective
had been computationally non-optimum. Though various mod- and can change depending on the model and differing use
The threshold function is defined as:
(
0 y<θ
f (y) = (2)
1 y≥θ
The M-P neuron was not a model created for implementa-
tion in electronic or electromechanical devices. It served the
purpose of providing an insight into how naturally occurring
neural networks work. Even if the model is implemented
using modern software then it would not be regarded as an
intelligent model since the object itself lacks a learning rule
Fig. 1. A simple representation of the mathematical process in an M-P neuron or mechanism. The threshold value θ needs to be calculated
manually for different use cases.
cases for which such a model may adapt not by itself but due B. Perceptron by Frank Rosenblatt, 1957
to the intervention of an outside human agent which is not Frank Rosenblatt was a project engineer at the Cornell
considered as to be a part of the system itself. This threshold Aeronautical Laboratory who worked alongside US Navy to
function is what gives the implementations of neuron up to develop the first electromechanical implementation of a neuron
Perceptron, their characteristic property of being an all-or- which he called as a Perceptron [8] or an object that could
none object. perceive. Rosenblatt wrote in 1958, “Yet we are about to
Reader should note that this does not summarize the original witness the birth of such a machine – a machine capable
paper of McCulloch and Pitts titled ’A Logical Calculus of of perceiving, recognizing and identifying its surroundings
Ideas Immanent in Nervous Activity’. It is in-fact a summary without any human training or control.” What Rosenblatt
of the inferences derived from the analysis of the original paper proposed was not limited to an algorithmic model as popular
presented for application in machine learning. This inference in modern machine learning practices, rather he described the
was arrived upon from the intensive discussion and general working a machine that is capable of learning.
acceptance of the all-or-none nature of biological neurons that A machine such as the one described by Rosenblatt would
gave way to the study of neurological systems as a sum of constitute of three systems;
different propositional logic. Interestingly, the original paper 1) The S-System(Sensory System)
did not endeavour to interpret neurological or physiological 2) The A-System(Association System)
mechanism of neuron for the purpose of direct implementation 3) The R-System(Response System)
in machine learning practices, rather it was aimed at deriving a
calculus that could consistently describe the action of a neuron
with respect to synapses and dendrites via the argument that
the nature of any compound statement can be derived from its
individual claims.
The M-P neuron does not have the ability to learn from
inputs given at a previous time. This is a direct result of the
fact that the authors did not describe any learning method
for their model of neuron. Such a neuron cannot undergo Fig. 2. The basic structure of a Perceptron from original paper of Rosenblatt
captioned, General Organization of the Perceptron
changes to stimuli received at a previous time, or in case of
machine learning, such a neuron cannot retain information
The sensory system acts as the input function, the association
that corresponds to an input received at a previous time. This
system is synonymous to the learning rule and the response
is where an M-P neuron differs from their natural counterparts;
system is the threshold function. The basic idea is that the
neurons present in sentient beings can undergo changes to
S-System can send excitatory(positive) or inhibitory(negative)
retain information that was associated with the stimuli it
impulses to the A-System which will then build an association
receives. The mechanism of that process is beyond the scope
between responses and stimulus. The notation As.r denotes the
of this research.
subset of the A units activated by stimulus s and response r.
The application of this model is highly limited and can Suppose stimulus S1 is presented and subsets A1.1 and A1.2
be described as follows- suppose we have a set of stimuli are activated, meaning two possible responses 1 and 2. The net
c1 , c2 , c3 . . . , cj where each has a value x1 , x2 , x3 . . . , xj . value of both the sets is compared and the one with greater net
Each of xi can take values from the set S ∈ {0, 1}. The value is considered dominant and the one with lesser value is
aggregator function g(x) is defined as: suppressed. The elements of the subset of correct correlation
j
will gather an increment to their value while the elements of
X suppressed subsets will remain unchanged. Since the value
g(x) = xi (1)
i=1
has been incremented, the next time stimulus S1 is presented,
the net input corresponding to the previously greater subset
would be even greater, thus ensuring a higher probability of
w ← w + η(o − y)x (3)
selection of the previously established correct response due to
the reinforcement that was done when the stimulus S1 was
The derivation of this rule can be done using least mean
presented at a previous time. The machine has associated the
squares to calculate the error and differentiating the expression
stimulus S1 with whichever the correct response was.
to perform regular gradient descent. The activation function is
It is important to note here that modern implementations of present in Perceptron as well. The only difference is that it is
Perceptron differ in the approach of performing updates. Most an identity function, meaning σ(x) = x.
of software based implementation use the threshold function
The benefit of using a continuous value for checking error
similarly but they focus on incorrect classifications. The idea
is that we can then incorporate a measure of degree of
is that if a stimulus(or data point) is incorrectly classified then
incorrectness when updating the weights and hence making
the update should attempt to move the model in the direction
the classifiers adaptive, as in the name. Thus, theoretically
of correctness. Instead of reinforcing correct classifications,
convergence would be faster. This however brings about a
they penalize incorrect classifications. The learning algorithm
disadvantage, since we are trying to fit multiple continuous
of perceptron is defined as:
values to a fixed value i.e. the class label, we cannot be sure
when the model has sufficiently fit the data. it comes from:
Theorem II.1. No two real and distinct numbers when mul-
tiplied by a non-zero real number shall yield results that are
equal.
Proof. To prove this, we will use contradiction. Let us assume
that we have two real and distinct numbers a and b and a non-
zero real number c and when a and b are multiplied with c,
they yield the same number. Then
a.c = b.c
or
Fig. 3. Perceptron algorithm, from, Convergence Proof for the Perceptron
Algorithm, Michael Collins a=b
The only aspect that slows down the performance of a Per- which contradicts the assumption that a and b are distinct.
ceptron is that it has to use the threshold function to determine Hence our theorem is true.
the update value which means that the updates will always
take place with fixed values. The degree of incorrectness is
not propagated to the neuron. This makes convergence slightly
slower.
Fig. 7. Model trains over a non-linearly separable dataset for 150 epochs
Fig. 6. Two moving points start moving close to each other as the model
trains.