machine learning notes
machine learning notes
machine learning notes
Supervised learning: The learning algorithm is given labeled data and the
desired output. For example, pictures of dogs labeled “dog” will help the
algorithm identify the rules to classify pictures of dogs.
Unsupervised learning: The data given to the learning algorithm is
unlabeled, and the algorithm is asked to identify patterns in the input data. For
example, the recommendation system of an e-commerce website where the
learning algorithm discovers similar items often bought together.
Reinforcement learning: The algorithm interacts with a dynamic
environment that provides feedback in terms of rewards and punishments. For
example, self-driving cars being rewarded to stay on the road.1
Supervised Learning
Supervised learning do the work of function approximation, where basically we train an
algorithm and in the end of the process we pick the function that best describes the input
data, the one that for a given X makes the best estimation of y (X -> y). Most of the time
we are not able to figure out the true function that always make the correct predictions and
other reason is that the algorithm rely upon an assumption made by humans about how the
computer should learn and this assumptions introduce a bias, Bias is topic I’ll explain
in another post.
Here input dataset acts as a teacher where we feed the computer with training data
containing the input/predictors and we show it the correct answers (output or the label of
input predictors ). Form the training dataset, the model learns the mapping function between
the input predictors and the output variable.
Supervised learning algorithms try to model relationships and dependencies between the
target prediction output and the input features such that we can predict the output values for
new data based on those relationships which it learned from the previous data sets.
Supervised learning based models are the predictive models that predict either the value of a
continuous variable ( like temperature, stock price etc) which we calls regression, another is
the prediction of class ( like input image is of dog or cat) which we call classification models.
.
In the case of semi-supervised learning algorithms, some of the training examples are missing
training labels, but they can nevertheless be used to improve the quality of a model. In weakly
supervised learning, the training labels are noisy, limited, or imprecise; however, these labels are
often cheaper to obtain, resulting in larger effective training sets.
Unsupervised learning
Unsupervised learning algorithms take a set of data that contains only inputs, and find structure
in the data, like grouping or clustering of data points. The algorithms, therefore, learn from test
data that has not been labeled, classified or categorized. Instead of responding to feedback,
unsupervised learning algorithms identify commonalities in the data and react based on the
presence or absence of such commonalities in each new piece of data. A central application of
unsupervised learning is in the field of density estimation in statistics, though unsupervised
learning encompasses other domains involving summarizing and explaining data features.
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to one or more predesignated criteria,
while observations drawn from different clusters are dissimilar. Different clustering techniques
make different assumptions on the structure of the data, often defined by some similarity
metric and evaluated, for example, by internal compactness, or the similarity between members
of the same cluster, and separation, the difference between clusters. Other methods are based
on estimated density and graph connectivity.
Semi-supervised learning
Semi-supervised learning falls between unsupervised learning (without any labeled training data)
and supervised learning (with completely labeled training data). Many machine-learning
researchers have found that unlabeled data, when used in conjunction with a small amount of
labeled data, can produce a considerable improvement in learning accuracy.
Reinforcement learning
Reinforcement learning is an area of machine learning concerned with how software
agents ought to take actions in an environment so as to maximize some notion of cumulative
reward. Due to its generality, the field is studied in many other disciplines, such as game
theory, control theory, operations research, information theory, simulation-based
optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In
machine learning, the environment is typically represented as a Markov Decision
Process (MDP). Many reinforcement learning algorithms use dynamic
programming techniques. Reinforcement learning algorithms do not assume knowledge of an
exact mathematical model of the MDP, and are used when exact models are infeasible.
Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game
against a human opponent.
Application of Machine Learning:
Lack of Data : Many machine learning algorithms require large amounts of data
before they begin to give useful results. A good example of this is a neural
network. Neural networks are data-eating machines that require copious amounts
of training data. The larger the architecture, the more data is needed to produce
viable results. Reusing data is a bad idea, and data augmentation is useful to some
extent, but having more data is always the preferred solution. If you can get the
data, then use it.
Lack of Good Data: Despite the appearance, this is not the same as the above
comment. Let’s imagine you think you can cheat by generating ten thousand fake
data points to put in your neural network. What happens when you put it in?
1. It will train itself, and then when you come to test it on an unseen data set, it
will not perform well. You had the data but the quality of the data was not up
to scratch.
2. In the same way that having a lack of good features can cause your algorithm
to perform poorly, having a lack of good ground truth data can also limit the
capabilities of your model. No company is going to implement a machine
learning model that performs worse than human-level error.
3. Similarly, applying a model that was trained on a set of data in one situation
may not necessarily apply as well to a second situation. The best example of
this I have found so far is in breast cancer prediction.
4. Mammography databases have a lot of images in them, but they suffer from
one problem that has caused significant issues in recent years — almost all of
the x-rays are from white women. This may not sound like a big deal, but
actually, black women have been shown to be 42 percent more likely to die
from breast cancer due to a wide range of factors that may include
differences in detection and access to health care. Thus, training an algorithm
primarily on white women adversely impacts black women in this case.
5. What is needed in this specific case is a larger number of x-rays of black
patients in the training database, more features relevant to the cause of this 42
percent increased likelihood, and for the algorithm to be more equitable by
stratifying the dataset along the relevant axes.
Data Augmentation
Data augmentation is a method by which you can virtually increase the number of samples in
your dataset using data you already have. For image augmentation, it can be achieved
by performing geometric transformations, changes to color, brightness, contrast or by adding
some noise. Currently there are ongoing studies on interesting new methods in data
augmentation using Generative Adversarial Networks or by pairing samples.
Position augmentation
Scaling
Cropping
Flipping
Padding
Rotation
Translation
Affine transformation
Color augmentation
Brightness
Contrast
Saturation
Hue
Scaling
In scaling or resizing, the image is resized to the given size e.g. the width of the image can be
doubled.
Cropping
In cropping, a portion of the image is selected e.g. in the given example the center cropped image is
returned
Flipping
In flipping, the image is flipped horizontally or vertically.
Padding
In padding, the image is padded with a given value on all sides.
Rotation
The image is rotated randomly in rotation.
Translation
In translation, the image is moved either along the x-axis or y-axis.
Color augmentation
Color augmentation or color jittering deals with altering the color properties of an image by changing
its pixel values.
Brightness
One way to augment is to change the brightness of the image. The resultant image becomes darker
or lighter compared to the original one.
Contrast
The contrast is defined as the degree of separation between the darkest and brightest areas of an
image. The contrast of the image can also be changed.
Saturation
Saturation is the separation between colors of an image.
Hue
Hue can be described of as the shade of the colors in an image
Topic: Eigen vector and Eigen value
Eigenvectors and eigenvalues have many important applications in computer vision and machine
learning in general. Well known examples are PCA (Principal Component Analysis) for
dimensionality reduction or EigenFaces for face recognition. An interesting use of eigenvectors and
eigenvalues is also illustrated in my post about error ellipses. Furthermore, eigendecomposition
forms the base of the geometric interpretation of covariance matrices, discussed in an more recent
post. In this article, I will provide a gentle introduction into this mathematical concept, and will show
how to manually obtain the eigendecomposition of a 2D square matrix.
Eigenvectors (red) do not change direction when a linear transformation (e.g. scaling) is applied to
them. Other vectors (yellow) do.
The transformation in this case is a simple scaling with factor 2 in the horizontal direction and factor
0.5 in the vertical direction, such that the transformation matrix is defined as:
In general, the eigenvector of a matrix is the vector for which the following holds:
where is a scalar value called the ‘eigenvalue’. This means that the linear transformation on
vector is completely defined by .
However, assuming that is not the null-vector, equation (2) can only be defined if is
not invertible. If a square matrix is not invertible, that means that its determinant must equal zero.
Therefore, to find the eigenvectors of , we simply have to solve the following equation:
In the following sections we will determine the eigenvectors and eigenvalues of a matrix , by
solving equation (3). Matrix in this example, is defined by:
(6)
Since the discriminant is strictly positive, this means that two different values for exist:
We have now determined the two eigenvalues and . Note that a square matrix of
size always has exactly eigenvalues, each with a corresponding eigenvector. The
eigenvalue specifies the size of the eigenvector.
We first do this for eigenvalue , in order to find the corresponding first eigenvector:
Since this is simply the matrix notation for a system of equations, we can write it in its equivalent
form:
Since an eigenvector simply represents an orientation (the corresponding eigenvalue represents the
magnitude), all scalar multiples of the eigenvector are vectors that are parallel to this eigenvector,
and are therefore equivalent (If we would normalize the vectors, they would all be equal). Thus,
instead of further solving the above system of equations, we can freely chose a real value for
either or , and determine the other one by using equation (9).
The training dataset will be divided into two sections, one is the set of independent
variables(set of independent features) and another is the dependent variable which
is to be predicted. For example: In dataset of mobile price prediction, the input
dataset will be divided, input feature set ( CPU speed, ram, pixels for camera,
battery ) and output feature will be price which is dependent on the input feature
set.
Error
Function
4. Update M and C
5. m=m-(learning rate*Gm)
6. b= b-(Learning rate*Gb)
Statistic
The entire subject of statistics is based around the idea that you have this big set of data, and you
want to analyse that set in terms of the relationships between the individual points in that data
set. I am going to look at a few of the measures you can do on a set of data, and what they tell
you about the data itself.
where:
xi=Value of the ith point in the data set
x=The mean value of the data set
n=The number of data points in the data set
The variance helps determine the data's spread size when compared to the mean value. As the
variance gets bigger, more variation in data values occurs, and there may be a larger gap between
one data value and another. If the data values are all close together, the variance will be smaller.
This is more difficult to grasp than are standard deviations, however, because variances represent
a squared result that may not be meaningfully expressed on the same graph as the original
dataset.
Standard deviations are usually easier to picture and apply. The standard deviation is expressed
in the same unit of measurement as the data, which isn't necessarily the case with the variance.
Using the standard deviation, statisticians may determine if the data has a normal curve or other
mathematical relationship. If the data behaves in a normal curve, then 68% of the data points will
fall within one standard deviation of the average, or mean data point. Bigger variances cause
more data points to fall outside the standard deviation. Smaller variances result in more data that
is close to average.
A Big Drawback
The biggest drawback of using standard deviation is that it can be impacted by outliers and
extreme values. Standard deviation assumes a normal distribution and calculates all uncertainty
as risk, even when it’s in the investor's favor—such as above average returns.
1.2 Variance:
1.3 Covariance:
In mathematics and statistics, covariance is a measure of the relationship between two random
variables. The metric evaluates how much – to what extent – the variables change together. In
other words, it is essentially a measure of the variance between two variables (note that the
variance of one variable equals the variance of the other variable). However, the metric does not
assess the dependency between variables.
Unlike the correlation coefficient, covariance is measured in units. The units are computed by
multiplying the units of the two variables. The variance can take any positive or negative values.
The values are interpreted as follows:
Positive covariance: Indicates that two variables tend to move in the same direction.
Negative covariance: Reveals that two variables tend to move in inverse directions.
Where:
Covariance and correlation both primarily assess the relationship between variables. The closest
analogy to the relationship between them is the relationship between the variance and standard
deviation.
Covariance measures the total variation of two random variables from their expected values.
Using covariance, we can only gauge the direction of the relationship (whether the variables tend
to move in tandem or show an inverse relationship). However, it does not indicate the strength of
the relationship, nor the dependency between the variables.
On the other hand, correlation measures the strength of the relationship between variables.
Correlation is the scaled measure of covariance. It is dimensionless. In other words, the
correlation coefficient is always a pure value and not measured in any units.
The relationship between the two concepts can be expressed using the formula below:
Z Score:
In statistics, the standard score is the signed fractional number of standard deviations by
which the value of an observation or data point is above the mean value of what is being
observed or measured. Observed values above the mean have positive standard scores,
while values below the mean have negative standard scores.
It is calculated by subtracting the population mean from an individual raw score and then
dividing the difference by the population standard deviation. It is a dimensionless quantity.
This conversion process is called standardizing or normalizing (however, "normalizing" can
refer to many types of ratios; see normalization for more).
Standard scores are also called z-values, z-scores, normal scores, and standardized
variables. They are most frequently used to compare an observation to a
theoretical deviate, such as a standard normal deviate.
Computing a z-score requires knowing the mean and standard deviation of the complete
population to which a data point belongs; if one only has a sample of observations from the
population, then the analogous computation with sample mean and sample standard
deviation yields the t-statistic.
F1 Score:
Recall=TP/ (TP+FN)