0% found this document useful (0 votes)
13 views21 pages

ML 3

Machine learning

Uploaded by

Shagun Dhiman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views21 pages

ML 3

Machine learning

Uploaded by

Shagun Dhiman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

MACHINE LEARNING

Machine learning is a subset of AI, which enables the machine to


automatically learn from data, improve performance from past
experiences, and make predictions. Currently, it is being used for various tasks
such as image recognition, speech recognition, email filtering, Facebook auto-
tagging, recommender system, and many more.

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount of the data.

Need for Machine Learning


The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly. As a human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer systems and here comes the
machine learning to make things easy for us.

We can train machine learning algorithms by providing them the huge amount of data and
let them explore the data, construct the models, and predict the required output
automatically. The performance of the machine learning algorithm depends on the amount
of data, and it can be determined by the cost function. With the help of machine learning,
we can save both time and money.

The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and Amazon
have build machine learning models that are using a vast amount of data to analyze the user
interest and recommend product accordingly.
Following are some key points which show the importance of Machine Learning:

o Rapid increment in the production of data


o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

1) Supervised Learning
Supervised learning is a type of machine learning method in which we
provide sample labeled data to the machine learning system in order to
train it, and on that basis, it predicts the output.

The system creates a model using labeled data to understand the


datasets and learn about each data, once the training and processing are
done then we test the model by providing a sample data to check whether
it is predicting the exact output or not.

The goal of supervised learning is to map input data with the output data.
The supervised learning is based on supervision, and it is the same as
when a student learns things in the supervision of the teacher. The
example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of


algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:

o Clustering
o Association

3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets
a reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it. The goal of an agent is to
get the most reward points, and hence, it improves its performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

Applications of Machine learning


Machine learning is a buzzword for today's technology, and it is growing very rapidly day by
day. We are using machine learning in our daily life even without knowing it such as Google
Maps, Google assistant, Alexa, etc. Below are some most trending real-world applications of
Machine Learning:

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a


photo with our Facebook friends, then we automatically get a tagging suggestion with
name, and the technology behind this is machine learning's face detection and recognition
algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the
voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for
some product on Amazon, then we started getting an advertisement for the same product
while internet surfing on the same browser and this is because of machine learning.

Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:


Whenever we receive a new email, it is filtered automatically as important, normal, and
spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning. Below are
some spam filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,


and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As
the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.

These assistant record our voice instructions, send it over the server on a cloud, and decode
it using ML algorithms and act accordingly.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that
a fraudulent transaction can take place such as fake accounts, fake ids, and steal money in
the middle of a transaction. So to detect this, Feed Forward Neural network helps us by
checking whether it is a genuine transaction or a fraud transaction.

For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round. For each genuine transaction, there is a specific
pattern which gets change for the fraud transaction hence, it detects it and makes our
online transactions more secure.

9. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock market, there is always
a risk of up and downs in shares, so for this machine learning's long short term memory
neural network is used for the prediction of stock market trends.
10. Medical Diagnosis:

In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

11. Automatic Language Translation:

Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, as for this also machine learning helps us by converting the text into our
known languages. Google's GNMT (Google Neural Machine Translation) provide this feature,
which is a Neural Machine Learning that translates the text into our familiar language, and it
called as automatic translation.

The technology behind the automatic translation is a sequence-to-sequence learning


algorithm, which is used with image recognition and translates the text from one language
to another language.

Machine learning Life cycle


Machine learning life cycle is a cyclic process to build an efficient machine learning project.
The main purpose of the life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:

o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment

1. Gathering Data:

Data Gathering is the first step of the machine learning life cycle. The goal of this step is to
identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the most
important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will be
the prediction.

This step includes the below tasks:

o Identify various data sources


o Collect data
o Integrate the data obtained from different sources

By performing the above task, we get a coherent set of data, also called as a dataset. It will
be used in further steps.

2. Data preparation

After collecting the data, we need to prepare it for further steps. Data preparation is a step
where we put our data into a suitable place and prepare it to use in our machine learning
training.

In this step, first, we put all data together, and then randomize the ordering of data.

This step can be further divided into two processes:

o Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is pre-processing of data for its analysis.

3. Data Wrangling

Data wrangling is the process of cleaning and converting raw data into a useable format. It is
the process of cleaning the data, selecting the variable to use, and transforming the data in
a proper format to make it more suitable for analysis in the next step. It is one of the most
important steps of the complete process. Cleaning of data is required to address the quality
issues.

It is not necessary that data we have collected is always of our use as some of the data may
not be useful. In real-world applications, collected data may have various issues, including:

o Missing Values
o Duplicate data
o Invalid data
o Noise

So, we use various filtering techniques to clean the data.

It is mandatory to detect and remove the above issues because it can negatively affect the
quality of the outcome.

4. Data Analysis

Now the cleaned and prepared data is passed on to the analysis step. This step involves:

o Selection of analytical techniques


o Building models
o Review the result

The aim of this step is to build a machine learning model to analyse the data using various
analytical techniques and review the outcome. It starts with the determination of the type
of the problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build the model using
prepared data, and evaluate the model.

Hence, in this step, we take the data and use machine learning algorithms to build the
model.

5. Train Model

Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.

We use datasets to train the model using various machine learning algorithms. Training a
model is required so that it can understand the various patterns, rules, and, features.

6. Test Model

Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset to it.

Testing the model determines the percentage accuracy of the model as per the requirement
of project or problem.

7. Deployment

The last step of machine learning life cycle is deployment, where we deploy the model in the
real-world system.
If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying the
project, we will check whether it is improving its performance using available data or not.
The deployment phase is similar to making the final report for a project.

Supervised Machine Learning


Supervised learning is the types of machine learning in which machines
are trained using well "labelled" training data, and on basis of that data,
machines predict the output. The labelled data means some input data is
already tagged with the correct output.

How Supervised Learning Works?


In supervised learning, models are trained using labelled dataset, where
the model learns about each type of data. Once the training process is
completed, the model is tested on the basis of test data (a subset of the
training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below


example and diagram:

Types of supervised Machine learning Algorithms:


Supervised learning can be further divided into two types of problems:
1. Regression

Regression algorithms are used if there is a relationship between the input


variable and the output variable. It is used for the prediction of continuous
variables, such as Weather forecasting, Market Trends, etc. Below are
some popular Regression algorithms which come under supervised
learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical,


which means there are two classes such as Yes-No, Male-Female, True-
false, etc.

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Advantages of Supervised learning:

o With the help of supervised learning, the model can predict the output on
the basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of
objects.
o Supervised learning model helps us to solve various real-world problems
such as fraud detection, spam filtering, etc.
Disadvantages of supervised learning:

o Supervised learning models are not suitable for handling the complex
tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of
object.

Regression:
Linear Regression in Machine Learning
Linear regression algorithm shows a linear relationship between a
dependent (y) and one or more independent (y) variables, hence called as
linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.

The linear regression model provides a sloped straight line representing


the relationship between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,

Y= Dependent Variable (Target Variable)


X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear
Regression model representation.

Types of Linear Regression


Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called
Simple Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.

Linear Regression Line


A linear line showing the relationship between the dependent and
independent variables is called a regression line. A regression line can
show two types of relationship:

o Positive Linear Relationship:


If the dependent variable increases on the Y-axis and independent
variable increases on X-axis, then such a relationship is termed as a
Positive linear relationship.
o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent
variable increases on the X-axis, then such a relationship is called a
negative linear relationship.
Simple Linear Regression Model:
The Simple Linear Regression model can be represented using the below
equation:

y= a0+a1x+ ε

Where,

a0= It is the intercept of the Regression line (can be obtained putting


x=0)
a1= It is the slope of the regression line, which tells whether the line is
increasing or decreasing.
ε = The error term. (For a good model it will be negligible)

Multiple Linear Regression


Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship between a single dependent continuous variable and more than one independent
variable.

In Simple Linear Regression, where a single Independent/Predictor(X)


variable is used to model the response variable (Y). But there may be
various cases in which the response variable is affected by more than one
predictor variable; for such cases, the Multiple Linear Regression
algorithm is used.

Moreover, Multiple Linear Regression is an extension of Simple Linear


regression as it takes more than one predictor variable to predict the
response variable. We can define it as:

Example:

Prediction of CO2 emission based on engine size and number of cylinders


in a car.

Some key points about MLR:


o For MLR, the dependent or target variable(Y) must be the continuous/real,
but the predictor or independent variable may be of continuous or
categorical form.
o Each feature variable must model the linear relationship with the
dependent variable.
o MLR tries to fit a regression line through a multidimensional space of data-
points.

MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear
combination of multiple predictor variables x 1, x2, x3, ...,xn. Since it is an
enhancement of Simple Linear Regression, so the same is applied for the
multiple linear regression equation, the equation becomes:

1. Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub
>2</sub>x<sub>2</sub>+ b<sub>3</sub>x<sub>3</
sub>+...... bnxn ............... (a)

Where,

Y= Output/Response variable

b0, b1, b2, b3 , bn....= Coefficients of the model.

x1, x2, x3, x4,...= Various Independent/feature variable

Assumptions for Multiple Linear Regression:

o A linear relationship should exist between the Target and predictor


variables.
o The regression residuals must be normally distributed.
o MLR assumes little or no multicollinearity (correlation between the
independent variable) in data.

Applications of Multiple Linear Regression:


There are mainly two applications of Multiple Linear Regression:

o Effectiveness of Independent variable on prediction:


o Predicting the impact of changes:
What is Sum of Squares?
Sum of squares (SS) is a statistical tool that is used to identify the dispersion of
data as well as how well the data can fit the model in regression analysis. The
sum of squares got its name because it is calculated by finding the sum of the
squared differences.

The sum of squares is one of the most important outputs in regression analysis.
The general rule is that a smaller sum of squares indicates a better model, as
there is less variation in the data.

In finance, understanding the sum of squares is important because linear


regression models are widely used in both theoretical and practical finance.

Types of Sum of Squares

In regression analysis, the three main types of sum of squares are the total sum
of squares, regression sum of squares, and residual sum of squares.

1. Total sum of squares

The total sum of squares is a variation of the values of a dependent


variable from the sample mean of the dependent variable. Essentially, the total
sum of squares quantifies the total variation in a sample. It can be determined
using the following formula:

Where:

 yi – the value in a sample


 ȳ – the mean value of a sample
2. Regression sum of squares (also known as the sum of
squares due to regression or explained sum of squares)

The regression sum of squares describes how well a regression model represents
the modeled data. A higher regression sum of squares indicates that the model
does not fit the data well.

The formula for calculating the regression sum of squares is:

Where:

 ŷi – the value estimated by the regression line


 ȳ – the mean value of a sample

3. Residual sum of squares (also known as the sum of squared


errors of prediction)

The residual sum of squares essentially measures the variation of modeling


errors. In other words, it depicts how the variation in the dependent variable in a
regression model cannot be explained by the model. Generally, a lower residual
sum of squares indicates that the regression model can better explain the data,
while a higher residual sum of squares indicates that the model poorly explains
the data.

The residual sum of squares can be found using the formula below:

Where:

 yi – the observed value


 ŷi – the value estimated by the regression line

The relationship between the three types of sum of squares can be summarized
by the following equation:

What Is the Sum of Squares?


The term sum of squares refers to a statistical technique used
in regression analysis to determine the dispersion of data points. The sum
of squares can be used to find the function that best fits by varying the
least from the data. In a regression analysis, the goal is to determine how
well a data series can be fitted to a function that might help to explain how
the data series was generated. The sum of squares can be used in the
financial world to determine the variance in asset values.

KEY TAKEAWAYS

 The sum of squares measures the deviation of data points away


from the mean value.
 A higher sum of squares indicates higher variability while a lower
result indicates low variability from the mean.
 To calculate the sum of squares, subtract the data points from the
mean, square the differences, and add them together.
 There are three types of sum of squares: total, residual, and
regressive.
 Investors can use the sum of squares to help make better decisions
about their investments.

Sum of Squares Formula


The following is the formula for the total sum of squares.

For a set � of � items:Sum of squares=∑�=0�(��−�‾)2whe


re:��=The ��ℎ item in the set�‾=The mean of all items in th
e set(��−�‾)=The deviation of each item from the meanFor a
set X of n items:Sum of squares=i=0∑n(Xi−X)2where:Xi=The ith
item in the setX=The mean of all items in the set(Xi−X)=The
deviation of each item from the mean

Understanding the Sum of Squares


The sum of squares is a statistical measure of deviation from the mean. It
is also known as variation. It is calculated by adding together the squared
differences of each data point. To determine the sum of squares, square
the distance between each data point and the line of best fit, then add
them together. The line of best fit will minimize this value.

A low sum of squares indicates little variation between data sets while a
higher one indicates more variation. Variation refers to the difference of
each data set from the mean. You can visualize this in a chart. If the line
doesn't pass through all the data points, then there is some unexplained
variability. We go into a little more detail about this in the next section
below.

Analysts and investors can use the sum of squares to make better
decisions about their investments. Keep in mind, though that using it
means you're making assumptions about using past performance. For
instance, this measure can help you determine the level of volatility in a
stock's price or how the share prices of two companies compare.

Let's say an analyst who wants to know whether Microsoft (MSFT) share
prices move in tandem with those of Apple (AAPL) can list out the daily
prices for both stocks for a certain period (say one, two, or 10 years) and
create a linear model or a chart. If the relationship between both variables
(i.e., the price of AAPL and MSFT) is not a straight line, then there are
variations in the data set that must be scrutinized.

Variation is a statistical measure that is calculated or measured by using


squared differences.

How to Calculate the Sum of Squares


You can see why the measurement is called the sum of squared
deviations, or the sum of squares for short. You can use the following
steps to calculate the sum of squares:

1. Gather all the data points.


2. Determine the mean/average
3. Subtract the mean/average from each individual data point.
4. Square each total from Step 3.
5. Add up the figures from Step 4.

In statistics, it is the average of a set of numbers, which is calculated by


adding the values in the data set together and dividing by the number of
values. But knowing the mean may not be enough to determine the sum of
squares. As such, it helps to know the variation in a set of measurements.
How far individual values are from the mean may provide insight into how
fit the observations or values are to the regression model that is created.

Types of Sum of Squares


The formula we highlighted earlier is used to calculate the total sum of
squares. The total sum of squares is used to arrive at other types. The
following are the other types of sum of squares.
Residual Sum of Squares

As noted above, if the line in the linear model created does not pass
through all the measurements of value, then some of the variability that
has been observed in the share prices is unexplained. The sum of squares
is used to calculate whether a linear relationship exists between two
variables, and any unexplained variability is referred to as the residual sum
of squares.

The RSS allows you to determine the amount of error left between a
regression function and the data set after the model has been run. You
can interpret a smaller RSS figure as a regression function that is well-fit to
the data while the opposite is true of a larger RSS figure.

Here is the formula for calculating the residual sum of squares:

SSE=∑�=1�(��−�^�)2where:��=Observed value�^�=
Value estimated by regression lineSSE=i=1∑n(yi−y^i)2where:yi
=Observed valuey^i=Value estimated by regression line
Regression Sum of Squares

The regression sum of squares is used to denote the relationship between


the modeled data and a regression model. A regression model establishes
whether there is a relationship between one or multiple variables. Having a
low regression sum of squares indicates a better fit with the data. A higher
regression sum of squares, though, means the model and the data aren't a
good fit together.

Here is the formula for calculating the regression sum of squares:


SSR=∑�=1�(�^�−�ˉ)2where:�^�=Value estimated by regr
ession line�ˉ=Mean value of a sampleSSR=i=1∑n(y^i−yˉ
)2where:y^i=Value estimated by regression lineyˉ=Mean value of
a sample

Adding the sum of the deviations alone without squaring will result in a
number equal to or close to zero since the negative deviations will almost
perfectly offset the positive deviations. To get a more realistic number, the
sum of deviations must be squared. The sum of squares will always be a
positive number because the square of any number, whether positive or
negative, is always positive.
Limitations of Using the Sum of Squares
Making an investment decision on what stock to purchase requires many
more observations than the ones listed here. An analyst may have to work
with years of data to know with a higher certainty how high or low the
variability of an asset is. As more data points are added to the set, the sum
of squares becomes larger as the values will be more spread out.

The most widely used measurements of variation are the standard


deviation and variance. However, to calculate either of the two metrics, the
sum of squares must first be calculated. The variance is the average of the
sum of squares (i.e., the sum of squares divided by the number of
observations). The standard deviation is the square root of the variance.

There are two methods of regression analysis that use the sum of squares:
the linear least squares method and the non-linear least squares method.
The least squares method refers to the fact that the regression function
minimizes the sum of the squares of the variance from the actual data
points. In this way, it is possible to draw a function, which statistically
provides the best fit for the data. Note that a regression function can either
be linear (a straight line) or non-linear (a curving line).

Example of Sum of Squares


Let's use Microsoft as an example to show how you can arrive at the sum
of squares.

Using the steps listed above, we gather the data. So if we're looking at the
company's performance over a five-year period, we'll need the closing
prices for that time frame:

 $74.01
 $74.77
 $73.94
 $73.61
 $73.40

Now let's figure out the average price. The sum of the total prices is
$369.73 and the mean or average price is $369.73 ÷ 5 = $73.95.

Then, figure out the sum of squares, we find the difference of each price
from the average, square the differences, and add them together:

 SS = ($74.01 - $73.95)2 + ($74.77 - $73.95)2 + ($73.94 - $73.95)2 +


($73.61 - $73.95)2 + ($73.40 - $73.95)2
 SS = (0.06)2 + (0.82)2 + (-0.01)2 + (-0.34)2 + (-0.55)2
 SS = 1.0942

In the example above, 1.0942 shows that the variability in the stock price
of MSFT over five days is very low and investors looking to invest in stocks
characterized by price stability and low volatility may opt for MSFT.

How Do You Define the Sum of Squares?


The sum of squares is a form of regression analysis to determine the
variance from data points from the mean. If there is a low sum of squares,
it means there's low variation. A higher sum of squares indicates higher
variance. This can be used to help make more informed decisions by
determining investment volatility or to compare groups of investments with
one another.

How Do You Calculate the Sum of Squares?


In order to calculate the sum of squares, gather all your data points. Then
determine the mean or average by adding them all together and dividing
that figure by the total number of data points. Next, figure out the
differences between each data point and the mean. Then square those
differences and add them together to give you the sum of squares.

You might also like