ML 3
ML 3
We can train machine learning algorithms by providing them the huge amount of data and
let them explore the data, construct the models, and predict the required output
automatically. The performance of the machine learning algorithm depends on the amount
of data, and it can be determined by the cost function. With the help of machine learning,
we can save both time and money.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and Amazon
have build machine learning models that are using a vast amount of data to analyze the user
interest and recommend product accordingly.
Following are some key points which show the importance of Machine Learning:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we
provide sample labeled data to the machine learning system in order to
train it, and on that basis, it predicts the output.
The goal of supervised learning is to map input data with the output data.
The supervised learning is based on supervision, and it is the same as
when a student learns things in the supervision of the teacher. The
example of supervised learning is spam filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets
a reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it. The goal of an agent is to
get the most reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging suggestion:
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the
voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for
some product on Amazon, then we started getting an advertisement for the same product
while internet surfing on the same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As
the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.
These assistant record our voice instructions, send it over the server on a cloud, and decode
it using ML algorithms and act accordingly.
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that
a fraudulent transaction can take place such as fake accounts, fake ids, and steal money in
the middle of a transaction. So to detect this, Feed Forward Neural network helps us by
checking whether it is a genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round. For each genuine transaction, there is a specific
pattern which gets change for the fraud transaction hence, it detects it and makes our
online transactions more secure.
Machine learning is widely used in stock market trading. In the stock market, there is always
a risk of up and downs in shares, so for this machine learning's long short term memory
neural network is used for the prediction of stock market trends.
10. Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.
Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, as for this also machine learning helps us by converting the text into our
known languages. Google's GNMT (Google Neural Machine Translation) provide this feature,
which is a Neural Machine Learning that translates the text into our familiar language, and it
called as automatic translation.
Machine learning life cycle involves seven major steps, which are given below:
o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is to
identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the most
important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will be
the prediction.
By performing the above task, we get a coherent set of data, also called as a dataset. It will
be used in further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a step
where we put our data into a suitable place and prepare it to use in our machine learning
training.
In this step, first, we put all data together, and then randomize the ordering of data.
o Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is pre-processing of data for its analysis.
3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable format. It is
the process of cleaning the data, selecting the variable to use, and transforming the data in
a proper format to make it more suitable for analysis in the next step. It is one of the most
important steps of the complete process. Cleaning of data is required to address the quality
issues.
It is not necessary that data we have collected is always of our use as some of the data may
not be useful. In real-world applications, collected data may have various issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
It is mandatory to detect and remove the above issues because it can negatively affect the
quality of the outcome.
4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step involves:
The aim of this step is to build a machine learning model to analyse the data using various
analytical techniques and review the outcome. It starts with the determination of the type
of the problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build the model using
prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build the
model.
5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.
We use datasets to train the model using various machine learning algorithms. Training a
model is required so that it can understand the various patterns, rules, and, features.
6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as per the requirement
of project or problem.
7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the model in the
real-world system.
If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying the
project, we will check whether it is improving its performance using available data or not.
The deployment phase is similar to making the final report for a project.
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
o With the help of supervised learning, the model can predict the output on
the basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of
objects.
o Supervised learning model helps us to solve various real-world problems
such as fraud detection, spam filtering, etc.
Disadvantages of supervised learning:
o Supervised learning models are not suitable for handling the complex
tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of
object.
Regression:
Linear Regression in Machine Learning
Linear regression algorithm shows a linear relationship between a
dependent (y) and one or more independent (y) variables, hence called as
linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
y= a0+a1x+ ε
Here,
The values for x and y variables are training datasets for Linear
Regression model representation.
y= a0+a1x+ ε
Where,
Example:
MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear
combination of multiple predictor variables x 1, x2, x3, ...,xn. Since it is an
enhancement of Simple Linear Regression, so the same is applied for the
multiple linear regression equation, the equation becomes:
1. Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub
>2</sub>x<sub>2</sub>+ b<sub>3</sub>x<sub>3</
sub>+...... bnxn ............... (a)
Where,
Y= Output/Response variable
The sum of squares is one of the most important outputs in regression analysis.
The general rule is that a smaller sum of squares indicates a better model, as
there is less variation in the data.
In regression analysis, the three main types of sum of squares are the total sum
of squares, regression sum of squares, and residual sum of squares.
Where:
The regression sum of squares describes how well a regression model represents
the modeled data. A higher regression sum of squares indicates that the model
does not fit the data well.
Where:
The residual sum of squares can be found using the formula below:
Where:
The relationship between the three types of sum of squares can be summarized
by the following equation:
KEY TAKEAWAYS
A low sum of squares indicates little variation between data sets while a
higher one indicates more variation. Variation refers to the difference of
each data set from the mean. You can visualize this in a chart. If the line
doesn't pass through all the data points, then there is some unexplained
variability. We go into a little more detail about this in the next section
below.
Analysts and investors can use the sum of squares to make better
decisions about their investments. Keep in mind, though that using it
means you're making assumptions about using past performance. For
instance, this measure can help you determine the level of volatility in a
stock's price or how the share prices of two companies compare.
Let's say an analyst who wants to know whether Microsoft (MSFT) share
prices move in tandem with those of Apple (AAPL) can list out the daily
prices for both stocks for a certain period (say one, two, or 10 years) and
create a linear model or a chart. If the relationship between both variables
(i.e., the price of AAPL and MSFT) is not a straight line, then there are
variations in the data set that must be scrutinized.
As noted above, if the line in the linear model created does not pass
through all the measurements of value, then some of the variability that
has been observed in the share prices is unexplained. The sum of squares
is used to calculate whether a linear relationship exists between two
variables, and any unexplained variability is referred to as the residual sum
of squares.
The RSS allows you to determine the amount of error left between a
regression function and the data set after the model has been run. You
can interpret a smaller RSS figure as a regression function that is well-fit to
the data while the opposite is true of a larger RSS figure.
SSE=∑�=1�(��−�^�)2where:��=Observed value�^�=
Value estimated by regression lineSSE=i=1∑n(yi−y^i)2where:yi
=Observed valuey^i=Value estimated by regression line
Regression Sum of Squares
Adding the sum of the deviations alone without squaring will result in a
number equal to or close to zero since the negative deviations will almost
perfectly offset the positive deviations. To get a more realistic number, the
sum of deviations must be squared. The sum of squares will always be a
positive number because the square of any number, whether positive or
negative, is always positive.
Limitations of Using the Sum of Squares
Making an investment decision on what stock to purchase requires many
more observations than the ones listed here. An analyst may have to work
with years of data to know with a higher certainty how high or low the
variability of an asset is. As more data points are added to the set, the sum
of squares becomes larger as the values will be more spread out.
There are two methods of regression analysis that use the sum of squares:
the linear least squares method and the non-linear least squares method.
The least squares method refers to the fact that the regression function
minimizes the sum of the squares of the variance from the actual data
points. In this way, it is possible to draw a function, which statistically
provides the best fit for the data. Note that a regression function can either
be linear (a straight line) or non-linear (a curving line).
Using the steps listed above, we gather the data. So if we're looking at the
company's performance over a five-year period, we'll need the closing
prices for that time frame:
$74.01
$74.77
$73.94
$73.61
$73.40
Now let's figure out the average price. The sum of the total prices is
$369.73 and the mean or average price is $369.73 ÷ 5 = $73.95.
Then, figure out the sum of squares, we find the difference of each price
from the average, square the differences, and add them together:
In the example above, 1.0942 shows that the variability in the stock price
of MSFT over five days is very low and investors looking to invest in stocks
characterized by price stability and low volatility may opt for MSFT.