Erick Myers - Python Machine Learning is the Complete Guide to Everything You Need to Know About Python Machine Learning_ Keras, Numpy, Scikit Learn, Tensorflow, With Useful Exercises and Examples. (2
Erick Myers - Python Machine Learning is the Complete Guide to Everything You Need to Know About Python Machine Learning_ Keras, Numpy, Scikit Learn, Tensorflow, With Useful Exercises and Examples. (2
INTRODUCTION
INTRODUCTION OF MACHINE LEARNING
DATA SCIENCE, ARTIFICIAL INTELLIGENCE
AND MACHINE LEARNING
MACHINE LEARNING PREPARATION
MACHINE LEARNING WORKING SECTION
Linear Regression
PRACTICAL LINEAR REGRESSION
APPENDIX
CONCLUSION
INTRODUCTION
Python
Summarize
The details of machine learning will be discussed later, but briefly, if
a machine can learn by itself based on experience or predict it, then
the system is Intelligent or ML Activated.
Currently, machine learning has become an important factor for any
engineering department. It is very important to learn it for data
analysis, classification, prediction. Big data, data science, and
machine learning are associated with artificial intelligence. At
present, in general web apps or mobile phones, ML's various theories
are applying so that the application you use becomes more intelligent
and can gain the ability to understand your mind. Common
applications and ML applicable apps are different, common
applications will always be common, but the ML implementation
will be unique, every time you use the app, it seems to be more
intelligent. However, ML can not only give an intelligence to the
app, but there is no pair of ML for classification and perfection of
any type from diagnosis. In this course, basically, as well as creating
models, the mathematics behind it will be explained in the fluent
language.
This Course is for:
Artistic Intelligence, Big Data, Interested in Data Mining or ML
practitioner, ML Hobbyist and ML Beginners can have this course.
Whoever heard the name of Machine Learning and interested in
applying can do the course. Details are described below.
What you need to know before starting the course (* Marked
topics will be excluded from discussion):
Basic Python Programming *
Basic MATLAB Programming *
Basic JavaScript Programming *
Linear Algebra *
Pythonic Syntactic Sugar
If you have an OOP python it can be considered as
a plus point
Calculus (integral and differential)
Basic statistics such as: Mean, Mode, Median,
Variance, Co-Variance, Correlation, Standard
Deviation ...
What will be discussed in this?
Machine learning is actually a very broad subject. In a course, it is
not possible to complete. Regardless, research is going on to build
better models of better models. In this course you will be introduced
to machine learning, but you will have to go to advanced level your
own. Then let's look at the topics (full topics will be updated later):
Necessary software installation
Anaconda Python Distribution Installation
Identity and installation with PyCharm IDE
Make Sublime Text 3 useful for Python
Machine learning kick start
What is machine learning?
What is the application of machine learning?
What is regression?
What is linear and polynomial regression?
Prediction with Simple Linear Regression (using
Sklearn Module)
Prediction with Simple Linear Regression (Model
from Scratch)
Machine learning kick start 2
Supervised Learning
Unsupervised Learning
Two essential precognition algorithms
Why are these two algorithms necessary?
What is Penalized Regression Method?
What is Ensemble Method?
How to select the algorithm?
The general recipe for making predictive models
Identify the problem through the dataset chain
New dissection problems
What are attributes and labels? What are
synonyms?
The dataset has to be kept in mind
Model and Cost Function
Model Representation
Cost Function
Cost Function Intuition - 1
Cost Function Intuition - 2
Ovefitting - Is your model giving over
performance?
Parameter learning
Gradient Descent
Gradient Descent Intuition
Gradient Descent in Linear Regression
Frequently Asked Questions:
What will be the use of machine learning in my career?
Machine learning is a very wide area. It is from Artificial
Intelligence to Pattern Recognition. Every day, there is a lot of work
to do with data. This data is processed by Google, Microsoft, and
large companies, through pattern recoding. That's why it is so easy to
offer Google search. Whatever the mistakes, he decides to fix it.
Suppose you watch videos on regular programming in YouTube.
After a few days, he will give a video suggestion that you would like
the video to watch.
It does not matter what you need in the careers. If you are a doctor,
you know a little bit of programming, some can be made by ML,
some Data Science and some NLP (Natural Language Processing) or
NLU (Natural Language Understanding) then you can make artificial
brain, which may lead to disease symptoms and can give disease
infections. Whenever you go around, you can treat minor diseases as
a brain e doctor made by you as a chatbot.
In career or not, a large interest-bearing area of CS is ML.
Everybody should know the keywords used in ML.
For whom is machine learning?
It is very good to have a science background to learn machine
learning. Because common programming is done through explicit
programming but in the case of predication, it cannot be solved by
using explicit programming. If there is not a slight idea about
science, there may be problems understanding the underlying
concepts, but you can develop models apart from Math, but the
optimization of the model is close to the impossible without Math.
When should machine learning be used?
If you think your app requires music / video / blog post review to be
set. Or you need smart spammer blockers on your website. Or by
clicking on any ad on your website based on some parameters ... etc.
What is the reason for discussing so many languages?
If a flower stack JavaScript developer wants to apply ML method to
his web app, he will have to learn Python again, to avoid these
troubles, the same thing will be displayed on different platforms.
Which books will be followed?
Machine Learning in Python: Essential Techniques
for Predictive Analysis [Wiley] - Michael Bowles
Mastering Machine Learning with Scikit-Learn
[PACKT]
Data Science from Scratch [OREILY] - Joel Grus
Building Machine Learning System with Python
[PACKT]
Is There any TV series made with ML?
If machine learning is complicated and difficult, it is difficult and
stagnant if you only learn the theory when learning something. But if
we look at the topical related movie or series as well, then our
interest is multiplied. So, this is a short list.
Person Of Interest
A TV series, which is a very enjoyable one, based on machine
learning, this is just enough to love machine learning. The main
character of this is the ultimate talented programmer Harold Finch
and his right-hand John Reese. Harold Finch creates a machine that
can predict before an accident occurs and Harold Finch's job is to
prevent the accident.
In it, shown that->
Natural Language Understanding (where Harold
uses this machine to communicate with in English
language)
Image Processing (Facial Recognition, Object
Recognition, Optical Character Recognition ...)
Artificial Neural Network: It is often seen that
several pictures are interconnected through lines,
these are actually the connections of Artificial
Neuron. A large part of this course will be ANN
Silicon Valley
Although the series is primarily about the genius of the talented
programmer and its data compression company, the ML application
here is called 3rd season.
The data compression algorithm is the main function of how much
information is there in a dataset? If the algorithm detects that a
specific part of the dataset is redundant, it does not damage even if it
is deleted. Except that part, the compressed data size will be lower
than usual, that is normal. But the information extraction tie is the
real challenge.
Suppose that your class teacher only speaks the word 'a' in class, it
can be understood from though the sum of 'A' is acceptable as a
dataset, but the amount of information in 0 When we compress these
'A' strings, the output file size will be 0 bytes. As it has no
information at all. But if the bad algorithm applying, the size of the
output file can be equal or slightly less in input.
Predict one of the applications of machine learning. So, by using this
in data compression we can extract information very easily. But if
the performance of our model is bad, then the AI system can cut the
information as part of the redundant part.
In the 3rd season (non-spoiler) it can be seen that under some
circumstances, Richard is supposed to drop the machine learning
system, but he says that it will be his compression algorithm useless.
We can assume that from here, the ML Methodology applying the
Information Extraction was the main work of the Middle Out
(fictional algorithm).
Very interesting and insightful a TV series Silicon Valley. Although
it is not fully connected with ML, its stories will help you to better
spend your time.
INTRODUCTION OF MACHINE
LEARNING
What is machine learning?
Before starting machine learning, let's look at some definitions from
books. In this regard Arthur Samuel said
Field of study that gives computers the ability to learn without being
explicitly programmed.
Suppose, a Bipedal robot can learn to walk manually, without a
specific walker program, but it can be said that robotic learning
algorithms have been used. We can easily write a program for a
Bipedal Robot walk. But that walk cannot be called intelligent, in
any way, if an embedded system is programmed for that, then only
that specific task, then how can it be intelligent? If the behavior of
the device changes with the change, then it can be called an
intelligent.
Tom Michelsaid-
A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with
experience E.
Suddenly, looking at the definition can be a problem, so it can be
said by an example,
Well, I made a machine that can play chess, then we can write the
following parameters like this,
E = Suppose, the machine played 500 pieces full set chess
T = chess game is the task of machine
P = The machine is not won or won.
As the Definition, If the increase in the number of machine games
(E) increases its win rate (P), then the machine really knows how to
learn.
And it is absolutely impossible to program explicitly.
DATA SCIENCE, ARTIFICIAL
INTELLIGENCE AND MACHINE
LEARNING
Data Science
Data Science is the sum of statistics, machine learning and data
visualization. The job of a Data Scientist is to find answers to some
questions through a dataset.
Artificial Intelligence
Artificial Intelligence: Some problems and problem-solving
techniques, which are used to solve complex problems. The
computer is playing cards, chess games, natural language translation,
security strategy management incorporates AI. There is no word that
AI's problem is to be based on the real dataset, it can be theoretical.
Machine Learning
Machine learning is a section of artificial intelligence where
intelligence systems are created, through datasets or interactive
experiences. Machine learning technology is being used in a number
of fields, including Cybersecurity, Bioinformatics, Natural Language
Processing, Computer Vision, Robotics.
The most basic work of Machine Learning is to check the data
classification, such as an email or website comment spam. There is a
lot of research going on above Deep Learning or Deep Network,
which is mainly used in Convolution Neural Network.
At present, machine learning at the industrial level is very important.
Everyone should know some machine learning methodology. Several
things of machine learning will overlap with data science, but the
main target of machine learning is to build a predictive model.
In a Word,
That is, AI helps in creating an Intelligent Machine. ML is a subfield
of AI, which helps the machine to learn something and helps the
latest data science learning algorithm-based machine to find a data
pattern, which it can use in the next application.
Data science, ML and AI will often feel the same thing because the
differences between them are very small. But there is a common joke
about data science,
Computer knowledge of a Data Scientist is more than a statistic and
its statistics knowledge is more than a computer scientist.
Computer knowledge of a Data Scientist is more than a statistic and
its statistics knowledge is more than a computer scientist.
Types of learning algorithms
Supervised learning
First, teach the machine, then use its teaching.
Regression
Let's start discussing the most familiar problems of ML. Suppose I
have some data, the size of a house and its bargain.
If I plotted this dataset, I would see a graph like this below.
Data-set:
Problem:
I was asked to find out with the above dataset,
If your friend's house size is 750 square feet, then how much is the
price?
Solution:
If I could find an equation which would have the Corresponding
Price available in the area, that means
y = f(x)
or
Price = f (Area)
That means we have to find out what the f () function is? I will not
say here how to find out f ().
The information we get from the problem
Here we are giving our algorithm a dataset where the correct answer
is given. (From the graph)
That means we know the actual price of a fixed size house
By feeding this data in the algorithm, we can teach
him that the price of this size is so much higher.
This is called Training Data.
Now based on this training data, we know the size
of the house that was not in the training data. For
example, if I want to know the price of 3000 sq. ft,
it is not in the dataset! But the model I created can
estimate the price of 3000 sq. ft home based on the
previous experience.
This problem is related to Regression problems
Because, from the earlier used values, we are trying to assume a new
value. Seeing the next example will be clean.
Classification
Now let's look at another type of dataset, where the size of the tumor
is said to be whether the tumor is deadly, malignant or lethal.
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
x = np.array(range(10))
y = np.array(range(10))
plt.plot(x, y)
plt.show()
The work of IPython has been shown! Later, new packages will be
introduced.
MACHINE LEARNING
WORKING SECTION
Before applying machine learning, there are several things to know
about. We will look at this chapter, how to build a predictive model,
starting from the machine learning algorithm choice can be published
through a simple recipe.
Scikit-learn has three parts:
Input
Training data
Predicted output
Model
The most common thing to do when using the scikit-learn library is
to create an object that uses the classifier. For example, in case of
previous home size and bargain related problems,
3x1 = 3
3x2 = 6
3x3 = 9
3x4 = 12
3 X 5 = 15
3 x 6 = 18
3 X 9 = 27
3 X 10 = 30
Now if I separate the training data and testing data from here then the
datasets will stand,
Note: There is also a separate algorithm for separating training data
and testing data from datasets. Data Selections are very good if the
data selections are well-structured algorithms. We'll look into details
later.
Training data
3x1 = 3
3x2 = 6
3x3 = 9
3x4 = 12
3 x 6 = 18
3 X 9 = 27
3 X 10 = 30
Testing data
3x5=15
Here, it turns out that I did not give 3x5 to the training data. That
means, I'll model the model with the remaining data except 3 x 5.
Then, with the input of 3 and 5, will the model be given around
output 15? If so, that means my model algorithm selection, data
preprocessing and training is appropriate. And if it does not work,
then I'll have to triage again from the data preprocessing start with
the new algorithm.
Creating a model is a whole iterative process, that is to say, reflection
is to be done, it is not right to think that a model will be perfect at
once.
Workflow Guidelines
The last step is to be careful to build the model, from the very
beginning
To build the model, from the beginning all the thoughts will come
forward. Because every step is dependent on the previous step. Like
a chain, one part is wrong, again the first thing to start from.
So, the ultimate target, the handset datasets, the algorithm selection
will do everything with care and thinking.
Anyone can go back to the previous step
Suppose you have a quality dataset, but you want to output the result
of adding two numbers. At the same time, with what you want, the
dataset is no match. So, we have to replacing the data-set of the
quality by adding dataset of yoga, then we need to train the model
again.
Data needs to be sorted
RAW data will never be sorted in your mind, for model training it
must be prepressed.
This is the most time-consuming data preprocessing.
The more fun the data is
In reality, the data you can feed in the model will be better than its
prediction accuracy. This theory is indispensable.
Problems, it's a solution that is good
Never give up on bad solutions. If you do not get adequate
performance even after many attempts at solving a problem, ask
other steps.
Am I right?
Do I have the necessary data to solve the problem?
Is the corrected algorithm correct?
If you do not get satisfactory answers, then do not fix the problem.
The reason is that the model gives the correct answer 50% and the
remaining 50% is wrong answer, it is better to create a confession,
not to solve the problem.
"There are no right answers to wrong questions." - Ursula K. Le
Guin
"Ask the right questions if you're going to find the right answers." -
Vanessa Redgrave
"In school, we're rewarded for having the answer, not for asking a
good question." - Richard Saul Wurman
The first step in the form of machine learning model
Workflow Revision
If we want to apply general method, then the first thing we need to
do is solve a problem with those methods. Let's see our sorted
problem once,
Problem details
Predict whether a person will be infected with diabetes
Seeing the details of the problem, we may have got our desired
question, what does it mean to ask a new question? This problem is
only going to be solved?!
The answer is no, no. It is wrong to define the machine learning
problem with just one line of questions, break this question into
small and specific questions, then they have to be resolved.
Then what are the questions?
Finding Data
Data Inspection and Data Cleaning
Data Exploration
Converting Data to Tidy Data through Handling
All things are done at Jupyter Notebook
Let's see, what is Tidy Data?
Tidy Data
With the dataset that can be easily modeled, it can easily be
visualized and those that have a specific structure or structure are
Tidy Data.
Feature of Tidy Data
Each variable will be column one
Each observation will be a row
Each observational unit will be a table
Tidy form of collected dataset is somewhat time consuming.
50-80% of the time spent on machine learning based projects are
used to collect, clean and organize data.
Data collection
What are the good sources of data collection?
Google
If you search Google, of course you will find, but
there may be a little caution, hahabi, fake and
cancellation data, they can be used for testing. But
if a serious project is required, try to collect
vertefid data.
Government database
Government databases are really good sources for
data calculation. Because here you can find fairly
varied data. Some government databases have good
documentation for checking the data.
Professional or company data source
Very good source. Some professional societies
share their databases. Twitter also reports and
shares their tweets of their tweets and their own
analysis of tweets. Financial data is available from
the company's free API, such as Yahoo! Share
these types of datasets.
The company that you work in
The company you are in may also be a good source
of data.
University Data Repository
Some universities offer datasets at free of charge,
such as the University of California Irvine. They
have their own data repositories from which you
can collect data.
Kaggle
Do not know about Kaggle's name while working
with machine learning, or not. You can call it the
code force of data scientists. There is a constant
contest on data analysis. Unrivaled for high grade
datasets.
GitHub
Yes, there is a large amount of data available in
Git-hub. You can check out this Awesome Dataset
Collection
All that has been discussed above
Sometimes a source does not work with the data, so
all sources cannot be tricked. Then integrate all the
data and make the Tidy data work.
From where will we collect the dataset of our selected problem?
Pima Indian Diabetes Data
Data files
Dataset details
It has been said that we will collect diabetes database from UCI
Machine Learning repository.
Some features of this dataset:
At least 21-year-old Female Patient
768 Observation (768 Row)
Contrary to every row, there are 10 columns
9 of 10 columns are the feature, meaning: number
of pregnancies, blood pressure, glucose, insulin
level ... etc.
And the rest of the column is: Do not have diabetes
(True / False)
Using this dataset, we will find out the solution of the problem.
Let's take a look at some data rules before that.
Data Rule # 1
Whatever you are trying to predict, it is as good as it is in the
dataset
This rule can be understood by reading or listening to the rule and
even with common sense.
Actually, that's not the case, since we want to figure out how likely a
person is to be diagnosed with diabetes, this dataset is perfect for our
work, because there is a direct line on a column, the person who has
been tested is diagnosed with diabetes?
To solve many problems, you might not want to predict exactly what
you want to predict. Then you will have to recreate the dataset and
sort it in a way that matches your target variables (the attribute that is
predicted, such as whether there is diabetes here) or near.
Data Rule # 2
No matter how good you look at the dataset, the way you want to
work with it will never be in the format.
So, the next task of collecting data is data preprocessing. Which we
talk about today.
CSV (Comma Separated Value) Download and
Direct Data.
If you visit the UCI link, you can see there are links to two files
named data and .name.
Valid values in the .data file are separated, but the file format is not
.csv, another thing is that there is no value what the value means
(say, but separate file - .name).
So, for your convenience, I uploaded the .csv file. Whereas, along
with the values, any column actually indicates the properties.
Download two files and place them on your PC.
csv pima dataset download (original)
csv pima dataset download (modified)
Note
original: It has been said that there is no diabetes
here
Modified: Replaced all 1/0 with TRUE / FALSE
The data exploration with the Pandas library
Did you know a bit about ipython notebook? If you do not know,
take a look from here.
Open the cmd in Windows and open the notebook
with the following command
ipython notebook
If the command does not work, then try it jupyter
notebook
If your browser is open then New> Python 3 opens
a Python file and performs the tasks shown here.
Import required libraries
Before starting the work, we added the necessary libraries with the
code below
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Magic Function of Jupiter Notebook for Inline
Ploting (We do not want to make plot shows in
separate windows)
% matplotlib inline
Data load and reviews
pd.read_csv (r'file_path ')
Here we have imported the pandas library as pd (as), in order to call
any function of pandas, I do not need to write the pandas, pd must be
written.
If I did this,
import pandas as PANDA
So to call the function, I wrote it like PANDA.read_csv ('file_path').
Now, in the read_csv function, the function is understood to mean
that the job is to read the csv file.
This function converts csv files into Pandas dataframe format. Which
can be changed by the Pandas library.
Read_csv ('filePath') Here I gave the directory where my PC was the
csv file in the argument. You must give your PC the directory where
you have the file.
Data_frame.shape
Since DataFram's data is a matrix (or 2D array) so we call the shape
variables to see the row and column number.
Output Row - 768 (without label) and Column - 10pm
data_frame.head (number)
By calling the data_frame.head (3) function we printed the first 3
rows of dataframes.
data_frame.tail (number)
By calling the data_frame.tail (4) function, we print the last 4 row of
dataframe.
Today's chapter ends here, but it was the first part of data
preparation. In the next phase we will discuss the fundamentals of
data prepressing.
Data Processing -Last Part
"Organize, do not agonize." - Nancy Pelosi
Data Preparation (Data Prepressing) - 2
Change the dataframe
Dataset data may be missing almost always. We also have to handle
those missing data. Yes, maybe we will not get lost data, but
programs can crash if they do not take any necessary measures.
What Columns to Exclude?
Which will not be used
There are columns but no data
If the same column is multiple times, then one will
have to delete the rest
Many times, it may seem to be the
name of two separate columns but
the fact is the same. For example,
one column is written in length and
the other is written in the column
Size (centimeter), suddenly it seems
that two things are different because
the label is Size and Length. But it
was noticed that we were getting the
size data by multiplying each of the
lengths of the data by 100. It is not
possible to find similar data by
calculating hands and even though
it is not an efficient method. These
extra columns actually generate
noise on dataset. We will separate
the same columns by using
'Correlation' here.
What is Correlated Column?
If the same information is in a different format, in
the example above, the length and size are actually
the same thing, only the unit is different. So, these
are Correlated Column
Adding little information or not.
Configuring learning algorithm.
A little bit about the linear regression
To understand the next example, we need some basic of linear
regression.
Think of the following fantasy datasets,
Graph
If you are told, how much will the house cost 5 sq ft? Feel free to
say, the answer would be 25 lac.
How did you say?
It is very easy, for every 1 sq ft, the price increases to 5 lac.
If we want to stand up to a mathematical model, it will be a lot like
that.
price=size(sqft)×5(tkinlac)
Or,
y=f(x)=α×x
Where, the price is y, sizeis x, a is 5 and the f(x) function tells how
much the price for its value x.
In reality, the model is not so easy, there are many odds, now I'm
getting the value with an alpha quality, but if you multiply it by beta,
gamma, theta etc. may not even get close quality.
Let's see the following dataset,
Graph
If you say, if the size of the house is 6 sq ft then how much will the
price? Now you are going to get a lot of trouble, because the
increased price is not balanced with increasing the size of every
square fit. By separating from the previous, you can find the
difference with the difference by adding the difference, the problem
is not so easy. No re-attached to the reason
Now if I am told, then I would be quite inconvenienced to stand a
mathematical model of it. Any such linear emission, where 1, 2, ... 5
gives 10, 12 ... 22 if inputs are available?
Exact cannot build a model, but it may be possible to create a nearby
model, which may be quite similar to the equation,
Examples of Correlated Column
Let's say we again get the famous problem House Price Prediction.
Earlier it was said, the co-relation with a variable of its own will
always be 1. The value of the tables along with the diagonals must be
1 because the co-relation was detected in their own way. And
corr_value is a variation of a variable and another variable can be a
value, since we define these values using the libraries so we do not
see the need to calculate in our own hands.
The most important thing to do is choose the color of the heatmap.
There is nothing to worry about, if we look at the built-in color map
of the Matplotlib library, we can work now. If you want, you can
give your choice of color to the documentation. For now, we will use
the default.
Matplotlib Heat Map Color Guide
Matplotlib will set the color according to the sequence below when
generating a heatmap.
Less Correlated to More Correlated
Blue -> Cyan -> Yellow -> Red -> Dark Red (Correlation 1)
Heatmap Generating Function
Let's write down the function to generate an instant heatmap, the
function is like that
# Here size means plot-size
def corr_heatmap (data_frame, size = 11):
# Getting correlation using Pandas
correlation = data_frame.corr ()
# Dividing the plot in subplots for growing size of plots
fig, heatmap = plt.subplots (figsize = (size, size))
# Plotting the correlation heatmap
heatmap.matshow (correlation)
# Adding xticks and yticks
plt.xticks (range (len (correlation.columns)), correlation.columns
plt.yticks (range (len (correlation.columns)), correlation.columns
# Displaying the graph
plt.show ()
Why use the sub-slot?
If you wanted to be able to generate heatmaps even using
plt.matshow (correlation), but I could not generate the arbitrary size
graph, so plotting the plot is a size plot and generating a customized
plot of the plot.
What is xticks and yticks?
plt.xticks (range (len (correlation.columns)), correlation.columns)
This code implies that the length of each block is 1 unit and the spots
will be 0, 1, 2 ... len (correlation.columns). And each block with the
next argument (correlation.columns) has been labeled.
The same applies for plt.yticks ..
What has been done with plt.show ()?
U kiddin' bro?
HitmapPlotting’s by corr_heatmap (data_frame, size) function
Tried to write the function with difficulty and not use? With the code
snippet, I can easily plot the heatmap,
corr_heatmap (data_frame)
Generated hit mapCloser view
Remarkable
If we had already seen two variables in the same way, they would
have Correlation 1. In each diagonal, the co-relation with its own has
been detected, so the diagonal blocks are dark red.
But notice that the coils of skin and thickness are both but 1 (dark
red color).
So, the skin and the thickness are actually the same thing, the unit
has just changed. Do not believe?
If you do one thing, multiply each value of thickness by 0.0393701
and you will see the value of skin. 1 millimeter = 0.0373701 inch
Now you can tell which unit is actually?
Calvert, I've got dataset cleaning
From the above work, we realized that we have columns of the same
type. Tidy Data's attributes were that each column must be unique.
Keep one of the duplicate and the rest will disappear from the
dataset.
I'm going to miss the skin variable here, if you want you can erase it
or thickness, your wish is completely.
# Deleting 'skin' column fully
del data_frame ['skin']
# Checking if the action was successful or not
data_frame.head ()
We could drop a duplicate column. The work is not finished yet, the
data must be molded. There is nothing to worry about, this is the last
step of data prepation. So cheers!
Data Molding
Data type adjustment
Our dataset should be such that it is useful to work in all algorithms.
Otherwise, we need to do twinkling of data for each algorithm which
is a lot of trouble. So, we will do the trouble once and for all, so that
it does not become the cause of headaches.
Data type checking
Before data handling, please check the datatype once.
data_frame.head ()
If you do, then you will see some samples of dataForm and you will
notice that all the values are float or integer type but there is a
boolean type.
Data type changing
We will make true 1 and make False 0. The code below can be done
by snippet,
# Mapping the values
map_diabetes = {true: 1, False: 0}
#Setting the map to the data_frame
data_frame ['diabetes'] = data_frame ['diabetes']. map
(map_diabetes)
#Looks what we have done
data_frame.head ()
Congratulations!
This molded and cleaned dataset will be able to work in our
algorithm we want.
But?
Data Rule # 3
Rare events are less likely to predict with high auctions
Normal, because the rare event means that such events are less in
your dataset. And the lesser the duration of these events will be less
prediction and wors. But it is better not to worry about it. Prior to the
conventional prediction, fix the ray events later or not.
Some other analyzes.
True / False Ratio Check
If we want to see how many percentages of this dataset are affected
by diabetes and how many people do not, then notebooks will be
removed and write the instant code.
num_true = 0.0
num_false = 0.0
for item in data_frame ['diabetes']:
if item == True:
num_true + = 1
else:
num_false + = 1
percent_true = (num_true / (num_true + num_false)) * 100
percent_false = (num_false / (num_true + num_false)) * 100
print ("Number of True Cases: {0} ({1: 2.2f}%)" format (num_true,
percent_true)
print ("Number of False Cases: {0} ({1: 2.2f}%)" format (num_false,
percent_false))
Output:
Number of True Cases: 268.0 (34.90%)
Number of False Cases: 500.0 (65.10%)
We can write the code in Pythonic Way in four lines.
# Pythonic Way
num_true = len (data_frame.loc [data_frame ['diabetes'] == true])
num_false = len (data_frame.loc [data_frame ['diabetes'] ==
False])
print ("Number of True Cases: {0} ({1: 2.2f}%)" format (num_true,
(num_true / (num_true + num_false)) * 100))
print ("Number of False Cases: {0} ({1: 2.2f}%)" format (num_false,
(num_true / (num_true + num_false)) * 100))
Data Rule # 4
Keep data manipulation history and check regularly
There is a system to do this (using Jupyter Notebook)
Using the version control system, such as: Git, SVN, BitBucket,
GitHub, GitLab etc.
Summary
What have you done in these two episodes?
Read the data with Pandas
Auction of ideas about co-relation
Duplicate columns evicted
I've doled the data
Check true / false too
Algorithm Selection
Model Training Model Performance tasting-1
Model Performance Tasting-Last Part
Linear Regression
To see we came to the third step of machine learning. How to
determine which of the so many learning algorithms will be the best
choice for me.
If you see the work sequence again, then it will stand,
Overview of this chapter
Basic or Enhanced?
Enhanced
Basic variation
Better than Perfomance Basic (say?: P)
Extra convenient
Complex
Basic
Easy
So easy to understand
Yes, you understand, because we're bigger, so it's good to be in
Basic.
Three Candidate Algorithm at the end of filtering
We now have three algorithms,
Naive Bayes
Logistic Regression
Decision Tree
I'll pick one from this. After a few discussions about three things, we
will come to a decision that will be used by Better. Three basic and
classic algorithm of machine learning. Complex algorithms are
basically made using them as building blocks. Let's start with naive
bayes.
Naive Bayes
Binary result
The weight of the Input Variable / Feature is
weighted (all features may not be equally
important)
Let's see the next algorithm.
Decision Tree
It is easy to understand
Work faster (about 100 times faster than normal
algorithm)
Even if the data is changed, the model is stable
Debugging is comparatively easy
The biggest reason is that this algorithm matches
our problem perfectly, because we are trying to
find the likelihood and the work of this algorithm is
to determine the likelihood :)
Summary
Lots of learning algorithms are available
I did the
selection Learning Type – Supervised
Result - Binary Classification
Complexity - Non-Ensemble
Basic or Enhanced - Basic
Naive Bayes selected for training, because
Easy, Fast and Stable
Model Training
We have worked up to the solution statement, data collection and
prepressing and algorithm so far. Now we will see how to train the
model. According to the flowchart, we are in the following step,
Overview of this chapter
feature_column_names = ['num_preg',
'glucose_conc', 'diastolic_bp', 'thickness', 'insulin',
'bmi', 'diab_preb', 'age']
predicted_class_name = ['diabetes']
# Getting feature variable values
X = data_frame [feature_column_names] .values
y = data_frame [predicted_class_name] .values
# Saving 30% for testing
split_test_size = 0.30
# Splitting using scikit-learn train_test_split
function
X_train, X_test, y_train, y_test = train_test_split
(X, y, test_size = split_test_size, random_state =
42)
For random_state = 42, this means that if the program runs every
time, splatting is guaranteed to be from the same place.
Is the split in the dataset really 70-30? Let's check
print ("{0: 0.2f}% in training set" .format ((len (X_train) / len
(data_frame.index)) * 100))
print ("{0: 0.2f}% test test" .format ((len (X_test) / len
(data_frame.index)) * 100))
Output:
69.92% in training set
30.08% in test set
Near!
What is missing data? (0 value, not null value)
Sometimes a column may have different values, but when you check,
you see that many values are 0 which is not possible. How to deal
with it? There is an algorithm that replaces 0 values with an average
value to be placed in the state to work, before going see it, how many
of our values are actually 0!
print ("# rows in dataframe {0}". format (len (data_frame))
print ("# rows missing glucose_conc: {0}" format (len
(data_frame.loc [data_frame ['glucose_conc'] == 0])))
print ("# rows missing diastolic_bp: {0}" format (len
(data_frame.loc [data_frame ['diastolic_bp'] == 0])))
print ("# rows missing thickness: {0}" format (len (data_frame.loc
[data_frame ['thickness'] == 0])))
print ("# rows missing insulin: {0}" format (len (data_frame.loc
[data_frame ['insulin'] == 0])))
print ("# rows missing bmi: {0}" format (len (data_frame.loc
[data_frame ['bmi'] == 0])))
print ("# rows missing diab_pred: {0}" format (len (data_frame.loc
[data_frame ['diab_pred'] == 0])))
print ("# rows missing age: {0}" format (len (data_frame.loc
[data_frame ['age'] == 0])))
Output:
We already know that the ROC curve has Y-axis in TPR and X-axis
has FPR. Then we can easily place these four coordinates at ROC
Space
Coordinates
Coordinate -> Model (X, Y)
G point -> Gaussian Naive (0.28, 0.63)
L point -> Logistic Regression (.77, .77)
R point -> Random forest (.88, .24)
A point -> Artificial Neural Network (.76, .12)
We'll plot these points now.
import numpy as np
import matplotlib.pyplot as plt
# fpr, tp
naive_bayes = np.array ([0.28, 0.63])
logistic = np.array ([0.77, 0.77])
random_forest = np.array ([0.88, 0.24])
ann = np.array ([0.12, 0.76])
# plotting
plt.scatter (naive_bayes [0], naive_bayes [1], label = 'naive bayes',
facecolors = 'black', edgecolors = 'orange', s = 300)
plt.scatter (logistic [0], logistic [1], label = 'Logistic Regression',
facecolors = 'orange', edgecolors = 'orange', s = 300)
plt.scatter (random_forest [0], random_forest [1], label = 'Random
forest', facecolors = 'blue', edgecolors = 'black', s = 300)
plt.scatter (ann [0], ann [1], label = 'artificial neural network',
facecolors = 'red', edgecolors = 'black', s = 300)
plt.plot ([0, 1], [0, 1], 'k--')
plt.xlim ([0.0, 1.0])
plt.ylim ([0.0, 1.0])
plt.xlabel ('False Positive Rate')
plt.ylabel ('True Positive Rate')
plt.title ('Receiver operating characteristic example')
plt.legend (loc = 'lower center')
'plt.show ()
Examples are taken from Wikipedia, there are some extra points out
of Wikipedia's ROC curve,
I've split scatter separately to explain each point here. If you have a
lot of models, or if you plot a paradigm shift based on the same
model, then your plotted ROC curve will be such.
The model here is only 4, so the line plot here cannot be understood,
so Scatter Plot is done.
ROC curve explained
100% Accurate Model's FPR = 0 and TPR = 1. It is easily
understood by the Ideal as the ANN model is best, then Naive Bayes,
then Logistic Regression and the end is the Random Forest
performed.
Already (and again) say, it is not always like ANN> NB> LR> RF,
according to the type of dataset and problem, one model performance
is different. I imagined the whole thing here.
Let's say that the dashed line that has gone through the middle is
called line of no-discrimination. The points are as good as the above
on the line and the worse is the bottom.
AUC or Area Under Curve
Looking at a ROC curve above? The ROC curve covers the area as
much as it is. 100% Accurate Model's AUC is TPR * FPR or area of
whole graph.
A lot of questions have been raised about measuring performance by
AUC, at present, everyone prefers ROC. So, I did not talk about the
AUC.
Overfitting
Earlier it was said, at some time the performance of the model is so
good that the accuracy rate in training data is about 95-99%. But
predicting test data is not 40% accurate.
The question is, why is it?
In fact, we train with that dataset, there is noise and there is real data
as well. That is, 100% Pure Dataset will never get you.
It can be a classic expletive, I found some datasets, how many bells I
read, and how many hours I got to get a number of marks on it. Now
I'm going to predict the model on this dataset and then paddle it, and
if I see some way, Marx is getting more than sleeping, and based on
that, I slept before the next test but I did not read anything (because I
made sense that Marx got more sleeping ). It is going to mean what
the results will be.
So, what is the reason that this is wrongly giving the prediction? Two
reasons, (1) there is not enough data, (2) the number of columns in
the dataset (variable, here are some bells read and many hours sleep).
There may be many reasons for Marx's goodness, if the test is MCQ,
then it is a good amount of color with a bark, or the question is much
easier etc. So I did the train without input, so the model is naturally
that? Without knowing the reasons, I will adapt myself to the dataset
given in the way that the error is the lowest.
Model train means reducing error, and hyperparameters for each
model are set according to mathematical analysis for error reduction.
If you use the Hyperpermeter, then the model will use the same error
(this is normal). But if the model is adapted to the noise of the
dataset, it will be a lot of trouble if the error is reduced.
Later on, we will look at more detail about overfitting in few steps.
Overweight reduction
What can be done to reduce the overfitting is to collect data and
increase the number of columns. Pure datasets and good prediction
results as much as possible. What did it do in the dataset? If you
want to tune the algorithm well then it is possible to find out. We
will look at a method.
Regularization & Regularization Hyperparameter
We can control how we will learn an algorithm. The machine
learning algorithm means that no mathematical model is working
behind it, so the mathematical model learning mechanism can be
controlled with certain parameters.
Hold out a model out of the output with this formula,
Y = ax^{3} + b x Y = a x 3 + bx
We can control its learning, subtract from section results, create a
Regularized Model,
Y=ax3+bx−λ×xY = ax^{3} + bx - \lambda \times xY=ax3+bx−λ×x
Here, the Regularization Hyperparameter i s λ
Note that, Y s value will be slightly lower than the previous
prediction, so that the training dataset will be less than this precision.
But it's good! Because? The reason is that, now he is not memorizing
every dataset, because Regularization Hyperparameter will not allow
him to memorize, as its value increases, its predicted value will be as
much as penalty. So, we can say this as a Penalized Machine
Learning Model.
Whenever the model can adapt to the dataset for reducing error,
Omni lambda will remove it with a penalty. Our ultimate work will
be done to tune this lambda in such a way that accuracy is good at
Testing Dataset. Accuracy glance at training dataset: p
Improve Accuracy via Regularization Hyperparameter Tuning in
Logistic Regression Models
Topic title became a bit bigger. A little earlier we learned that by
hacking the mathematical model we can reduce the model
overloading through regularization. Modular-based Regularization
Hyperparameter is a variety of. Alive Logistic Regression's model is
coded in the Cykit Library and they also provide a convenient
interface for Regularization Hyperparameter.
Our task will be to collect prediction scores by changing the
Regularization Hyperparameter's value. Then store that accuracy will
be the highest in the Hyperparameter Value.
I saw the theorem, this time to see the practical watch. Now you
must take out the notebook and write the code.
from sklearn.linear_model import LogisticRegression
lr_model = LogisticRegression (C = 0.7, random_state = 42)
lr_model.fit (X_train, y_train.ravel ())
lr_predict_test = lr_model.predict (X_test)
# training metrics
print "Accuracy: {0: .4f}" format (metrics.accuracy_score (y_test,
lr_predict_test))
print "Confusion Matrix"
print metrics.confusion_matrix (y_test, lr_predict_test, labels = [1,
0])
Output
Accuracy: 0.7446
Confusion Matrix
[23 128]
Classification Report
precision recall f1-score support
1 0.66 0.55 0.60 80
0.78 0.85 0.81 151
avg / total 0.74 0.74 0.74 231
We did this for the Nive Voices model. Here C is our Regularization
Hyperparameter, starting at the beginning, we'll check Acuracie for
its various values later.
C (Regularization Hyperparameter) Values
C_start = 0.1
C_end = 5
C_inc = 0.1
C_values, recall_scores = [], []
C_val = C_start
best_recall_score = 0
while (C_val <C_end):
C_values.append (C_val)
lr_model_loop = LogisticRegression (C = C_val, random_state =
42)
lr_model_loop.fit (X_train, y_train.ravel ())
lr_predict_loop_test = lr_model_loop.predict (X_test)
recalls_score = metrics.recall_score (y_test, lr_predict_loop_test)
recall_scores.append (recall_score)
if (recall_score> best_recall_score):
best_recall_score = recall_score
best_lr_predict_test = lr_predict_loop_test
C_val = C_val + C_inc
best_score_C_val = C_values [recall_scores.index
(best_recall_score)]
print "1st max value of {0: .3f} occured at C = {1: .3f}" format
(best_recall_score, best_score_C_val)
% matplotlib inline
plt.plot (C_values, recall_scores, "-")
plt.xlabel ("C value")
plt.ylabel ("recall score")
Since Regularization Hyperparameter C, and I want to see the
recall_scores for the value of different Cs (the better the
recall_score), so C_start = 0.1 auction, C_end = 5 auction, and
increase the value of C to 0.1 in the loop.
And for every C value, I checked Acuracie with predicted dataset,
whenever the value of the recall is greater than before, then at the
best_recall_score, recall_score means the current score will be
assigned.
The code is not difficult to understand the previous issues.
Two lists named C_values and Recall_scores are for value store
Output
Graph of how performance is changing with the increase in the value
of C
As you have already made a good hand in the matplotlib library, let's
just shoot a graph.
import matplotlib.pyplot as plt
import numpy as np
beton = np.array ([20, 30, 40, 50, 60])
khaoa = np.array ([5, 10, 15, 20, 25])
plt.xlabel ('Proti mash e income')
plt.ylabel ('Khete jaoar har')
# Income vs expenses
plt.title ("Ae vs Baae")
plt.plot (beton, khaoa)
plt.show ()
Graph
If I tell you, okay, in the sixth month, how many times will you go to
eat? You can say without difficulty, 30 times (if the income increases
balanced).
This is what you predicted, but a mathematical model can be made.
$$ Khaoa = \ frac {Aye - 10k} {10k} \ times 5 $$
With this equation you can verify dataset.
Here I created an equation; this is the linear model where you can
give an Aye input and sometimes you will be predicted. Graphic
proof of being a linear model is graph. Your dataset is undoubtedly
fitted in the linear model.
Now let's think of another scenario,
Income per monthHow many times have you been eating out every
month?
Seeing in this dataset your income increases in the early months, but
later, you have not been able to quit eating habits first. Then he's
taken over control.
A scatter of this dataset
If you say this time, how many times will you go to the next month if
your income is 15k? Now there's no linear pattern in the dataset, no
specific equation and you can easily predict.
We can put the linear model apart from the non-linearity, with
extreme conditions. That's the next discussion. Now we will discuss
the linear pattern. We understand the linear regression, now what is
the model representation?
Model Representation
The simplest Bengali model of model rearrangement is the analysis
that we will run in a dataset, how many notes of the notation mean,
how to write and how to construct a book. Why is it necessary? The
reason is that when you go to read theoretical books of machine
learning, then you will not be able to match this course. There may
be models reprinted with Math's Harkhabi symbol. So, to understand
them we also need to know about official notation.
Your friend's dataset needs a little bit again, so paste here again.
Home size (single-sq ft) (assuming it is x)Home Prices (Single - Tk.)
(Hold it, Y)
So, this is the row number 47 of this dataset so we will write,
m = 47
X = "input" variable / feature
Y = "output" variable / "target" value
$$ (x, y) $$ is a row with this notation, it can be any one.
If I want to mean the 20th row, I will write $$ (X ^ {(20)}, Y ^
{(20)}) $$
If it means training example then we have to say
Hypothesis
Let's see a diagram,
The question is,
How do we create this? Since today's chapter can be considered with
a linear model, we can choose a linear function.
Well, we're writing Hypothesis, who we write in the short hand.
Notice that the input variable here is just one, so let us say Univariate
Linear Regression.
Cost Function
We usually spend income calculating. Always try so that our
expenditure is the lowest. This is exactly the case for machine
learning. Here's an all-in-one effort, minimize how much the Cost
Function is. As model training, we understand Cost Function
Minimization.
Before minimizing the cast function, it is understandable that what is
actually the Cost Function. Before understanding the Cost Function,
let's know another thing.
We've chosen the function for hypothesis. Here we only know its
value. But I do not know how much it will be. Let's do some research
on what it will be like.
Do one thing before, make a scatter plot of your friend's dataset.
One of the straight lines that we have to do is to fit into this data.
It must be sorted and sorted according to the value of our hypothesis
(Prediction) Dataset, ie cost is lowest or Cost Function is Minimized.
Before knowing what the value of the theta will fit in the dataset, we
will see how the graphs of h come with the change in the value of the
parameters (theta).
Grab and
Then the graph will come,
Grab and
Then the graph will come,
With Diets, but almost fits. What we have to do is shift the line a bit
more, do not remembe r y = m x + c ? m Is the slope and the c
constants whose work is to move the straight line to the positive side
of Y-axis (for c positive values).
Here θ 0 is working and doing c job and θ 1 ffff
an d θ1
Green color line is our new hypothesis. The red line is the previous
one.
All understood but did not say the cost function? I cannot even get
the ticket.
There is nothing to worry about, we have raised the hypothesis this
time we will see the cost function.
The Cost Function will be displayed here by J ( θ 0 , θ 1 ) . If
we had a few more styles in our model, then we would publish the
J ( θ 0 , θ 1 )= 2 m 1 ∑ i = 1 m ( h 0 ( X ( i ) )
− Y ( i ) ) 2
\frac{1}{2 \times m} 2 × m 1
e 4/2 + e 5/2 )
Where , e 1 =( h ( X 1 )− Y 1 )=342480−399900=
−57420
Thus, we can remove the rest of the errors by python, formulate the
form and calculate the cost.
12 + e 22 + e 32 + e 42 + e 52 =14534002800
Multiplied by1/ 2 × m
J (90000,120)=1453400280
In this way we will calculate the cost calculation for our various
combinations, and the lowest cost comes in a combination, then by
that combination we will perform a performance test by making a
linear model.
Frequently Asked Questions:
What is the value of multiplication by the Cost Function?
It is done to make the next mathematical calculation easy. There is
nothing else. If you do not multiply by half, there is no problem.
What is the meaning of Residual, MSE (Mean Square Error), OLS
(Ordinary Least Square), Loss Function, Residual Sum of Squares
(RSS)?
Graph
Such a graph of this dataset,
h 0 ( θ )= θ × X 0
Hold θ = 0.1
Then the plot will come,
hypo1
Cost Calculation: J(0.1)=2×31 ×4 2 +40 2 +400
2=26936.0
Hold again , θ =0.2
Then plots,
hypo2
Cost Calculation: 15151. 5 J (0.2) = 2×31 ×3
2+30 2 +300 2 =15151.5
Hold again θ =0.3
Then plots,
Cost Calculation: J (0.3)= 2×31 ×2 2 +20 2
+200 2 =6734.0
Hold again θ =0. 4
Cost Calculation: (0.4)= 2×31 ×1 2 +10 2 +100 2
=1683.5
θ = 0.5
We finally learned a lot about Cost Function. Now we will see the
Cost Function Minimization Using Gradient Descent
Gradient Descent Algorithm
Do you remember calculus? Differentiation? That's what we will be
doing now. If you do not mind, then let's look at a differentiation
first.
Differentiation: Method for Calculating Slope at a specific point
of a function
At any point the derivative of a function means that the tangent tilt of
that function is at that point. Hold, y = f (x) y = f (x) any one
function, we now have its (x_ {1}, y_ {1}) (x
) Want to know the tangent, its slope (how many degrees angle
produces with the XX axis) at the point. Then we deflect f (x) f (x)
with respect to the independent variable xx. The differential operator
writes to the operator \ frac {dy} {dx}
Let's see the picture below,
Slope or shield
The formula for the slope is, m = Δ y / Δx
Partial Derivative
We will most certainly use the Personal Derivative. A function that is
always a defender on a variable is not true. Eg: z = f (x, y) = x ^ {2}
+ xy + y ^ {2} z = f (x, y) = x Let's think about this function, here
the zz variable is dependent on x, yx, y two. So if we want to track
changes in zz with respect to xx and yy two, then there will be no
derivative.
z = x 2+ xy + y 2
Δ z / Δx= 2 x + y
Δ z / Δx= 2 y + x
If we calculate the cost function with the parameters then our
normal derivative is being taken, but if there is no cost function with
two or more parameters, then we must take a partial derivative. For
now, we will try to understand Gradient Distends with a parametric
cast function.
The question is, what will we do with this shield? Actually, we can
save Billion-Billions of secs with a little (!) Concept of calculus.
We will try to minimize the cost with differentiation and slope
concept. And for that effort, the algorithm we use will be that of
Gradient Descent
Gradient Distinct
Algorithm
repeat till convergence {
θ j : = θ j − α δθ j δ J ( θj )
} Mathematical Notation
I meanMathProgramming
x and y are equalx = yx == y
Assign the value of y to xx: = yx = y
x update examplex: = x + 1x = x + 1
That means : =This is meant by θj’s value to be updated every time.
Here, α is the Learning Rate
Gradient Distinct Intuition
What is the algorithm actually saying? We already know that
machine learning model training means that the parameters of the
model are set in such a way that our prediction is perfect. We try to
understand through a few graphs, what is the function of the gradient
descent algorithm?
Hold our Cost Function j(θ1)
This time any θ1 Hold the value of it, and deferent at that point. If
the shield is positive, it means that J (θ1) The value increases and the
value of the value decreases in the opposite direction. Seeing the
picture below can be understood.
This time we take another point, which is at the left of the local
minimum.
That means the Gradient Dissent Sources tell us which direction the
cost function will be minimized. This is when a parameter. It is not
convenient to visualize these hundreds of parameters, but in all
cases, it is exactly the same.
This update will continue until the minimum point is reached. The
algorithm at the minimum point will become an automatic stop
Remarkable
Notice that the input variable is not the same as before. Rather, many
more, we now have the features that we will not have xx only. Now
we need to separate each column with the mathematical notation so
that we can understand what is actually a column. To do this, we will
place the column number with the xx subscribe for each column. In
the superscript, the row indexes sit and subscribe to the column
Column Index.
Example: (For First Row Only)
Size(feet2) =x1(1Number of bed rooms=x2(1)
Number of floors=x3(1)Age of home=x4(1)
Price=y1(1)
2nd example
If we want to arrange the second-row input variables in matrices,
then it would be like this, since we are not considering any specific
columwise variables, we created a matrix with all the variables so we
do not have to implement the subscript separately.
Hopefully, the matrix notation of the third and fourth rows can be
understood. End of the notation is understood; we will go directly to
the model building.
Hypothesis
The previous hypothesis was that,
hθ ( x )= θ 0 + θ 1 x
With this we will not work on this multi variable set. Then the way?
Well, there are ways, that is to multiply a new parameter before each
variable.
hθ ( x )= θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 3
+ θ 4 x 4 …(1)
Now we can do good practice with different values of the theater, for
example,
h θ ( x )=80+0. 1 x 1 +0.0 1 x 2 +3 x 3 −2 x 4
…(2)
, hθ ( x )= θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + θ 3
x 3 + θ 4 x 4 + ⋯ + θ n x n …(3 )
Hold on,
And
However,
For example, a column and another row matrix. But the variables
we're working with are both column matrices. So, we can convert a
column matrix to the row matrix to multiply. Transpose the name of
this conversion. Transposing is very easy; the matrix rows are
arranged in columns or columns will be arranged in a row.
We have to modify theta matrix here, so that's it
Where,
While, n=1
If you want to apply this formula, you can use the loop. But no, we
can do it very easily using NumPy through the matrix operation.
What is the meaning of which notes are in the previous chapters? I
also show it again through a simple example.
Well, this is my dataset,
Where, m=3
Linear Regression Formula,
But the dimension 2 x 1 of our theta, that is, two rows and a column.
In the matrix form,
homework
Gradient Dissent Formula Formats Write in the form of matrices.
We will write this matrix calculation in Python. These reasons
require linear Algebra Solid Foundation to quickly understand the
calculation of machine learning. It is very easy to apply machine
learning algorithm for those who understand the good linear algebra
and calculus.
How to Apply Cost Calculation and Gradient Dissent Algorithm
Using NumPy
Now we will use the NumPy to write the 97 calculations of the
calculation of the calculation of the calculation of the dataset and the
gradient descent.
Cost calculation function in Python
# Here, X, y and theta are 2D NumPy array
def computeCost (X, y, theta):
# Getting number of observations
m = len (y)
# Getting hypothesis output
hypothesis = X.dot (theta)
# Computing loss
loss = hypothesis – y
# Computing cost
cost = sum (loss ** 2)
# Returning cost
return (cost / (2 * m))
Put the number of observations on a m
Hypothesis values out
I got the loss, which is the value of the original
value and the predicted value
I got the cost, which is the sum of the squares of
loss
Average cost return
Gradient Dissent Calculation Function in Python
def gradientDescent (X, y, theta, alpha, iterations):
cost = []
m = len (y)
for i in range (iterations):
# Calculating Loss
loss = X.dot (theta) - y
# Calculating gradient
gradient = X.T.dot (loss)
# Updating theta
theta = theta - (alpha / m) * gradient
# Recording the costs
cost.append (computeCost (X, y, theta))
# Printing out
print ("Cost at iteration {0}: {1}" .form (i, computeCost (X, y,
theta)))
return (theta, cost)
a = i ^+ 2 j ^ + 3 k And b = i ^+ 2 j ^ + 3 k ^
will be dot product,
Take, C as a matrix,
should be n 1= m 2
But we can easily add it. This copy can be done in Numpy like this,
import numpy as np
x = np.array ([[1, 2, 3], [4, 5, 6], [7, 8. 9], [10, 11, 12]])
v = np.array ([1, 0, 1])
# Stacking 4 copies of 'v' on top of each other [4 -> 4 rows, 1 -> 1
rows]
vv = np.tile (v, (4, 1))
print (vv)
# Prints "[[1 0]
# [1 0]
# [1 0]
# [1 0]] "
y = x + vv # Adding elementwise
print (y)
# [[2 2 4]
# [5 5 7]
# [8 8 10]
# [11 11 13]]
Actually, there was no need to do all this work, Numpy handles it
manually, and this is the Numpy's Broadcasting
import numpy as np
x = np.array ([[1, 2, 3], [4, 5, 6], [7, 8. 9], [10, 11, 12]])
v = np.array ([1, 0, 1])
y=x+v
print (y)
# [[2 2 4]
# [5 5 7]
# [8 8 10]
# [11 11 13]]
See Numpy User Guide, Release 1.11.0 - Section 3.5.1 (General
Broadcasting Rules) for more details about broadcasting.
Broadcasting applications
import numpy as np
## Example 1
# Computing the external product of vectors
v = np.array ([1, 2, 3]) # shape (3,)
w = np.array ([4, 5]) # shape (2,)
# To compute an external product, we first reshape 'v' to be a column
vector of shape (3, 1)
# Then we can broadcast it against 'w' to produce a output of shape
(3,2)
# Which is the outer product of 'v' and 'w':
# [[4 5]
# [8 10]
# [12 15]]
print (v.reshape (3, 1) * w)
# Add a vector to each row of a matrix
x = np.array ([[1, 2, 3], [4, 5, 6]])
# [[2 4 6]
# [5 7 9]]
print (x + v)
## Example 2
# Let's add vector 'w' with 'x' [x.T == x.transpose ()]
z = x.T + w
print (z)
#prints
# [[59]
# [6 10]
# [7 11]]
# Now we have to revert back to original shape
print (z.T.)
#prints
# [[6 6]
# [9 10 11]]
NumPy’s basically some operations are shown. In the next phase, we
will begin to apply the formula using the NumPy library.
CONCLUSION
This tutorial introduces you to machine learning. Now, you know
that machine learning is a technique of training machines for human
brain operations, which is slightly faster and better than average
people. Today we have seen that the machines can lose human
champions in games, Chess, Alpha GO, which seems very
complicated. You have seen that machines can be trained for human
activity in many areas and can help people to lead a better life.
Machine learning can be a supervision or nonprofit. If you have less
information for your training and clearly labeled data, select the
option for supervised education. Generally learning unknown usually
gives better performance and results for larger data sets. If you have
a large data set available easily, go for deep learning strategies. You
have learned to learn reinforcement and to learn deep reinforcement.
Now you know what neural networks, their applications and
limitations are.