Machine Learning - For Beginners Your Definitive Guide For Neural Networks, Algorithms, Random Forests and Decision Trees Made Simple-AUVA PRESS (2017)
Machine Learning - For Beginners Your Definitive Guide For Neural Networks, Algorithms, Random Forests and Decision Trees Made Simple-AUVA PRESS (2017)
BEGINNERS
See more books: https://www.auvapress.com/books
PREFACE
Chapter 1 Introduction To Machine Learning
What is learning?
When is machine learning important?
Applications of machine learning
Chapter 2 Introduction to Statistics and Probability Theory
Random variables
Distributions
Mean, variance, and standard deviation in statistical distribution
Marginalization
Bayes’ theorem
Chapter 3 Building Blocks of Machine Learning
Formal statistical learning frameworks
Empirical risk
PAC learning strategies
Generalization models for machine learning
Chapter 4 Basic Machine-Learning Algorithms
Challenges of machine learning
Types of learning
Chapter 5 Supervised Machine-Learning Algorithms
Decision trees
Random forest
KNN algorithm
Regression algorithms
Chapter 6 Unsupervised Machine Learning Algorithms
Clustering algorithm
Markov algorithm
Neural networks
Chapter 7 Reinforcement Machine Learning Algorithms
Q-learning
SARSA
CONCLUSION
FURTHER RESOURCES
ABOUT THE AUTHOR
PREFACE
In the following chapters, you will learn all the ins and outs of machine
learning over a wide range of applications, exploring some of the basic
tools from statistics and probability theory that define how machine-
language problems are framed and understand fairly basic yet effective
algorithms that solve complex machine-learning problems. Let’s jump in.
Chapter 1
But more specifically, the biggest query that should concern us at this stage
is “How do we represent the training data or input so that experience can be
described?”
For example, when mice come across an unfamiliar food that looks or
smells good, they eat it in small quantities at first. If the food tastes bad or
makes them sick, the mice associate that food with illness and consequently
avoid it from then on. Likewise, this behavior can be found in other animals
in the wild when venturing into new territories.
On the other hand, if the food doesn’t have any ill effects on their bodies,
the mice continue eating the food. Obviously, a learning process has taken
place: the mice have used their experience eating the food to acquire basic
knowledge of the safety of that food, automatically predicting that it will
have the same effect on their bodies when encountered in the future.
#1: Finance
In the financial world, online learning and decision-making take place in a
wide variety of scenarios that can change depending on the business
environment, the available feedback, and the nature of the decision. These
decisions can involve stock trading, ad placement, route planning, or even
picking a heuristic move.
Machine learning can also help banks discover important new insights in
data, enabling them to compete better and increase their bottom lines. Such
data mining assists not only in determining the best investment
opportunities but also in identifying high-risk customer profiles and
detecting signs of fraud.
Well, Machine learning is a broad field that intersects several other fields
such as statistics and probability theory, AI and computer science
algorithmic field that helps to build programs that can iteratively learn on
their own and extract meaningful insights from large sets of data. To unlock
the immense potential of machine learning in a given application, a
thorough understanding of statistics and probability theory is necessary for
a good grasp of machine learning to get the best intelligent systems.
Obviously, there are several reasons why statistical and probability theory is
essential ingredients of machine learning. Some of these reasons are:
Selecting the algorithm that provides the best balance of accuracy,
number of parameters, training time, and model complexity;
Selecting the appropriate parameter settings for the program and its
validation strategies;
Identifying the bias-variance tradeoffs that cause underfitting and
over fitting; and
Estimating the right confidence interval and level of uncertainty in
the program’s decisions.
Suppose you throw a die. Let X represents the random variable that relies
on the outcome of the throw. Obviously, the natural choice for X would be
to map the outcome denoted as i to the value of i. An X outcome of 1, for
instance, would map the event of throwing a one on the die to the value of
1. (We could also choose strange mappings such as a variable Y that maps
all outcomes to 0, but that would be a tedious and boring function.) The
probability (P) of outcome i of random variable X is denoted as either P(X
= i) or PX(i); in this way, you can avoid the formal notation of event spaces
by defining random variables that capture the appropriate events.
Distributions
Probability distribution is the probability of each outcome of a random
variable. Consider the following example:
Let random variable X again represent the outcome space of the die throw.
Assuming that the die isn’t loaded, we would expect the probability
distribution of X to be as follows:
Conditional distributions
Using a conditional distribution, the distribution of a random variable when
the value of another random variable is known, we can base the probability
of an event on the given outcome of another event. The conditional
probability of random variable X = a given that random variable Y = b is
defined as follows:
Independence
A random variable is independent of another random variable when the
variable’s distribution doesn’t change in response to the value of the other
variable. In machine learning, we can make assumptions about data based
on independence. For instance, training samples i and j are assumed to be
independent of an underlying space when the label of sample i is unaffected
by the features of sample j. In formal notation, random variable X is
independent of random variable Y only when
P(X) = P(X|Y)
The values of X and Y have been dropped because the statement is true for
all values of X and Y.
Mean, variance, and standard deviation in statistical
distribution
Mean
Also known as the expected value or the first moment, the mean (E) of the
distribution for each outcome (a) of random variable X is defined as
follows:
In other words, the mean of distribution can be given by the formula below:
The mean of a random variable X is useful for assessing the expected of the
variable’s distribution. For instance, if you’re a stockbroker, you may want
to know the expected value of a client’s investments in a year’s time. You
may be want to investigate the risk of your investment. In other words, how
likely it is that the value of your investment will deviate from its
expectation?
Variance
To protect the client from unacceptable financial losses, the stockbroker
may also want to estimate the amount of risk involved in an investment by
calculating its variance. The variance or second moment of a probability
distribution is the degree of disparity within the distribution, defined in
formal notation as
Instead of
Standard deviation
The variance of a random variable’s distribution measures only the average
degree of disparity, which is why its use is mostly confined to calculating
risk. To more accurately determine how much the random variable deviates
from its expected value, the standard deviation is used instead. The standard
deviation (σ) of the distribution of random variable X is the square root of
the distribution variance:
P(x; y)
This means that given the joint density, we can recover the P(x) by
summing it out with y if it is a discrete random variable or integrating it
with y if it is a continuous random variable. Such an operation is called
marginalization.
p(x) = ∑ p(x; y)
For all values of y. In such a scenario, we say that the random variables X
and Y are independent which mean that the values that X assumes don't
depend on the values that Y assumes.
p(x; y) = p(x)p(y)
In this chapter, we explore the building blocks of machine learning that will
help you close the gap. By the end of the chapter, you should be in a
position to understand the finite elements of machine learning that can help
you turn ideas into hardware and software primitives.
Formal statistical learning frameworks
Statistical learning contexts can simplify learning processes in computers.
Allow me to illustrate using an example.
Suppose you’ve landed on an island where the natives love to eat papaya, a
fruit with which you are unfamiliar. You’re standing in the marketplace,
trying to figure out which papayas are the best. Based on your previous
experiences with fresh fruit, you decide to examine the color and softness of
each papaya. The colors range from red to dark brown, while the softness
ranges from rock solid to mushy; in statistical terms, these are two sets of
inputs to help you predict the taste of each papaya.
Learner’s input
Learner’s output
Simple data-generalization model
Measures of success
f:X!Y
or
Yi yi= f(Xixi)
Any useful notion of the error that can be computed by the learner will be
the training error, which is the error that the classifier gets from the training
sample.
Although PAC learning can be useful even when a model’s training data
accurately represent the distribution, it is especially vital for handling the
inevitable uncertainties of the data-access model. This model always
generates random training sets with a small chance of producing an
uninformative training set that has only one domain point.
Generalization models for machine learning
The data-access model that we have discussed in the previous section can
readily be use to represent the wider scope of learning processes in general.
By generalization, we mean having two essential components:
When the reliability assumption has been met, we can expect a machine-
learning algorithm to generate training data that reliably reflect the
distribution. However, such an assumption can be impractical, placing
unrealistic standards on the algorithm.
1. Search engines.
Probably the most familiar application of machine learning, search
engines are designed to find the Web pages that are most relevant to a
user’s query. To accomplish this goal, a search engine needs to figure
out which pages are relevant and which ones exactly match the query.
The program can draw its conclusions from a variety of inputs, such
as the words in each link, the content of each page, or how frequently
users follow the suggested links.
2. Collaborative filtering.
Collaborative filtering is the learning process by which online
retailers such as Amazon recommend items that customers may be
interested in buying. These programs are similar to search engines,
but relevancy is based on trends in a user’s past purchases rather than
on direct input from the user.
3. Automatic translation.
A particularly challenging task in machine learning is automatic
translation of documents. It’s one thing for a computational linguist to
develop a curated set of rules for a program to follow, but it’s quite
another for the program to understand the words in a document and
learn other rules of syntax and grammar on its own, especially since
documents aren’t always grammatically correct. One solution is to
design a program that learns how to translate languages by using
examples of translated documents.
For instance, we can annotate any audio sequence with the text data
such as Apple’s Siri, the recognition of handwriting and signatures
and the avatar behavior that is used in computer games. This
information had increasingly led to the development of robotic
systems that are far much more efficient in work than when these
systems were not there.
6. Facial recognition.
These days, facial recognition is a primary component of many
security and access-control systems. Using photos or video
recordings, such a system is designed to allow only people it
recognizes into the building or information system it’s guarding. The
difficult part, of course, is verifying a person’s identity; a machine-
learning program is far more effective in this regard, since a system
that learns from its mistakes is much harder to fool than one with a
rigid set of standards.
Types of learning
Machine learning is a broad field with several different subfields of learning
processes. These subfields can be grouped into three main categories based
on their goals:
Supervised learning
Unsupervised learning
Reinforcement learning
Decision trees
Random forest
KNN
Regression algorithms
Clustering algorithm
Markov algorithm
Neural networks
Q-learning
SARSA
Decision trees use one of several algorithms to help them split a given set
into two or more subsets, producing increasingly homogeneous outcomes;
the exact algorithm chosen depends on the types of variables involved. The
following algorithms are among the most commonly used:
The Gini index. The Gini index works as follows: If two items with
the same value of a target variable are randomly selected, then they
must be in the same class and probability. The categorical value of
the target variable is marked by another variable as either success or
failure, which the algorithm converts into a binary split within the
class. The Gini scores of the subclasses are then calculated as the
squares of their respective probabilities within the class and added
together to determine the Gini index of the class. The higher a
class’s Gini index, the greater its homogeneity.
The higher the value of the Gini index, the greater the homogeneity.
For the sub-nodes, we compute the Gini index that is denoted as the
square of probability for success and failure using the formula:
(p2+q2).
Compute the Gini for the split using the weighted Gini score of each
node of the split.
Salary Bands
The above salary bands are then split based on the 3 variables:
Variable: Age
Using the above aggregate data, the algorithm now builds a single mean
probabilistic model to predict the salary band of one of the training data that
can map the above data into a single probabilistic model, For simplicity, we
can use the mean probabilistic model to develop the unique framework for
modeling the above information. Let’s say a 30-year-old Kenyan male who
has a college degree, which produces the following result:
The model is also competitive learning in the sense that it internally uses
competition between the model elements—the data instances—to make a
predictive decision. The overarching objective is to measure the similarities
between data instances that provide the hidden data instance that is
necessary for making predictions.
A lazy learning process, the algorithm doesn’t build a model until it is asked
for a prediction, which keeps the data relevant to the task at hand. Let’s
look at a practical example to demonstrate how the model works.
Generate the input data. The input data forms part of the training data. In
the flower example, we have 150 observations of the iris flowers from 3
different species that have different sepal lengths, sepal widths, petal
lengths and the petal widths.
Compute the distance between any two data instances (determine the kth
nearest neighbor) to determine the similarity levels.
Suppose you want to estimate the sales growth in your company based on
current economic conditions. Using your company’s past and current
performance data, a regression algorithm shows you the growth trend. If
you know that your company is growing about twice as fast as the overall
economy, for instance, you can predict future sales growth.
Linear regression
Logistic regression
Polynomial regression
Stepwise regression
Ridge regression
Y = a + (b*X) + e
where a is the Y intercept and b is the slope of the line. The line is then used
to predict the value of the target variable based on the given predictor
variable.
odds = p/(1-p)
ln(odds) = ln[p/(1-p)]
logit(p) = ln[p/(1-p)] = b0 + b1X1 + b2X2 + b3X3 + ... + bkXk
Y = a + (bX2)
Y = a + (bX)
Y = a + (bX) + e
Factors to consider when selecting a regression- analysis
technique
Below are some useful factors that you should consider when
choosing a particular regression analysis method:
How do you want to explore your data? Data exploration is an
unavoidable component of developing a predictive model and
therefore should be one of the initial considerations in selecting the
right technique.
Which statistical metrics do you want to include in your learning
algorithm? You may be interested in several metrics such as the
statistical significance of the parameters, R-squared, adjusted R-
squared, and the error term, in which case stepwise regression may
be the best technique for your purposes.
How do you want to evaluate the predictive model? Cross-
validation is the best approach to assessing the model; a simple
mean squared deviation between the observed and the predicted
values gives you a good idea of the prediction accuracy.
Does the dataset have multiple independent variables? If these
variables can confound the model, then you should avoid the
automatic selection method so you don’t have to use all the
variables simultaneously.
The algorithm selects the number of points for each main cluster.
For the sake of clarity, let these clusters be called centroids.
Each data point clusters with the centroid with which it shares the
most attributes.
Each centroid is then divided into more clusters based on
similarities between its members.
With each new set of clusters within a centroid, further division
occurs until the centroids no longer change, which is called
convergence.
The sum of the squares of the differences between data points within a
cluster and those in the rest of their centroid constitutes the value for that
cluster. Likewise, the total value of the centroids is the sum of the squared
values for all the clusters.
Since reinforcement learners don’t require any expert supervision, the types
of problems that they are best suited for are usually complex. Examples of
these problems include the following:
Game playing. When playing a game, the learner finds the best
move based on various factors and the number of possible states
in the game. Reinforcement learning eliminates the need for
manually specifying all the rules.
Control problems. Unlike both supervised and unsupervised
learning, reinforcement learning can develop management
policies for complex issues such as elevator scheduling.
Now that you know the basic strategies of reinforcement learning, let’s
explore two of the most common algorithms.
Q-learning
Often used for temporal-difference learning, Q-learning is an off-policy
algorithm that learns an action-value function to provide the expected utility
of taking a given action in a particular state. Since the algorithm can use
any such function, the policy rule must specify how the learner selects a
course of action.
Once the action-value function has been determined, the optimal policy can
be constructed using the actions that have the highest values in each of the
states. One advantage of Q-learning is that it doesn’t need a model of the
environment to compare the expected utility of all the available actions. The
algorithm also doesn’t have to adapt to problems with stochastic transitions
and rewards.
SARSA
State-action-reward-state-action (SARSA) is an algorithm that describes a
Markov decision-process policy where the main function for updating the
q-value relies on the current state of the learner, the action that the learner
chooses, the reward that the learner gets for selecting the action, and the
state that the learner will be in after taking the action. Since the SARSA
algorithm learns the safest path to the solution, it receives a higher average
reward for any trial than the Q-learning algorithm does, even though it
doesn’t choose the optimal path.
CONCLUSION
From search engines to facial-recognition systems to driverless cars,
machine learning is revolutionizing the once-dull AI field of IT. Smart
programmers stay ahead of the curve by staying well informed about the
latest technological advances, and now you have all the basic tools you
need to join the revolution and start designing machine-learning programs
yourself.
FURTHER RESOURCES
1. https://www.sas.com/en_us/insights/analytics/machine-
learning.html
2. http://www.kdnuggets.com/2016/08/10-algorithms-machine-
learning-engineers.html
3. http://cdn.intechopen.com/pdfs/10694.pdf
4. http://disp.ee.ntu.edu.tw/~pujols/Machine%20Learning%20Tutorial
.pdf
5. http://mlss08.rsise.anu.edu.au/files/smola.pdf
6. http://www.ulb.ac.be/di/map/gbonte/mod_stoch/syl.pdf
7. http://scribd-download.com/essentials-of-machine-learning-
algorithms-with-python-and-r-
codes_58a2f7506454a7a940b1e8ec_pdf.html
8. http://machinelearningmastery.com/supervised-and-unsupervised-
machine-learning-algorithms/
9. https://www.saylor.org/site/wp-content/uploads/2011/11/CS405-
6.2.1.2-WIKIPEDIA.pdf
10. https://page.mi.fu-berlin.de/rojas/neural/chapter/K5.pdf
11. http://www.cs.upc.edu/~bejar/apren/docum/trans/09-clusterej-
eng.pdf
12. http://webee.technion.ac.il/people/shimkin/LCS11/ch4_RL1.pdf
13. http://web.mst.edu/~gosavia/neural_networks_RL.pdf
ABOUT THE AUTHOR
Matt Gates is an associate university lecturer with more than 10 years of
teaching experience on academic subjects ranging from IT management,
software development, and machine learning to data modeling. He believes
that the next phase of IT development lies in AI, Machine Learning and
Automation.
Matt hopes to use his books to share his knowledge to impact thousands of
people.
In his spare time, Matt often engage in discussions on Reddit, forums and
buy the latest gadgets on Amazon.
Matt’s Message
Thank you for reading! This book is a starter quick guide on Machine
Learning to help you familiarize with the basic understanding on what is
Machine Learning and the various type of algorithms (including their pros
and cons) that are available for your further exploration.
If you would like to read more great books like this one, why not subscribe
to our website.
https://www.auvapress.com/vip
Thanks for reading! Please add your short review on Amazon and let me know what your thoughts! –
Matt
Other Victor’s Titles You Will Find Useful
Blockchain Technology
Victor Finch
ISBN: 978-1-5413-6684-8
Paperback: 102 Pages
eBook, Audiobook Available
Bitcoin
Victor Finch
ISBN: 978-1-5441-4139-8
Paperback: 98 Pages
eBook, Audiobook Available
Other Auva Press Titles You Will Find Useful
Smart Contracts
Victor Finch
ISBN: 978-1-5446-9150-3
Paperback: 106 Pages
eBook, Audiobook Available
Python
Ronald Olsen
ISBN: 978-1-5426-6789-0
Paperback: 152 Pages
eBook, Audiobook Available
AUVA PRESS
AUVA Press commits lots of effort in the content research, planning and production of quality books.
Every book is created with you in mind and you will receive the best possible valuable information in
clarity and accomplish your goals.
If you like what you have seen and benefited from this helpful book, we would appreciate your
honest review on Amazon or on your favorite social media.
Your review is appreciated and will go a long way to motivate us in producing more quality books
for your reading pleasure and needs.
Visit Us Online
AUVA PRESS Books
https://www.auvapress.com/books
Register for Updates
https://www.auvapress.com/vip
Contact Us
AUVA Press books may be purchased in bulk for corporate, academic, gifts
or promotional use.
For information on translation, licenses, media requests, please visit our
contact page.
https://www.auvapress.com/contact
- END -