Roberts Ryan Machine Learning The Ultimate Beginners Guide F
Roberts Ryan Machine Learning The Ultimate Beginners Guide F
Roberts Ryan Machine Learning The Ultimate Beginners Guide F
Ryan Roberts
CONTENTS
Introduction
Chapter 1 – What is Machine Learning?
Chapter 2 – Applications of Machine Learning
Chapter 3 – Supervised versus Unsupervised Algorithms
Chapter 4 – Machine Learning Algorithms
Chapter 5 – Algorithms in Practice
Chapter 6 – Voice Control and Machine Learning
Conclusion
Thank you very much for getting this book!
I hope, that you will really enjoy reading it. If you want to help me to produce more
materials like this, then please leave a positive review on Amazon.
It really does make a difference!
Introduction
Anyone who grew up watching sci-fi shows on television knows the allure of a computer that can
learn. The idea of a robot that can learn new information, develop a personality, and interact naturally
with humans has been part of the human imagination for centuries. Within the past few decades,
several technology companies have taken the first steps toward bringing those machines out of fiction
and into reality—and machine learning is the way they’ve been doing it.
There are, of course, far more practical applications for machine learning than a robot who can carry
on a conversation. Computers that are able to learn from their mistakes and construct new ways of
approaching unfamiliar data can be an invaluable asset in almost every industry and area of life. This
kind of technology is increasingly necessary as more and more data is collected in online interactions,
and is arguably the natural next evolution of modern computing.
Machine learning is certainly not a new concept. As early as the 1990s, Apple had developed a basic
intelligent program that could analyze and identify handwriting samples. This software was robust
enough that a similar program is still in use today by the U.S. Post Office.
What has brought machine learning so much attention recently are the advancements in usability. The
renewed interest in intelligent software can be seen in the recent proliferation of voice control
personal assistants—of which every major technology company now has a version—and the
expansion of the smart home sector.
While machine learning has been a steadily advancing field since its earliest developments, the new
attention it has received from the general public has accelerated the pace of development. Machine
learning seems poised to revolutionize how people interact with machines.
For anyone whose work requires them to stay on the cutting edge of technology, this means the time
has come to at least learn the basics of how machine learning works and what it is currently capable
of. While the concepts behind it are highly advanced, the premise and objectives are often very easily
comprehensible, even to a technological layman.
The chapters that follow in this book will help bring you up to speed. It will outline just what
machine learning is, how it is employed, and some of the situations in which it can be most useful. It
will also touch on some specific use cases for a variety of industries, examples which may help you
to gain a deeper understanding of just what the technology can be used for.
While non-math types might be intimidated by words like “algorithm,” the truth is some basic
understanding of the underpinnings of this software is important to truly grasp what they can
accomplish. This book will walk you through the basic functions and abilities of some of the most
common types of algorithms, without the mathematical component that can bog them down and make
them hard to process.
As with any relatively new technology, there is still much to be explored in the field of machine
learning. What is plain to see is that it is a technology with massive potential. The advancements so
far made by companies like Apple and Google are just the first step into a larger world. Learning
about intelligent machines now will help prepare you for this smarter future.
Chapter 1 – What is Machine Learning?
In the broadest sense, machine learning is the construction of an artificial intelligence that is capable
of learning new data which isn’t programmed into it. This is accomplished using one or more
algorithms, which can use a variety of methods to interpret and respond to given data.
Machine learning is a subset of the broader science of data mining. This is a general term applied to
the use of data in a purposeful way to make new connections with the information and use it to better
advantage. There are a lot of different methods included under the broad umbrella of data mining.
This can often include statistical algorithms and various forms of analysis along with machine
learning. It encompasses the practice and study of data storage and manipulation as well as the
application of methods from a variety of scientific areas to aid in identifying patterns not previously
known.
Machine learning has a few distinct advantages over the statistical modeling prevalent in data mining.
Statistical modeling is based on a theory that has been mathematically proven. This means that data
analyzed using it has to meet those same strong assumptions, limiting its use for analyzing broad
categories of data.
The idea behind machine learning is to use computers to probe data and find the structures or theories
that support it—to build the theory from the learning, rather than having it analyze according to a
theory. The difference between the two can be seen in how you would test for a successful model. In
statistical analysis, the process fails if it cannot fulfill a theoretical test that proves a hypothesis; in
machine learning, a validation error on newly inputted data is a sign of problems.
Machine learning using an iterative approach, meaning it goes over the data multiple times until a
successful pattern is identified. It also often is able to learn from errors it made in previous iterations,
strengthening its ability to complete the given task on the next pass.
This makes it very helpful in all kinds of analytic pursuits, across various fields and disciplines.
Because of the iterative approach of the program, it makes it easy to automate the construction of
analytical models, potentially allowing you to run far more accurate and effective models while
greatly reducing the amount of time and effort spent in running them.
Ultimately, this style of algorithm overcomes one major short falling of computerized systems and
artificial intelligences up to this point. Though they are very effective and efficient at analyzing data
they’re given, most previous software would not be able to generate a given solution for a problem
unless it had been programmed to look for it. Because of the iterative approach used in machine
learning, the algorithms can find hidden insights in the data, even if they haven’t been explicitly
programmed to look for them, greatly expanding the value of the software.
Many of the algorithms at use today in machine learning applications are hardly new. A lot of them
have been around a long time, by computing standards. Machine learning was born out of the idea that
computers could be self-improving, something that got its start with pattern recognition software.
While it might not be a new science, the machine learning of today is vastly different from that of the
past. Today’s programs are able to independently adapt when presented with new data, learning from
their previous attempts to process similar things. This means they’re able to produce more reliable
results and make better decisions. The ability to automatically analyze a lot of data at a much faster
rate is what has brought fresh momentum to the machine learning industry.
Even laypeople are likely familiar with some of the most impressive developments in machine
learning that have happened over the last decade. Voice controlled personal assistant software like
Cortana, Siri, Alexa, and Google assist are pertinent real-world examples of machine learning in
action, based on the idea that the service will become more adept at answering your questions the
longer you use it.
Predictive softwares are also firmly based in machine learning. Google’s self-driving car utilizes this
kind of program, as do the personalized product or viewing recommendations you’ll see on sites like
Amazon or Netflix. There is also a significant value for machine learning in fraud detection, as a
computer running the right algorithm could notice inconsistencies before even a well-trained human.
The further we move into the information age, the more important machine learning becomes. There is
more data available than ever, in even more forms and more places. It takes a new kind of analysis to
work through all of it, one that can process more information in less time and for less money.
As technology progresses, it takes faster, more efficient computers to handle the newest advances.
Getting these more powerful computers in turn makes it possible to produce even larger models,
pushing the industry forward again. This ever-shifting and evolving landscape necessitates machines
smart enough to keep up.
Using machine learning makes it possible to produce models with a minimum of effort that can
analyze both a larger volume of data and more complex data. The results of these models are typically
more accurate than those run using other methods, and it can run the model on a very large scale.
These precise, efficient models can help your organization identify new opportunities for profit, or
avoid a previously unseen risk. It can even automate certain aspects of the decision-making process,
limiting the need for human intervention on more trivial matters, thus letting your organization’s best
brains focus on more important work.
However valuable and impressive a machine learning system can be, it is nonetheless only as good as
the data that is fed into it. These kinds of systems will learn from interfacing with information. The
better-structured the information that the machine is learning from, the more effective the program
will ultimately be.
Implementing machine learning successfully in your business requires only two things: the algorithms
that will process the data, however basic or advanced, and the data that they will analyze. You should
think about each of these aspects in depth, including where it will come from and what you hope to
achieve using it, before deciding what kind of algorithm will suit your needs best.
Consider the scalability of your model, as well, and how much that is likely to be a factor. The whole
idea is to create a model that is able to adapt to new information, but you’ll be setting yourself up for
success if the algorithm is designed to work with the volume and type of data you plan to feed into it.
The structures you use for the information will also have a great impact on the program’s
effectiveness. Consider the ontology of the program. This can be loosely defined as the knowledge
and information architecture in place for accessing information and retrieving answers. It includes
things like the vocabulary and thesaurus structures, the established relationships between terms and
concepts, and the general taxonomy of the information. A good ontology is what makes a system more
user friendly, allowing for variations in phrasing and more capable of responding in unfamiliar
situations.
Artificial intelligence is in many ways the next natural step in the evolution of computing technology.
The theories and practices behind machine learning are currently producing some of the most natural
and successful AI programs, both on the customer-facing side of the equation and among upper-level
development.
Terminology
As is the case with any new technological development, there are some terms associated with
machine learning that you might not be familiar with if you haven’t spent much time researching the
topic. Learning these terms can help you to better navigate conversations and research surrounding
machine learning systems.
The corpus is the collection of information the program is attempting to process and interpret. In a
supervised system, it will contain the “answer keys,” if you will, telling the computer what results it
should come up with so it can adjust accordingly.
What would be called a dependent variable in statistics is called a label in machine learning. A
general variable is known as a feature in machine learning. This can consist of anything that you
want to use the sort or otherwise analyze the data, whether that’s a numerical variable like height or
weight or a more behavior-centered statistic, like purchase or web browsing history. The way that
information is structured for retrieval is known as knowledge engineering, and is an important
aspect of a machine learning algorithm.
While there are more specific terminologies employed in discussing various specific algorithms and
types of machine learning, those will be explored in later chapters more focused on those topics.
These terms should get you started as you begin to delve deeper into the intricacies of machine
learning.
Chapter 2 – Applications of Machine Learning
There’s certainly something impressive about the idea of machine learning even when explained in
the abstract, but you may have a more difficult time figuring out just how it could be applied to your
own life or business. In fact there is a role for machine learning in any industry that works with large
amounts of data—and in today’s world, that’s an increasing percentage of businesses.
Machine learning lets you gain insights from data in real time, regarding a whole host of different
topics and areas. From a marketing and sales standpoint, it could help you to gain an advantage over
your competitors by giving you early warning of shifts in the market. In the areas of health and safety,
it can be used to identify slow-building infrastructure or geological problems, allowing for better
preparation. The array of potential uses are vast, and are honestly only just beginning to be explored
thanks to the recent improvements to the technology.
In a general sense, machine learning is useful in any application that’s based on pattern recognition.
This includes obvious applications like facial recognition software and handwriting recognition, but
it is also the basis of speech to text and voice control softwares.
Pattern recognition can take the form of any kind of visual, audio, or numerical data, depending on the
intended use of the software. The competitive machine learning website Kaggle.com has a list of its
most popular current problems that utilize machine learning. These span a broad array of industries,
and give a good picture of the range of things machine learning can be utilized in.
There is a problem on Kaggle.com aimed at anticipating the hazard level of properties for an
insurance company; another designed for predicting what the revenue of a given restaurant would be;
still another that utilizes weather clues to predict the bike rental numbers for a given day. Though
they’re very different programs that work with very different data, all of them are classic examples of
machine learning systems in action.
Most of the practical applications of machine learning today are based on a supervised or at least
part-supervised algorithm. There’s more about what this means in chapter 3, but the general shape for
this kind of machine learning is the ability to identify objects and predict probabilities based on the
data it has been given.
You can find several examples of this kind of machine learning software in current use for free
online. The University of California at Irvine has an online repository of machine learning datasets,
each of which focuses on a select topic. Some of these applications are rather frivolous; the poker
hand predictor is more a novelty than anything, and various applications for identifying flora and
fauna based on given characteristics are useful to a very small subset of the population.
Other datasets in the UC Irvine repository have very significant uses. The car evaluation dataset can
predict how safe a car is based on the information about the car that you input. The heart disease
dataset can predict the levels of heart disease based on a patient’s diagnostic information. One that
may be more viable for at home use is the internet advertisement dataset, which can predict whether
an image is an ad or not based on the details of the images.
No doubt you are already seeing a vast array of potential uses of this software, in a variety of
industries. The sections that follow in the rest of this chapter will go through some of the potential
areas where machine learning could be most beneficial and influential in the near future.
Financial services
There are two key ways that banks and other financial institutions use machine learning: to make new
insights into data, and to prevent fraud. To the first point, machine learning can be used to identify
new investment opportunities. An algorithm that is able to intelligently follow the market could
inform investors of good times to trade, giving these institutions a leg up on the competition.
On the other side of the equation is the value in security. With the growing prevalence of identity
theft, it’s more important than ever for banks to be able to detect fraudulent activity—both for their
own protection and that of their customers. An intelligent system can track a customer’s transactions
and use their history to determine whether a given charge seems legitimate or not. These systems are
better able to see patterns that might go undetected by human eyes, picking up on potential security
risks before they can do too much damage.
Government
There are a few different government institutions that could benefit from the implementation of
machine learning systems. It is already at work in the U.S. Postal service, which uses it for
handwriting recognition, one of the first successful implementations of machine learning technology in
the general public.
Public safety and utilities management are two other areas where the use of smarter systems would be
a huge benefit. A lot of this is for the same reasons outlined above for financial institutions. Smarter
software makes it easier to detect fraud and identity theft, increasing the security of personal or
sensitive information.
There is a potential use for unsupervised as well as supervised systems within the overarching area
of city management and public safety. Machine learning systems can analyze massive quantities of
sensor data and seek out patterns. This can not only help city and state employees to find solutions to
the problems they’re currently facing, it can also help to predict issues that may happen, letting you
plan for them in advance—or avoid them altogether.
Health care
The invention of wearable sensors that can easily collect data on a person’s day to day vitals makes
machine learning systems incredibly useful in the health care industry. Decision tree style algorithms
tend to be especially helpful. Patients can be broken down by group to track how certain variables
affect the efficacy of treatment plans, or the likelihood of developing certain diseases. It can also help
to track abnormalities in a given patient’s health and use what it’s learned from previous cases to
predict what the cause of the abnormality might be.
There is also great value to be gained from machine learning at the clinical trial stage of medical
procedures. The ability to quickly create accurate models based off of real-world data can reduce the
number of physical trials that need to be run. This can speed the process of getting drugs from
development to implementation and reduce the risk factor to patients.
Marketing
Advertising and marketing were the first industries to see wide-spread practical applications of
machine learning. Any website that you go to that recommends things you might like based on your
search or purchase history is likely employing some form of machine learning. The site has analyzed
your shopping behavior and predicted what products you may be most interested in, using its
experience with other customers as a guide.
Marketing agencies use intelligent systems to capture and analyze customer data. When implemented
correctly, this can create a personalized shopping experience for the user. It is also extremely helpful
in setting up targeted marketing campaigns, allowing companies to isolate which specific sub-sets of
their customer base will be most interested in a given product or service.
You might think of a personalized shopping experience as being limited to online commerce, but this
kind of intelligent software is likely the future of retail. More and more brick and mortar stores are
putting these concepts into practice. Think of a grocery store perks card that sends you personalized
coupons; this uses the same basic concept as the recommended items bar on Amazon.
These are two terms you’ll likely hear thrown out a lot associated with machine learning, often
without any explanation or context for what they’re supposed to mean. These are two broad
categories of how an algorithm learns. They’re not the only options, but they’re the most commonly
utilized learning methods.
The majority of practical applications of intelligent algorithms currently in use utilized a supervised
learning method. It is the simpler of the two methods to understand, although the unsupervised method
can be equally valuable. It ultimately comes down to choosing the right learning method for your
particular problem.
You can think of the difference between supervised and unsupervised algorithms by using an analogy
of a school room. In a supervised method, the computer is “taught” the correct answers to various
questions, the way a teacher would instruct a student in the classroom. Once it’s learned the answer,
the computer can then identify the correct answer, extrapolating that learned knowledge to new
questions.
In an unsupervised method, there is no teacher. The computer is not given the correct answers to
questions or problems. It takes more of the path you would if you were learning a subject on your
own. It analyzes the information it’s been given, compares it with other situations, and in this way
constructs a framework to address future problems.
The sections that follow in this chapter go into more detail about each of these learning methods, as
well as outlining some of the most common use cases for each. Gaining a deeper understanding of the
difference between supervised and unsupervised learning can help you pick the right algorithm for
your needs.
Supervised algorithms
The supervised algorithm is taught how to identify objects and patterns using labeled examples. It
does this through methods like regression, prediction, and classification. The patterns that it learns
from processing labeled data then help it to predict values of additional, unlabeled data that it later
encounters.
Supervised algorithms are the best option when you know what you want your desired output to be. It
is commonly used in situations where historical data can be a reliable indicator of future events.
Insurance companies may use these types of algorithms to predict which customers will be most
likely to file a claim, for example. They can also be useful in identifying deviations from an establish
pattern, such as in fraud detection applications.
The ultimate goal of a supervised algorithm is to construct its framework of learned information so
well that when you input new data you can trust it to reliably interpret the variables and provide
accurate outputs. This means there are two equally important aspects to a successful supervised
algorithm: the program itself and the information that is used to teach it.
The program may not give such dependable answers during the early stages of its knowledge
acquisition. It is designed to learn from its mistakes, and takes an iterative approach to new data.
When it makes an incorrect prediction, it is corrected and stores that knowledge; it continues to make
predictions until it is able to do so correctly at a certain benchmark level of achievement.
While a supervised algorithm can still learn from new data that is inputted, its ability to evolve in a
structural sense is more limited than with the more open-ended unsupervised network. It is best suited
to applications where the addition of new knowledge isn’t likely to drastically change the underlying
structures.
Unsupervised algorithms
An unsupervised learning method trains the AI using data with no historical labels. In other words,
the computer doesn’t know what answers it’s supposed to output; instead, it analyzes the data, figures
out what it’s being shown, and returns what it believes to be an appropriate answer.
This makes unsupervised algorithms great for exploring large quantities of data and finding an
underlying structure. This makes it very useful for advertising and marketing purposes. It can explore
a large group of customer information and identify segments with similar features, helping to direct
targeted marketing campaigns into previously unexplored directions.
The ultimate goal of an unsupervised network is to learn more about the data by gaining more
information about its distribution and the relationships between different data sets. It works only with
input variables, with no corresponding output variable. Algorithms that use this learning method take
a few different forms, including self-organizing maps, nearest neighbor mapping, and k-means
clustering.
Unsupervised algorithms can be very useful in analyzing transactional data. They can also help in
recommending items or content based on user habits, and can be useful in identifying outliers to your
data set. By separating the data it’s given into clusters, it makes it easier to comprehend and analyze.
Semi-supervised algorithms
Falling in between the supervised and the unsupervised learning methods, this method utilizes both
labeled and unlabeled data when it’s training the program. This method has many of the same
applications and advantages as supervised learning, but with the added benefit of being more
adaptable to different structures.
There are quite a few real-world problems that are best dealt with using semi-supervised learning.
Sometimes it would be prohibitively time-consuming or expensive to label all the data that needs to
be given to the machine for it to learn, even if this same data was incredibly cheap and easy to collect
and store.
You can think of a semi-supervised learning method as a photo archive where some of the pictures
have labels and others don’t. The computer sees an image of a dog that’s labeled “dog,” and is then
able to identify it in other pictures, even when they don’t bear that same label.
In practice, unsupervised methods are used to discover the underlying structure, connections, and
variable sets in the data. These variables are then taught to the system in a supervised method, so they
can be used to make predictions on future unlabeled data it encounters. Again, the cycle is iterative.
Each time the computer learns a new variable, it is better able to make sense of similar data
encountered in the future.
Reinforcement learning
This learning method is seen most often in robotics, though it can also be applied to navigation or
gaming. In it, the algorithm learns which actions produce the best results through an iterative process
of trial and error. This allows it to train itself by interacting with its environment, learning from past
experiences to choose the correct option at any given moment.
There are three main components in a reinforcement learning system: the agent making the decisions,
the environment it interacts with, and the actions it is capable of making. The main objective of this
learning method is for the agent to choose actions that maximize reward over a given period of time.
Navigation is likely the easiest real-world context to envision when it comes to this learning style.
The reward is the fastest route from point A to point B. The system can analyze past trips, comparing
the actual travel time with the expected travel time, then selecting the best anticipated route in the
future based on those results.
Chapter 4 – Machine Learning Algorithms
All of the methods outlined here have the same end goal: to gain insight and locate patterns and
relationships in the given data, then use the information that’s been learned to make a decision.
Different algorithms will have different approaches to achieving this goal, and will be most useful in
different contexts as a result.
To get the most out of the algorithm that you choose, you have to know how to pair it correctly with
your tools, processes, and intended results. Determining the right combination requires a working
knowledge of the different types of algorithms that are out there and which kinds of problems each is
best-suited for.
On the supervised side of things, common algorithms include regression, logistic regression, and
decision trees or random forests. The K-means is one of the most common unsupervised methods,
though various mapping techniques are also common, like self-organizing maps and nearest neighbor
mapping.
One of the things that will determine the correct algorithm for your project is the type of variable in
play. Some will use categorical variables, which identify a specific target, such as a yes/no output.
Others will be continuous variables, which change and evolve depending on the situation. Certain
algorithms work better with one than the other, while some algorithms are equally well-suited to all
types.
The sections that follow in this chapter don’t explore every single algorithm that’s out there, but they
will give you a basic introduction to some of the most common options. This will help you to better
understand the differences between them and just which processes and tools each deals with best.
Linear regression
This is a supervised algorithm that can be used to establish the connection and relationship between
independent and dependent variables by fitting the best line between them, known as the “regression
line. It is very useful for estimating the value of homes or properties, as well as predicting total sales
or call numbers, based on a continuous variable.
The reason this is known as a linear regression is because the data points can be plotted on a single
line, represented by a linear equation of Y=aX+b. In this equation, “Y” is the dependent variable (or
solution), “X” is the independent variable, “a” is the slope of the regression line, and “b” is the
intercept point of the line with the axis.
People use a form of linear regression every day to make educated assumptions about the world
around them. Say you were to try and order a group of people according to their weight, but didn’t
have a scale on hand. You would probably look at other factors, like their height, build, and gender,
and use this to estimate how they would be ranked by using a combination of these visible
parameters.
Linear regression basically does the same thing. It uses typical correlations and data that’s given to
generate new data. If it has been well-trained, it will be able to return incredibly accurate results,
successfully predicting a wide range of different factors.
There are two main types of this style. Simple linear regression has only a single independent
variable, meaning it can be modeled in two-dimensional space. Multiple linear regression, on the
other hand, has more than one independent variable. The modeling for this method can be a bit more
complex; you may need to look toward polynomial or curvilinear best fit lines.
Logistic regression
Despite the name, this is a classification rather than a regression algorithm. It returns the likelihood of
a given occurrence taking place based on historical data, making it a supervised algorithm. This is
typically a yes/no or true/false kind of response, and the output will be expressed as a value between
0 and 1 that expresses probability.
This algorithm works by analyzing historical data concerning the topic in question. It then consults the
relevant data when a problem is posed to it. Say, for example, you input every single academic test
that you’ve taken in your years of schooling, labeling them by subject. The algorithm can then analyze
that data and tell you the probability that you will pass a test in any given subject. As you enter new
test stores, it will learn from and incorporate this new data, altering its response accordingly.
The inner workings of most logistic regression algorithms are more in-depth than this simple
example, analyzing a variety of data with varying degrees of intensity. Probability is predicted by
fitting the data to an appropriate logit function, which is why this method is alternatively known as
logit regression. The model can be improved by changing which features are considered, utilizing
regularization techniques, including interaction terms, or utilizing a non-linear model.
Decision trees
This is one of the most popular supervised learning algorithms in use right now. It excels at problems
that have to do with classification, creating powerful predictive models that are stable and easy to
interpret as well as being highly accurate. It is better able to adapt to non-linear relationships than
other supervised algorithms, working for both categorical and continuous dependent variables. This
versatility is what makes it so useful and popular.
The idea behind this algorithm is to split the data into ever-smaller segments based on designated
variables. This can help users to see the connections between various pieces of data, or to form sub-
sets of customers for marketing and advertising purposes. The main question that needs to be
answered when discussing these types of algorithms is how the tree determines which variables to
use and where the split between populations is made.
There is some specific terminology that comes into play when you’re using a decision tree. The root
node is the entire population or sample of data. A decision node is a subsequent node, or subset of
data, that moves on to further divisions. A node that does not divide further is called a le af or a
terminal node. In terms of relationship between them, a node that splits is called a parent node,
while the resulting divisions are called child nodes. An entire section of a tree from root node to leaf
can be referred to as ether a branch or a sub-tree.
The process of dividing a population into separate nodes is known as splitting. The opposite of this
—removing sections from a decision tree—is known as pruning. Deciding when and how to split or
prune nodes is the main work of a programmer who’s modeling using this kind of algorithm.
Decision trees can be very advantageous for programmers working with modern machine learning
problems. The interpretability is a high mark in its favor. Results are easy to understand, even for
those who come from a non-analytical background. Even if you know nothing about statistics, you can
interpret the results.
They are also one of the fastest available methods for identifying relationships between objects. They
are adept at determining the key variables that are most significant to dividing the population. The
method also doesn’t necessitate as much data cleaning as other algorithms. Outliers and missing
information won’t have the same degree of influence of the results of a decision tree as a linear
regression or other similar method.
Decision trees are considered to be “non-parametric,” which means they do not make advance
assumptions about structure and distribution before analyzing the data. This can make them incredibly
useful for understanding which factors are the most influential on customer decisions.
Random forest
A random forest is actually a variation on the decision tree, involving an ensemble of decision trees
working together in tandem. It is designed to classify a new object based on its various attributes.
Each tree is given a classification that it looks for and “votes” for that class when it is observed. The
object is then assigned to the classification that received the most votes during these proceedings.
There is obviously more involved in the process than that, but this conveys the basics of the process.
Each tree in the forest is grown as large as is possible, with no pruning. New trees within the forest
are also grown organically out of data that is fed it, making it an entirely self-governing system, useful
for analyzing large collections of complex data.
Gradient boosting
Also called GBM, this algorithm is idea for making predictions when given large amounts of data. It
is what is known as an ensemble learning algorithm, making predictions on a variety of different
factors then combining them into one overall prediction, which will be more reliable and accurate
than a single evaluator.
Gradient boosting has performed well at competitions, especially compared to single-factor
predictors. They’re more robust and can cope more easily with outliers and missing information. The
speed of learning and processing tends to be the most significant drawback of this algorithm.
K-Means
This is one popular type of unsupervised algorithm. The procedure it uses is designed to easily
classify the data it’s been given through a certain number of clusters, in such a way that the data points
within each cluster are homogeneous, but they are also heterogeneous with the other clusters of data.
The most common use of this algorithm is to break up data into a generalized shape or spread to
determine how many different groups are present within the overall population. A cluster is formed
by picking a designated number of points for each cluster, which are known as centroids. Each data
point clusters with the closest centroid, until a consensus is found between data points.
As you can probably tell from how the process is described, this is a very valuable algorithm when
you have a large amount of data and aren’t sure how to divide it into like groups. Because it is
unsupervised, the algorithm may surprise you with the connections and correlations it is able to
discover.
Chapter 5 – Algorithms in Practice
As was mentioned previously, the majority of companies that are using machine learning today are
utilizing a supervised learning method. Anything that is based in a recognition or classification model
is probably using some kind of supervised technique.
This doesn’t mean, however, that the unsupervised algorithm is unheard of or useless in the modern
market. There are plenty of instances when it is the right tool for the job, especially when there are
large quantities of unlabeled data involved in the sample.
When these kinds of this are discussed in theoretical terms, they can sometimes be difficult to
process. The sections below will outline some real-world examples for both supervised and
unsupervised algorithms that may help you to make sense of which ones are best suited to which
purposes.
Supervised algorithms
There are two main kinds of problems that you tend to see solved with supervised algorithms:
regression and classification. In a regression problem, the output will be some kind of real variable,
like a number of dollars in sales, or a measurement of expected rainfall. In a classification problem,
the output will be the category into which the inputted data fits.
Many problems can be solved using both regression and classification, or with a combination of the
two. A bank using a smart piece of software to identify potentially risky customers, for example,
could use a linear regression to chart the customer’s financial data against past customers and
determine if he falls above a certain risk threshold. They could also use a logit regression to
determine his probability of over drafting, or making loan payments on time.
Features that make personalized recommendations of new content or products—like the ones you see
on Facebook, Netflix, or YouTube—use a supervised learning technique, as well. The same is true of
spam filters and fraud detectors, anything that is designed to detect certain characteristic in an item
and reject or accept it based on those criteria.
Unsupervised algorithms
Just like with supervised algorithms, there are two main problems that are best solved using
unsupervised algorithms: clustering problems, and association problems. In a clustering problem, the
goal is to find groups within the larger collection of data. If the factors you want to group them by are
known, this could be accomplished using a supervised algorithm, like a decision tree. If instead you
want to see potential ways the data could be grouped, an unsupervised algorithm like a K-nearest
neighbor will prove a better tool.
An association problem, on the other hand, is when you want to learn the rules that describe segments
of the population. These algorithms let you find out what combination of features make a person more
or less likely to make a certain decision. They’re again quite useful in marketing, but can also be used
to analyze political polling data and census data.
Some searches will use unsupervised algorithms, as well. A service that not only searches for
articles and pages from the internet but also groups them according to topic without prompting from
the user is utilizing some kind of unsupervised algorithm. The “without prompting” is key, however; if
the user is inputting labels, the system is being taught, and is therefore supervised.
In a very general sense, you could say that unsupervised algorithms are the best choice when you want
to explore your data. Because it is less prescriptive, it can lead to unexpected discoveries. Whether
or not that’s what you’re looking for will depend on what kind of problem you’re addressing.
Chapter 6 – Voice Control and Machine Learning
The idea of an ambient artificial intelligence that is always listening and ready to answer any request
is not quite so close as some marketing minds want you to believe—but it is certainly much closer
than it was ten years ago. The last decade has seen a spike in the development and release of voice
control software, with an impressive range of viable uses.
There are four main voice controlled personal assistant software programs currently on the market:
Alexa, put out by Amazon and available on all of their Echo and FireTV devices; Cortana, developed
by Microsoft, which is integrated on all devices running the Windows 10 operating system; Siri, from
Apple, which comes included with the iPhone; and Google assist, developed as a companion for their
Google Home devices, that can also be installed as a third-party app on other devices, as well.
The main intent and functionality of these four pieces of software is the same. All are intended to
allow users to gain information or perform a selection of common functions simply by speaking rather
than by pressing any buttons. Yet while the basic premise is the same, each company has turned its
focus in a slightly different direction, and each of these four programs has its own unique set of
advantages and drawbacks.
As with all machine learning systems, the artificial intelligence that runs each of these four programs
doesn’t simply give you answers to your questions, but it also participates in the interface itself. This
allows it to make adjustments in response to the speaking style and typical needs of the user.
This also makes this kind of program very complex, however, and advancements to the interfaces
have taken longer than some people expected. All four of these programs currently have some
limitations, the most frustrating of which can often be its ability to recognize spoken phrases. In some
cases, especially with earlier iterations of these programs, there seemed to be less machine learning
going on than human learning, as users adapted to the way they had to phrase their questions in order
to be understood.
Most machine learning systems in practice right now are rather static, which causes their usefulness
to degrade over time. A lot of initial effort is put in to train the AI with a huge corpus of data. This is
typically done off-line and then deployed once the process has been completed.
Across the board with personal assistant software, they require a different kind of machine learning.
It has to be continuously re-trainable by non-experts, allowing the system to not only be effortlessly
maintained but to continue to advance and grow in response to user input. The extent to which this has
been achieved with the systems currently on the market varies depending on the program and who you
ask, but none have quite reached the point of being able to learn anything you ask it to do on
command.
Despite the current limitations, the four applications mentioned above are some of the most ambitious
user-facing applications of machine learning that you will see. Speech recognition is only the
beginning; they apply the principles of machine learning at multiple stages of the user interaction.
Stored information about a user’s search habits can help the program predict what kinds of results
they want to see, while information about their location can be used to give more accurate data on
things like the weather and local attractions.
Intelligent assistants are programmed with very specific users and use cases in mind. The information
architecture put in place in these systems is based on these specific use cases. As the programs
continue to evolve and more user data is able to be processed, their usefulness will no doubt expand
to encompass an even greater range of speaking styles and areas of influence. Because intelligent
systems are able to learn from their mistakes, the quality increase will accelerate the longer these
systems are in use.
Perhaps the most consistent complaint against all four of these systems is that the user can’t relate
with them in the same casual speaking tone they’d use with their friends. In fact, it’s estimated that
around 30% of the questions posed to these programs are not things the user actually needs an answer
to but are instead tests of the voice control, checking to see if it’s able to produce a given fact or
respond to a certain question.
Before ambient AI voice control systems can become a reality, the technology will need to advance to
the point that people feel they can naturally engage with the program, so its use is seamless in their
life. The four programs mentioned above have each been making their progress toward addressing
this issue.
Alexa
Amazon’s Alexa personal assistant has a wide array of functionality, along with one of the most
streamlined and natural voice control systems currently available. This is partially because Amazon
focuses on using customer feedback to help the program learn. Since it’s housed in the cloud, it can be
perpetually updated and improved using this new information, utilizing machine learning at the system
level as well as in individual interactions with users.
The machine learning team at Amazon is responsible for some of the most advanced research in a
variety of fields related to artificial intelligence. Work toward their stated goal of making voice
interfaces ubiquitous has led to advancements in automatic speech recognition, natural language
understanding, and text-to-speech technology at an unprecedented rate.
The Alexa Skills kit encourages the development of new commands and abilities by third-party
programmers, as well, opening new avenues to explore the possibilities of machine learning. This
gives the user an active role in directing the program’s learning by installing selected skills that will
be of the most value.
Despite these improvements over previous voice control systems, Alexa is not so robust when it
comes to replying to natural conversational cues when you first get it out of the box. This is especially
true with certain accents and phrase variations. When you go off script, you may hear the response
“Sorry, I didn’t understand the question.” This is partially due to its limitations in vocabulary; it
simply won’t be able to always identify the meanings of every word you say to it.
Still, despite its limitations in phrasing and vocabulary, Alexa is one of the most comprehensive
voice control programs to date, and does allow more variation in phrasing than the first generation of
such programs. As the company continues to expand the software’s capabilities, even these
inconsistencies will no doubt improve. The skills are arguably the most uniquely valuable aspect of
the programming, allowing it to be used in a wide array of applications, including smart home control
and third-party options of the user’s choosing.
Siri
Microsoft’s version of voice control software, Siri, comes automatically on all iPhones, and is a
fairly similar program to Alexa. Apple was the first major company that made smart assistant
integration standard on its operating system. Siri gained a reputation for poor interpretation of
commands early in its life. Since making a better integration of machine learning techniques,
however, this stopped being such a concern.
Apple has always been on the cutting edge of artificial intelligence. They started using AI as early as
the 1990s, when it became part of their handwriting recognition programs. Since then, machine
learning has been a part of their development in a very all-encompassing, behind the scenes way. Siri
is only the most recent outward-facing example of this.
Siri started as an adaptation of a stand-alone voice control app the company purchased in 2010.
Apple later moved the software over to a neural net based system, utilizing more machine learning
techniques including convolutional neural networks, gated recurrent units, long short-term memory
units, and n-grams. It was this shift that helped them to improve the system’s accuracy and reduce the
number of errors.
Apple didn’t publicize the shift, but it was obvious for anyone using the program. As soon as deep
machine learning techniques were implemented, Siri’s error rate was cut by a factor of two across
languages, and well more than two for most. This wasn’t just due to the algorithm itself but also the
way the company was able to optimize it, and seamlessly enough that users likely noticed nothing
except the reduced frequency of issues.
Apple doesn’t only use machine learning for Siri, though it’s certainly the most visible program in
that area of their operations. They’ve already implemented machine learning in other aspects of their
operating system, in subtle ways that many users won’t directly notice. The short list of apps you
might want to open next when you swipe your screen is one example of machine learning. This is also
how it reminds you of appointments made through your phone, even if you forgot to type them into
your calendar, or shows you who’s calling even if their number isn’t saved in your contacts, provided
they’ve emailed you recently.
All of these, and many of the Apple operating system’s other key features, are made possible or
enhanced by the adoption of deep machine learning. They also use it in behind the scenes
applications. It’s utilized to identify the most valuable feedback from beta testers and to detect fraud
in the Apple store. Intelligent systems also monitor the interior workings of your device, helping to
extend battery life between charges.
When it comes to machine learning, the take-away is that Apple is making huge strides and doing new
things with the technology, but that Siri is not necessarily the most advanced of these systems. Their
approach is more one of a comprehensive system, with the learning built into the framework; Siri is
only the most obvious customer-facing extension of this overall construction.
Cortana
Microsoft’s entry in the voice control area is called Cortana, and is in many respects very similar to
the two that have already been discussed above. Like Alexa and Siri, Cortana gets smarter and is
better able to respond to user requests and questions the longer that she’s used. While she’s
continuously learning, however, she has the same limitations of phrasing that plague the other voice
control systems on the market.
The machine learning department at Microsoft has made great strides with their recent work to
combine deep neural networks and probability models in the right ways, and deploy them across their
various projects, not just in their work with Cortana. Machine learning is becoming more and more a
part of everything users interact with, and Microsoft is aware of this, finding new ways to integrate it
across their various product offerings.
Like other similar products and companies, Microsoft is very well aware of the current issues with
speech recognition, notably the inability to converse with Cortana and similar programs in a way that
feels natural and human. They have made their own unique attempts to rectify this situation.
One of these is the “chit chat” function on Cortana. You should be able to use this to discuss things
with Cortana as though you’re talking to a friend. She can crack jokes, discuss current events, or
commiserate over the score of a recent sporting event. This capability is not very expansive as of yet;
she certainly can’t converse on as wide an array of topics as an actual person. Still, it’s a step toward
the more natural interactions that all of these styles of software are ultimately working to achieve.
Google assist
Google assist is different than the other machine learning applications explored in this chapter in a
number of different ways. Firstly, its main use isn’t through an associated device but rather as a third-
party installation on another manufacturer’s devices. There is a device that comes automatically
equipped with it—Google home, the company’s answer to the Amazon Echo—but it is also just as
easy to use on iPhones and other devices as an alternative to the voice control system that comes
installed.
A lot of the abilities and limitations of Google assist, then, have more to do with the device that it’s
loaded on than they do with the software itself. Some devices will lock access to aspects of the
program. For example, Apple doesn’t allow their clock app to be available to third-party apps, which
means Google assist can’t be used to set alarms on those particular devices, even though the system
itself possess the ability.
Functionally, Google assist is not much different than Siri. You’ll have to open it differently since it
doesn’t come installed on iPhones, and it is marginally better at capturing commands and information
than either Microsoft or Apple’s voice control services.
The main benefits of Google assist will be for those who are already heavy Google users. If you are a
regular user of services like Google Maps, Google Calendar, and Gmail, Google assist will give you
voice control over all of them in one convenient package.
The second way that Google assist is different from other voice control functions is more conceptual
than practical, and requires a bit of an understanding of the history of the software to fully understand.
It began as an outcropping of the Google Search function. The implementation of machine learning
into searches let Google start to deliver more personalized results and content, anticipating user
needs and building a profile of their likes and dislikes in a classic example of the system’s most
common application.
The original idea was to integrate the more personalized Google assist as a secondary function of the
same familiar Google Search dialog, but the company quickly realized users felt uncomfortable typing
their personal information into the same place they’d go to search for recipes and news stories. They
switched to a new interface at this point, but the underpinnings of Google assist remain rooted in the
Google Search ideology.
Since the early days of the company’s machine learning experimentations, they added a deeper layer
of artificial intelligence with a more natural language processing center and algorithms that learn key
details about the user to better craft their experience. This includes their online habits, questions they
frequently ask, and favorite searches for restaurants, apps, or services, as well as identifying
information that can be easily gleaned from online activity, like age and gender.
Google’s ultimate vision has a wider scope than other personal assistant programs, however, aiming
to make it a more all-inclusive program that’s similar to how things work on the internet. When you
open your browser to navigate to a website, you don’t have to download a new piece of software
before you can visit new sites. Aside from creating accounts for subscriptions or paid services, the
functionality for using the website comes included with every browser.
This is the image Google has in mind for their personal assistant. As opposed to the Alexa model,
which requires you to download specific skills to use certain add-on devices or add new functions,
Google wants to make a program that accesses all of those functions and devices automatically. The
goal is for it to be ready to give you answers for anything you ask for out of the box.
When this dream comes to fruition, it will translate to a new kind of intelligent internet system that’s
more open than programs like Alexa or Siri. Toward this end, it already works with over 70 different
smart home manufacturers, and has left the system open for new companies to easily add their own
collaborations.
On the user side of things, Google assist is working toward a more conversational interface. The goal
is to return search results as simple, one-sentence answers, and in some instances it can already
deliver on this. It also archives all conversations that use the interface so that the usefulness of the
product can be more significantly improved over time.
At the moment, Google assist is a slightly more robust alternative to the other voice control softwares
on the market, though it still suffers from many of the same drawbacks. As it develops, however, it is
likely that it will become more and more distinct from the products put out by Microsoft, Amazon,
and Apple, pushing toward a more inclusive end result that is closer in concept to the sci-fi computer
and ambient AI that have long been the end goal.
Conclusion
One of the most important things about a machine learning algorithm is that it is able to use the
information that has been fed into it to teach itself and make its own decisions. The open-ended nature
of this computing, especially in the case of unsupervised algorithms, is their main advantage. It also
serves to make the field somewhat unpredictable, an irony when many of the algorithms being
constructed are concerned with probability.
There is no doubt that machine learning has made incredible strides even since the first generation of
voice control software was released onto the market. The addition of deep neural nets and other
modern algorithms has improved their speech recognition, making it easier to interact with them in a
more natural way.
Statistical modeling has also become an entirely new field because of the advances already made in
machine learning. The kinds of models that intelligent algorithms can produce using an iterative
approach would take days or even weeks for a team of researchers to construct. This can save
businesses incredible amounts of both time and money.
As far as what else you can accomplish with machine learning, however, the possibilities are just
beginning to be explored. One of the most incredible things about this kind of software is that it can
come up with new ways to be useful in the process of completing its initial assignment. It may notice
connections and correlations in the data that all the human observers had thus far missed, and can
sometimes even solve problems it wasn’t asked to solve.
Experimenting with various types of machine learning algorithms is the best way to get to know just
what they’re capable of. Explore some of the available datasets on the University of California at
Irvine’s repository; play with creating new skills on the Alexa API. You may find yourself surprised
by how far machine learning has already come.
I hope, that you really enjoyed reading my book. If you want to help me to produce
more materials like this, then please leave a positive review on Amazon.