0% found this document useful (0 votes)
141 views16 pages

Chapter 2 Notes

This document provides an introduction and overview of machine learning concepts. It discusses definitions of machine learning, the machine learning process, and different types of learning like supervised, unsupervised, and reinforcement learning. It also describes the basic components of the machine learning process, including data storage, abstraction, generalization, and evaluation. Finally, it discusses applications of machine learning and concepts related to understanding machine learning data like units of observation, examples, features, and different forms of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views16 pages

Chapter 2 Notes

This document provides an introduction and overview of machine learning concepts. It discusses definitions of machine learning, the machine learning process, and different types of learning like supervised, unsupervised, and reinforcement learning. It also describes the basic components of the machine learning process, including data storage, abstraction, generalization, and evaluation. Finally, it discusses applications of machine learning and concepts related to understanding machine learning data like units of observation, examples, features, and different forms of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 2

INTRODUCTION ML

Machine Learning Process- Preliminaries for Machine, Learning algorithms -Turning data into
Probabilities and Statistics for Machine Learning- Probability theory – Probability Distributions
– Decision Theory.
In this chapter, we consider different definitions of the term “machine learning” and explain
what is meant by “learning” in the context of machine learning. We also discuss the various
components of the machine learning process. There are also brief discussions about different
types learning like supervised learning, unsupervised learning and reinforcement learning.
4.1 Introduction
4.1.1 Definition of machine learning
Arthur Samuel, an early American leader in the field of computer gaming and artificial
intelligence, coined the term “Machine Learning” in 1959 while at IBM. He defined machine
learning as “the field of study that gives computers the ability to learn without being explicitly
programmed.” However, there is no universally accepted definition for machine learning.
Different authors define the term differently. We give below two more definitions.
1. Machine learning is programming computers to optimize a performance criterion using
example data or past experience. We have a model defined up to some parameters, and learning
is the execution of a computer program to optimize the parameters of the model using the
training data or past experience. The model may be predictive to make predictions in the future,
or descriptive to gain knowledge from data, or both (see [2] p.3).

2.The field of study known as machine learning is concerned with the question of how to
construct computer programs that automatically improve with experience (see [4], Preface.).
Remarks In the above definitions we have used the term “model” and we will be using this term
at several contexts later in this book. It appears that there is no universally accepted one sentence
definition of this term. Loosely, it may be understood as some mathematical expression or
equation, or some mathematical structures such as graphs and trees, or a division of sets into
disjoint subsets, or a set of logical “if : : : then : : : else : : :” rules, or some such thing. It may be
noted that this is not an exhaustive list.

4.1.2 Definition of learning


Definition
A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks T, as measured by P, improves with
experience

Examples
i) Handwriting recognition learning problem
• Task T: Recognizing and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
Page 1
• training experience: A sequence of images and steering commands recorded while observing a
human driver
iii) A chess learning problem
• Task T: Playing chess
• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself Definition
A computer program which learns from experience is called a machine learning program or
simply a learning program. Such a program is sometimes also referred to as a learner.

4.2 How machines learn


4.2.1 Basic components of learning process
The learning process, whether by a human or a machine, can be divided into four components,
namely, data storage, abstraction, generalization and evaluation. Figure 1.1 illustrates the various
Components and the steps involved in the learning process.

Data Concepts Inferences


Data storage Abstraction Generalization Evaluation

Figure 1.1: Components of learning process


1. Data storage
Facilities for storing and retrieving huge amounts of data are an important component of the
learning process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical
signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices
to store data and use cables and other technology to retrieve data.

2. Abstraction
The second component of the learning process is known as abstraction. Abstraction is the
process of extracting knowledge about stored data. This involves creating general concepts about
the data as a whole. The creation of knowledge involves application of known models and
creation of new models.
The process of fitting a model to a dataset is known as training. When the model has been
trained, the data is transformed into an abstract form that summarizes the original information.

Page 2
3. Generalization
The third component of the learning process is known as generalization. The term generalization
describes the process of turning the knowledge about stored data into a form that can be utilized
for future action. These actions are to be carried out on tasks that are similar, but not identical, to
those what have been seen before. In generalization, the goal is to discover those properties of
the data that will be most relevant to future tasks.

4. Evaluation
Evaluation is the last component of the learning process. It is the process of giving feedback to
the user to measure the utility of the learned knowledge. This feedback is then utilized to effect
improvements in the whole learning process.

4.3 Applications of machine learning


Application of machine learning methods to large databases is called data mining. In data
mining, a large volume of data is processed to construct a simple model with valuable use, for
example, having high predictive accuracy.
The following is a list of some of the typical applications of machine learning.
1. In retail business, machine learning is used to study consumer behavior.
2. In finance, banks analyze their past data to build models to use in credit applications, fraud
detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and troubleshooting.
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and maximizing
the quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast
enough by computers. The World Wide Web is huge; it is constantly growing and searching
for relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the
system designer need not foresee and provide solutions for all possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
9. Machine learning methods are applied in the design of computer-controlled vehicles to steer
correctly when driving on a variety of roads.
10. Machine learning methods have been used to develop programmes for playing games such as
chess, backgammon and Go.

4.4 Understanding data


Since an important component of the machine learning process is data storage, we briefly
consider in this section the different types and forms of data that are encountered in the machine
learning process.
4.4.1 Unit of observation
By a unit of observation we mean the smallest entity with measured properties of interest for a
study.
Examples
• A person, an object or a thing
• A time point
• A geographic region

Page 3
• A measurement
Sometimes, units of observation are combined to form units such as person-years.
4.4.2 Examples and features
Datasets that store the units of observation and their properties can be imagined as collections of
data consisting of the following:
• Examples
An “example” is an instance of the unit of observation for which properties have been recorded.
An “example” is also referred to as an “instance”, or “case” or “record.” (It may be noted that
the word “example” has been used here in a technical sense.)
• Features
A “feature” is a recorded property or a characteristic of examples. It is also referred to as
“attribute”, or “variable” or “feature.”
Examples for “examples” and “features”
1. Cancer detection
Consider the problem of developing an algorithm for detecting cancer. In this study we note
the following.
(a) The units of observation are the patients.
(b) The examples are members of a sample of cancer patients.
(c) The following attributes of the patients may be chosen as the features:
• gender
• age
• blood pressure
• the findings of the pathology report after a biopsy
2. Pet selection
Suppose we want to predict the type of pet a person will choose.
(a) The units are the persons.
(b) The examples are members of a sample of persons who own pets.

Figure 4.2: Example for “examples” and “features” collected in a matrix format (data relates to
automobiles and their features)
(c) The features might include age, home region, family income, etc. of persons who own

Page 4
pets.

3. Spam e-mail
Let it be required to build a learning algorithm to identify spam e-mail.
(a) The unit of observation could be e-mail messages.
(b) The examples would be specific messages.
(c) The features might consist of the words used in the messages.
Examples and features are generally collected in a “matrix format”. Fig. 4.2 shows such a data
set.

4.4.3 Different forms of data


1. Numeric data
If a feature represents a characteristic measured in numbers, it is called a numeric feature.
2. Categorical or nominal
A categorical feature is an attribute that can take on one of a limited, and usually fixed, number
of possible values on the basis of some qualitative property. A categorical feature is also called
a nominal feature.

3. Ordinal data
This denotes a nominal variable with categories falling in an ordered list. Examples include
clothing sizes such as small, medium, and large, or a measurement of customer satisfaction
on a scale from “not at all happy” to “very happy.”
Examples
In the data given in Fig.4.2, the features “year”, “price” and “mileage” are numeric and the
features
“model”, “color” and “transmission” are categorical.

4.5 Prerequisites
The learners of this tutorial are expected to know the basics of Python programming. Besides,
they need to have a solid understanding of computer programing and fundamentals.
If you are new to this arena, we suggest you pick up tutorials based on these concepts first,
before you embark on with Machine Learning.
Today’s Artificial Intelligence (AI) has far surpassed the hype of blockchain and quantum
computing. This is due to the fact that huge computing resources are easily available to the
common man. The developers now take advantage of this in creating new Machine Learning
models and to re-train the existing models for better performance and results. The easy
availability of High Performance Computing (HPC) has resulted in a sudden increased demand
for IT professionals having Machine Learning skills.
In this tutorial, you will learn in detail about −
What is the crux of machine learning?
 What are the different types in machine learning?
 What are the different algorithms available for developing machine learning models?

Page 5
 What tools are available for developing these models?
 What are the programming language choices?
 What platforms support development and deployment of Machine Learning applications?
 What IDEs (Integrated Development Environment) are available?
 How to quickly upgrade your skills in this important area?
 When you tag a face in a Facebook photo, it is AI that is running behind the scenes and
identifying faces in a picture. Face tagging is now omnipresent in several applications
that display pictures with human faces. Why just human faces? There are several
applications that detect objects such as cats, dogs, bottles, cars, etc. We have autonomous
cars running on our roads that detect objects in real time to steer the car. When you
travel, you use Google Directions to learn the real-time traffic situations and follow the
best path suggested by Google at that point of time. This is yet another implementation
of object detection technique in real time.
 Let us consider the example of Google Translate application that we typically use while
visiting foreign countries. Google’s online translator app on your mobile helps you
communicate with the local people speaking a language that is foreign to you.
 There are several applications of AI that we use practically today. In fact, each one of us
use AI in many parts of our lives, even without our knowledge. Today’s AI can perform
extremely complex jobs with a great accuracy and speed. Let us discuss an example of
complex task to understand what capabilities are expected in an AI application that you
would be developing today for your clients.
 Example
 We all use Google Directions during our trip anywhere in the city for a daily commute
or even for inter-city travels. Google Directions application suggests the fastest path to
our destination at that time instance. When we follow this path, we have observed that
Google is almost 100% right in its suggestions and we save our valuable time on the trip.
 You can imagine the complexity involved in developing this kind of application
considering that there are multiple paths to your destination and the application has to
judge the traffic situation in every possible path to give you a travel time estimate for
each such path. Besides, consider the fact that Google Directions covers the entire globe.
Undoubtedly, lots of AI and Machine Learning techniques are in-use under the hoods of
such applications.
 Considering the continuous demand for the development of such applications, you will
now appreciate why there is a sudden demand for IT professionals with AI skills.
 In our next chapter, we will learn what it takes to develop AI programs.
The journey of AI began in the 1950's when the computing power was a fraction of what it is
today. AI started out with the predictions made by the machine in a fashion a statistician does
predictions using his calculator. Thus, the initial entire AI development was based mainly on
statistical techniques.
In this chapter, let us discuss in detail what these statistical techniques are.

Page 6
Machine Learning has a very large width and requires skills across several domains.
The skills that you need to acquire for becoming an expert in Machine Learning are
listed below −

 Statistics
 Probability Theories
 Calculus
 Optimization techniques
 Visualization

4.6 Necessity of Various Skills of Machine Learning


To give you a brief idea of what skills you need to acquire, let us discuss some
examples −
Mathematical Notation
Most of the machine learning algorithms are heavily based on mathematics. The level
of mathematics that you need to know is probably just a beginner level. What is
important is that you should be able to read the notation that mathematicians use in
their equations. For example - if you are able to read the notation and comprehend
what it means, you are ready for learning machine learning. If not, you may need to
brush up your mathematics knowledge.

Probability Theory
Here is an example to test your current knowledge of probability theory: Classifying
with conditional probabilities.

With these definitions, we can define the Bayesian classification rule −

Page 7
 If P(c1|x, y) > P(c2|x, y) , the class is c1 .
 If P(c1|x, y) < P(c2|x, y) , the class is c2 .

Optimization Problem
Here is an optimization function

Subject to the following constraints −

If you can read and understand the above, you are all set.
Visualization
In many cases, you will need to understand the various types of visualization plots to
understand your data distribution and interpret the results of the algorithm’s output.

Besides the above theoretical aspects of machine learning, you need good
programming skills to code those algorithms.
So what does it take to implement ML? Let us look into this in the next chapter.
To develop ML applications, you will have to decide on the platform, the IDE and the
language for development. There are several choices available. Most of these would
meet your requirements easily as all of them provide the implementation of AI
algorithms discussed so far.
If you are developing the ML algorithm on your own, the following aspects need to be
understood carefully −

Page 8
The language of your choice − this essentially is your proficiency in one of the
languages supported in ML development.
The IDE that you use − This would depend on your familiarity with the existing IDEs
and your comfort level.
Development platform − There are several platforms available for development and
deployment. Most of these are free-to-use. In some cases, you may have to incur a
license fee beyond a certain amount of usage. Here is a brief list of choice of
languages, IDEs and platforms for your ready reference.

4.7 Language Choice


Here is a list of languages that support ML development −

 Python
 R
 Matlab
 Octave
 Julia
 C++
 C
This list is not essentially comprehensive; however, it covers many popular languages
used in machine learning development. Depending upon your comfort level, select a
language for the development, develop your models and test.

IDEs
Here is a list of IDEs which support ML development −

 R Studio
 Pycharm
 iPython/Jupyter Notebook
 Julia
 Spyder
 Anaconda
 Rodeo
 Google –Colab
The above list is not essentially comprehensive. Each one has its own merits and
demerits. The reader is encouraged to try out these different IDEs before narrowing
down to a single one.

Platforms
Here is a list of platforms on which ML applications can be deployed −

Page 9
 IBM
 Microsoft Azure
 Google Cloud
 Amazon
 Mlflow
Once again this list is not exhaustive. The reader is encouraged to sign-up for the
abovementioned services and try them out themselves.
This tutorial has introduced you to Machine Learning. Now, you know that Machine Learning
is a technique of training machines to perform the activities a human brain can do, albeit bit
faster and better than an average human-being. Today we have seen that the machines can beat
human champions in games such as Chess, AlphaGO, which are considered very complex. You
have seen that machines can be trained to perform human activities in several areas and can aid
humans in living better lives.
Machine Learning can be a Supervised or Unsupervised. If you have lesser amount of data and
clearly labelled data for training, opt for Supervised Learning. Unsupervised Learning would
generally give better performance and results for large data sets. If you have a huge data set
easily available, go for deep learning techniques. You also have learned Reinforcement
Learning and Deep Reinforcement Learning. You now know what Neural Networks are, their
applications and limitations.
Finally, when it comes to the development of machine learning models of your own, you looked
at the choices of various development languages, IDEs and Platforms. Next thing that you need
to do is start learning and practicing each machine learning technique. The subject is vast, it
means that there is width, but if you consider the depth, each topic can be learned in a few
hours. Each topic is independent of each other. You need to take into consideration one topic at
a time, learn it, practice it and implement the algorithm/s in it using a language choice of yours.
This is the best way to start studying Machine Learning. Practicing one topic at a time, very
soon you would acquire the width that is eventually required of a Machine Learning expert.
4.8 Statistical Techniques
The development of today’s AI applications started with using the age-old traditional statistical
techniques. You must have used straight-line interpolation in schools to predict a future value.
There are several other such statistical techniques which are successfully applied in developing
so-called AI programs. We say “so-called” because the AI programs that we have today are
much more complex and use techniques far beyond the statistical techniques used by the early
AI programs.
Some of the examples of statistical techniques that are used for developing AI applications in
those days and are still in practice are listed here −

 Regression
 Classification
 Clustering
 Probability Theories

Page 10
 Decision Trees
Here we have listed only some primary techniques that are enough to get you started on AI
without scaring you of the vastness that AI demands. If you are developing AI applications
based on limited data, you would be using these statistical techniques.
However, today the data is abundant. To analyze the kind of huge data that we possess
statistical techniques are of not much help as they have some limitations of their own. More
advanced methods such as deep learning are hence developed to solve many complex problems.
As we move ahead in this tutorial, we will understand what Machine Learning is and how it is
used for developing such complex AI applications.
Consider the following figure that shows a plot of house prices versus its size in sq. ft.

After plotting various data points on the XY plot, we draw a best-fit line to do our predictions
for any other house given its size. You will feed the known data to the machine and ask it to
find the best fit line. Once the best fit line is found by the machine, you will test its suitability
by feeding in a known house size, i.e. the Y-value in the above curve. The machine will now
return the estimated X-value, i.e. the expected price of the house. The diagram can be
extrapolated to find out the price of a house which is 3000 sq. ft. or even larger. This is called
regression in statistics. Particularly, this kind of regression is called linear regression as the
relationship between X & Y data points is linear.
In many cases, the relationship between the X & Y data points may not be a straight line, and it
may be a curve with a complex equation. Your task would be now to find out the best fitting
curve which can be extrapolated to predict the future values. One such application plot is shown
in the figure below.

Page 11
You will use the statistical optimization techniques to find out the equation for the best fit curve
here. And this is what exactly Machine Learning is about. You use known optimization
techniques to find the best solution to your problem.
Next, let us look at the different categories of Machine Learning.
Machine Learning is broadly categorized under the following headings −

Machine learning evolved from left to right as shown in the above diagram.
 Initially, researchers started out with Supervised Learning. This is the case of housing
price prediction discussed earlier.

Page 12
 This was followed by unsupervised learning, where the machine is made to learn on its
own without any supervision.
 Scientists discovered further that it may be a good idea to reward the machine when it
does the job the expected way and there came the Reinforcement Learning.
 Very soon, the data that is available these days has become so humongous that the
conventional techniques developed so far failed to analyze the big data and provide us
the predictions.
 Thus, came the deep learning where the human brain is simulated in the Artificial Neural
Networks (ANN) created in our binary computers.
 The machine now learns on its own using the high computing power and huge memory
resources that are available today.
 It is now observed that Deep Learning has solved many of the previously unsolvable
problems.
 The technique is now further advanced by giving incentives to Deep Learning networks
as awards and there finally comes Deep Reinforcement Learning.
Let us now study each of these categories in more detail.
4.9 Supervised Learning
Supervised learning is analogous to training a child to walk. You will hold the child’s hand,
show him how to take his foot forward, walk yourself for a demonstration and so on, until the
child learns to walk on his own.
Regression
Similarly, in the case of supervised learning, you give concrete known examples to the
computer. You say that for given feature value x1 the output is y1, for x2 it is y2, for x3 it is y3,
and so on. Based on this data, you let the computer figure out an empirical relationship between
x and y.
Once the machine is trained in this way with a sufficient number of data points, now you would
ask the machine to predict Y for a given X. Assuming that you know the real value of Y for this
given X, you will be able to deduce whether the machine’s prediction is correct.
Thus, you will test whether the machine has learned by using the known test data. Once you are
satisfied that the machine is able to do the predictions with a desired level of accuracy (say 80 to
90%) you can stop further training the machine.
Now, you can safely use the machine to do the predictions on unknown data points, or ask the
machine to predict Y for a given X for which you do not know the real value of Y. This training
comes under the regression that we talked about earlier.
Classification
You may also use machine learning techniques for classification problems. In classification
problems, you classify objects of similar nature into a single group. For example, in a set of 100
students say, you may like to group them into three groups based on their heights - short,
medium and long. Measuring the height of each student, you will place them in a proper group.

Page 13
Now, when a new student comes in, you will put him in an appropriate group by measuring his
height. By following the principles in regression training, you will train the machine to classify
a student based on his feature – the height. When the machine learns how the groups are
formed, it will be able to classify any unknown new student correctly. Once again, you would
use the test data to verify that the machine has learned your technique of classification before
putting the developed model in production.
Supervised Learning is where the AI really began its journey. This technique was applied
successfully in several cases. You have used this model while doing the hand-written
recognition on your machine. Several algorithms have been developed for supervised learning.
You will learn about them in the following chapters.
4.10 Unsupervised Learning
In unsupervised learning, we do not specify a target variable to the machine, rather we ask
machine “What can you tell me about X?”. More specifically, we may ask questions such as
given a huge data set X, “What are the five best groups we can make out of X?” or “What
features occur together most frequently in X?”. To arrive at the answers to such questions, you
can understand that the number of data points that the machine would require to deduce a
strategy would be very large. In case of supervised learning, the machine can be trained with
even about few thousands of data points. However, in case of unsupervised learning, the
number of data points that is reasonably accepted for learning starts in a few millions. These
days, the data is generally abundantly available. The data ideally requires curating. However,
the amount of data that is continuously flowing in a social area network, in most cases data
curation is an impossible task.
The following figure shows the boundary between the yellow and red dots as determined by
unsupervised machine learning. You can see it clearly that the machine would be able to
determine the class of each of the black dots with a fairly good accuracy.

Page 14
The unsupervised learning has shown a great success in many modern AI applications, such as
face detection, object detection, and so on.
Reinforcement Learning
Consider training a pet dog, we train our pet to bring a ball to us. We throw the ball at a certain
distance and ask the dog to fetch it back to us. Every time the dog does this right, we reward the
dog. Slowly, the dog learns that doing the job rightly gives him a reward and then the dog starts
doing the job right way every time in future. Exactly, this concept is applied in “Reinforcement”
type of learning. The technique was initially developed for machines to play games. The
machine is given an algorithm to analyze all possible moves at each stage of the game. The
machine may select one of the moves at random. If the move is right, the machine is rewarded,
otherwise it may be penalized. Slowly, the machine will start differentiating between right and
wrong moves and after several iterations would learn to solve the game puzzle with a better
accuracy. The accuracy of winning the game would improve as the machine plays more and
more games.
The entire process may be depicted in the following diagram −

Page 15
This technique of machine learning differs from the supervised learning in that you need not
supply the labelled input/output pairs. The focus is on finding the balance between exploring the
new solutions versus exploiting the learned solutions.
4.11 Deep Learning
The deep learning is a model based on Artificial Neural Networks (ANN), more specifically
Convolutional Neural Networks (CNN)s. There are several architectures used in deep learning
such as deep neural networks, deep belief networks, recurrent neural networks, and
convolutional neural networks.
These networks have been successfully applied in solving the problems of computer vision,
speech recognition, natural language processing, bioinformatics, drug design, medical image
analysis, and games. There are several other fields in which deep learning is proactively
applied. The deep learning requires huge processing power and humongous data, which is
generally easily available these days.
We will talk about deep learning more in detail in the coming chapters.
Deep Reinforcement Learning
The Deep Reinforcement Learning (DRL) combines the techniques of both deep and
reinforcement learning. The reinforcement learning algorithms like Q-learning are now
combined with deep learning to create a powerful DRL model. The technique has been with a
great success in the fields of robotics, video games, finance and healthcare. Many previously
unsolvable problems are now solved by creating DRL models. There is lots of research going on
in this area and this is very actively pursued by the industries.
So far, you have got a brief introduction to various machine learning models, now let us explore
slightly deeper into various algorithms that are available under these models.
Supervised learning is one of the important models of learning involved in training machines.
This chapter talks in detail about the same.

Page 16

You might also like