utx
ing.
3an
lata
ited
the
2B
em
the
ala,
- ive
ngs
ans
on.
sed
ave
tee.
ing
ult
ng
Introduction to Machine Learning 6 5
1.3 MACHINE LEARNING IN RELATION TO OTHER FIELDS
Machine learning uses the concepts of Artificial Intelligence, Data Science, and Statistics primarily.
Itis the resultant of combined ideas of diverse fields.
1.3.1 Machine Learning and Artificial Intelligence
Machine learning is an important branch of AI, which is a much broader subject. The aim of Al is
to develop intelligent agents. An agent can be a robot, humans, or any autonomous systems.
Initially, the idea of AI was ambitious, that is, to develop intelligent systems like human beings.
‘The focus was on logic and logical inferences. It had seen many ups and downs. These down
periods were called Al winters.
‘The resurgence in AI happened due to development of data driven systems. The aim is to find
relations and regularities present in the data, Machine learning is the subbranch of AI, whose aim
is to extract the patterns for prediction. Itis a broad field that includes learning from examples and.
other areas like reinforcement learning. The relationship of Al and machine learning is shown in
Figure 1.3. The model can take an unknown instance and generate results,
“Artificial
intelligence
Machine learning
Deep
learning
Figure 1.3: Relationship of Al with Machine Learning
Deep learning isa subbranch of machine learning. In deep learning, the models are constructed
using neural network technology. Neural networks are based on the human neuron models. Many
neurons form a network connected with the activation functions that trigger further neurons to
perform tasks.
1.3.2. Machine Learning, Data Science, Data Mining, and Data Analytics
Data science is an ‘Umbrella’ term that encompasses many fields. Machine learning starts with
’ine learning are interlinked. Machine learning is a branch
hering of data for ana
data. Therefore, data science an:
e. Data science d sis. It is a broad field that6 «Machine Learning
Big Data Data science concerns about collection of data. Big data is a field of data science that
deals with data’s following characteristics:
1, Volume: Huge amount of data is generated by big companies like Facebook, Twitter,
YouTube.
2. Variety: Data is available in variety of forms like images, videos, and in different formats.
3. Velocity: It refers to the speed at which the data is generated and processed.
Big data is used by many machine learning algorithms for applications such as language trans-
lation and image recognition. Big data influences the growth of subjects like Deep learning. Deep
learning is a branch of machine learning that deals with constructing models using neural networks.
Data Mining Data mining’s original genesis is in the business. Like while mining the earth one
gets into precious resources, itis often believed that unearthing of the data produces hidden infor-
mation that otherwise would have eluded the attention of the management. Nowadays, many
consider that data mining and machine learning are same. There is no difference between these
fields except that data mining aims to extract the hidden patterns that are present in the data,
whereas, machine learning aims to use it for prediction.
Data Analytics Another branch of data science is data analytics. It aims to extract useful
knowledge from crude data. There are different types of analytics. Predictive data analytics is used
for making predictions. Machine learning is closely related to this branch of analytics and shares
almost all algorithms.
Pattern Recognition It is an engineering field. It uses machine learning algorithms to extract
the features for pattern analysis and pattern classification. One can view pattern recognition as a
specific application of machine learning.
‘These relations are summarized in Figure 1.4.
Data science
data |¥__ Data
rmining analytics
1
Ma
learning
Pattern Big data
recognition
Figure 1.4: Relationship of Machine Learning with Other Major Fields
1.3.3 Machine Learning and Statistics
Statistics isa branch of mathematics thathas a solid theoretical foundation regarding statistical learning,
Like machine learning (ML, it can leam from data. But the difference between statistics and ML is that
statistical methods look for regularity in data called patterns. Initially, statistics sets a hypothesis and
performs experiments to verify and validate the hypothesis in order to find relationships among data.eful
sed
ures
ract
wa
%
rat
ad
Introduction to Machine Learning » 7
Statistics requires knowledge of the statistical procedures and the guidance of a good
statistician. Itis mathematics intensive and models are often complicated equations and involve
many assumptions. Statistical methods are developed in relation to the data being analysed.
In addition, statistical methods are coherent and rigorous. It has strong theoretical foundations
and interpretations that require a strong statistical knowledge.
Machine learning, comparatively, has less assumptions and requires less statistical knowledge.
But, itoften requires interaction with various tools to automate the process of learning.
Nevertheless, there is a school of thought that machine leaning is just the latest version of ‘old
Statistics’ and hence this relationship should be recognized.
1.4 TYPES OF MACHINE LEARNING
What does the word ‘learn’ mean? Learning, like adaptation, occurs as the result of interaction of
the program with its environment. It can be compared with the interaction between a teacher and
astudent. There are four types of machine learning as shown in Figure 1.5.
learning
nana =
Supervised Unsupervised Semi-supervised Reinforcement
learning learning learning learning
classification Regression || luster
analysis
Figure 1.5: Types of Machine Learning
Association
mining
Dimension
reduction
Before discussing the types of learning, it is necessary to discuss about data
Labelled and Unlabelled Data Data is a raw fact. Normally, data is represented in the form
of a table, Data also can be referred to asa data point, sample, or an example, Each row of the
table represents a data point, Features are attributes or characteristics of an abject. Normally, the
columns of the table are attributes. Out of all attributes, one attribute is important and is called a
label. Label is the feature that we aim to predict. Thus, there are two types of data — labelled and
unlabelled:
Labelled Data To illustrate labelled data, let us take one example dataset called Iris flower dataset
or Fisher's Iris dataset. The dataset has 50 samples of Iris ~ with four attributes, length and width
There are three classes ~ Iris setosa, Iris
of sepals and petals. The target variable is called cla
virginica, and Iris versicolor
The partial data of Iris dataset is shown in Table 1.1.8 © Machine Learning
Table 1.1: Iris Flower Dataset
‘Width of ies Mitneears
el ell ker
(eet
(eel
55 42 14 02 Setosa
2. 7 32 47 14 Versicolor
3. 73 29 63 18 Virginica
‘A dataset need not be always numbers. It can be images or video frames. Deep neural networks
can handle images with labels. Inthe following Figure 1.6, the deep neural network takes images of
dogs and cats with labels for classification.
Input Label
dog
cat
(b)
Figure 1.6: (2) Labelled Dataset (b) Unlabelled Dataset
In unlabelled data, there are no labels in the dataset.
14,1 Supervised Learning
Supervised algorithms use labelled dataset. As the name suggests there isa supervisor or teacher
component in supervised learning, A supervisor provides labelled data so that the model is
constructed and generates test data.
In supervised leaming algorithms, learning takes place in two stages. In layman terms, during the
first stage, the teacher conununicates the information to the student that the student is supposed to
‘master, The student receives the information and understands it. During this stage, the teacher has no
knowledge of whether the information is grasped by the student.
This leads to the second stage of learning, The teacher then asks the student a set of questions
to find out how much information has been grasped by the student, Based on these questions,the
to
no
ms
ns,
~ Introduction to Machine Learning * 9
the students tested, and the teacher informs the student about his assessment. This kind of learning
is typically called supervised learning.
Supervised learning has two methods:
1. Classification
2. Regression
Classification
Classification is a supervised learning method. The input attributes of the classification algorithms
are called independent variables. The target attribute is called label or dependent variable.
The relationship between the input and target variable is represented in the form of a structure
which is called a classification model. So, the focus of classification is to predict the ‘label’ that is
ina discrete form (a value from the set of finite values). An example is shown in Figure 1.7 where
a dassification algorithm takes a set of labelled data images such as dogs and cats to construct a
model that can later be used to classify an unknown test image data.
Labelled
on
New test data
Classification Classification
—} algorithm model
Label is Cat
Figure 1.7; An Example Classification System
In classification, learning takes place in two stages. During the first stage, called training stage,
the learning algorithm takes a labelled dataset and starts learning, After the training set, samples
are processed and the model is generated. In the second stage, the constructed model is tested with
test or unknown sample and assigned a label. This is the classification process.
‘This is illustrated in the above Figure 1.7. Initially, the classification learning algorithm learns
with the collection of labelled data and constructs the model. Then, a test case is selected, and the
model assigns a label
Similarly, in the case of Iris dataset, if the test is given as (6.3, 2.9, 5.6, 1.8, 2), the classification
will generate the label for this. This is called classification. One of the examples of classification is
Image recognition, which includes classification of diseases like cancer,
sification of plants, etc.
‘The classification models can be categorized based on the implementation technology like
decision trees, probabilisticmethods, distance measures, and soft computing methods. Classification
models can also be classified as generative models and discriminative models. Generative models
deal with the process of data generation and its distribution. Probabilistic models are examples of40 + Machine Learning
generative models. Discriminative models do not care about the generation of data. Instead, they
simply concentrate on classifying the given data.
Some of the key algorithms of classification are:
* Decision Tree
* Random Forest
* Support Vector Machines
© Naive Bayes
«Artificial Neural Network and Deep Learning networks like CNN
Regression Models
Regression models, unlike classification algorithms, predict continuous variables like price.
In other words, it is a number. A fitted regression model is shown in Figure 1.8 for a dataset that
represent weeks input x and product sales y.
47 “J
yranis-Product sales cata ()
i
4
|
+ |
1 2 3 4
canis Week data (x)
— Regression line (y = 0.66X + 0.54)
Figure 1.8: A Regression Model of the Form y= ax + b
‘The regression model takes input x and generates a model in the form of a fitted line of the
form y= fix). Here, x is the independent variable that may be one or more attributes and y is the
dependent variable. In Figure 1.8, linear regression takes the training set and tries to fit it with a
line — product sales = 0,66 x Week + 0.54. Here, 0.66 and 0.54 are all regression coefficients that are
learnt from data. The advantage of this model is that prediction for product sales (y) can be made
for unknown week data (z). For example, the prediction for unknown eighth week can be made by
substituting x as 8 in that regression formula to get y.
One of the most important regression algorithms is linear regression that is explained in the
next section.
Both regression and classification models are supervised algorithms. Both have a supervisor and
the concepts of training and testing are applicable to both. What is the difference between classification
and regression models? The main difference is that regression models predict continuous variables
such as product price, while classification concentrates on assigning labels such as class.hey
ice,
that
the
the
ha
are
ade
by
the
ind
sles.
Introduction to Machine Learning «1
4.4.2 Unsupervised Learning
‘The second kind of learning is by selfnstruction. As the name suggests, there are no supervisor or
teacher components. In the absence of a supervisor or teacher, self-instruction is the most common
kind of learning, process. This process of self-instruction is based on the concept of trial and error.
Here, the program is supplied with objects, but no labels are defined. The algorithm itself
observes the examples and recognizes patterns based on the principles of grouping. Grouping is
done in ways that similar objects form the same group.
Cluster analysis and Dimensional reduction algorithms are examples of unsupervised
algorithms.
Cluster Analysis
Cluster analysis is an example of unsupervised learning. It aims to group objects into disjoint
clusters or groups. Cluster analysis clusters objects based on its attributes. All the data objects
of the partitions are similar in some aspect and vary from the data objects in the other partitions
significantly.
Some of the examples of clustering processes are — segmentation of a region of interest in an
image, detection of abnormal growth in a medical image, and determining clusters of signatures
ina gene database.
‘An example of clustering scheme is shown in Figure 1.9 where the clustering algorithm takes
a set of dogs and cats images and groups it as two clusters-dogs and cats. It can be observed that
the samples belonging to a cluster are similar and samples are different radically across clusters.
Unlabelled Cluster 1
data
we
Cluster 2
Figure 1.9: An Example Clustering Scheme
Some of the key clustering algorithms are
k-means algorithm
* Hierarchical algorithms12 « Machine Learning
Dimensionality Reduction
Dimensionality reduction algorithms are examples of unsupervised algorithms. It takes a higher
dimension data as input and outputs the data in lower dimension by taking advantage of the variance
of the data. It is a task of reducing the dataset with few features without losing the generality.
The differences between supervised and unsupervised learning are listed in the following
Table 1.2.
Table 1.2: Differences between Supervised and Unsupervised Learning
oon ete Unsupervised Learning
‘There is a supervisor component _| No supervisor component
2, | Uses Labelled data Uses Unlabelled data
3. | Assigns categories or labels Performs grouping process such that similar objects
| willbe in one cluster
1.4.3 Semi-supervised Learning
‘There are circumstances where the dataset has a huge collection of unlabelled data and some
labelled data. Labelling is a costly process and difficult to perform by the humans. Semi-supervised
algorithms use unlabelled data by assigning a pseudo-label. Then, the labelled and pseudo-labelled
dataset can be combined.
1.4.4 Reinforcement Learning
Reinforcement learning mimics human beings. Like human beings use ears and eyes to perceive the
world and take actions, reinforcement learning allows the agent to iriteract with the environment
to get rewards. The agent can be human, animal, robot, ot any independent program. The rewards
enable the agent to gain experience. The agent aims to maximize the reward.
‘The reward can be positive or negative (Punishment). When the rewards are more, the behavior
gets reinforced and learning becomes possible.
Consider the following example of a Grid game as shown in Figure 1.10.
Block
Goal
Danger
Figure 1.10: A Grid game— — Introduction to Machine Learning ° 13
Intthis grid game, the gray tile indicates the danger, black is a block, and the tile with diagonal
tines is the goal. The aim is to start, say from bottomv-left grid, using the actions left, right, top and
bottom to reach the goal state.
To solve this sort of problem, there is no data. The agent interacts with the environment to
get experience. In the above case, the agent tries to create a model by simulating many paths and
finding rewarding paths. This experience helps in constructing a model.
It can be said in summary, compared to supervised learning, there is no supervisor or
Jabelled dataset. Many sequential decisions need to be taken to reach the final decision. Therefore,
reinforcement algorithms are reward-based, goal-oriented algorithms.
‘Scan for information on ‘Important Machine Learning Algorithms”
1.5 CHALLENGES OF MACHINE LEARNING
What are the challenges of machine learning? Let us discuss about them now.
Problems that can be Dealt with Machine Learning
Computers arebetter than humans in performing tasks like computation, For example, while calculating
the square root of large numbers, an average human may blink but computers can display the result in
seconds. Computers can play games like chess, GO, and even beat professional players of that game.
However, humans are better than computers in many aspects like recognition. But, deep
learning systems challenge human beings in this aspect as well. Machines can recognize human
faces in a second, Still, there are tasks where humans are better as machine learning systems still
require quality data for model construction. The quality of a learning system depends on the
quality of data. This is a challenge. Some of the challenges are listed below:
1. Problems~ Machine learning can deal with the well-posed’ problems where specifications
are complete and available. Computers cannot solve ‘ill-posed’ problems.
Consider one simple example (shown in Table 1.3):
Table 1.3: An Example
|
[_ : i
Can a model for this test data be multiplication? Thatis, y=x, x x,, Well! Itis true! But, this is
equally true that y may be y x,2. So, there are three functions that fit the data.
‘This means that the probl this problem, one needs more example to
check the model, Puzzles and games that do not have
ill-posed problem and scientific computation has many ill-posed problems.
fficient specification may become an14 + Machine Learning
Huge data ~ This is a primary requirement of machine learning, Availability of a quality
data is a challenge. A quality data means it should be large and should not have data
problems such as missing data or incorrect data.
High computation power ~ With the availability of Big Data, the computational resource
requirement has also increased. Systems with Graphics Processing Unit (GPU) or even Tensor
Processing Unit (TPU) are required to execute machine learning algorithms. Also, machine
learning tasks have become complex and hence time complexity has increased, and that
can be solved only with high computing power.
. Complexity of the algorithms - The selection of algorithms, describing the algorithms,
application of algorithms to solve machine learning task, and comparison of algorithms
have become necessary for machine learning or data scientists now. Algorithms have
become a big topic of discussion and itis a challenge for machine learning professionals to
design, select, and evaluate optimal algorithms.
Bias/Variance ~ Variance is the error of the model. This leads to a problem called bias/
variance tradeoff. A model that fits the training data correctly but fails for test data, in
general lacks generalization, is called overfitting. The reverse problem is called underfitting
where the model fails for training data but has good generalization. Overfitting and
underfitting are great challenges for machine learning algorithms.
1.6 MACHINE LEARNING PROCESS
‘The emerging process model for the data mining solutions for business organizations is CRISP-DM.
Since machine learning is like data mining, except for the aim, this process can be used for machine
earning. CRISP-DM stands for Cross Industry Standard Process ~ Data Mining, This process
involves six steps. The steps are listed below in Figure 1.11,
Understand the [oT-| Understand the
| data
a
Data
preprocesing |
~
i
Modelling
business
Model evaluation
Model deployment
Figure 1.11: A Machine Learning/Data Mining Process— Introduction to Machine Learning » 15
1. Understanding the business ~ This step involves understanding the objectives and
requirements of the business organization, Generally, a single data mining algorithm is
enough for giving, the solution. This step also involves the formulation of the problem
statement for the data mining process.
2. Understanding the data ~ It involves the steps like data collection, study of the charac-
teristics of the data, formulation of hypothesis, and matching of patterns to the selected
hypothesis,
3. Preparation of data - This step involves producing the final dataset by cleaning the raw
data and preparation of data for the data mining process. The missing values may cause
problems during both training and testing phases. Missing data forces classifiers to produce
inaccurate results, This is a perennial problem for the classification models, Hence, suitable
strategies should be adopted to handle the missing data.
4, Modelling ~ This step plays a role in the application of data mining algorithm for the data
to obtain a model or pattern.
5. Evaluate ~ This step involves the evaluation of the data mining results using statistical
analysis and visualization methods. The performance of the classifier is determined by
evaluating the accuracy of the classifier. The process of classification is a fuzzy issue.
For example, classification of emails requires extensive domain knowledge and requires
domain experts. Hence, performance of the classifier is very crucial.
6. Deployment ~ This step involves the deployment of results of the data mining algorithm
to improve the existing process or for a new situation.
1.7 MACHINE LEARNING APPLICATIONS
Machine Learning technologies are used widely now in different domains. Machine learning appli
cations are everywhere! One encounters many machine learning, applications in the day-to-day
life. Some applications are listed below:
1, Sentiment analysis ~ This is an application of natural language processing (NLP) where
the words of documents are converted to sentiments like happy, sad, and angry which are
captured by emoticons effectively. For movie reviews or produict reviews, five stars or one
star are automatically attached using sentiment analysis programs.
Recommendation systems ~ These are systems that make personalized purchases possible.
For example, Amazon recommends users to find related books or books bought by people
who have the same taste like you, and Netflix suggests shows or related movies of your
taste, The recommendation systems are based on machine learning.
3, Voice assistants ~ Products like Amazon Alexa, Microsoft Cortana, Apple Siri, and Google
Assistant are all examples of voice assistants. They take speech commands and perform
tasks. These chatbo
4, Technologies like Google Maj
learning which offer to locate and
are the result of machine learning technologies
and those used by Uber are all
avigate shortest
amples of machine
juce time:
The machine learning applications
the machine learning applications.
re enormous. The following Table 1.4 summarizes some of16 © Machine Learning
Table 1.4: Applications’ Survey Table
No. (esouerouel ‘Applications
Business Predicting the bankruptcy of a business firm
2, | Banking Prediction of bank loan defaulters and detecting credit card frauds
3. | Image Processing | Image search engines, object identification, image classification, and
generating synthetic images
4. | Audio/Voice Chatbots like Alexa, Microsoft Cortana. Developing chatbots for
customer support, speech to text, and text to voice
7 TTelecommuni- | Trend analysis and identification of bogus calls, fraudulent calls and
cation its callers, churn analysis
6. | Marketing Retail sales analysis, market basket analysis, product performance
analysis, market segmentation analysis, and study of travel patterns of
customers for marketing tours
7. | Games Game programs for Chess, GO, and Atari video games
8. | Natural Language | Google Translate, Text summarization, and sentiment analysis
Translation
9. | Web Analysis and | Identification of access pattems, detection of e-mail spams, viruses,
Services personalized web services, search engines like Google, detection of
promotion of user websites, and finding loyalty of users after web page
layout modification
10. | Medicine Prediction of diseases, given disease symptoms as cancer or diabetes.
Prediction of effectiveness of the treatment using patient history and
Chatbots to interact with patients like IBM Watson uses machine
learning technologies. '
1. | Multimedia and | Face recognitionjidentification, biometric projects like identification | |
Security of a person from a large image or video database, and applications
involving multimedia retrieval
12. | Scientific Domain | Discovery of new galaxies, identification of groups of houses based
‘on house type/geographical location, identification of earthquake
epicenters, and identification of similar land use
Summary
1. Machine learning can enable top management of an organization to extract the knowledge from the |
data stored in various archives to facilitate decision making.
2. Machine learning is an important subbranch of Artificial Intelligence (Al).
3. A model is an explicit description of patterns within the data.
4, A model can be a formula, procedure or representation that can generate data decisions.
5, Humans predict by remembering the past, then formulate the strategy and make a prediction. In the
same manner, the computers can predict by following the process.
6. Machine learning is an important branch of AI. Al is a much broader subject. The aim of AI is to
develop intelligent agents. An agent can be a robot, humans, or other autonomous systems. I—
Introduction to Machine Learning + 17
7. Deep learning is a branch of machine leaming. The difference between machine learning and deep
learning is that models are constructed using neural network technology in deep learning, Neural
networks are models constructed based on the human neuron models.
8. Data science deals with gathering of data for analysis. Itis a broad field that includes other fields.
9. Data analytics aims to extract useful knowledge from crude data. There are many types of analytics.
Predictive data analytics is an area that is dedicated for making predictions. Machine learning is
closely related to this branch of analytics and shares almost all algorithms.
10. One can say thus there are two types of data ~ labelled data and unlabelled data. The data with a
label is called labelled data and those without a label are called unlabelled data.
11. Supervised algorithms use labelled dataset. As the name suggests, there is a supervisor or teacher
component in supervised learning. A supervisor provides the labelled data so that the model is
constructed and gives test data for checking the model.
12. Classification is a supervised learning method. The input attributes of the classification algorithms
1.
are called independent variables. The target attribute is called label or dependent variable. ‘The
relationship between the input and target variables is represented in the form ofa structure which is
called a classification model.
Cluster analysis is an example of unsupervised leaming. Tt aims to assemble objects into disjoint
clusters or groups.
4, Semi-supervised algorithms assign a pseudo-label for unlabelled data,
15. Reinforcement learning allows the agent to interact with the environment to get rewards, The agent
1
can be human, animal, robot, or any independent program. The rewards enable the agent to gain
experience.
6, The emerging process model for the data mining solutions for business organizations is CRISP-DM.
‘This model stands for Cross Industry Standard Process ~ Data Mining.
17. Machine Learning technologies are used widely now in different domains.
Machine Learning ~ A branch of AI that concems about machines to learn austomatically without
being explicitly programmed.
Data ~ A raw fact.
Model ~ An explicit description of patterns in a data,
Experience ~ A collection of knowledge and heuristics in humans and historical training data in case
of machines.
Predictive Modelling ~ A technique of developing models and making a prediction of unseen data.
Deep Learning ~ A branch of machine learning that deals with constructing models using neural
networks.
Data Science ~ A field of study that encompasses capturing of data to its an:
ysis covering all stage
of data management
Data Analytics ~ A field of study that deals with analysis of data,
Big Data ~ A study of data that has characteristics of volume, variety, and velocity
Pattern Recognition ~ A field of study that analyses a pattern using machine learning algorithms.18
Machine Learning —————______
Statistics — A branch of mathematics that deals with learning from data using statistical methods.
Hypothesis ~ An initial assumption of an experiment.
Learning ~ Adapting to the environment that happens because of interaction of an agent with the
environment.
Label ~A target attribute.
Labelled Data A data that is associated with a label.
Unlabelled Data~ A data without labels
Supervised Learning - A type of machine learning that uses labelled data and learns with the help
‘of a supervisor or teacher component.
Classification Program — A supervisory learning method that takes an unknown input and assigns
a label for it. In simple words, finds the category of class of the input attributes.
Regression Analysis A supervisory method that predicts the continuous variables based on the
input variables,
Unsupervised Learning - A type of machine leaning that uses unlabelled data and groups the
attributes to clusters using atrial and error approach,
Cluster Analysis - A type of unsupervised approach that groups the objects based on attributes
so that similar objects or data points form a cluster.
Semi-supervised Learning - A type of machine leaming that uses limited labelled and large
unlabelled data, It first labels unlabelled data using labelled data and combines it for learning
purposes.
Reinforcement Learning ~ A type of machine learning that uses agents and environment interaction
for creating labelled data for learning,
Well-posed Problem - A problem that has well-defined specifications, Otherwise, the problem is
called ill-posed.
Bias/Variance ~ The inability. of the machine leaning algorithm to predict correctly due to lack
of generalization is called bias. Variance is the error of the model for training data, This leads to
problems called overfitting and underfitting,
Model Deployment ~ A method of deploying machine leaming algorithms to improve the existing
business processes for a new situation.
So
Why is machine learning needed for business organizations?
List out the factors that drive the popularity of machine learning.
What is a model?
Distinguish between the terms: Data, Information, Knowledge, and Intelligence.
. How is machine leaning linked to Al, Data Science, and Statistics?
List out the types of machine learning.
List out the differences between a model and pattern, Patterns are local and model is global for entire
dataset — Justify
. Are classification and clustering are same or different? Justify.eee enema nonreranamnmaietreroiinminen een Ae IIR
10.
iu
12.
13,
14,
15,
Introduction to Machine Learning » 19
List out the differences between labelled and unlabelled data.
Point out the differences between supervised and unsupervised learning.
What are the differences between classification and regression?
‘What isa semi-supervised learning?
List out the differences between reinforced leaming and supervised learning.
List out important dassification and clustering algorithms
List out at least five major applications of machine learning.
Long Questions
1
2.
3.
Explain in detail the machine learning process model.
List out and briefly explain the classification algorithms.
List out and briefly explain the unsupervised algorithms.
Numerical Problems and Activities
Let us assume a regression algorithm generates a model y = 0.54 + 0.66 x for data pertaining to week
sales data of a product. Here, x is the week and y is the product sales. Find the prediction for the
5 and 8" week.
. Give two examples of patterns and models.
. Survey and find out atleast five latest applications of machine learning,
. Survey and list out atleast five products that use machine learning,
SS og20 © Machine Learning
Across
3.
nL
2B.
15.
7,
19.
a.
2.
2.
24,
The initial assumption of the experiment is
called a
5. A study that deals with the analysis of data
is called
A domain of study that coversall the aspects
of data management is called
science.
fact
model.
Dataisa
CRISP-DM is a
Pattern recogition is used for identifying
pattems in and videos.
‘Unsupervised learning uses
data.
Reinforcement learning uses feedback from
environment for learning —. (True/False)
Amazon Alexa is a, assistant.
Classification is an example of,
learning.
data has the characteristics
such as volume, variety and velocity.
Down
1.
10.
2
14.
16.
18.
. Learning is
A problem thathas well-posed specification
can be solved using machine learning
algorithms —. (Yes/No)
Cluster analysis is an example of
learning,
. Learning from datais the aim of statistics —.
(Yes/No)
. Predictive models can predict based on
data.
. Regression can predict
variables.
tothe
environment.
Bias and variance cause overfitting and
of model
Lack of generalization in machine learning
happens because of Bias —. (Yes/No)
Supervised learning uses
data,
Machine learning is learning
without being explicity programmed.
Model is a description of
A semi-supervised algorithm assigns a
pseudo-label for unlabelled data —. (True/
False)
|. Machine leaning using neural networks
from a domain called artificial neural
network arid learning.8 foMee Zu eUneee
ME ES eEl op
2 Imp mE oxQOBOneNHUOH] . By g
g ONUMZO>ZUxKENOHemM SZEE
z Zeawant0mne Et oOOnn aU z
2 jo>unmeM Po tree om tinn ge
3 PeUMH MAR AmH HE HHDOHKYU oe
g Onn Onazamaenmsmm| EZ E
& a> BOte Man uZnvOnnn
7 eDE Sn .0n) = ZE
Jnmo zn Ema SFOMH mR aK Zag:
lw OOMNAZ—mOCmM<=EZ
lOZmadDuDzOHnunoza< &
lnnmonmxmEdmxaaiodm ge 8
Jon DunDmEmOnvOZeux £2be
Boon bawdonesDoupzal MAO
Ziman mm r>aempmxode < =
El.mxmeONOeee eae 2
Zloe Dade EZaDx2DNNN<| = ky
Ps 0u bax maNOuendOmpZuPaun>uanmonl 2 By
ElomtannuazeaxEoo is known as
five-point summary.
Box plots are suitable for continuous variables and a nominal variable. Box plots can be used
to illustrate data distributions and summary of data. It is the popular way for plotting five number
summaries. A Box plot is also known as a Box and whisker plot.
‘The box contains bulk of the data. These data are between first and third quartiles. The line
inside the box indicates location - mostly median of the data. If the median is not equidistant, then
the data is skewed. The whiskers that project from the ends of the box indicate the spread of the
tails and the maximum and minimum of the data value.
eS
Find the 5-point summary of the list (13, 11, 2, 3, 4, 8, 9}.
Solution: The minimum is 2 and the maximum is 13. The Q, Q, and Q, are 3, 8 and 11, respec-
tively. Hence, 5-point summary is {2, 3, 8, 11, 13}, that is, {minimum, Q,, median, Q,, maximum).
Box plots are useful for describing, 5-point summary. The Box plot for the set is given in
Figure 2.7.
English marks box plot
Marks scored
yveoe
English
Figure 2.7: Box Plot for English Marks
a e
2.5.4 Shape
Skewness and Kurtosis (called moments) indicate the symmetry/asymmetry and peak location of
the dataset.
Skewness
The measures of direction and degree of symmetry are called measures of third order. Ideally
skewness should be zero as in ideal normal distribution. More often, the given dataset may not
have perfect symmetry (consider the following Figure 2.8).42» Machine Learning
(6) (b)
Figure 2.8: (a) Positive Skewed and (b) Negative Skewed Data
The dataset may also either have very high values or extremely low values. If the dataset has
far higher values, then it is said to be skewed to the tight. On the other hand, if the dat; get has far
more low values then it is said to be skewed towards left. Ifthe tail is longer on the lofthand side
and hump on the right-hand side, it is called positive skew. Otherwise, it is called négative skew.
The given dataset may have an equal distribution of data. The implication of this is that if the
data is skewed, then there is a greater chance of outliers in the dataset. This affects the mean and
median. Hence, this may affect the performance ofthe data mining algorithm. A perfect symmetry
means the skewness is zero. In the case of skew, the median is greater than the mean. In positive
skew, the mean is greater than the median.
Generally, for negatively skewed distribution, the median is more than the mean,
The relationship between skew and the relative size of the mean and median can be summarized
by a convenient numerical skew index known as Pearson 2 skewness coefficient.
3x(u= median) — e1
o
Also, the following measure is more commonly used to measure skewness. Let Xy X,-) Xy
bea set of ‘N’ values or observations then the skewness can be given as:
1x @,- WP
Ay, geo
whe
Here, iris the population mean and is the population standard deviation of the univariate
data. Sometimes, for bias correction instead of N, N 1 is used,
(2.13)
Kurtosis
Kurtosis also indicates the peaks of data. If the data is high peak, then it indicates higher
kurtosis and vice versa,
Kurtosis is the measure of whether the data is heavy tailed or light tailed relative to normal
distribution. It can be observed that normal distribution has bell-shaped curve with no long tals,
Low kuriosis tends tohave light tails. The implication is that there isno outlier data, Lett x»,
bea set of ‘N’ values or observations. Then, kurtosis is measured using the formula given below:
2, -3/N
o
It can be observed that N - 1 is used instead of N in the numerator of Eq. (2.14) for
(214)
/bias correction. Here, X and ¢ are the mean and standard deviation of the univariate data,
“ respectively.1s far
side
"ifthe
vand
retry
sitive
2.14)
) for
data,
_ thas |
———— Understanding Data + 43
ome of the other useful measures for finding the shape ofthe univariate dataset are mean absolute
‘eviation (MAD) and coefficient of variation (CV).
mean Absolute Deviation (MAD)
MADis another dispersion measure and is robust to outliers. Normally, the outlier points detected
by computing the deviation from median and by dividing itby MAD. Here, the absolute deviation
between the data and mean is taken. Thus, the absolute deviation is given as:
le—ul 215)
‘The sum of the absolute deviations is given as 21x —j
Ze-al
‘Therefore, the mean absolute deviation is given as: (2.16)
Coefficient of Variation (CV)
Coefficient of variation is used to compare datasets with different units. CV is the ratio of standard
deviation and mean, and %CV is the percentage of coefficient of variations.
2.5.5 Special Univariate Plots
‘The ideal way to check the shape of the dataset is a stem and leaf plot. A stem and leaf plot are a
display that help us to know the shape and distribution of the data. In this method, each value is
split into a ‘stem’ and a ‘leaf’. The last digit is usually the leaf and digits to the left: of the leaf mostly
form the stem. For example, marks 45 are divided into stem 4 and leaf 5 in Figure 2.9.
‘The stem and leaf plot for the English subject marks, say, (45, 60, 60, 80, 85} is given in
Figure 2.9. :
Leaf
als
6|oo
8 fos
Figure 2.9: Stem and Leaf Plot for English Marks
It can be seen from Figure 2.9 that the first column is stem and the second column is leaf.
For the given English marks, two students with 60 marks are shown in stem and leaf plot as stem-6
with 2 leaves with 0.
As discussed earlier, the ideal
to normality. Most of the statistical tests are designed only for normal distribution of data.
A QQ plot can be used fo assess the shape of the dataset. The Q-Q plot is a 2D scatter plot of an
univariate data against theoretical normal distribution data or of two datasets ~ the quartiles of
iad datasets. The normal Q-Q plot for marks x =[13 11 234 8 9] is given below in
ape of the dataset is a bell-shaped curve. This corresponds
the first and sec
Figure 2.10.44 «© Machine Learning
QQ plot of sample data versus standard normal
16
14
Quantiles of input sample
5 0 t 05 0 05 1 15
Standard normal quantiles
Figure 2.10: Normal Q-Q Plot
Ideally, the points fall along the reference line (45 Degree) if the data follows normal distri-
bution. Ifthe deviation is more, then there is greater evidence that the datasets follow some different
distribution, that is, other than the normal distribution shape. In such a case, careful analysis of the
slatistical investigations should be carried out before interpretation.
This skewness, kurtosis, mean absolute deviation and coefficient of variation help in assessing
the univariate data.
vy
2.6 BIVARIATE DATA AND MULTIVARIATE DATA
Bivariate Data involves two variables. Bivariate data deals with causes of relationships. The aim is
to find relationships among data. Consider the following Table 2.3, with data ofthe temperature in
a shop and sales of sweaters.
Table 2.3: Temperature in a Shop and Sales Data
Oye ce
Temperature (in centigr
5 200
10 150
15 ~ 140
20 75
2 @
3 55 |
B 20