0% found this document useful (0 votes)

17 views45 pages

ML Mid1 Notes

The document discusses the principles and applications of machine learning, emphasizing the importance of learning from experience and the need for well-defined learning problems. It outlines various types of learning tasks, including classification, regression, and clustering, and highlights the significance of data mining across diverse fields such as healthcare, finance, and robotics. Additionally, it introduces the Occam's Razor principle, advocating for simplicity in model selection and addressing challenges like accuracy and overfitting in machine learning.

Uploaded by

prasadreddy7577

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views45 pages

ML Mid1 Notes

Uploaded by

prasadreddy7577

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

1.

Well posed learning problem

Soft Computing/Machine Learning
In the conventional approach to solving decision problems, the variables related to the problem
are identified (input or condition variables and output or action variables); the relationships
among the variables are expressed in terms of mathematical (like algebraic, differential
(ordinary/ partial), difference, integral or functional) equations, that fit our prior knowledge on
the problem and cover the observed data. Actions (decisions) are given by analytical or
numerical solutions of these equations. The statistical tools consider description of data in terms
of probability measure rather than a deterministic function; the estimation methods yield the
decisions. Whenever devising mathematical/statistical model is feasible using a reasonable
number of equations that can solve the given problem in a reasonable time, at a reasonable cost,
and with reasonable accuracy, there is no need to look for an alternative.
The present-day scene is much different from yesterday; we now have ocean of data to be
processed. Humans are unable to extract useful information from them. Computers of today can
store this data and analyze it. However, to lead to meaningful analysis, human-like abilities need
to be incorporated into software solutions. This, in fact, is the essence of machine learning.
The basic premises of machine learning are as follows:
• The real world is pervasively imprecise and uncertain.
• The precision and certainty carry a cost. The guiding principle of machine learning, which
follows from these premises is as follows:
• Exploit tolerance for imprecision, uncertainty, and partial truth, to achieve tractability,
robustness, and low solution costs.

WELL-POSED MACHINE LEARNING PROBLEMS

The field of machine learning is concerned with the question of how to construct computer
programs that improve their performance at some task through experience. Machine learning is
about making computers modify or adapt their actions (whether the task is making predictions,
or controlling a robot) so that these actions get more accurate with experience, where accuracy is
measured by how well the chosen actions reflect the correct ones. Put more precisely [1],
A computer program is said to learn from experience with respect to some class of tasks and
performance measure, if the performance at the tasks, as measured by performance measure,
improves with the experience.
In general, to have a well-defined learning problem, we must identify these three features:
• The learning task
• The measure of performance
• The task experience
The key concept that we will need to think about for our machines is learning from experience.
Important aspects of ‘learning from experience’ behavior of humans and other animals
embedded in machine learning are remembering, adapting, and generalizing.
• Remembering and Adapting: Recognizing that last time in a similar situation, a certain action
(that resulted in this output) was attempted and had worked; therefore, it should be tried again
or this same action failed in the last attempt in a similar situation, and so something different
should be tried.
• Generalizing: This aspect is regarding recognizing similarity between different situations.
This makes learning useful because we can use our knowledge in situations unseen earlier.
Given a situation not faced earlier, recognizing similarity with the situations faced earlier, we
take a decision for the new situation—a generalizing capability of animal learning.
Machine learning concerns getting computers to alter or adapt their actions in a way that those
actions improve in terms of accuracy, with experience. Machine learning, like animal learning,
relies heavily on the notion of similarity in its search for valuable knowledge in data. The
computer program is the ‘machine’ in our context. The computer program is designed employing
learning from the task experience. Equivalently, we say that the machine is trained using task
experience, or machine learns from task experience. The terms: learning machine, learning
algorithm, learned knowledge, all refer to a computer program design with respect to the
assigned task. In case of any software system, understanding the inputs and outputs is of greater
importance than being aware of what takes place in between, and machine learning does just
that. The input is defined by the learning task. Four different types of learning tasks appear in the
real-world applications (details given later in Section 1.7). In classification learning, the
expectation is that the machine will learn a technique of classifying examples of
measurements/observations. In association learning, any relation between observations is
required, not merely association capable of predicting a specific class value. In clustering,
groups of observations that belong together are sought. In regression, the output to be predicted
is not a discrete class but a continuous numeric quantity. The classification and regression tasks
are carried out through the process of directed/supervised learning. For the examples of
measurements/observations, the outcome is known ‘a priori’; for classification problems, the
outcome is the class to which the example belongs; and for regression problems, the outcome is
the numeric value on the approximating curve that fits the data. The other form of learning is
undirected/unsupervised, wherein the outcome is not known ‘a priori’; clustering and association
learning belong to this category, as we shall see in later chapters. The experience with which the
machine will be trained (from which the machine will learn) may be available in the form of data
collected in databases. Most of the information that surrounds us, manifests itself in the form of
data that can be as basic as a set of measurements or observations, characterized by vectors with
numerical values; or may be in forms which are more difficult to characterize in the form of
numerical vectors—set of images, documents, audio clips, video clips, graphs, etc. For different
forms of raw data (text, images, waveforms, and so forth), it is common to represent data in
standard fixed length vector formats with numerical values. Such abstractions typically involve
significant loss of information, yet they are essential for a well-defined learning problem.
Thus, though the raw data is an agglomerated mass that cannot be fragmented accurately into
individual experience examples characterized by numerical vectors—yet it is very useful for
learning many things.
Numerical form of data representation allows us to deal with patterns geometrically, and thus we
shall study learning algorithms using linear algebra and analytic geometry. Characterizing the
similarity of patterns in state space can be done through some form of metric (distance) measure:
distance between two vectors is a measure of similarity between two corresponding patterns.
Many measures of ‘distance’ have been proposed in the literature. In another class of machine
learning problems, the input (experience) is available in the form of nominal (or categorical)
data, described in linguistic form (not numerical). For nominal form of data, there is no natural
notion of similarity. Each learning algorithm based on nominal data employs some nonmetric
method of similarity. In an alternative learning option, there is no training dataset, but human
knowledge (experience, expertise, heuristics) is available in linguistic form. This form of human
knowledge, when properly structured as a set of IF-THEN rules, can be embedded into a
learning machine. Having described the input to the software system, let us now look at the
output description. The output of an algorithm represents the learned knowledge. This
knowledge is in the form of a model of the structural patterns in the data. The model is deployed
by the user for decision-making; it gives the prediction with respect to the assigned task for
measurements/observations not in the task experience; a good model will generalize well to
observations unseen by the machine during training. A block diagrammatic representation of a
learning machine is shown in Fig. 1.1.
2. Applications of Machine learning in diverse fields
Machine learning is a growing technology used to mine knowledge from data (popularly known
as data mining field. Wherever data exists, things can be learned from it. Whenever there is
excess of data, the mechanics of learning must be automatic. Machine learning technology is
meant for automatic learning from voluminous datasets.
Google is by far the most popular and extensively used of all search engines. It offers access to
information from billions of web pages, which have been indexed on its server.
While going through the results of our Google query, many different advertisements show up
relating to our query. To tailor ads to match the interests of the users is a strategy by Google and
is one of the typical services that every Internet search provider tries to offer. Mining
information on the World Wide Web is an area that is fast growing, almost exploding.
Many organizations use data mining for customer relationship management (CRM), which
facilitates the provision of more customized and personal service, addressing individual
requirements of customers. It is possible for organizations to tailor ads and promotions to the
profiles of customers by closely studying the patterns of browsing and buying on web stores.
Banks were fast enough to embrace data mining technology to examine the issue of fickle
customers. That is, there is a likelihood of them defecting. As they successfully used machine
learning to assess credit, it was possible to reduce customer attrition. Cellular phone companies
handle churn by identifying behavioral patterns that gain from new services, and then promoting
such services in order to retain their customer base.
Data mining has greatly impacted the ways in which people use computers. On getting on to the
Internet, for instance, let us say we feel like checking our email. Unknown to us, many irritating
emails have already been noticed using spam filters that use machine learning to identify spam.
Computer network security is a continually rising issue. While protectors keep hardening
networks, operating systems, and applications, attackers keep discovering weak spots in all these
areas. Systems for detecting intrusions are able to detect unusual patterns of activity. Data
mining is being applied to this issue in an attempt to find out semantic connections among
attacker traces in computer network data. Privacy-preserving data mining assists in protecting
privacy-sensitive information, such as credit card transaction records, healthcare records,
personal financial records, biological features, criminal/justice investigations, and ethnicity.
Of late, huge data collection and storage technologies have altered the landscape of scientific
data analysis. Major examples include applications which involve natural resources, the
prediction of floods and droughts, meteorology, astronomy, geography, geology, biology, and
other scientific and engineering data. Machine learning/data mining is present in all of these
examples.
Machine Vision: It is a field where pattern recognition has been applied with major successes.
A machine vision system captures images through a camera and analyzes these to be able to
describe the image. A machine vision system is applicable in the manufacturing industry, for
automating the assembly line or for automated visual inspection.
Biometric Recognition: It has been made clear by decades of research in pattern recognition
that the level of visual understanding and recognition that humans exhibit cannot be matched by
computer algorithms. Certain problems, such as biometric recognition (fingerprints
identification, face and gesture recognition, etc.) are being handled with success, but general
purpose image-representation systems are still not visible on the horizon.
Handwriting Recognition: It is another area where pattern recognition can be applied, with
major consequences in automation and information handling. Take first the simpler problem of
printed character recognition. The commercially available Optical Character Recognition or
OCR system has a light source, a document transport, as well as a detector. At the point where
the light-sensitive detector gives output, light intensity variation is translated into ‘numbers’. On
this image array, image processing and pattern recognition methods are applied to identify the
characters—that is, to categorize each character into the correct ‘letter’, ‘number’, and
‘punctuation’ class.
Medical Diagnosis: It also uses pattern recognition. Doctors make use of it while making
diagnostic decisions. The ultimate diagnosis is, of course, made by the doctor. Computer-aided
diagnosis has been applied to, and is of interest for, a range of medical data—X-rays, computed
tomographic images, ultrasound images, electrocardiograms (ECGs), and electroencephalograms
(EEGs). Alignment of Biological Sequences: Alignment of sequences is done on the basis of the
fact that all living organisms are related by evolution. This means, nucleotide (DNA, RNA) and
amino acid (proteins) sequences of species that have evolved close to each other, should display
more similarities. An alignment is the procedure of lining up sequences to obtain a maximum
identity level, which also expresses the level of similarity between sequences. Biological
sequence analysis is significant in bioinformatics and modern biology.
Drug Design: It is usually based on a long and expensive process involving complex chemical
experiments to check whether or not a particular chemical compound could be a good candidate
for a specific drug, which would be a positive result involving further clinical experiments. For
several years, a new scheme based on computational simulations has been emerging.
Speech Recognition: It is an area that has been well researched. Speech is the most natural
means by which humans share, convey and exchange information. Intelligent machines that
recognize spoken information can be used in numerous applications, for example, to help control
machines by talking to them—entering data into a computer via a microphone. Speech
recognition can enhance our ability to communicate with the deaf and dumb.
Text Mining: It concerns identification of patterns in text. The procedure involves analysis of
text for extraction of useful information for specific purposes. The way information available on
the Web and on corporate intranets, digital libraries, and news wires is spread or propagated, is
overwhelming. Integration of this information into the decision-making process, at a fast pace, is
essential in order to help businesses stay competitive in today’s market. Text mining has reached
the industrial world and is helping to exploit knowledge that, due to its shear size, is often
beyond human consumption.
Natural Language Processing: Ever since the computer age dawned, computer science research
has been attempting to understand human language. In 1950, immediately following the first
invention of the computer, Alan Turing, one of the greatest computer scientists of the twentieth
century, suggested a test for computer intelligence. In a paper titled “Computing Machinery and
Intelligence”, he introduced this machine. Over sixty years later, computers could perform
extraordinary actions that Alan Turing probably never imagined could be possible. Language is
obviously a critical component of how people communicate and how information is stored in the
business world and beyond. The goal of Natural Language Processing (NLP) is to analyze,
understand, and generate languages that humans use naturally so that eventually a computer will
‘naturally’ be able to interpret what the other person is saying. Voice automation is just starting
with robot vacuum cleaners that respond to cleaning orders; telephones and household
appliances that obey voice commands.
Fault Diagnostics: Preventive upkeep of motors and generators and other electromechanical
devices, can delay malfunctions that may otherwise interrupt industrial procedures. Typical
defects or flaws include misalignment of shaft, mechanical slackening, defective bearings, and
unbalanced pumps.
Load Forecasting: It is quite essential to establish future power demand in the electricity supply
industry. In fact, the earlier the demand is known, the better. Precise estimates can be made with
the help of machine learning methods for the maximum and minimum load for each hour, day,
month, season, and year.
Control and Automation: A quiet revolution is ongoing in the manufacturing world which is
changing the look of factories. Computers are controlling and monitoring manufacturing
processes with a high degree of automation, facilitated by machine learning techniques. The
computer control includes control of all types of processes such as Computerized Numerical
Control (CNC), welding, electrochemical machining, etc., and control of industrial robots. High
degree of automation is applied in today’s Flexible Manufacturing Systems (FMS) that can be
readily rearranged to handle new market requirements. Flexible manufacturing systems,
combined with automatic assembly and product inspection on one hand, and CAD/CAM system
on the other, are the basic components of the modern Computer Integrated Manufacturing
System.
Business Intelligence: It is essential for businesses to be able to comprehend the commercial
control of their organization well, in terms of customer base, market, supply and resources, and
competition. Business Intelligence (BI) technologies offer not only historical and current
information but also predictive views of business operations. Data mining is the fundamental
core of business intelligence. In the absence of data mining, many businesses may be unable to
effectively perform market analyses, compare customer feedback on similar products, find the
strengths and weaknesses of their competitors, retain extremely valuable customers, and arrive at
intelligent business decisions.
Robotics and automation
A robot is a machine capable of carrying out a series of complex tasks automatically,
programmed by computers. Eg. Automated visual inspection
3. Occam's Razor Principle

Occam’s Razor is a principle that likes simplicity. It says that the simplest solution is usually the best one.
In machine learning, this means that if we have two models that work about as well as each other, we should
choose the simpler one.

Fig.2.5 Occam Razor simplicity

Accuracy: The Accuracy is computed by dividing the number of correctly predicted instances by the
total number of instances in the dataset. The formula is Accuracy = (Number of correctly predicted
instances) / (Total number of instances). The output is usually expressed as a percentage.
Overfitting:
Overfitting is an undesirable machine learning behavior that occurs when the machine learning model
gives accurate predictions for training data but not for new data. When data scientists use machine
learning models for making predictions, they first train the model on a known data set.
In machine learning, overfitting occurs when an algorithm fits too closely or even exactly to its
training data, resulting in a model that can't make accurate predictions or conclusions from any data
other than the training data. Overfitting defeats purpose of the machine learning model.

What’s Occam’s Razor w.r.t Machine Learning?

Have you ever wondered why we sometimes get lost in complex solutions when a simple one can solve the
problem? This brings us to an important concept in machine learning, Occam’s Razor.

Who needs to know about Occam’s Razor?

Anyone who works with machine learning models should know about it. This includes data
scientists and machine learning engineers. Occam’s Razor can help you make good decisions when
you’re choosing between different models. It can stop you from picking a model that’s too
complicated when a simpler one would do the job. Occam’s Razor can be used in all parts of machine
learning. Whether you’re deciding which features to use in your model, which algorithm to use,
or how to fine-tune your model, Occam’s Razor can guide you. It tells you to choose simplicity and
avoid overfitting.

Techniques for Applying Occam’s Razor in Machine Learning

Staying aligned with the philosophy of Occam’s Razor in the context of machine learning involves choosing
simpler models when possible and using techniques that prevent overfitting. The following are some of the
techniques we can apply to stay aligned with Occam’s Razor while building machine learning model:

1. Start with Simpler Models: Rather than starting with a complex model, start with a simpler one. You
could begin with a linear regression or decision tree before moving to more complex models like
random forests or neural networks. This gives you a baseline to compare against and helps you
understand if the additional complexity is justified.
2. Regularization: Regularization techniques such as L1 (Lasso) and L2 (Ridge) can help prevent
overfitting by adding a penalty term to the loss function that constrains the magnitude of the
parameters. This discourages the model from relying too heavily on any one feature and makes the
model simpler and more generalizable.
3. Pruning: Pruning techniques are used in decision trees and neural networks to remove unnecessary
complexity. In decision trees, pruning can remove unimportant branches. In neural networks, pruning
can remove unnecessary weights or neurons.
4. Cross-Validation: Cross-validation helps you understand how well your model generalizes to unseen
data. If a model performs well on the training data but poorly on the validation data, it’s likely
overfitting, which indicates that the model might be too complex.
5. Dimensionality Reduction: Techniques like Principal Component Analysis (PCA) or t-SNE can
reduce the number of features in your data, simplifying the model and helping to prevent overfitting.
6. Feature Selection: By selecting only the most important features for your model, you can reduce
complexity and improve interpretability. Techniques for feature selection include mutual information,
correlation coefficients, and recursive feature elimination.
7. Hyperparameter Tuning: Many machine learning models have hyperparameters that control their
complexity. For example, the depth of a decision tree, or the penalty term in a regularized regression.
Tuning these hyperparameters can help you find the right balance between simplicity and accuracy.
The following represents some of the benefits of understanding and applying Occam’s Razor for data
scientists

 Enhancing Interpretability: Simpler models are often more interpretable, which means it’s easier to
understand how they’re making predictions. This can be important for trust, transparency, and even
legal reasons in certain industries. For example, in healthcare or finance, being able to explain why a
model made a certain prediction could be crucial.
 Avoiding Overfitting: As mentioned before, complex models can often fit the training data very well,
but they can also capture the noise in the data, leading to overfitting. An overfitted model performs well
on the training data but poorly on unseen data, which is a problem because the goal of machine learning
is to make accurate predictions on new, unseen data. By keeping models simpler, data scientists can
reduce the risk of overfitting.
 Improving Generalizability: Simpler models are more likely to generalize well to unseen data. This is
because they are less likely to fit the noise in the training data and more likely to capture the underlying
trend or relationship.
 Reducing Computational Resources: Simpler models typically require less computational resources
to train and predict. This can be a significant advantage in real-world settings, where resources might be
limited or expensive.
4. Discuss various Supervised learning algorithms.
• Supervised learning is the machine learning task of learning a function that maps an input
to an output based on example input-output pairs.
• Unsupervised learning is a type of machine learning algorithm used to draw inferences from
datasets consisting of input data without labeled responses. The most common unsupervised
learning method is cluster analysis, which is used for exploratory data analysis to find hidden
patterns or grouping in data.
• Classification
In machine learning and statistics, classification is the problem of identifying to
which of a set of categories (sub-populations) a new observation belongs, on the basis of a
training set of data containing observations (or instances) whose category membership is
known.
• Regression
Linear Regression is a machine learning algorithm based on supervised learning.
Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x). So,this regression technique finds out a linear relationship between x
(input) and y(output).
Predictive modeling or supervised learning aims at constructing models that can predict the
value of a target (dependent) variable from the known values of attributes (independent
variables).
Subgroup discovery is a data mining technique that discovers interesting associations
among different variables with respect to a property of interest.
Descriptive clustering
Clustering is an unsupervised machine learning approach, but can it be used to
improve the accuracy of supervised machine learning algorithms as well by clustering the
data points into similar groups and using these cluster labels as independent variables in the
supervised machine learning algorithm.
Associative rule discovery
Association rule learning is a rule-based machine learning method for discovering
interesting relations between variables in large databases.
Association rule mining, at a basic level, involves the use of machine learning models to
analyze data for patterns, or co-occurrence, in a database.
Support is an indication of how frequently the items appear in the data.
Confidence indicates the number of times the if-then statements are found true.
Precision and Recall
Precision can be seen as a measure of exactness or quality, whereas recall is a measure
of completeness or quantity
Precision = Total number of documents retrieved that are relevant/Total number of
documents that are retrieved.
Recall = Total number of documents retrieved that are relevant/Total number of
relevant documents in the database.
Predictive clustering: unlabled

Descriptive clustering

Fig.2.1 A simple regression example

Fig.2.2 A simple classifier example

Inductive learning

Inductive learning takes the traditional sequence of a lesson and reverses things. Instead of
saying, “Here is the knowledge; now go practice it,” inductive learning says, “Here are some
objects, some data, some artifacts, some experiences… what knowledge can we gain from
them?”
Supervised learning algorithms

Linear Regression (Ordinary Least Squares)

Linear regression (Ordinary Least Squares Regression or OLS Regression) is perhaps one of the most well-
known and best-understood algorithms in statistics and machine learning. Linear regression is a linear
model, e.g., a model that assumes a linear relationship between the input variables (x) and the single output
variable (y). The goal of linear regression is to train a linear model to predict a new y given a previously
unseen x with as little error as possible.

Implementation in Python
from sklearn.linear_model

import LinearRegression

model = LinearRegression()

model.fit(X, Y)

Linear regression

Logistic Regression

Logistic regression is one of the most widely used algorithms for classification. The logistic regression
model arises from the desire to model the probabilities of the output classes given a function that is linear
in x, at the same time ensuring that output probabilities sum up to one and remain between zero and one as
we would expect from probabilities.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X, Y)

Support Vector Machine

The objective of the support vector machine (SVM) algorithm is to maximize the margin (shown as shaded
area in Figure 4-3), which is defined as the distance between the separating hyperplane (or decision
boundary) and the training samples that are closest to this hyperplane, the so-called support vectors. The
margin is calculated as the perpendicular distance from the line to only the closest points, as shown in Fig.
Hence, SVM calculates a maximum-margin boundary that leads to a homogeneous partition of all data
points.

Fig. Support vector machine

he SVM regression and classification models can be constructed using the sklearn package of Python, as
shown in the following code snippets:

Regression

from sklearn.svm

import SVRmodel = SVR()

model.fit(X, Y)

Classification

from sklearn.svm

import SVCmodel = SVC()

model.fit(X, Y)

K-Nearest Neighbors

K-nearest neighbors (KNN) is considered a “lazy learner,” as there is no learning required in the model. For
a new data point, predictions are made by searching through the entire training set for the K most similar
instances (the neighbors) and summarizing the output variable for those K instances.

To determine which of the K instances in the training dataset are most similar to a new input, a distance
measure is used. The most popular distance measure is Euclidean distance, which is calculated as the square
root of the sum of the squared differences between a point a and a point b across all input attributes i, and
which is represented as
Euclidean distance is a good distance measure to use if the input variables are similar in type.

Another distance metric is Manhattan distance, in which the distance between point a and point b is
represented as

Manhattan distance is a good measure to use if the input variables are not similar in type.

The steps of KNN can be summarized as follows:

Choose the number of K and a distance metric.

Find the K-nearest neighbors of the sample that we want to classify.

Assign the class label by majority vote.

KNN regression and classification models can be constructed using the sklearn package of Python, as shown
in the following code:

Classification

from sklearn.neighbors import KNeighborsClassifiermodel = KNeighborsClassifier()model.fit(X, Y)

Regression

from sklearn.neighbors import KNeighborsRegressormodel = KNeighborsRegressor()model.fit(X, Y)

Classification and Regression Trees

The model can be represented by a binary tree (or decision tree), where each node is an input variable x with
a split point and each leaf contains an output variable y for prediction.

Figure 4-4 shows an example of a simple classification tree to predict whether a person is a male or a
female based on two inputs of height (in centimeters) and weight (in kilograms).
Classification

from sklearn.tree

import DecisionTreeClassifier

model = DecisionTreeClassifier()

model.fit(X, Y)

Regression

from sklearn.tree

import DecisionTreeRegressor

model = DecisionTreeRegressor ()

model.fit(X, Y)

Random forest

Random forest is a tweaked version of bagged decision trees. In order to understand a random forest
algorithm, let us first understand the bagging algorithm. Assuming we have a dataset of one thousand
instances, the steps of bagging are:

Random forest regression and classification models can be constructed using the sklearn package of Python,
as shown in the following code:

Classification

from sklearn.ensemble

import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X, Y)

Regression

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()

model.fit(X, Y)

Create many (e.g., one hundred) random subsamples of our dataset.

Train a CART model on each sample.

Given a new dataset, calculate the average prediction from each model and aggregate the prediction by each
tree to assign the final label by majority vote.

Random forest regression and classification models can be constructed using the sklearn package of Python,
as shown in the following code:

Classification

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()

model.fit(X, Y)

Regression

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()

model.fit(X, Y)

Adaptive Boosting (AdaBoost)

Adaptive Boosting or AdaBoost is a boosting technique in which the basic idea is to try predictors
sequentially, and each subsequent model attempts to fix the errors of its predecessor. At each iteration, the
AdaBoost algorithm changes the sample distribution by modifying the weights attached to each of the
instances. It increases the weights of the wrongly predicted instances and decreases the ones of the correctly
predicted instances.

The steps of the AdaBoost algorithm are:

• Initially, all observations are given equal weights.

• A model is built on a subset of data, and using this model, predictions are made on the whole dataset.
Errors are calculated by comparing the predictions and actual values.
• While creating the next model, higher weights are given to the data points that were predicted
incorrectly. Weights can be determined using the error value. For instance, the higher the error, the
more weight is assigned to the observation.

This process is repeated until the error function does not change, or until the maximum limit of the number
of estimators is reached.

Implementation in Python

AdaBoost regression and classification models can be constructed using the sklearn package of Python, as
shown in the following code snippet:

Classification

from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier()

model.fit(X, Y)

Regression

from sklearn.ensemble import AdaBoostRegressor

model = AdaBoostRegressor()

model.fit(X, Y)

Gradient boosting method

Gradient boosting method (GBM) is another boosting technique similar to AdaBoost, where the general idea
is to try predictors sequentially. Gradient boosting works by sequentially adding the previous underfitted
predictions to the ensemble, ensuring the errors made previously are corrected.

The following are the steps of the gradient boosting algorithm:

A model (which can be referred to as the first weak learner) is built on a subset of data. Using this model,
predictions are made on the whole dataset.

Errors are calculated by comparing the predictions and actual values, and the loss is calculated using the loss
function.

A new model is created using the errors of the previous step as the target variable. The objective is to find
the best split in the data to minimize the error. The predictions made by this new model are combined with
the predictions of the previous. New errors are calculated using this predicted value and actual value.

This process is repeated until the error function does not change or until the maximum limit of the number
of estimators is reached.
Contrary to AdaBoost, which tweaks the instance weights at every interaction, this method tries to fit the
new predictor to the residual errors made by the previous predictor.

Classification

from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier()

model.fit(X, Y)

Regression

from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor()

model.fit(X, Y)

Classification

For simplicity, we will mostly discuss things in terms of a binary classification problem (i.e., only two
outcomes, such as true or false); some common terms are:

True positives (TP)

Predicted positive and are actually positive.

False positives (FP)

Predicted positive and are actually negative.

True negatives (TN)

Predicted negative and are actually negative.

False negatives (FN)

Predicted negative and are actually positive.

The difference between three commonly used evaluation metrics for classification, accuracy, precision, and
recall, is illustrated in Figure 4-8.
Accuracy

Accuracy is the number of correct predictions made as a ratio of all predictions made.

Precision

Precision is the percentage of positive instances out of the total predicted positive instances.

Recall

Recall (or sensitivity or true positive rate) is the percentage of positive instances out of the total actual
positive instances. Therefore, the denominator (true positive + false negative) is the actual number of
positive instances present in the dataset.

Area under ROC curve

Area under ROC curve (AUC) is an evaluation metric for binary classification problems. ROC is a
probability curve, and AUC represents degree or measure of separability. It tells how much the model is
capable of distinguishing between classes.

ROC stands for Receiver Operating Characteristics, and the ROC curve is the graphical representation of the
effectiveness of the binary classification model. It plots the true positive rate (TPR) vs the false positive rate
(FPR) at different classification thresholds.

5. Classification using Bayesian learning

Naïve Bayes Classifier Algorithm

o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps
in building the fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of
the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and
taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:

o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is
true.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Outlook Play
0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between features.

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring.

o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.

Types of Naïve Bayes Model:

There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values are
sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular document
belongs to which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor
variables are the independent Booleans variables. Such as if a particular word is present or not in a
document. This model is also famous for document classification tasks.

Python Implementation of the Naïve Bayes algorithm:

Now we will implement a Naive Bayes Algorithm using Python. So for this, we will use the
"user_data" dataset, which we have used in our other classification model. Therefore we can easily
compare the Naive Bayes model with the other models.

Steps to implement:

o Data Pre-processing step

o Fitting Naive Bayes to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

1) Data Pre-processing step:

In this step, we will pre-process/prepare the data so that we can use it efficiently in our code. It is similar as
we did in data-pre-processing. The code for this is given below:

Importing the libraries

import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('user_data.csv')
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

The output for the dataset is given as:

2) Fitting Naive Bayes to the Training Set:

After the pre-processing step, now we will fit the Naive Bayes model to the Training set. Below is the code
for it:

1. # Fitting Naive Bayes to the Training set

2. from sklearn.naive_bayes import GaussianNB
3. classifier = GaussianNB()
4. classifier.fit(x_train, y_train)

In the above code, we have used the GaussianNB classifier to fit it to the training dataset. We can also use
other classifiers as per our requirement.
Output:

Out[6]: GaussianNB(priors=None, var_smoothing=1e-09)

3) Prediction of the test set result:

Now we will predict the test set result. For this, we will create a new predictor variable y_pred, and will use
the predict function to make the predictions.

# Predicting the Test set results

y_pred = classifier.predict(x_test)
Output:

The above output shows the result for prediction vector y_pred and real vector y_test. We can see that some
predications are different from the real values, which are the incorrect predictions.

4) Creating Confusion Matrix:

Now we will check the accuracy of the Naive Bayes classifier using the Confusion matrix. Below is the code
for it:

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

Output:
As we can see in the above confusion matrix output, there are 7+3= 10 incorrect predictions, and 65+25=90
correct predictions.

5) Visualizing the training set result:

Next we will visualize the training set result using Naïve Bayes Classifier. Below is the code for it:

# Visualising the Training set results

from matplotlib.colors import ListedColormap
x_set, y_set = x_train, y_train
X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step = 0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
mtp.xlim(X1.min(), X1.max())
mtp.ylim(X2.min(), X2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Naive Bayes (Training set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
mtp.show()
Output:
In the above output we can see that the Naïve Bayes classifier has segregated the data points with the fine
boundary. It is Gaussian curve as we have used GaussianNB classifier in our code.

6) Visualizing the Test set result:

# Visualising the Test set results

from matplotlib.colors import ListedColormap
x_set, y_set = x_test, y_test
X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step = 0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
mtp.xlim(X1.min(), X1.max())
mtp.ylim(X2.min(), X2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Naive Bayes (test set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
mtp.show()

Output:
The above output is final output for test set data. As we can see the classifier has created a Gaussian curve to
divide the "purchased" and "not purchased" variables. There are some wrong predictions which we have
calculated in Confusion matrix. But still it is pretty good classifier.

6. Define KNN algorithm. Write and explain KNN algorithm.

7. State and explain descriptive statistical learning techniques.
8. Explain with and example how Inferential Statistical Analysis is used in Machine learning.

9. Give a brief note on structured and unstructured data analysis in Machine learning.

Structured data

Structured data is particularly useful when you’re dealing with discrete, numeric data. Examples of this type
of data include financial operations, sales and marketing figures, and scientific modeling. You can also use
structured data in any case where records with multiple, short-entry text, numeric, and enumerated fields are
required, such as HR records, inventory listings, and housing data.

Unstructured data

Unstructured data is used when a record is required and the data won’t fit into a structured data format.
Examples include video monitoring, company documents, and social media posts. You can also use
unstructured data where it isn’t efficient to store the data in a structured format, such as Internet of Things
(IoT) sensor data, computer system logs, and chat transcripts.

Summary of differences: structured data vs. unstructured data

Structured data Unstructured data

Data that fits in a predefined data model or Data without an underlying model to discern
What is it?
schema. attributes.
Basic example An Excel table. A collection of video files.
An associated collection of data, objects, or
An associated collection of discrete, short,
Best for files where the attributes change or are
non-continuous numerical and text values.
unknown.
Relational databases, graph databases, File systems, DAM systems, CMSs, version
Storage types
spatial databases, OLAP cubes, and more. control systems, and more.
Easier to organize, clean, search, and Can analyze data that can’t be easily shaped
Biggest benefit
analyze. into structured data.
Biggest All data must fit in the prescribed data
Can be difficult to analyze.
challenge model.
Main analysis
SQL queries. Varies.
technique

10. Types of learning in Machine learning.

Types of Machine Learning

Machine learning is a subset of AI, which enables the machine to automatically learn from data,
improve performance from past experiences, and make predictions. Machine learning contains a set of
algorithms that work on a huge amount of data. Data is fed to these algorithms to train them, and on the
basis of training, they build the model & perform a specific task.

These ML algorithms help to solve different business problems like Regression, Classification, Forecasting,
Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four types, which are:

1. Supervised Machine Learning

2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

In this topic, we will provide a detailed description of the types of Machine Learning along with their
respective algorithms:

1. Supervised Machine Learning

As its name suggests, Supervised machine learning is based on supervision. It means in the supervised
learning technique, we train the machines using the "labelled" dataset, and based on the training, the
machine predicts the output. Here, the labelled data specifies that some of the inputs are already mapped to
the output. More preciously, we can say; first, we train the machine with the input and corresponding output,
and then we ask the machine to predict the output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input dataset of cats and dog
images. So, first, we will provide the training to the machine to understand the images, such as the shape &
size of the tail of cat and dog, Shape of eyes, colour, height (dogs are taller, cats are smaller), etc. After
completion of training, we input the picture of a cat and ask the machine to identify the object and predict
the output. Now, the machine is well trained, so it will check all the features of the object, such as height,
shape, colour, eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the
process of how the machine identifies the objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input variable(x) with the output
variable(y). Some real-world applications of supervised learning are Risk Assessment, Fraud Detection,
Spam filtering, etc.

Categories of Supervised Machine Learning

Supervised machine learning can be classified into two types of problems, which are given below:

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output variable is
categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification algorithms predict
the categories present in the dataset. Some real-world examples of classification algorithms are Spam
Detection, Email filtering, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm

o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear relationship between
input and output variables. These are used to predict continuous output variables, such as market trends,
weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm

o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages and Disadvantages of Supervised Learning

Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea about the
classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.

o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning

Some common applications of Supervised Learning are given below:

o Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In this process, image classification
is performed on different image data with pre-defined labels.
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It is done by using
medical images and past labelled data with labels for disease conditions. With such a process, the
machine can identify a disease for the new patients.
o Fraud Detection - Supervised Learning classification algorithms are used for identifying fraud
transactions, fraud customers, etc. It is done by using historic data to identify the patterns that can
lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used. These algorithms
classify an email as spam or not spam. The spam emails are sent to the spam folder.
o Speech Recognition - Supervised learning algorithms are also used in speech recognition. The
algorithm is trained with voice data, and various identifications can be done using the same, such as
voice-activated passwords, voice commands, etc.

2. Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as its name suggests, there is no
need for supervision. It means, in unsupervised machine learning, the machine is trained using the unlabeled
dataset, and the machine predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor labelled, and the
model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset
according to the similarities, patterns, and differences. Machines are instructed to find the hidden
patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit images, and we
input it into the machine learning model. The images are totally unknown to the model, and the task of the
machine is to find the patterns and categories of the objects.
So, now the machine will discover its patterns and differences, such as colour difference, shape difference,
and predict the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning

Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the data. It is a way to group
the objects into a cluster such that the objects with the most similarities remain in one group and have fewer
or no similarities with the objects of other groups. An example of the clustering algorithm is grouping the
customers by their purchasing behaviour.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm

o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting relations among
variables within a large dataset. The main aim of this learning algorithm is to find the dependency of one
data item on another data item and map those variables accordingly so that it can generate maximum profit.
This algorithm is mainly applied in Market Basket analysis, Web usage mining, continuous production,
etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-growth
algorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

o These algorithms can be used for complicated tasks compared to the supervised ones because these
algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is easier as
compared to the labelled dataset.

Disadvantages:

o The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and
algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled dataset that
does not map with the output.

Applications of Unsupervised Learning

o Network Analysis: Unsupervised learning is used for identifying plagiarism and copyright in
document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised learning techniques
for building recommendation applications for different web applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised learning, which
can identify unusual data points within the dataset. It is used to discover fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract particular
information from the database. For example, extracting information of each user located at a
particular location.

3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning. It represents the intermediate ground between Supervised (With Labelled
training data) and Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and unsupervised learning and
operates on the data that consists of a few labels, it mostly consists of unlabeled data. As labels are costly,
but for corporate purposes, they may have few labels. It is completely different from supervised and
unsupervised learning as they are based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the concept
of Semi-supervised learning is introduced. The main aim of semi-supervised learning is to effectively use
all the available data, rather than only labelled data like in supervised learning. Initially, similar data is
clustered along with an unsupervised learning algorithm, and further, it helps to label the unlabeled data into
labelled data. It is because labelled data is a comparatively more expensive acquisition than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a student is under the
supervision of an instructor at home and college. Further, if that student is self-analysing the same concept
without any help from the instructor, it comes under unsupervised learning. Under semi-supervised learning,
the student has to revise himself after analyzing the same concept under the guidance of an instructor at
college.

Advantages and disadvantages of Semi-supervised Learning

Advantages:

o It is simple and easy to understand the algorithm.

o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:
o Iterations results may not be stable.
o We cannot apply these algorithms to network-level data.
o Accuracy is low.

4. Reinforcement Learning

Reinforcement learning works on a feedback-based process, in which an AI agent (A software

component) automatically explore its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance. Agent gets rewarded for each good action and get punished
for each bad action; hence the goal of reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn from their
experiences only.

The reinforcement learning process is similar to a human being; for example, a child learns various things by
experiences in his day-to-day life. An example of reinforcement learning is to play a game, where the Game
is the environment, moves of an agent at each step define states, and the goal of the agent is to get a high
score. Agent receives feedback in terms of punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such as Game theory,
Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision Process(MDP). In MDP, the
agent constantly interacts with the environment and performs actions; at each action, the environment
responds and generates a new state.

Categories of Reinforcement Learning

Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies increasing the

tendency that the required behaviour would occur again by adding something. It enhances the
strength of the behaviour of the agent and positively impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly opposite to the
positive RL. It increases the tendency that the specific behaviour would occur again by avoiding the
negative condition.

Real-world Use cases of Reinforcement Learning

o Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super-human performance.
Some popular games that use RL algorithms are AlphaGO and AlphaGO Zero.
o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that how to use RL
in computer to automatically learn and schedule resources to wait for different jobs in order to
minimize average job slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial and
manufacturing area, and these robots are made more powerful with reinforcement learning. There are
different industries that have their vision of building intelligent robots using AI and Machine
learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with the help of
Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement Learning

Advantages

o It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
o The learning model of RL is similar to the learning of human beings; hence most accurate results can
be found.
o Helps in achieving long term results.

Disadvantage

o RL algorithms are not preferred for simple problems.

o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can weaken the results.

The curse of dimensionality limits reinforcement learning for real physical systems.

5. Descriptive statistics Vs Inferential statistics

Descriptive statistics

After collecting data, one of the first things to do is to graph the data, calculate the mean and get an
overview of the distributions of the data. This is the task of descriptive statistics.

Thus, the goal of descriptive statistics is to gain an overview of the distribution of data sets. Descriptive
statistics helps to describe and illustrate data sets.

Definition

The term descriptive statistics covers statistical methods for describing data using statistical characteristics,
charts, graphics or tables.
It is important here that only the properties of the respective sample are described and evaluated. However,
no conclusions are drawn about other points in time or the population. This is the task of inferential statistics
or concluding statistics.

The various sub-areas of descriptive statistics can be summarised as follows:

Depending on which question and which measurement scale is available, different key figures, tables and
graphics are used for evaluation. The best known of these are:

 Location parameter: Mean value, median, mode, sum

 Dispersion parameter: Standard deviation, variance, range
 Tables: Absolute, relative and cumulative frequencies
 Charts: Histograms, bar charts, box plots, scatter charts, matrix plots

The first group of Descriptive Statistics are location parameter like the mean and mode. They are used to
express a central tendency of the data set. They therefore describe where the center of a sample is located or
where a large part of the sample is located.

The second group are measures of dispersion. They provide information about how much the values of a
variable in a sample differ from each other. Measures of dispersion can therefore describe how strongly the
values of a variable deviate from the mean value: Are the values rather close together, i.e. are they similar,
or are they far apart and thus differ greatly? A classic example is the standard deviation.

Which measures of location or dispersion are suitable for describing the data depends on the
respective scales of measurement of the variable. Here, a distinction can be made
between metric, ordinal and nominal scales of measurement.

Finally, a large area of descriptive statistics is diagrams such as the bar chart, the pie chart, or the histogram.

Tip

On DATAtab you can create charts directly in your browser, e.g. you can create a bar chart online or create
a boxplot online. Just try it out!

Descriptive Statistics Example

A random sample of 10 male basketball players will be drawn, whose height will be measured in meters.

Playe
r Body height

1 1.62

2 1.72

3 1.55

4 1.7

5 1.78

6 1.65
Playe
r Body height

7 1.64

8 1.64

9 1.66

10 1.74

Load example data

Once you have copied the data into the table of the Online Statistics Software, click on descriptive statistics
in the calculator and select the variable "height".

DATAtab will now give you the following table of descriptive statistics (relevant dispersion measures and
location measures) on the height of the players.

Inferential statistics

What's inferential statistics? In contrast to descriptive statistics, inferential statistics want to make a
statement about the population. However, since it is almost impossible in most cases to survey the entire
population, a sample is used, i.e. a small data set originating from the population. With this sample a
statement about the population can be made. An example would be if a sample of 1,000 citizens is taken
from the population of all Canadian citizens.
Depending on which statement is to be made about the population or which question is to be answered about
the population, different statistical methods or hypothesis tests are used. The best known are the hypothesis
tests with which a group difference can be tested, such as the t-test, the chi-square test or the analysis of
variance. Then there are the hypothesis tests with which a correlation of variables can be tested, such as
correlation analysis and regression.

In the Hypothesis Test Calculator on DATAtab you can easily calculate these tests from the inference
statistics directly online in your browser.

Inferential statistics definition

Inferential statistics is a branch of statistics that uses various analytical tools to draw conclusions about the
population from sample data. For a given hypothesis about the population, inferential statistics uses a sample
and gives an indication of the validity of the hypothesis based on the sample collected.

Example inferential statistics

In the example above, a sample of 10 basketball players was drawn and then exactly this sample was
described, this is the task of descriptive statistics. If you want to make a statement about the population you
need the inferential statistics. For example, it could be of interest if basketball players are larger than the
average male population. To test this hypothesis a t-Test is calculated, the t-test compares the sample mean
with the mean of the population.

Furthermore, the question could arise whether basketball players are larger than football players. For this
purpose, a sample of football players is drawn, and then the mean value of the basketball players can be
compared with the mean value of the football players using an independent t-test. Now a statement can be
made, for example, whether basketball players are larger than football players in the population or not.

Since this statement is only made based on the samples and it can also be pure coincidence that the
basketball players are larger in exactly this sample, the statement can only be confirmed or re-submitted
with a certain probability.

What is Inferential Statistics?

Inferential statistics helps to develop a good understanding of the population data by analyzing the samples
obtained from it. It helps in making generalizations about the population by using various analytical tests
and tools. In order to pick out random samples that will represent the population accurately many sampling
techniques are used. Some of the important methods are simple random sampling, stratified sampling,
cluster sampling, and systematic sampling techniques.

Inferential Statistics Definition

Inferential statistics can be defined as a field of statistics that uses analytical tools for drawing conclusions
about a population by examining random samples. The goal of inferential statistics is to make
generalizations about a population. In inferential statistics, a statistic is taken from the sample data (e.g., the
sample mean) that used to make inferences about the population parameter (e.g., the population mean).

Types of Inferential Statistics

Inferential statistics can be classified into hypothesis testing and regression analysis. Hypothesis testing also
includes the use of confidence intervals to test the parameters of a population. Given below are the different
types of inferential statistics.
Hypothesis Testing
Hypothesis testing is a type of inferential statistics that is used to test assumptions and draw conclusions
about the population from the available sample data. It involves setting up a null hypothesis and an
alternative hypothesis followed by conducting a statistical test of significance. A conclusion is drawn based
on the value of the test statistic, the critical value, and the confidence intervals. A hypothesis test can be left-
tailed, right-tailed, and two-tailed. Given below are certain important hypothesis tests that are used in
inferential statistics.
Z Test: A z test is used on data that follows a normal distribution and has a sample size greater than or equal
to 30. It is used to test if the means of the sample and population are equal when the population variance is
known. The right tailed hypothesis can be set up as follows:

T Test: A t test is used when the data follows a student t distribution and the sample size is lesser than 30. It
is used to compare the sample and population mean when the population variance is unknown. The
hypothesis test for inferential statistics is given as follows:
F Test: An f test is used to check if there is a difference between the variances of two samples or
populations. The right tailed f hypothesis test can be set up as follows:

Confidence Interval: A confidence interval helps in estimating the parameters of a population. For
example, a 95% confidence interval indicates that if a test is conducted 100 times with new samples under
the same conditions then the estimate can be expected to lie within the given interval 95 times. Furthermore,
a confidence interval is also useful in calculating the critical value in hypothesis testing.

Apart from these tests, other tests used in inferential statistics are the ANOVA test, Wilcoxon signed-rank
test, Mann-Whitney U test, Kruskal-Wallis H test, etc.

Regression Analysis

Regression analysis is used to quantify how one variable will change with respect to another variable. There
are many types of regressions available such as simple linear, multiple linear, nominal, logistic, and ordinal
regression. The most commonly used regression in inferential statistics is linear regression. Linear
regression checks the effect of a unit change of the independent variable in the dependent variable. Some
important formulas used in inferential statistics for regression analysis are as follows:

Regression Coefficients:
Inferenti
al Statistics Examples

Inferential statistics is very useful and cost-effective as it can make inferences about the population without
collecting the complete data. Some inferential statistics examples are given below:

 Suppose the mean marks of 100 students in a particular country are known. Using this sample
information the mean marks of students in the country can be approximated using inferential statistics.
 Suppose a coach wants to find out how many average cartwheels sophomores at his college can do
without stopping. A sample of a few students will be asked to perform cartwheels and the average will
be calculated. Inferential statistics will use this data to make a conclusion regarding how many cartwheel
sophomores can perform on average.
Inferential Statistics vs Descriptive Statistics

Descriptive and inferential statistics are used to describe data and make generalizations about the population
from samples. The table given below lists the differences between inferential statistics and descriptive
statistics.

Inferential Statistics Descriptive Statistics

Inferential statistics are

used to make conclusions Descriptive statistics are used
about the population by to quantify the characteristics
using analytical tools on of the data.
the sample data.
Inferential Statistics Descriptive Statistics

Measures of central
Hypothesis testing and
tendency and measures of
regression analysis are
dispersion are the important
the analytical tools used.
tools used.

It is used to make It is used to describe the

inferences about an characteristics of a known
unknown population sample or population.

Measures of inferential Measures of descriptive

statistics are t-test, z test, statistics are variance, range,
linear regression, etc. mean, median, etc.

Machine Learning Practical File
No ratings yet
Machine Learning Practical File
41 pages
Machine Learning - v1
No ratings yet
Machine Learning - v1
30 pages
Unit 01
No ratings yet
Unit 01
32 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
135 pages
Module 1
No ratings yet
Module 1
22 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
Module 1
No ratings yet
Module 1
50 pages
Lecture 2
No ratings yet
Lecture 2
20 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
134 pages
1 - Module5 - Machine Learning
100% (1)
1 - Module5 - Machine Learning
78 pages
ML UNIT-1 Notes PDF
No ratings yet
ML UNIT-1 Notes PDF
22 pages
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
No ratings yet
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
48 pages
Unit I
No ratings yet
Unit I
132 pages
Unit 1
No ratings yet
Unit 1
102 pages
ML 1
No ratings yet
ML 1
21 pages
Unit 1
No ratings yet
Unit 1
62 pages
ML - 1 - Sovan - Introduction To ML
No ratings yet
ML - 1 - Sovan - Introduction To ML
83 pages
Machine Learning 1
No ratings yet
Machine Learning 1
29 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
ML Unit-1
No ratings yet
ML Unit-1
60 pages
Chapter 5 Artificial Intelligence Notes
No ratings yet
Chapter 5 Artificial Intelligence Notes
7 pages
Last Time: - Web As A Graph - What Is Link Analysis
No ratings yet
Last Time: - Web As A Graph - What Is Link Analysis
78 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Machine Learning Unit1
No ratings yet
Machine Learning Unit1
151 pages
Unit 3
No ratings yet
Unit 3
62 pages
Textbook ML - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed
44 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Unit 2
No ratings yet
Unit 2
19 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Unit - 5.1 - Introduction To Machine Learning
No ratings yet
Unit - 5.1 - Introduction To Machine Learning
38 pages
Reinforcement Learning: Parallelizing Genetic Algorithms
No ratings yet
Reinforcement Learning: Parallelizing Genetic Algorithms
5 pages
1 ML M1503-Introduction - ABP
No ratings yet
1 ML M1503-Introduction - ABP
14 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
Introduction 1175
No ratings yet
Introduction 1175
58 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
Unit - 1 Machine Learning
No ratings yet
Unit - 1 Machine Learning
82 pages
Unit 1
No ratings yet
Unit 1
12 pages
FML Lecture Notes
No ratings yet
FML Lecture Notes
34 pages
ML Notes All
No ratings yet
ML Notes All
257 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Aiml Co - 3,4 Notes
No ratings yet
Aiml Co - 3,4 Notes
98 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Unit 1
No ratings yet
Unit 1
6 pages
Unit - 2 Machine Learning
No ratings yet
Unit - 2 Machine Learning
45 pages
CH 1 Notes FOML
No ratings yet
CH 1 Notes FOML
10 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Internship Report
No ratings yet
Internship Report
31 pages
Unit 1 Introduction To Machine Learning
No ratings yet
Unit 1 Introduction To Machine Learning
45 pages
Index: Unit No Topic Page No
No ratings yet
Index: Unit No Topic Page No
5 pages
Machine Learning: Bilal Khan
No ratings yet
Machine Learning: Bilal Khan
26 pages
01-Introduction To Machine Learning
No ratings yet
01-Introduction To Machine Learning
89 pages
AI Notes Module - 4
No ratings yet
AI Notes Module - 4
13 pages
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
No ratings yet
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
46 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
24 pages
ML Notes (Unit 1&2)
No ratings yet
ML Notes (Unit 1&2)
42 pages
Binary Logistic Regression Lecture 9
No ratings yet
Binary Logistic Regression Lecture 9
33 pages
10.1080-14697688.2018.1495335.PDF - de Spiegeleer, Jan Madan, Dilip B. Reyners, Sofie
No ratings yet
10.1080-14697688.2018.1495335.PDF - de Spiegeleer, Jan Madan, Dilip B. Reyners, Sofie
10 pages
The Effect of Tertiary Creep On Allowable Stress Values For Type 304 and 316 Stainless Steel For Elevated Temperature Nuclear Component Design
No ratings yet
The Effect of Tertiary Creep On Allowable Stress Values For Type 304 and 316 Stainless Steel For Elevated Temperature Nuclear Component Design
7 pages
CFM - Programming Task
No ratings yet
CFM - Programming Task
10 pages
LEARNING ACTIVITY SHEET (LAS) Grade 11 - Statistics and Probability
No ratings yet
LEARNING ACTIVITY SHEET (LAS) Grade 11 - Statistics and Probability
5 pages
Financial Management 2
No ratings yet
Financial Management 2
10 pages
Thesis Presentation
No ratings yet
Thesis Presentation
35 pages
Estimating and Testing Linear Models With Multiple Structural Changes
No ratings yet
Estimating and Testing Linear Models With Multiple Structural Changes
76 pages
Lahore School of Economics: Course Pack
No ratings yet
Lahore School of Economics: Course Pack
6 pages
Accounting Fin Bcom
No ratings yet
Accounting Fin Bcom
36 pages
Jurnal Panduan Skripsi (Analisis Regeresi Linier Berganda) (Fang)
No ratings yet
Jurnal Panduan Skripsi (Analisis Regeresi Linier Berganda) (Fang)
11 pages
GAP (2022) The ESG Effect On The Cost of Debt Financing - A Sharp RD Analysis
No ratings yet
GAP (2022) The ESG Effect On The Cost of Debt Financing - A Sharp RD Analysis
12 pages
(Ebook PDF) Statistics Learning From Data by Roxy Peck Download
100% (1)
(Ebook PDF) Statistics Learning From Data by Roxy Peck Download
58 pages
ADMModule - STEM - GP12EU Ia 7
No ratings yet
ADMModule - STEM - GP12EU Ia 7
27 pages
Code Blue Early Predictor - Somanchi
No ratings yet
Code Blue Early Predictor - Somanchi
8 pages
Explain in Detail Different Types of Machine Learning Models?
No ratings yet
Explain in Detail Different Types of Machine Learning Models?
14 pages
Truck Factor PDF
100% (2)
Truck Factor PDF
91 pages
Chapter3 Classification Summary Final
No ratings yet
Chapter3 Classification Summary Final
11 pages
MLDA Syllabus
No ratings yet
MLDA Syllabus
20 pages
The Relationship of Mean Temperature and 9 Collect
No ratings yet
The Relationship of Mean Temperature and 9 Collect
8 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
9 pages
Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin
100% (1)
Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin
54 pages
Review Article Digital Change Detection Techniques Using Remotely Sensed Data
No ratings yet
Review Article Digital Change Detection Techniques Using Remotely Sensed Data
16 pages
Does Managerial Ownership Affect Investment Efficiency? Causal Evidence From The 2003 Tax Cut
No ratings yet
Does Managerial Ownership Affect Investment Efficiency? Causal Evidence From The 2003 Tax Cut
48 pages
A Study On Employees Stress Management in Dixcy Scott Private Limited, Tirupur
No ratings yet
A Study On Employees Stress Management in Dixcy Scott Private Limited, Tirupur
30 pages
A Case Study of Tesco PLC - UK
100% (1)
A Case Study of Tesco PLC - UK
17 pages
Healthcare 08 00293
No ratings yet
Healthcare 08 00293
16 pages
Nonlinear Methods in Econometrics
No ratings yet
Nonlinear Methods in Econometrics
44 pages
Supervised - GLMs GAMs GAIMs EBMs (1h) - 2
No ratings yet
Supervised - GLMs GAMs GAIMs EBMs (1h) - 2
27 pages
Problem Set 6
No ratings yet
Problem Set 6
2 pages