0% found this document useful (0 votes)
420 views

ML Notes

Machine learning is getting computers to program themselves by learning from data rather than being explicitly programmed. It involves computers improving their performance on tasks through experience. There are three key components to machine learning algorithms: representation, which is how knowledge is represented; evaluation, which assesses candidate programs; and optimization, which generates candidate programs. Machine learning has many applications including web search, computational biology, finance, e-commerce, robotics, and information extraction. It uses data to detect patterns and can learn from past data to improve automatically in a data-driven manner.

Uploaded by

Margaret Savitha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
420 views

ML Notes

Machine learning is getting computers to program themselves by learning from data rather than being explicitly programmed. It involves computers improving their performance on tasks through experience. There are three key components to machine learning algorithms: representation, which is how knowledge is represented; evaluation, which assesses candidate programs; and optimization, which generates candidate programs. Machine learning has many applications including web search, computational biology, finance, e-commerce, robotics, and information extraction. It uses data to detect patterns and can learn from past data to improve automatically in a data-driven manner.

Uploaded by

Margaret Savitha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Machine Learning

Machine Learning is getting computers to program themselves. If programming is automation, then


machine learning is automating the process of automation.

Definition by Tom Mitchell (1998):


Machine Learning is the study of algorithms
that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P,
T, E>.

Writing software is the bottleneck, we don’t have enough good developers. Let the data do the
work instead of people. Machine learning is the way to make programming scalable.

Traditional Programming: Data and program is run on the computer to produce the output.
Machine Learning: Data and output is run on the computer to create a program. This program can
be used in traditional programming.
Machine learning is like farming or gardening. Seeds is the algorithms, nutrients is the data, the
gardner is you and plants is the programs.

Features of Machine Learning:


Machine learning uses data to detect various patterns in a given dataset.
It can learn from past data and improve automatically.
It is a data-driven technology.
Machine learning is much similar to data mining as it also deals with the huge amount of the data.

Applications of Machine
Sample applications of machine learning:

Web search: ranking page based on what you are most likely to click on.
Computational biology: rational design drugs in the computer based on past experiments.
Finance: decide who to send what credit card offers to. Evaluation of risk on credit offers. How to
decide where to invest money.
E-commerce: Predicting customer churn. Whether or not a transaction is fraudulent.
Space exploration: space probes and radio astronomy.
Robotics: how to handle uncertainty in new environments. Autonomous. Self-driving car.
Information extraction: Ask questions over databases across the web.
Social networks: Data on relationships and preferences. Machine learning to extract value from
data.
Debugging: Use in computer science problems like debugging. Labor intensive process. Could
suggest where the bug could be.

Key Elements of Machine Learning


There are tens of thousands of machine learning algorithms and hundreds of new algorithms are
developed every year.

Every machine learning algorithm has three components:


Representation: how to represent knowledge. Examples include decision trees, sets of rules,
instances, graphical models, neural networks, support vector machines, model ensembles and
others.
Evaluation: the way to evaluate candidate programs (hypotheses). Examples include accuracy,
prediction and recall, squared error, likelihood, posterior probability, cost, margin, entropy k-L
divergence and others.
Optimization: the way candidate programs are generated known as the search process. For
example combinatorial optimization, convex optimization, constrained optimization.
All machine learning algorithms are combinations of these three components. A framework for
understanding all algorithms.

Every ML algorithm has three components:


– Representation
– Optimization
– Evaluation

Various Function Representations


• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks

Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMMLearning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution

Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.

ML in Practice
• Understand domain, prior knowledge, and goals
• Data integration, selection, cleaning, pre-processing, etc.
• Learnmodels
• Interpret results
• Consolidate and deploy discovered knowledge

https://www.cmpe.boun.edu.tr/~ethem/i2ml3e/

A Taxonomy of Machine Learning Models


There is no simple way to classify machine learning algorithms. In this section, we present a
taxonomy of machine learning models adapted from the book Machine Learning by Peter Flach.
While the structure for classifying algorithms is based on the book, the explanation presented
below is created by us.

For a given problem, the collection of all possible outcomes represents the sample space or
instance space.

The basic idea for creating a taxonomy of algorithms is that we divide the instance space by using
one of three ways:

Using a Logical expression.


Using the Geometry of the instance space.
Using Probability to classify the instance space.

The outcome of the transformation of the instance space by a machine learning algorithm using the
above techniques should be exhaustive (cover all possible outcomes) and mutually exclusive (non-
overlapping).

2. Logical models

2.1 Logical models - Tree models and Rule models


Logical models use a logical expression to divide the instance space into segments and hence
construct grouping models. A logical expression is an expression that returns a Boolean value, i.e.,
a True or False outcome. Once the data is grouped using a logical expression, the data is divided
into homogeneous groupings for the problem we are trying to solve. For example, for a
classification problem, all the instances in the group belong to one class.

There are mainly two kinds of logical models: Tree models and Rule models.

Rule models consist of a collection of implications or IF-THEN rules. For tree-based models, the ‘if-
part’ defines a segment and the ‘then-part’ defines the behaviour of the model for this segment.
Rule models follow the same reasoning.

Tree models can be seen as a particular type of rule model where the if-parts of the rules are
organised in a tree structure. Both Tree models and Rule models use the same approach to
supervised learning. The approach can be summarised in two strategies: we could first find the
body of the rule (the concept) that covers a sufficiently homogeneous set of examples and then
find a label to represent the body. Alternately, we could approach it from the other direction, i.e.,
first select a class we want to learn and then find rules that cover examples of the class.

A simple tree-based model is shown below. The tree shows survival numbers of passengers on the
Titanic ("sibsp" is the number of spouses or siblings aboard). The values under the leaves show
the probability of survival and the percentage of observations in the leaf. The model can be
summarised as: Your chances of survival were good if you were (i) a female or (ii) a male younger
than 9.5 years with less than 2.5 siblings.

2.2 Logical models and Concept learning


To understand logical models further, we need to understand the idea of Concept Learning.
Concept Learning involves learning logical expressions or concepts from examples. The idea of
Concept Learning fits in well with the idea of Machine learning, i.e., inferring a general function
from specific training examples. Concept learning forms the basis of both tree-based and rule-
based models. More formally, Concept Learning involves acquiring the definition of a general
category from a given set of positive and negative training examples of the category. A Formal
Definition for Concept Learning is “The inferring of a Boolean-valued function from training
examples of its input and output.” In concept learning, we only learn a description for the positive
class and label everything that doesn’t satisfy that description as negative.

A Concept Learning Task called “Enjoy Sport” as shown above is defined by a set of data from
some example days. Each data is described by six attributes. The task is to learn to predict the
value of Enjoy Sport for an arbitrary day based on the values of its attribute values. The problem
can be represented by a series of hypotheses. Each hypothesis is described by a conjunction of
constraints on the attributes. The training data represents a set of positive and negative examples
of the target function. In the example above, each hypothesis is a vector of six constraints,
specifying the values of the six attributes – Sky, AirTemp, Humidity, Wind, Water, and Forecast.
The training phase involves learning the set of days (as a conjunction of attributes) for which Enjoy
Sport = yes.
Thus, the problem can be formulated as:

Given instances X which represent a set of all possible days, each described by the attributes:
Sky – (values: Sunny, Cloudy, Rainy),
AirTemp – (values: Warm, Cold),
Humidity – (values: Normal, High),
Wind – (values: Strong, Weak),
Water – (values: Warm, Cold),
Forecast – (values: Same, Change).
Try to identify a function that can predict the target variable Enjoy Sport as yes/no, i.e., 1 or 0.

2.3 Concept learning as a search problem and as Inductive Learning


We can also formulate Concept Learning as a search problem. We can think of Concept learning
as searching through a set of predefined space of potential hypotheses to identify a hypothesis that
best fits the training examples. Concept learning is also an example of Inductive Learning.
Inductive learning, also known as discovery learning, is a process where the learner discovers
rules by observing examples. Inductive learning is different from deductive learning, where
students are given rules that they then need to apply. Inductive learning is based on the inductive
learning hypothesis. The Inductive Learning Hypothesis postulates that: Any hypothesis found to
approximate the target function well over a sufficiently large set of training examples is expected to
approximate the target function well over other unobserved examples. This idea is the
fundamental assumption of inductive learning.

3. Geometric models
In the previous section, we have seen that with logical models, such as decision trees, a logical
expression is used to partition the instance space. Two instances are similar when they end up in
the same logical segment. In this section, we consider models that define similarity by considering
the geometry of the instance space. In Geometric models, features could be described as points in
two dimensions (x- and y-axis) or a three-dimensional space (x, y, and z). Even when features are
not intrinsically geometric, they could be modelled in a geometric manner (for example,
temperature as a function of time can be modelled in two axes). In geometric models, there are two
ways we could impose similarity.

We could use geometric concepts like lines or planes to segment (classify) the instance space.
These are called Linear models.
Alternatively, we can use the geometric notion of distance to represent similarity. In this case, if two
points are close together, they have similar values for features and thus can be classed as similar.
We call such models as Distance-based models.
3.1 Linear models
Linear models are relatively simple. In this case, the function is represented as a linear
combination of its inputs. Thus, if x1 and x2 are two scalars or vectors of the same dimension and
a and b are arbitrary scalars, then ax1 + bx2 represents a linear combination of x1 and x2. In the
simplest case where f(x) represents a straight line, we have an equation of the form f (x) = mx + c
where c represents the intercept and m represents the slope.

Linear models are parametric, which means that they have a fixed form with a small number of
numeric parameters that need to be learned from data. For example, in f (x) = mx + c, m and c are
the parameters that we are trying to learn from the data. This technique is different from tree or rule
models, where the structure of the model (e.g., which features to use in the tree, and where) is not
fixed in advance.

Linear models are stable, i.e., small variations in the training data have only a limited impact on the
learned model. In contrast, tree models tend to vary more with the training data, as the choice of a
different split at the root of the tree typically means that the rest of the tree is different as well. As a
result of having relatively few parameters, Linear models have low variance and high bias. This
implies that Linear models are less likely to overfit the training data than some other models.
However, they are more likely to underfit. For example, if we want to learn the boundaries between
countries based on labelled data, then linear models are not likely to give a good approximation.
In this section, we could also use algorithms that include kernel methods, such as support vector
machine (SVM). Kernel methods use the kernel function to transform data into another dimension
where easier separation can be achieved for the data, such as using a hyperplane for SVM.

3.2 Distance-based models


Distance-based models are the second class of Geometric models. Like Linear models, distance-
based models are based on the geometry of data. As the name implies, distance-based models
work on the concept of distance. In the context of Machine learning, the concept of distance is not
based on merely the physical distance between two points. Instead, we could think of the distance
between two points considering the mode of transport between two points. Travelling between two
cities by plane covers less distance physically than by train because a plane is unrestricted.
Similarly, in chess, the concept of distance depends on the piece used – for example, a Bishop can
move diagonally. Thus, depending on the entity and the mode of travel, the concept of distance
can be experienced differently. The distance metrics commonly used are Euclidean, Minkowski,
Manhattan, and Mahalanobis.

Distance is applied through the concept of neighbours and exemplars. Neighbours are points in
proximity with respect to the distance measure expressed through exemplars. Exemplars are either
centroids that find a centre of mass according to a chosen distance metric or medoids that find the
most centrally located data point. The most commonly used centroid is the arithmetic mean, which
minimises squared Euclidean distance to all other points.

Notes:

The centroid represents the geometric centre of a plane figure, i.e., the arithmetic mean position of
all the points in the figure from the centroid point. This definition extends to any object in n-
dimensional space: its centroid is the mean position of all the points.
Medoids are similar in concept to means or centroids. Medoids are most commonly used on data
when a mean or centroid cannot be defined. They are used in contexts where the centroid is not
representative of the dataset, such as in image data.
Examples of distance-based models include the nearest-neighbour models, which use the training
data as exemplars – for example, in classification. The K-means clustering algorithm also uses
exemplars to create clusters of similar data points.

4. Probabilistic models
The third family of machine learning algorithms is the probabilistic models. We have seen before
that the k-nearest neighbour algorithm uses the idea of distance (e.g., Euclidian distance) to
classify entities, and logical models use a logical expression to partition the instance space. In this
section, we see how the probabilistic models use the idea of probability to classify new entities.

Probabilistic models see features and target variables as random variables. The process of
modelling represents and manipulates the level of uncertainty with respect to these variables.
There are two types of probabilistic models: Predictive and Generative. Predictive probability
models use the idea of a conditional probability distribution P (Y |X) from which Y can be predicted
from X. Generative models estimate the joint distribution P (Y, X). Once we know the joint
distribution for the generative models, we can derive any conditional or marginal distribution
involving the same variables. Thus, the generative model is capable of creating new data points
and their labels, knowing the joint probability distribution. The joint distribution looks for a
relationship between two variables. Once this relationship is inferred, it is possible to infer new data
points.

Naïve Bayes is an example of a probabilistic classifier.

The goal of any probabilistic classifier is given a set of features (x_0 through x_n) and a set of
classes (c_0 through c_k), we aim to determine the probability of the features occurring in each
class, and to return the most likely class. Therefore, for each class, we need to calculate P(c_i |
x_0, …, x_n).

We can do this using the Bayes rule defined as

The Naïve Bayes algorithm is based on the idea of Conditional Probability. Conditional probability
is based on finding the probability that something will happen, given that something else has
already happened. The task of the algorithm then is to look at the evidence and to determine the
likelihood of a specific class and assign a label accordingly to each entity.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

Supervised learning
Unsupervised learning
Reinforcement learning

1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample labeled data
to the machine learning system in order to train it, and on that basis, it predicts the output.

The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample data
to check whether it is predicting the exact output or not.

The goal of supervised learning is to map input data with the output data. The supervised learning
is based on supervision, and it is the same as when a student learns things in the supervision of
the teacher. The example of supervised learning is spam filtering.
Supervised learning can be grouped further in two categories of algorithms:

Classification
Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any supervision.

The training is provided to the machine with the set of data that has not been labeled, classified, or
categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects with
similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:

Clustering
Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a
reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning, the
agent interacts with the environment and explores it. The goal of an agent is to get the most reward
points, and hence, it improves its performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

Supervised Machine Learning


Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled data
means some input data is already tagged with the correct output.

In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.

Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping function
to map the input variable(x) with the output variable(y).
How Supervised Learning Works?
In supervised learning, models are trained using labelled dataset, where the model learns about
each type of data. Once the training process is completed, the model is tested on the basis of test
data (a subset of the training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle,
and Polygon. Now the first step is that we need to train the model for each shape.

If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
If the given shape has three sides, then it will be labelled as a triangle.
If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify the
shape.

The machine is already trained on all types of shapes, and when it finds a new shape, it classifies
the shape on the bases of a number of sides, and predicts the output.

Steps Involved in Supervised Learning:


First Determine the type of training dataset
Collect/Gather the labelled training data.
Split the training dataset into training dataset, test dataset, and validation dataset.
Determine the input features of the training dataset, which should have enough knowledge so that
the model can accurately predict the output.
Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
Evaluate the accuracy of the model by providing the test set. If the model predicts the correct
output, which means our model is accurate.

Types of supervised Machine learning Algorithms:

1. Regression

Regression algorithms are used if there is a relationship between the input variable and the output
variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc. Below are some popular Regression algorithms which come under supervised
learning:

Linear Regression
Regression Trees
Non-Linear Regression
Bayesian Linear Regression
Polynomial Regression
2. Classification

Classification algorithms are used when the output variable is categorical, which means there are
two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,

Random Forest
Decision Trees
Logistic Regression
Support vector Machines

Advantages of Supervised learning:


With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
In supervised learning, we can have an exact idea about the classes of objects.
Supervised learning model helps us to solve various real-world problems such as fraud detection,
spam filtering, etc.
Disadvantages of supervised learning:
Supervised learning models are not suitable for handling the complex tasks.
Supervised learning cannot predict the correct output if the test data is different from the training
dataset.
Training required lots of computation times.
In supervised learning, we need enough knowledge about the classes of object.
Unsupervised Machine Learning
In the previous topic, we learned supervised machine learning in which models are trained using
labeled data under the supervision of training data. But there may be many cases in which we do
not have labeled data and need to find the hidden patterns from the given dataset. So, to solve
such types of cases in machine learning, we need unsupervised learning techniques.

What is Unsupervised Learning?


As the name suggests, unsupervised learning is a machine learning technique in which models are
not supervised using training dataset. Instead, models itself find the hidden patterns and insights
from the given data. It can be compared to learning which takes place in the human brain while
learning new things. It can be defined as:

Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
Unsupervised learning cannot be directly applied to a regression or classification problem because
unlike supervised learning, we have the input data but no corresponding output data. The goal of
unsupervised learning is to find the underlying structure of dataset, group that data according to
similarities, and represent that dataset in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the given dataset,
which means it does not have any idea about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the image dataset into the groups according
to similarities between images.
Why use Unsupervised Learning?
Below are some main reasons which describe the importance of Unsupervised Learning:

Unsupervised learning is helpful for finding useful insights from the data.
Unsupervised learning is much similar as a human learns to think by their own experiences, which
makes it closer to the real AI.
Unsupervised learning works on unlabeled and uncategorized data which make unsupervised
learning more important.
In real-world, we do not always have input data with the corresponding output so to solve such
cases, we need unsupervised learning.
Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram
Here, we have taken an unlabeled input data, which means it is not categorized and corresponding
outputs are also not given. Now, this unlabeled input data is fed to the machine learning model in
order to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data and
then will apply suitable algorithms such as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups according
to the similarities and difference between the objects.

Types of Unsupervised Learning Algorithm:


The unsupervised learning algorithm can be further categorized into two types of problems:

Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group.
Cluster analysis finds the commonalities between the data objects and categorizes them as per the
presence and absence of those commonalities.
Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs
together in the dataset. Association rule makes marketing strategy more effective. Such as people
who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.
Unsupervised Learning algorithms:
Below is the list of some popular unsupervised learning algorithms:

K-means clustering
KNN (k-nearest neighbors)
Hierarchal clustering
Anomaly detection
Neural Networks
Principle Component Analysis
Independent Component Analysis
Apriori algorithm
Singular value decomposition

Advantages of Unsupervised Learning


Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled
data.
Disadvantages of Unsupervised Learning
Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.

https://www.javatpoint.com/reinforcement-learning
https://bloomberg.github.io/foml/#lectures

You might also like