0% found this document useful (0 votes)
13 views

Machine Learning Lecture

This document provides an overview of machine learning, defining it as the ability of computers to learn from data without explicit programming. It categorizes machine learning into three types: supervised learning, unsupervised learning, and reinforcement learning, detailing their techniques, applications, and challenges. Additionally, it discusses various algorithms used in supervised and unsupervised learning, highlighting their practical applications in fields such as predictive analytics, customer segmentation, and robotics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Machine Learning Lecture

This document provides an overview of machine learning, defining it as the ability of computers to learn from data without explicit programming. It categorizes machine learning into three types: supervised learning, unsupervised learning, and reinforcement learning, detailing their techniques, applications, and challenges. Additionally, it discusses various algorithms used in supervised and unsupervised learning, highlighting their practical applications in fields such as predictive analytics, customer segmentation, and robotics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Lecture Note #1

GENERAL CONCEPTS OF MACHINE LEARNING


BS Information Technology III
Dr. Aris J. Ordoñez

1. What is Machine Learning?


In 1959, Arthur Samuel, a computer scientist who pioneered the study of artificial
intelligence, described machine learning as “the study that gives computers the ability to
learn without being explicitly programmed.”
Alan Turing’s seminal paper (Turing, 1950) introduced a benchmark standard for
demonstrating machine intelligence, such that a machine has to be intelligent and
responsive in a manner that cannot be differentiated from that of a human being.
Machine Learning is an application of artificial intelligence where a computer/machine
learns from the past experiences (input data) and makes future predictions. The
performance of such a system should be at least human level.
A more technical definition given by Tom M. Mitchell’s (1997) : “A computer program is
said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves with experience
E.”
2. Machine Learning Categories

Machine Learning is generally categorized into three types: Supervised Learning,


Unsupervised Learning, Reinforcement learning

a. Supervised Learning:
• Supervised learning, also known as supervised machine learning, is a
subcategory of machine learning and artificial intelligence. It is defined by
its use of labeled datasets to train algorithms that to classify data or predict
outcomes accurately. As input data is fed into the model, it adjusts its
weights until the model has been fitted appropriately, which occurs as part
of the cross validation process
• In supervised learning the machine experiences the examples along with
the labels or targets for each example. The labels in the data help the
algorithm to correlate the features.
• Two of the most common supervised machine learning tasks are
classification and regression.
In classification problems the machine must learn to predict discrete
values. That is, the machine must predict the most probable category, class,
or label for new examples. Applications of classification include predicting
whether a stock's price will rise or fall, or deciding if a news article belongs to
the politics or leisure section.
In regression problems the machine must predict the value of a
continuous response variable. Examples of regression problems include
predicting the sales for a new product, or the salary for a job based on its
description.

b. Unsupervised Learning:

• Unsupervised learning, also known as unsupervised machine learning, uses


machine learning algorithms to analyze and cluster unlabeled datasets.
These algorithms discover hidden patterns or data groupings without the
need for human intervention. Its ability to discover similarities and
differences in information make it the ideal solution for exploratory data
analysis, cross-selling strategies, customer segmentation, and image
recognition.
• Unsupervised machine learning algorithms infer patterns from a dataset
without reference to known, or labeled, outcomes. Unlike supervised
machine learning, unsupervised machine learning methods cannot be
directly applied to a regression or a classification problem because you
have no idea what the values for the output data might be, making it
impossible for you to train the algorithm the way you normally would.
Unsupervised learning can instead be used to discover the underlying
structure of the data.
• Unsupervised machine learning purports to uncover previously unknown
patterns in data, but most of the time these patterns are poor
approximations of what supervised machine learning can achieve.
Additionally, since you do not know what the outcomes should be, there
is no way to determine how accurate they are, making supervised machine
learning more applicable to real-world problems.
• The best time to use unsupervised machine learning is when you do not
have data on desired outcomes, such as determining a target market for
an entirely new product that your business has never sold before.
However, if you are trying to get a better understanding of your existing
consumer base, supervised learning is the optimal technique
• When we have unclassified and unlabeled data, the system attempts to
uncover patterns from the data . There is no label or target given for the
examples. One common task is to group similar examples together called
clustering.

c. Reinforcement Learning:

• Reinforcement machine learning is a behavioral machine learning model


that is similar to supervised learning, but the algorithm isn’t trained using
sample data. This model learns as it goes by using trial and error. A
sequence of successful outcomes will be reinforced to develop the best
recommendation or policy for a given problem.
• Reinforcement learning refers to goal-oriented algorithms, which learn
how to attain a complex objective (goal) or maximize along a particular
dimension over many steps. This method allows machines and software
agents to automatically determine the ideal behavior within a specific
context in order to maximize its performance. Simple reward feedback is
required for the agent to learn which action is best; this is known as the
reinforcement signal. For example, maximize the points won in a game
over many moves.
• Reinforcement learning is an area of Machine Learning which is about
taking suitable action to maximize reward in a particular situation. It is
employed by various software and machines to find the best possible
behavior or path it should take in a specific situation. Reinforcement
learning differs from supervised learning in a way that in supervised
learning the training data has the answer key with it so the model is trained
with the correct answer itself whereas in reinforcement learning, there is
no answer but the reinforcement agent decides what to do to perform the
given task. In the absence of a training dataset, it is bound to learn from its
experience.

3. Techniques of Supervised Machine Learning:

a. Regression is a technique used to predict the value of a response (dependent)


variables, from one or more predictor (independent) variables.

b. Most commonly used regressions techniques are: Linear Regression and Logistic
Regression. (Covered later)

• In linear regression problems, the goal is to predict a real-value variable y


from a given pattern X.
• In some problems the response variable is not normally distributed. In
logistic regression, the response variable describes the probability that the
outcome is the positive case. If the response variable is equal to or exceeds
a discrimination threshold, the positive class is predicted; otherwise, the
negative class is predicted.

4. Implementations of Supervised Learning

a. Image- and object-recognition: Supervised learning algorithms can be used to


locate, isolate, and categorize objects out of videos or images, making them useful
when applied to various computer vision techniques and imagery analysis.
b. Predictive analytics: A widespread use case for supervised learning models is in
creating predictive analytics systems to provide deep insights into various
business data points. This allows enterprises to anticipate certain results based on
a given output variable, helping business leaders justify decisions or pivot for the
benefit of the organization.
c. Customer sentiment analysis: Using supervised machine learning algorithms,
organizations can extract and classify important pieces of information from large
volumes of data—including context, emotion, and intent—with very little human
intervention. This can be incredibly useful when gaining a better understanding of
customer interactions and can be used to improve brand engagement efforts.
d. Spam detection: Spam detection is another example of a supervised learning
model. Using supervised classification algorithms, organizations can train
databases to recognize patterns or anomalies in new data to organize spam and
non-spam-related correspondences effectively.
5. Challenges of Supervised Learning

a. Supervised learning models can require certain levels of expertise to structure


accurately.
b. Training supervised learning models can be very time intensive.
c. Datasets can have a higher likelihood of human error, resulting in algorithms
learning incorrectly.
d. Unlike unsupervised learning models, supervised learning cannot cluster or
classify data on its own.

6. Unsupervised Learning Approaches

a. Clustering is a data mining technique which groups unlabeled data based on their
similarities or differences. Clustering algorithms are used to process raw,
unclassified data objects into groups represented by structures or patterns in the
information. Clustering algorithms can be categorized into a few types, specifically
exclusive, overlapping, hierarchical, and probabilistic

b. An association rule is a rule-based method for finding relationships between


variables in a given dataset. These methods are frequently used for market basket
analysis, allowing companies to better understand relationships between
different products.

7. Applications of Unsupervised Machine Learning

a. News Sections: Google News uses unsupervised learning to categorize articles on


the same story from various online news outlets. For example, the results of a
presidential election could be categorized under their label for “US” news.
Computer vision: Unsupervised learning algorithms are used for visual perception
tasks, such as object recognition.
b. Medical imaging: Unsupervised machine learning provides essential features to
medical imaging devices, such as image detection, classification and
segmentation, used in radiology and pathology to diagnose patients quickly and
accurately.
c. Anomaly detection: Unsupervised learning models can comb through large
amounts of data and discover atypical data points within a dataset. These
anomalies can raise awareness around faulty equipment, human error, or
breaches in security.
d. Customer personas: Defining customer personas makes it easier to understand
common traits and business clients' purchasing habits. Unsupervised learning
allows businesses to build better buyer persona profiles, enabling organizations
to align their product messaging more appropriately.
e. Recommendation Engines: Using past purchase behavior data, unsupervised
learning can help to discover data trends that can be used to develop more
effective cross-selling strategies. This is used to make relevant add-on
recommendations to customers during the checkout process for online retailers.
8. Challenges of Unsupervised Learning

a. Computational complexity due to a high volume of training data


b. Longer training times
c. Higher risk of inaccurate results
d. Human intervention to validate output variables
e. Lack of transparency into the basis on which data was clustered

9. Types of Reinforcement Machine Learning


a. Positive Reinforcement is defined as when an event, occurs due to a particular
behavior, increases the strength and the frequency of the behavior. In other
words, it has a positive effect on behavior.
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which can
diminish the results
b. Negative Reinforcement is defined as strengthening of behavior because a
negative condition is stopped or avoided.
• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior

10. Various Practical applications of Reinforcement Learning


a. RL can be used in robotics for industrial automation.
b. RL can be used in machine learning and data processing
c. RL can be used to create training systems that provide custom instruction and
materials according to the requirement of students.

11. RL can be used in large environments in the following situations:


a. A model of the environment is known, but an analytic solution is not available;
b. Only a simulation model of the environment is given (the subject of simulation-
based optimization)
c. The only way to collect information about the environment is to interact with it.

12. CONCLUSION

• Machine Learning is the machine’s ability to learn without being explicitly


programmed.
• Supervised Learning, Unsupervised Learning, Reinforcement Learning are the
three types of Machine Learning
• Linear and Logistic Regressions are the most common techniques for SL
• Clustering and Association are the most common approaches of UL
• Positive Reinforcement and Negative Reinforcement are the two types of learning
under RL.
Examples of Supervised Learning Algorithms
Various algorithms and computation techniques are used in supervised machine learning
processes. Below are brief explanations of some of the most commonly used learning methods,
typically calculated through use of programs like R or Python:
Neural networks
Primarily leveraged for deep learning algorithms, neural networks process training data by
mimicking the interconnectivity of the human brain through layers of nodes. Each node is made
up of inputs, weights, a bias (or threshold), and an output. If that output value exceeds a given
threshold, it “fires” or activates the node, passing data to the next layer in the network. Neural
networks learn this mapping function through supervised learning, adjusting based on the loss
function through the process of gradient descent. When the cost function is at or near zero, we
can be confident in the model’s accuracy to yield the correct answer.

Naive Bayes
Naive Bayes is classification approach that adopts the principle of class conditional independence
from the Bayes Theorem. This means that the presence of one feature does not impact the
presence of another in the probability of a given outcome, and each predictor has an equal effect
on that result. There are three types of Naïve Bayes classifiers: Multinomial Naïve Bayes, Bernoulli
Naïve Bayes, and Gaussian Naïve Bayes. This technique is primarily used in text classification,
spam identification, and recommendation systems.

Linear regression
Linear regression is used to identify the relationship between a dependent variable and one or
more independent variables and is typically leveraged to make predictions about future
outcomes. When there is only one independent variable and one dependent variable, it is known
as simple linear regression. As the number of independent variables increases, it is referred to as
multiple linear regression. For each type of linear regression, it seeks to plot a line of best fit,
which is calculated through the method of least squares. However, unlike other regression
models, this line is straight when plotted on a graph.

Logistic regression
While linear regression is leveraged when dependent variables are continuous, logistical
regression is selected when the dependent variable is categorical, meaning they have binary
outputs, such as "true" and "false" or "yes" and "no." While both regression models seek to
understand relationships between data inputs, logistic regression is mainly used to solve binary
classification problems, such as spam identification.

Support vector machine (SVM)


A support vector machine is a popular supervised learning model developed by Vladimir Vapnik,
used for both data classification and regression. That said, it is typically leveraged for
classification problems, constructing a hyperplane where the distance between two classes of
data points is at its maximum. This hyperplane is known as the decision boundary, separating the
classes of data points (e.g., oranges vs. apples) on either side of the plane.

K-nearest neighbor
K-nearest neighbor, also known as the KNN algorithm, is a non-parametric algorithm that
classifies data points based on their proximity and association to other available data. This
algorithm assumes that similar data points can be found near each other. As a result, it seeks to
calculate the distance between data points, usually through Euclidean distance, and then it
assigns a category based on the most frequent category or average.

Its ease of use and low calculation time make it a preferred algorithm by data scientists, but as
the test dataset grows, the processing time lengthens, making it less appealing for classification
tasks. KNN is typically used for recommendation engines and image recognition.

Random forest
Random forest is another flexible supervised machine learning algorithm used for both
classification and regression purposes. The "forest" references a collection of uncorrelated
decision trees, which are then merged together to reduce variance and create more accurate
data prediction

Common Unsupervised Learning Approaches


Unsupervised learning models are utilized for three main tasks—clustering, association, and
dimensionality reduction. Below we’ll define each learning method and highlight common
algorithms and approaches to conduct them effectively.

Clustering

Clustering is a data mining technique which groups unlabeled data based on their similarities or
differences. Clustering algorithms are used to process raw, unclassified data objects into groups
represented by structures or patterns in the information. Clustering algorithms can be
categorized into a few types, specifically exclusive, overlapping, hierarchical, and probabilistic.

Exclusive and Overlapping Clustering

Exclusive clustering is a form of grouping that stipulates a data point can exist only in one
cluster. This can also be referred to as “hard” clustering. The K-means clustering algorithm is an
example of exclusive clustering.

• K-means clustering is a common example of an exclusive clustering method where


data points are assigned into K groups, where K represents the number of clusters
based on the distance from each group’s centroid. The data points closest to a given
centroid will be clustered under the same category. A larger K value will be indicative
of smaller groupings with more granularity whereas a smaller K value will have larger
groupings and less granularity. K-means clustering is commonly used in market
segmentation, document clustering, image segmentation, and image compression.
Overlapping clusters differs from exclusive clustering in that it allows data points to belong to
multiple clusters with separate degrees of membership. “Soft” or fuzzy k-means clustering is an
example of overlapping clustering.

Hierarchical clustering

Hierarchical clustering, also known as hierarchical cluster analysis (HCA), is an unsupervised


clustering algorithm that can be categorized in two ways; they can be agglomerative or divisive.
Agglomerative clustering is considered a “bottoms-up approach.” Its data points are isolated as
separate groupings initially, and then they are merged together iteratively on the basis of
similarity until one cluster has been achieved. Four different methods are commonly used to
measure similarity:

1. Ward’s linkage: This method states that the distance between two clusters is defined
by the increase in the sum of squared after the clusters are merged.
2. Average linkage: This method is defined by the mean distance between two points in
each cluster
3. Complete (or maximum) linkage: This method is defined by the maximum distance
between two points in each cluster
4. Single (or minimum) linkage: This method is defined by the minimum distance
between two points in each cluster

Euclidean distance is the most common metric used to calculate these distances; however,
other metrics, such as Manhattan distance, are also cited in clustering literature.

Divisive clustering can be defined as the opposite of agglomerative clustering; instead it takes a
“top-down” approach. In this case, a single data cluster is divided based on the differences
between data points. Divisive clustering is not commonly used, but it is still worth noting in the
context of hierarchical clustering. These clustering processes are usually visualized using a
dendrogram, a tree-like diagram that documents the merging or splitting of data points at each
iteration.

Probabilistic clustering

A probabilistic model is an unsupervised technique that helps us solve density estimation or


“soft” clustering problems. In probabilistic clustering, data points are clustered based on the
likelihood that they belong to a particular distribution. The Gaussian Mixture Model (GMM) is
the one of the most commonly used probabilistic clustering methods.

• Gaussian Mixture Models are classified as mixture models, which means that they are
made up of an unspecified number of probability distribution functions. GMMs are
primarily leveraged to determine which Gaussian, or normal, probability distribution a
given data point belongs to. If the mean or variance are known, then we can
determine which distribution a given data point belongs to. However, in GMMs, these
variables are not known, so we assume that a latent, or hidden, variable exists to
cluster data points appropriately. While it is not required to use the Expectation-
Maximization (EM) algorithm, it is a commonly used to estimate the assignment
probabilities for a given data point to a particular data cluster.

Association Rules

An association rule is a rule-based method for finding relationships between variables in a given
dataset. These methods are frequently used for market basket analysis, allowing companies to
better understand relationships between different products. Understanding consumption
habits of customers enables businesses to develop better cross-selling strategies and
recommendation engines. Examples of this can be seen in Amazon’s “Customers Who Bought
This Item Also Bought” or Spotify’s "Discover Weekly" playlist. While there are a few different
algorithms used to generate association rules, such as Apriori, Eclat, and FP-Growth, the Apriori
algorithm is most widely used.

Apriori algorithms

Apriori algorithms have been popularized through market basket analyses, leading to different
recommendation engines for music platforms and online retailers. They are used within
transactional datasets to identify frequent itemsets, or collections of items, to identify the
likelihood of consuming a product given the consumption of another product. For example, if I
play Black Sabbath’s radio on Spotify, starting with their song “Orchid”, one of the other songs
on this channel will likely be a Led Zeppelin song, such as “Over the Hills and Far Away.” This is
based on my prior listening habits as well as the ones of others. Apriori algorithms use a hash
tree to count itemsets, navigating through the dataset in a breadth-first manner.

Dimensionality reduction

While more data generally yields more accurate results, it can also impact the performance of
machine learning algorithms (e.g. overfitting) and it can also make it difficult to visualize
datasets. Dimensionality reduction is a technique used when the number of features, or
dimensions, in a given dataset is too high. It reduces the number of data inputs to a
manageable size while also preserving the integrity of the dataset as much as possible. It is
commonly used in the preprocessing data stage, and there are a few different dimensionality
reduction methods that can be used, such as:

Principal component analysis

Principal component analysis (PCA) is a type of dimensionality reduction algorithm which is


used to reduce redundancies and to compress datasets through feature extraction. This
method uses a linear transformation to create a new data representation, yielding a set of
"principal components." The first principal component is the direction which maximizes the
variance of the dataset. While the second principal component also finds the maximum
variance in the data, it is completely uncorrelated to the first principal component, yielding a
direction that is perpendicular, or orthogonal, to the first component. This process repeats
based on the number of dimensions, where a next principal component is the direction
orthogonal to the prior components with the most variance.

Singular value decomposition

Singular value decomposition (SVD) is another dimensionality reduction approach which


factorizes a matrix, A, into three, low-rank matrices. SVD is denoted by the formula, A = USVT,
where U and V are orthogonal matrices. S is a diagonal matrix, and S values are considered
singular values of matrix A. Similar to PCA, it is commonly used to reduce noise and compress
data, such as image files.

Autoencoders

Autoencoders leverage neural networks to compress data and then recreate a new
representation of the original data’s input. Looking at the image below, you can see that the
hidden layer specifically acts as a bottleneck to compress the input layer prior to reconstructing
within the output layer. The stage from the input layer to the hidden layer is referred to as
“encoding” while the stage from the hidden layer to the output layer is known as “decoding.”
Reinforcement Learning Algorithms
There are three approaches to implement a Reinforcement Learning algorithm.

Value-Based:
In a value-based Reinforcement Learning method, you should try to maximize a value
function V(s). In this method, the agent is expecting a long-term return of the current states
under policy π.

Policy-based:
In a policy-based RL method, you try to come up with such a policy that the action performed in
every state helps you to gain maximum reward in the future.

Two types of policy-based methods are:

• Deterministic: For any state, the same action is produced by the policy π.
• Stochastic: Every action has a certain probability.

Model-Based:

In this Reinforcement Learning method, you need to create a virtual model for each
environment. The agent learns to perform in that specific environment.

You might also like