Machine Learning Lecture
Machine Learning Lecture
a. Supervised Learning:
• Supervised learning, also known as supervised machine learning, is a
subcategory of machine learning and artificial intelligence. It is defined by
its use of labeled datasets to train algorithms that to classify data or predict
outcomes accurately. As input data is fed into the model, it adjusts its
weights until the model has been fitted appropriately, which occurs as part
of the cross validation process
• In supervised learning the machine experiences the examples along with
the labels or targets for each example. The labels in the data help the
algorithm to correlate the features.
• Two of the most common supervised machine learning tasks are
classification and regression.
In classification problems the machine must learn to predict discrete
values. That is, the machine must predict the most probable category, class,
or label for new examples. Applications of classification include predicting
whether a stock's price will rise or fall, or deciding if a news article belongs to
the politics or leisure section.
In regression problems the machine must predict the value of a
continuous response variable. Examples of regression problems include
predicting the sales for a new product, or the salary for a job based on its
description.
b. Unsupervised Learning:
c. Reinforcement Learning:
b. Most commonly used regressions techniques are: Linear Regression and Logistic
Regression. (Covered later)
a. Clustering is a data mining technique which groups unlabeled data based on their
similarities or differences. Clustering algorithms are used to process raw,
unclassified data objects into groups represented by structures or patterns in the
information. Clustering algorithms can be categorized into a few types, specifically
exclusive, overlapping, hierarchical, and probabilistic
12. CONCLUSION
Naive Bayes
Naive Bayes is classification approach that adopts the principle of class conditional independence
from the Bayes Theorem. This means that the presence of one feature does not impact the
presence of another in the probability of a given outcome, and each predictor has an equal effect
on that result. There are three types of Naïve Bayes classifiers: Multinomial Naïve Bayes, Bernoulli
Naïve Bayes, and Gaussian Naïve Bayes. This technique is primarily used in text classification,
spam identification, and recommendation systems.
Linear regression
Linear regression is used to identify the relationship between a dependent variable and one or
more independent variables and is typically leveraged to make predictions about future
outcomes. When there is only one independent variable and one dependent variable, it is known
as simple linear regression. As the number of independent variables increases, it is referred to as
multiple linear regression. For each type of linear regression, it seeks to plot a line of best fit,
which is calculated through the method of least squares. However, unlike other regression
models, this line is straight when plotted on a graph.
Logistic regression
While linear regression is leveraged when dependent variables are continuous, logistical
regression is selected when the dependent variable is categorical, meaning they have binary
outputs, such as "true" and "false" or "yes" and "no." While both regression models seek to
understand relationships between data inputs, logistic regression is mainly used to solve binary
classification problems, such as spam identification.
K-nearest neighbor
K-nearest neighbor, also known as the KNN algorithm, is a non-parametric algorithm that
classifies data points based on their proximity and association to other available data. This
algorithm assumes that similar data points can be found near each other. As a result, it seeks to
calculate the distance between data points, usually through Euclidean distance, and then it
assigns a category based on the most frequent category or average.
Its ease of use and low calculation time make it a preferred algorithm by data scientists, but as
the test dataset grows, the processing time lengthens, making it less appealing for classification
tasks. KNN is typically used for recommendation engines and image recognition.
Random forest
Random forest is another flexible supervised machine learning algorithm used for both
classification and regression purposes. The "forest" references a collection of uncorrelated
decision trees, which are then merged together to reduce variance and create more accurate
data prediction
Clustering
Clustering is a data mining technique which groups unlabeled data based on their similarities or
differences. Clustering algorithms are used to process raw, unclassified data objects into groups
represented by structures or patterns in the information. Clustering algorithms can be
categorized into a few types, specifically exclusive, overlapping, hierarchical, and probabilistic.
Exclusive clustering is a form of grouping that stipulates a data point can exist only in one
cluster. This can also be referred to as “hard” clustering. The K-means clustering algorithm is an
example of exclusive clustering.
Hierarchical clustering
1. Ward’s linkage: This method states that the distance between two clusters is defined
by the increase in the sum of squared after the clusters are merged.
2. Average linkage: This method is defined by the mean distance between two points in
each cluster
3. Complete (or maximum) linkage: This method is defined by the maximum distance
between two points in each cluster
4. Single (or minimum) linkage: This method is defined by the minimum distance
between two points in each cluster
Euclidean distance is the most common metric used to calculate these distances; however,
other metrics, such as Manhattan distance, are also cited in clustering literature.
Divisive clustering can be defined as the opposite of agglomerative clustering; instead it takes a
“top-down” approach. In this case, a single data cluster is divided based on the differences
between data points. Divisive clustering is not commonly used, but it is still worth noting in the
context of hierarchical clustering. These clustering processes are usually visualized using a
dendrogram, a tree-like diagram that documents the merging or splitting of data points at each
iteration.
Probabilistic clustering
• Gaussian Mixture Models are classified as mixture models, which means that they are
made up of an unspecified number of probability distribution functions. GMMs are
primarily leveraged to determine which Gaussian, or normal, probability distribution a
given data point belongs to. If the mean or variance are known, then we can
determine which distribution a given data point belongs to. However, in GMMs, these
variables are not known, so we assume that a latent, or hidden, variable exists to
cluster data points appropriately. While it is not required to use the Expectation-
Maximization (EM) algorithm, it is a commonly used to estimate the assignment
probabilities for a given data point to a particular data cluster.
Association Rules
An association rule is a rule-based method for finding relationships between variables in a given
dataset. These methods are frequently used for market basket analysis, allowing companies to
better understand relationships between different products. Understanding consumption
habits of customers enables businesses to develop better cross-selling strategies and
recommendation engines. Examples of this can be seen in Amazon’s “Customers Who Bought
This Item Also Bought” or Spotify’s "Discover Weekly" playlist. While there are a few different
algorithms used to generate association rules, such as Apriori, Eclat, and FP-Growth, the Apriori
algorithm is most widely used.
Apriori algorithms
Apriori algorithms have been popularized through market basket analyses, leading to different
recommendation engines for music platforms and online retailers. They are used within
transactional datasets to identify frequent itemsets, or collections of items, to identify the
likelihood of consuming a product given the consumption of another product. For example, if I
play Black Sabbath’s radio on Spotify, starting with their song “Orchid”, one of the other songs
on this channel will likely be a Led Zeppelin song, such as “Over the Hills and Far Away.” This is
based on my prior listening habits as well as the ones of others. Apriori algorithms use a hash
tree to count itemsets, navigating through the dataset in a breadth-first manner.
Dimensionality reduction
While more data generally yields more accurate results, it can also impact the performance of
machine learning algorithms (e.g. overfitting) and it can also make it difficult to visualize
datasets. Dimensionality reduction is a technique used when the number of features, or
dimensions, in a given dataset is too high. It reduces the number of data inputs to a
manageable size while also preserving the integrity of the dataset as much as possible. It is
commonly used in the preprocessing data stage, and there are a few different dimensionality
reduction methods that can be used, such as:
Autoencoders
Autoencoders leverage neural networks to compress data and then recreate a new
representation of the original data’s input. Looking at the image below, you can see that the
hidden layer specifically acts as a bottleneck to compress the input layer prior to reconstructing
within the output layer. The stage from the input layer to the hidden layer is referred to as
“encoding” while the stage from the hidden layer to the output layer is known as “decoding.”
Reinforcement Learning Algorithms
There are three approaches to implement a Reinforcement Learning algorithm.
Value-Based:
In a value-based Reinforcement Learning method, you should try to maximize a value
function V(s). In this method, the agent is expecting a long-term return of the current states
under policy π.
Policy-based:
In a policy-based RL method, you try to come up with such a policy that the action performed in
every state helps you to gain maximum reward in the future.
• Deterministic: For any state, the same action is produced by the policy π.
• Stochastic: Every action has a certain probability.
Model-Based:
In this Reinforcement Learning method, you need to create a virtual model for each
environment. The agent learns to perform in that specific environment.