0% found this document useful (0 votes)
41 views7 pages

Section 2 - Introduction To Machine Learning-Bje Edits - Ipynb - Colab

Uploaded by

fdwy13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views7 pages

Section 2 - Introduction To Machine Learning-Bje Edits - Ipynb - Colab

Uploaded by

fdwy13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

6/28/24, 2:42 PM Section 2: Introduction to Machine Learning-bje edits.

ipynb - Colab

Open in Colab

Authors: Diana V. Vera-Garcia, MD; Sanaz Vahdati, MD; Yashbir Singh, PhD

keyboard_arrow_down Introduction to Machine Learning


Machine learning (ML) is a research field in computer science. ML can be implemented via engineering programs to identify patterns among
the data, to solve problems (as opposed to traditional programs where the programmer determines the response based on expected inputs). It
is possible to use this method to the analysis of medical images. Machine learning is broadly classified as Supervised Learning, Unsupervised
Learning, and Reinforcement Learning.

Figure 1. Machine Learning Overview

keyboard_arrow_down Supervised Machine Learning


Supervised machine learning is perhaps the most widely used form of machine learning. In supervised learning, a machine learning algorithm
learns patterns from “labeled” examples of data, where the labels (either a category or a numerical value) have been provided by a human
“supervisor.” A model is “trained” on a set of labeled data so that it can learn these patterns. The trained model can then take a new, never-
before-seen example and predict its label based on its given examples. Based on an examination of a given training dataset, the learning
algorithm creates an inference function to provide predictions of output values. After adequate training and if the input data has enough signal,
the system should be able to make a good prediction regarding unseen data. It is also possible to compare the machine learning output to the
ground truth and evaluate its performance by recognizing valid and erroneous predictions, allowing the model to be modified if necessary.

General categories of supervised machine learning are: Classification and Regression as well as segmentation.

In this beginner section, we concentrate on gaining a broad knowledge of the supervised problems and leave coding examples for later
chapters.

Before starting into how these systems work, it is critical to understand 'features' and 'labels'. A feature is the property of the thing we are
interested in. For intance, it might be the systolic blood pressure, the color of the patient's hair, or their height. The label is the property that we
want the machine learner to figure out, such as the diagnosis of diabetes or the risk of developing diabetes.

Classification

​Classification is a common supervised learning process in which the model learns a function from input-output instances and predicts the
class or label of the newly provided data. Classification may be divided into binary classification or multi-label classification. In binary
classification, the model distinguishes between two groups, similar to answering a yes/no question or questions like ‘Is this Chest-Xray normal
or not?’. In comparison, a multi-label classification model distinguishes between more than two classes, such as attempting to identify if a

https://colab.research.google.com/github/Mayo-Radiology-Informatics-Lab/MIDeL/blob/main/chapters/2A.ipynb#scrollTo=KH1AdihpEnBk&printMode=true 1/7
6/28/24, 2:42 PM Section 2: Introduction to Machine Learning-bje edits.ipynb - Colab
Chest-Xray indicates viral infection, bacterial infection, or a normal lung. In most cases, the classifier selects a single best choice, but it can also
predict more than one class, possibly as a ranking. Note also the the choices are set at the time of training.

There are two types of “Learners” for classification tasks:

Lazy Learners:

These systems store the training data and wait until it is asked to classify an unknown example (classifying an unknown case is often referred
to as 'inference') before doing anything with the training data. An example of this type of learner is the k-Nearest Neighbor (k-NN). In k-NN, the
label assigned to the test case is based on the most common class in the “neighborhood” of “k” training examples near the test case. As an
example, lets suppose we have a number of fruits labeled as 'Apple', 'Orange', or 'Banana' and in addition to the labels, we have the 'features' for
several examples of each of these fruits, including the color and the 'sphericity' (how close it is to perfect sphere). When an example comes in,
we find the color is red, sphericity is 0.9. We then look at all the examples that are labeled, and find that 4 of the 5 closest examples to having
color 'red' and 'sphericity' of 0.9 are apples, so we call the fruit an apple. Because this search occurs for each example, it has more intense
computing requirements during inference but not at training time.

Eager Learners:

Eager learners perform training on the training data, learning patterns from the examples. As a result, they devote more computing resources to
training and less time to predicting. Decision Trees, Naive Bayes algorithms, and Artificial Neural Networks are examples of eager learners.

Regression

Regression tackles the challenge of predicting a continuous variable from input data. A regression pattern can be further divided into linear and
non-linear regression; many real-world problems tend to be non-linear.

In medicine, regression is helpful to predict illness severity and forecast clinical progression (e.g., predicting the risk of cardiovascular disease
based on family history, hypertension, and lipid profile of a patient).

Figure 2. Schematic Description of Classification and Regression

Segmentation

Segmentation is the process of assigning each pixel in an image to either being a part of an object or not being a part of an object, which
ultimately results in the creation of a "mask." The term "segmentation" can be broken down even further into "semantic segmentation" and
"instance segmentation."

During the process of semantic segmentation, every pixel in an image is assigned a label, such as cat, dog, or human. On the other hand,
instance segmentation recognizes many instances of the same class as distinct individual instances so its output labels might be'cat 1', 'cat 2',
and 'cat 3').

To gain a deeper comprehension of segmentation, refer to Intermediate level of this topic {{make link once valid}}.

keyboard_arrow_down Supervised Machine Learning Algorithms

https://colab.research.google.com/github/Mayo-Radiology-Informatics-Lab/MIDeL/blob/main/chapters/2A.ipynb#scrollTo=KH1AdihpEnBk&printMode=true 2/7
6/28/24, 2:42 PM Section 2: Introduction to Machine Learning-bje edits.ipynb - Colab

In this section, we go through the most common machine learning algorithms and their advantages and drawbacks. As previously noted, we will
avoid an in-depth analysis and instead provide readers with an overview of how these machine learning algorithms function.

First, let’s define certain terms to help you comprehend these algorithms:

Overfitting: This happens when a model learns patterns specific to the training examples rather than the overall population of interest. It can
occur if the model has too many parameters versus the number of training examples. Generalization: This denotes a model that is not overfitted
but has learned the critical aspects of the training set that are present in instances not in the training set.

Underfitting: This occurs when the model's complexity is insufficient to learn the complexity of the patterns in the training data.

Normalization: This is the process of transforming the data so that the range of values for each variable is similar while preserving the
information in the data. For example, this can involve the transformation of values such that they lie within a range of 0 to 1.

Linear Models

Linear models have been quite popular. In linear modeling, predicting the relationship of unknown input values often requires the employment
of a linear function.

When compared to other traditional machine learning techniques, linear models provide a number of advantages, including the capacity to be
interpreted and low computational demands. This simplicity comes at a cost: they perform poorly in many cases. However, these are typical
examples of trade-offs when selecting a model. But as a rule, it is usually best to start with a simple model because if it works, you have both
interpretability and low computation cost.

Here are some of the popular Linear models:

Linear Regression

Linear regression, also known as Ordinary Least Squares (OLS), is commonly used to demonstrate the relationship between one or more
independent input variables (x or x-vector) and the dependent output variable (y) by creating a linear equation as follows:

y=β+θ₁x₁+θ₂x₂+...+e

In the equation above, “β” is the intercept, “θ” is considered as the slope of the regression line, and “e” is the error. Note that while this is linear,
one may have more than 1 independent variable (x). Coefficients are weights for each independent variable (feature) based on their relevance
and is directly related to θ. The correlation coefficient, with a value from -1 to 1, reflects the strength of the linear link between variables.

Ridge Regression

In cases of multiple linear regression in which the variables being studied are linearly independent but highly linked, ridge regression is the one
method that is utilized to calculate the coefficient correlation (multicollinearity is a statistical term used to describe the correlation of multiple
independent variables in a model).

K-Nearest Neighbor

The k-NN algorithm, a lazy learner, is a straightforward machine learning algorithm used in supervised learning. The model stores all the training
examples in a dimensional space. To make a prediction for a new data point, the model finds the “k” closest examples to the data point (as
plotted in that N-dimensional feature space) and returns the average value among the classes for those k examples. If it is a classification task
where an ‘average’ is not applicable, it returns the most popular class more like a popular vote.

https://colab.research.google.com/github/Mayo-Radiology-Informatics-Lab/MIDeL/blob/main/chapters/2A.ipynb#scrollTo=KH1AdihpEnBk&printMode=true 3/7
6/28/24, 2:42 PM Section 2: Introduction to Machine Learning-bje edits.ipynb - Colab

Figure 3. A KNN algorithm classifies the new data (green circle) to the orange or blue group based on the nearest neighbors. If this is a
regression task, it will return ⅔ blue and ⅓ orange. If this is a classification task, it will return class blue.

The k-nearest neighbor algorithm assigns an unlabeled new example to the category of the recognized samples with which they share the
greatest similarities. k-NN models have the advantage of working effectively even with limited amounts of data, and they do not require any
amount of training time. They suffer from the shortcoming of being slow to forecast, which is especially problematic when there are many
different cases and aspects to consider.

Support Vector Machines

Problems that are linear as well as nonlinear can be solved by using support vector machines. Kernelized support vector machines (SVMs) will
be the topic of discussion throughout this section.

The decision boundary is often determined by the training points that are closest to the hyperplane that divides the classes. Support vectors are
the data that are in this proximity. The input data are remodeled by support vector machines in a manner that maximizes the margin, hence
achieving the greatest possible degree of differentiation between the two classes. For a new given data point, prediction is conducted based on
the distance of new input from each of the support vectors.

If you look at the left of Figure 4, you can't perfectly separate the 2 classes with a straight line. The SVM looks for a mapping function that
maximally separates the two classes like on the right side of Figure 4. Support vector machines can have any number of dimensions, but this
example only has two.

Figure 4. SVM

https://colab.research.google.com/github/Mayo-Radiology-Informatics-Lab/MIDeL/blob/main/chapters/2A.ipynb#scrollTo=KH1AdihpEnBk&printMode=true 4/7
6/28/24, 2:42 PM Section 2: Introduction to Machine Learning-bje edits.ipynb - Colab
Support vector machine techniques have a big advantage over many other machine learning methods: They work well with a wide range of data
sets. Support Vector Machines work best on problems with a small number of dimensions, but they can also be used on problems with many
dimensions with better results than other methods.

Naives Bayes Classifier

The Naive Bayes classifier (NB) is a probabilistic classifier derived from the ‘Bayes’ theorem. It is called ‘Naive’ because features of a class
variable are assumed to be independent from each other (a critical assumption you must not forget). The posterior probability is the result of a
Multiplication function of Prior probability and Likelihood probability which is then used to make the class prediction (e.g. the most likely class
is picked).

NB classifiers can be used for binary and multi-class classification. It is a fast and simple algorithm for data classification and often performs
well when the input data is categorical instead of numerical. One of the challenges of using the NB classifier is its inability to directly build a
relation between various features of one class.

Decision Trees

Decision trees provide a degree of simplicity and may be used in a wide range of activities that do not need a high level of complexity. A
decision tree model is best understood as a hierarchy: starting at the very top (bottom?) of the tree, referred to as the "root", an item is
examined, and one of two branches is chosen; this method is continued node by node until an end objective is reached, which classifies the
example. These objectives are usually referred to as "leaves." This attribute of decision trees can be helpful for demonstrating how the
framework works and arriving at a final selection that can be easily described step by step and understood by non-experts.

Trying to find out if a patient needs surgery after a pelvic fracture is a simple example of a decision tree (Figure 5). Does the x-ray demonstrate
fracture?, then it will be given a branch with yes/no options. If the response is no, the model arrives at the conclusion or end objective
"Observation". If the answer is yes, the next node will be displayed: Is surgery required? With the last leaves, you may choose between yes or no,
and make the final decision to take the patient to the Operation Room(OR) or not.

Figure 5. Decision Trees Example

Decision trees can classify categorical data, numeric data, or a combination of the two. These algorithms are very explainable and can be
simple in their dynamics. Furthermore, because the fundamental structure of decision trees is unaffected by the values taken by each feature,
they may perform effectively with unnormalized datasets.

Decision trees have the disadvantage of being subject to overfitting and poor generalization. This inaccuracy makes them unsuitable for
predictive learning.

In other words, they perform correctly with the data that was used to train them, but they frequently fail when confronted with new data.
Nonetheless, these are not the most often used strategies for classification problems.

Random Forest

Random forests are one method to address the lack of flexibility in decision trees. These algorithms are composed of many decision trees
(hence the name 'forest'), each of which differs from the others, resulting in increased accuracy.

The fundamental idea behind a random forest is that every tree in a forest is constructed using a combination of data and features chosen at
random. Because of this, a number of different predictions will be produced, since each tree will probably overfit some of the data.

On the other hand, and due to the fact that every tree is produced using a unique process, every one of them would overfit in a different manner,
therefore providing an accurate prediction for the majority of the data.

https://colab.research.google.com/github/Mayo-Radiology-Informatics-Lab/MIDeL/blob/main/chapters/2A.ipynb#scrollTo=KH1AdihpEnBk&printMode=true 5/7
6/28/24, 2:42 PM Section 2: Introduction to Machine Learning-bje edits.ipynb - Colab
Additionally, errors might be found in a variety of locations. In light of this, one way to mitigate the effects of overfitting is to conduct a majority
vote on the data before coming to a conclusion (Figure 6).

Figure 6. A schematic representation of how Random Rorest generates a final prediction.

Random forests provide all the convenience of decision trees and compensate for some of their weaknesses. Recently they are among the
most extensively used machine learning approaches for regression and classification because they are extremely powerful. Frequently, forests
function effectively without much parameter adjusting and do not necessitate data scalability.

One disadvantage of random forests is that it's difficult to understand the influence of every feature in detail with deeper trees. As a result, if it is
necessary to simplify the prediction in a visual manner for non-experts, a single decision tree cis required.

Ensembles

Ensemble algorithms utilize a meta approach to combine predictions of several classifiers to optimize the prediction such that it is better than
any of the models trained alone. Ensemble techniques include Bagging, Boosting, and Stacking.

In a “bagging algorithm” (bootstrap aggregation), a training set, which is a randomly replicated bootstrap sample of the initial training set, is
used to create the base classifier. By building numerous base classifiers from these various training sets, the performance of models, such as
decision trees, that are vulnerable to variations in data.

The main concept of the “boosting technique” is to combine many weak learners to fomr a strong learner. Adaboost is one of the boosting
algorithms that updates the weight of weak classifiers during iterations. After evaluating a model’s performance as weak the Adaboost
algorithm will update its weight proportionally to its error rate until the iteration ends.

The “stacking technique” uses the model’s predictions from primary models and utilizes them as inputs to the next model to improve the
performance of the system.

Neural Networks

The artificial neuron, or "node," is the essential building component of neural networks; each one is an individual equation. Each node in Figure 7
represents an input feature, while the connecting lines indicate a set of weights (w1, w2, w3), a bias (b), and an activation function(f). The
output (y) is represented by the node on the right, which is a weighted sum of the inputs.

Neural networks with more than 1 hiddcan layer is referred to as a 'Deep' neural network but is otherwise a neural network. As long as the
activation is nonlinear, these models are capable of doing well in nonlinear tasks. Although they have gained popularity in many machine
learning applications in recent years, the use of a deep learning algorithm is specific to the situation in which it was built.

https://colab.research.google.com/github/Mayo-Radiology-Informatics-Lab/MIDeL/blob/main/chapters/2A.ipynb#scrollTo=KH1AdihpEnBk&printMode=true 6/7
6/28/24, 2:42 PM Section 2: Introduction to Machine Learning-bje edits.ipynb - Colab

A node’s weights (w1, w2, w3) and biases (b) are parameters that represent any linear combination of inputs (x1, x2, x3). After the the sum of
the weighted inputs is computed, a activation function is added (please check Section 8 for a better understanding of activactions functions).
There are typically several nodes in a layer and there are usually several layers between an input and an output layer. The output layer is the last
layer, producing the result(s).

During training, these results are compared to the expected values (the training sample labels), and an error is computed. The weight optimizer
calculates how to change the network's various weights in order to reduce the error. When there is no significant improvement in error over
several iterations, the network is considered to have completed learning.

Before the learning process starts, the weights of neural networks are first initialized using a random value. This random initialization has an
effect on the trained model that is ultimately produced as a consequence of the training process. That is to say, even when we utilize the same
amount of parameters, we may end up with rather different models depending on the random seed that we choose.
Figure 7. Example of a Neural Network

Unsupervised Machine Learning

Unsupervised machine learning methods attempt to find patterns in collections of data that are not labeled. Instead of determining a 'correct'
output, the system analyzes the input datasets to infer hidden patterns in the unlabeled data, which may be used to classify data into groups.

Techniques of unsupervised learning are often useful for:

Clustering: Divides a dataset into groups based on similarities. Anomaly detection: Detects outliers in a dataset. Association mining: Identifies
groups in a dataset that often appear together. Dimensionality reduction: Reduces the number of features in a dataset.

Unsupervised learning algorithms include K-means, mean shift, affinity propagation, hierarchical clustering, DBSCAN, Gaussian mixture
modeling, Markov random fields, ISODATA and fuzzy C-means.

Reinforcement Learning: Learning From Rewards

Reinforcement Learning is used to train a machine through trial and error, such as in learning to play a video game. The machine begins with
random inputs to the game, and it learns which combinations of inputs maximize its score.

Let us walk through an example; an “agent” represented by the player receives the first mission or task from the “environment,” in this case, the
video game. After several random inputs, the agent can accomplish the mission assigned, and the environment gives the agent a “reward” (life
points), so now, the agent is able to transition into a new “state” (next mission).

In general, an action may be any choice that the agent intends to learn, and a state can be anything that may be helpful in making this choice.
This loop establishes a pattern consisting of a state, an action, and a reward. In every iteration, the agent applies a visualization between the
state and the chance of taking each action. This visualization created by the agent is named "policy." Reinforcement learning shows how an
agent adjusts the policy based on what it has learned. As stated previously, the agent objective is to improve the overall cumulative reward it
receives over time.

Semi-Supervised Learning

Semi-supervised learning is a type of machine learning that focuses on conducting specific learning tasks using both labeled and unlabeled
data. It allows using the large amounts of unlabelled data available in many use cases combined with relatively smaller sets of labeled data. It
is conceptually positioned between supervised and unsupervised learning in terms of performance and efficiency.

Conclusions

Machine learning is a powerful technique where the data drives the ultimate performance of the system. The most common form of machine
learning is 'supervised' machine learning, in which there are a number of training examples that have 'labels' which is what the example is. There
are many ways to do supervised machine learning with trade-offs between simplicity, understandability, and computational demands.
Unsupervised methods can be useful when you don't know if there is a pattern in the data--you can have the computer find things that might not
be apparent.

Deep learning is the new rage in machine learning, as it applies great computational power to find subtle patterns, assuming you have enough
examples. This technology has tremendous potential in medical imaging and will see increasing use over the next years.

https://colab.research.google.com/github/Mayo-Radiology-Informatics-Lab/MIDeL/blob/main/chapters/2A.ipynb#scrollTo=KH1AdihpEnBk&printMode=true 7/7

You might also like