0% found this document useful (0 votes)
337 views74 pages

Aws ML PDF

This document provides an overview and introduction to artificial intelligence, machine learning, and deep learning. It defines machine learning as using algorithms to find patterns in data to make predictions without being explicitly programmed. Deep learning is based on neural networks modeled after the human brain. The machine learning lifecycle includes collecting data, processing it, splitting it into training and testing sets, training a model on the training set, testing it on the testing set, and deploying the model. Different types of machine learning include supervised learning (with labeled examples), unsupervised learning (without labels), and reinforcement learning (using rewards).

Uploaded by

Rajat Shrinet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
337 views74 pages

Aws ML PDF

This document provides an overview and introduction to artificial intelligence, machine learning, and deep learning. It defines machine learning as using algorithms to find patterns in data to make predictions without being explicitly programmed. Deep learning is based on neural networks modeled after the human brain. The machine learning lifecycle includes collecting data, processing it, splitting it into training and testing sets, training a model on the training set, testing it on the testing set, and deploying the model. Different types of machine learning include supervised learning (with labeled examples), unsupervised learning (without labels), and reinforcement learning (using rewards).

Uploaded by

Rajat Shrinet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Overview

These notes follow LinuxAcademy structure which can be found here: https://
linuxacademy.com/course/aws-certified-machine-learning-specialty/. I would
recommend viewing the course to gain full and detailed explanations.

I have created these notes as part of my personal learning and hope to be able to
help and inspire others.

As i am also learning, there may well be mistakes, please do reach out and let me
know, so i can correct them.

Follow me on instagram: https://www.instagram.com/adnans_techie_studies/

All my notes are hosted here: https://adnan.study

Connect with me on LinkedIn: https://www.linkedin.com/in/adnanrashid1/


Artificial Intelligence

• Advancements with compute power has brought a new wave of artificial intelligence

• AI being used to analyse big data

• Machine Learning is an subset of Artificial Intelligence

• Deep Learning is a subnet of ML

What is Machine Learning?

• Machine Learning provides the ability to learn without being explicitly programmed.

• It focuses on the development of programs that can access data and use it to learn for them
selves

• Example of ML is AlphaGo which beat one of the best players of Go.

• Machine learning is when you load lots of data into a computer program and choose a model to
“fit” the data, which allows the computer (without your help) to come up with predictions.

• The way the computer makes the model is through algorithms, which can range from a simple
equation (like the equation of a line) to a very complex system of logic/math that gets the
computer to the best predictions.
What is Machine Learning?

Once the data is plotted we can then create a trend line


1 that goes through the data set.

The trend line can then be used to predict the weight

HEIGHT

v
This would then become our training data for our 1
inferences for height and weight given that we have
one of the values

HEIGHT
TESTING DATA

r
1 I This blue line may appear to be a better fit for
I predicting weights as opposed to the straight line.
r
3

HEIGHT
We can use our test data to test the line and see how well
it fits.

These differences would be the actual observed weights


vs the predicted line. We then take those differences and
add them up together creating the sum of the actual
observed weight and predicted weight.

We could do the same for the curved line which goes


through the actual data however it is overfit to our
1
I training data, which means it does not handle new data
very well.

The green line is better to make predictions as it is more


generalised.

This green line represents our machine learning model.


HEIGHT This particular type of model is called linear regression.

We would use other models depending on what we are


trying to achieve i.e. Logistic Regression, Support Vector
Machines and Decision Trees.

DATA
I RAIN MODEL PREDICTION
ALGORITHM
r

LINEN2 REGRESSION

This is an very simplified view of what we are doing with machine learning.

We have only looked at two dimensions which is easy to visualise, however when we
get beyond 3 it becomes more difficult. Having lots of dimensions is more closer to
reality and considering we cannot draw a 200 dimension graph, Machine Learning
can help towards solving these problems.
What is Deep Learning?

Deep learning is based on the principles of an organic brain with the aim to get machines to
learn in a similar way.

Neurons are chained together as a Neural Network with inputs and outputs.

Inside the neuron is an Activation Function


◦Function of code
◦It takes the inputs, decides what to do with the data, stores a value and passes it that on
through the output

The neuron’s are usually all connected together

Examples of deep learning:

◦Self Driving Cars, Object detection, decision making

◦Object Classification, visual search, face recognition

◦Natural language processing, spam filters, Siri, Alexa, or Google Assistant

◦Health Care. MRI scans, CT scans, records analysis


Machine Learning Lifecycle

This is a general process for a machine learning lifecycle.

We go through iterations of the lifecycle improving our inference.

COLLECT PROCESS SPLIT


DATA 7 TRAIN
DATA DATA
9

IMPROVE TEST

C
PREDICTIONS
L INFER c DEPLOY

Process Data
FEATURE
LABEL

The data collected in real world will be in various c v r


s
different formats. AGE STARSIGN DRINKCOFFEE LIKESCATS
20 3 YES YES
The first and primary item is to bring it into a format
25 1 NO YES
which our ML algorithm will be able to understand.
33 2 NO NO
In order to do this, we will also need to organise this 42 6 YES YES
data.
We may do the following to the data depending on the data set.

Feature reduction
We want as much data as Encoding
possible when training our In the previous image, the
model however, we dont want to star signs are numeric
pass data that is not related. values, therefore the string
This can be diffi cult as you may has been encoded. We
be looking for relationships in could look up the data in a
the data, that you are not aware separate table.
of.

Formatting
The fi le format that we will use
for providing the data to the ML
algorithm.

We believe there is relationship


somewhere in this dataset, but
Data we need a deep understanding
of the data.

Features and Labels


We are using these
I features to try and
understand if people like
penguins or not and this is
Feature Engineering our labelled data.
We may map the features
down between 0 to 1
such that features can be
compared
Once we are happy with the dataset, we would then split
into sections
Split Data ◦Training
◦Validation
◦Testing

The Algorithm
◦Can see and is directly influenced by the training data
Train ◦Uses but is indirectly influenced from validation data
◦Does not see testing data during training

• Perform inference on testing data to see how well the


model fits
• Is it overfit?

This data wasn't used to train the data, because we are


looking to see how well the model works.
Test
If the model is overfit, then it would be really good to predict
based on data its already seen.

However what we want is to make inference on similar data.

Host model in execution environment according to the


requirements
Deploy ◦Batch
◦As a service
◦Infrastructure required
Supervised, Unsupervised, and Reinforcement

AGE STARSIGN DRINKCOFFEE LIKESCATS

20 3 YES YES An example of supervised learning is the


ability to show what it looks like when
25 1 No YES
someone likes cats
33 2 NO NO
42 6 YES YES

29 3 YES YES

1
This is another example of supervised data
where we can infer the weight based on the
height.

HEIGHT

In this example we providing different


data as opposed to just cats which will
help train the model to be able to make
inferences

DOGS CATS
Unsupervised learning involves fi nding relationships
where we did not know there was one.

M
It is best used when we are trying to analyse data with
lots of dimensions in order to fi nd relationships
En between the data points, where we would not
normally fi nd using conventional methods.

SCORE A

REWARD

Reinforcement learning we give the robot a reward


based on its actions.

As an example, we want it to pick up a cat and if it does


selects one, we would give it a reward of +1, however if
it picks up something else we would remove that
reward, in order to try enforce a preference towards the
cat.

ACTION
Summary

Variety of algorithms that can be used such as:

◦Recurrent neural network


◦Convolutional Neural Network
◦Linear Regression
◦Latent Dirichlet Allocation
◦Support Vector Machines

Unsupervised learning
involves looking for patterns
Supervised when it is not initially evident
there are ones. It is best used
with hundreds of dimensions
where it is not possible to be
able to plot on graphs

Supervised learning is where


Unsupervised we use labeled data to classify
unlabelled data using a
machine learning algorithm

Reinforcement learning
involves providing a reward
Reinforcement when it does something
Learning correct and taking away the
reward when it is incorrect. It
involves a lot of trial and
error to get it right
Optimisation

This line represents the machine learning model.


1
How we do know that this is the best line to fit the
model?

We could have drawn at different gradients.

HEIGHT

Here we can get the sum of the residuals.

Some of the differences will be negative and some


1
I positive. If we square it before we add it together, such
that they will always be positive.

We can then add the square of the residuals to get the


sum and be able to see how it differs for different lines.

HEIGHT

This is a plot of the sum of the residuals vs the slope


of the line.

V
The job of the machine learning algorithm is to find
E the lowest point of the parabola.
E
The bottom of this curve would show the line with the
best fit as it has the least amount of differences.

SLOPE OFMODEL LINE


MINIMUMSLOPE O

O
It is easy for us to see the bottom of the slope but
E
V
the computer needs to be able to calculate this.

E
O u
n n
BEST FIT SLOPE FOR MODEL

SLOPE OFMODEL LINE

If you pick a point on the parabola you can then


calculate the slope at that point.
e

V
You can then tell if you are heading towards reduction
in slope or increase in order to understand the
gradient.
E
It is then possible to keep stepping until you get to
the bottom of the graph

SLOPE OFMODEL LINE

STEP This technique is called gradient descent.

e L To find the bottom of the line it depends on the step

V
size. If it is too large then we might miss the bottom
of the graph or too small and it would be inefficient.

s This technique is used for Linear Regression,


Logistical Regression and also Support Vector
Machines.

SLOPE OFMODEL LINE


Summary

• Sum of the residuals


◦Looking at the difference between the line through multiple data points and the difference
between the data point and the line

• Square the values


◦We then square the values as some of the values are positive and some negative due to
being below the line.
◦If it is squared we can get the overall positive number to be the sum of the squared
residuals
◦We can then put all those differences on a graph by taking a number of different lines and
see which one is the least, which is the line that best fi ts the data points

• Graph
◦If we do sum of the squares vs the slope of the model line, we will end up with a parabola.
The algorithm needs to fi nd the lowest point.
◦The bottom of the curve is where the slope is 0 and this is the best fi t.

• Gradient Descent
◦In order to discover the gradient, the model will pick a point and fi nd the gradient and
move in the direction where its less steep
◦This technique is called gradient descent

• Learning Rate
◦The step size sets the learning rate
◦If the step size is too large it might miss the bottom of the graph and too small is not
effi cient.

• Important
◦The other thing to bare in mind is that there might be multiple dips in the line
Regularisation

Your sample data may fit well, but real world data
generally does not fit so well straight away.

Regularisation through regression


1
I
L1 Regularisation (Lasso Regression)
L2 Regularisation (Ridge regression)

We apply regularisation when our model is


overfit, and it fits the training data really well but
does not generalise to real world inference.

A method to fix this is to apply regularisation and


HEIGHT it is achieved through regression.

• Technique when we dont see our dataset fit real world data that well.

• Looking at the graph we can see that a small differences can have a larger effect overall.
Hyperparameters

These are parameters we can set to tune a


PARAMETERS
model

Hyperparameters are external settings we


can set before we train a model and
influence how the training occurs.
HYPERPARAMETERS s MODEL a
Parameters are internal to the algorithm
that get tunes during the training

Hyperparameters

◦Learning rate
◦Epochs
◦Batch size.

Learning Rate
◦Determine the size of the step taken during gradient descent optimisation.
◦It is set between 0 and 1

Batch Size
◦Batch size is the number of samples used to train at any one time.
◦It could be all of the data, some of the data or a single sample.
◦Another way to put it is batch, stochastic or mini-batch. It is often 32, 64 or 128.
◦It is possible to calculate based on infrastructure.
◦It is also based on the amount of data you have.
◦If you span over multiple servers then you might use a batch size that splits over that
infrastructure

Epochs
◦The number of times the algorithm will process the entire data set multiple times.
◦Each time it passes through the data, the intention is to improve the accuracy of the
algorithm.
◦Common values of these are high numbers - the number of times the algorithm will
sample the data set.
Cross Validation

COLLECT PROCESS SPLIT


DATA
7 TRAIN
DATA DATA
Training data is seen by the training
9
process and directly infl uences the model

IMPROVE
TEST

PREDICTIONSC INFER c DEPLOY

Validation is not seen by the training process


but indirectly infl uences the model in order to
tweak it
TRAINING VALIDATION TESTING
Testing dataset is not seen by the training
process but informs the user of success.

Cross validation data is where we dont


isolate validation data and instead we split
our training data into a number of partitions
and we use the different sections to perform
TRAINING S the validation to get a better fi t for our
model.
TESTING

L.tt
As a result we use all data for training and
for validation which is called k-fold
validation.
VALIDATION
This technique can also be used to compare
different algorithms and validating different
data sets.
Feature Selection and Engineering

NAME COUNTRY AGE HEIGHT STARSIGN LIKECOFFEE

ADNAN UK 33 170 VIRGO YES

BRAZIL 23 133 GEMINI NO


SARAH

ALEXA USA 5 PISCES NO


z
ALI INDIA 39 175 SCORPIO YES
SPARKY AUSTRALIA 5 35 JEDI YES

This example dataset can be used to understand if people like coffee or not.

The first thing to do is remove anything in the data set which does not have anything to do
with the inference we are making, however this does require specific domain knowledge in
order to establish if we are taking away the correct features or not.

In this data set the name is not relevant and therefore can be removed and also helps
towards making the algorithm more efficient as it won't try and make a relationship between
someones name and if they like coffee. We need to be careful we do not remove a feature
that would of been useful.

The result will be a faster trained model and also one that is more accurate.
COUNTRY AGE HEIGHT LIKECOFFEE The other way to establish the
UK 33 170 YES relevant is by checking if there is
133 NO any correlation between the
BRAZIL 23
label and the feature.
USA NO
5 2
INDIA 39 175 YES This also needs domain level
5 35 YES knowledge and trial and error.
AUSTRALIA

NAME COUNTRY AGE HEIGHT STARSIGN LIKECOFFEE

Uk 33 170 VIRGO YES


ADNAN Gaps and anomalies will also
133 NO
SARAH BRAZIL 23 GEMINI influence the data set where we
2 PISCES NO can either remove the feature
ALEXA USA 5
9 entirely or do imputation.
39 YES
ALI INDIA 175 Scorpio
SPARKY AUSTRALIA 5 35 JEDI YES

LOOKS SUSPICIOUS
Another strategy is to engineer new
features.

COUNTRY AGEHEIGHT LIKECOFFEE We may decide that there is a


UK 5.15 YES relationship between age and
BRAZIL height and dividing them together
g78 NO
to create a new column. It would
USA 0.4 NO
then require running some training
INDIA 4.48 YES to understand if it was effective.
AUSTRALIA
7 YES

It would also reduce the amount of


data for the algorithm to analyse.

HOUSE CITY DATE COFFEECONSUMED

RED BRISBANE 8 23 18 1233 NO


We may not be as interested about
GREEN LONDON 9 1218 0710 YES which day people drink coffee but
BLUE DALLAS 102218 1050 NO instead the time might be of more
GREEN LONDON 11 1018 1235 YES relevance.

RED BRISBANE 120918 1607 YES


Principle Component Analysis (PCA)

Three features are about


the limit on a graph.

Beyond that it starts to


become difficult to start
representing that data
visually.

PCA allows the ability to see the relationships


between the data.

It is an unsupervised algorithm which takes


place with dimension reduction.
DIMENSION
Although were may loose some data, we
REDUCTION
need to maintain the principal components.

Using PCA it is possible to look at hundreds


of dimensions.

PCA looks for aspects of the data which influence it


the most by finding the central point of the data set.

N Once we find that central point all dataset is moved


W
such that it is centered around the origin.
E
b
3
RE
g

SCORE 1
PCA looks for aspects of the data which influence
it the most by finding the central point of the data
set.
Eg
Once we find that central point all data set is
score3
moved such that it is centered around the origin.
SCORE 1
We do that by finding the mean value on score 1,
score 2 and score 3

PCA generally does this for us but once the data set
is captured, we would need to draw around it.

Eg The longest length represents the largest variation


score3 of the data set which is principle component 1.

SCORE 1
The next longest is principle 2 followed by 3 which
gives us the spread of data which most influences
the data set

PCI Pez
r We can then leave out the 3rd dataset.

t We would expect our data to be spread across PC1

PCA is often used as a data preprocessing step


PC3

PC1 and PC2 are usually used to plot on graphs to


see the relationship between the data.
PCI
Missing Data

Imagine we had surveyed a number of


NAME COUNTRY AGE HEIGHT STARSIGNLIKECOFFEE
people on the street and got various
33 170 VIRGO YES
ADNAN Uk data.
SARAH BRAZIL 133 GEMINI NO
ALEXA USA 5 2 PISCES NO If we have missing data in that data set,
ALI INDIA 39 175 SCORPIO YES we may need to calculate or impute a
SPARKY Australia 5 35 JEDI YES value. One way we could do this is by
taking the mean of all the other values
YES which is part of that particular feature.
NO
In this process, we are presenting the
data for a ML algorithm to make a
IMPUTE 1 E MEAN inference but we don't skew our data set
by having no value or 0 which would
NAME COUNTRY AGE HEIGHT STARSIGN LIKECOFFEE
impact the ML model.
170 VIRGO YES
ADNAN UK 33
SARAH BRAZIL 21 133 GEMINI NO If we have too much data missing, it may
ALEXA USA 5 z PISCES NO be better to remove that feature entirely
ALI INDIA 39 175 Scorpio YES as it would be of little value or remove
SPARKY Australia 5 35 JEDI YES the row if that particular row is just
missing data.

331 5 39 51 4 20 5 We need domain level knowledge to find


outliers, you might have correct data but
perhaps mixed up i.e. animal age and
heights vs human age and height
We may have a dataset where we are looking for faults on a car engine. It could be that it is generally
fine, but there is a few reports of something faulty. As this is not a frequent occurrence, it is likely that
this particular data will become lost in a sea of other data. As a result, it may not be recognised by the
ML model.

There is a variety of different strategies which can be taken to help:

• Try and source more data because the thing you are looking for is not represented as well as you
would like.

• If it is not possible to get the data, another option is to over sample the data but then faults will
likely to look like whatever you have in your training data

• We can synthesise the data to understand what can vary and affect the data set. That way the ML
algorithm can approximate the data

• Finally we can try a different algorithm - often people use the same algorithm frequently since we
know that algorithm and understand it.
Label and One Hot Encoding

ML algorithms are mathematical


constructs therefore it does not work with
NAME COUNTRY AGE strings and instead needs to be integers.
ADNAN UK 33
So we can encode the names and also
SARAH BRAZIL 23
countries, which means it is a label
ALEXA USA 5 encoding.

ALI INDIA 39 The problem doing this is that, the ML


SPARKY AUSTRALIA 5 wont understand its a country and does
not need to look for a relationship but it
will still try and find one.

COUNTRY BRAZIL AUSTRALIA USA UK

UK 0 O O 1

BRAZIL I 0 O O

USA O O I 0

AUSTRALIA 0 I 0 O

In this scenario one hot encoding comes into play, whereby new features are introduced into
the data set and therefore each country would become a feature and a table with 0's and 1's .
In this case it is important not to have a numerical relationship between the countries and no
implied hierarchy between the countries.
Logistical Regression

Supervised ML algorithm

Data is provided along with example


inferences

Looking for patterns in the data set


with examples of what you are looking
s
for which only be a yes or no - a binary
outcome.

Typically its used to understand if this


data is an example of something or is it
not.

In the example you can see


someones resting heart rate and
NO
if they like cats or not.

However in the example you can


visually see that a low heart rate
YES indicates they do and higher
6570758085 9095100 105110 heart rates indicate they do not.

RESTINGHEARTRATE TO 88 65 89 78 61 69 98 82

LIKES CATS Y N Y N Y Y N N N
NO

YES
65707580 85 90 95 100 105 110

RESTINGHEARTRATE TO 88 65 89 78 61 69 98 82

LIKES CATS Y N Y N Y Y N N N

A way to do this would be to draw a line using linear regression to find the
best fit.

A problem with this is that there may be outliers which can skew the data
set and therefore make the wrong inferences

SIGMOIDFUNCTION

YES

6570 75 80 85 90 95 100 105 110

Instead we could fit a sigmoid function which does not skew the line like linear
regression but instead looks for the cut off point between the yes and no

There are methods to fine tune this to understand what is most important.
Linear Regression

Supervised model

We dont just provide the core data but


also the output value that we would want
to infer later on.

An example inference is numeric where


the output is a range

This can be used for fi nancial forecasting,


marketing effectiveness, risk evaluation
and more related to business

LATITUDE COFFEE
CONSUMED

4 6 60
7 2 50
20 O 40
28 O 30
38 24 20
45 35 to
59 18 10 20 30 40 50 60 70 80 90 100
70 49
76 24

The example data set might be latitude for where you live on the planet and then the
amount of coffee consumed.

We can then do techniques to understand exactly where that line sits

Although it does not go through any green points, it provides a generalised


statistically valid answer.
Support Vector Machines (SVM)

Supervised model

It would be used to classify data.

It can be used for customer classifi cation. As an


example, if we already had a classifi ed data set we
might want to identify the high value customers.

In this example we have 2 classifi cations but we


need to somehow draw a line in order to identify
new classifi cations.

40

30 SUPPORT VECTOR

20

10

10 20 30 40 so 60 70 80 90 100
How do we best identify where we
should draw lines by identifying the
boundaries of our data sets.

We would draw a hyper plane


between the support vectors such
that when we have a new data point
40 we can allocate it appropriately.

30

20

10

10 20 30 40 so 60 70 80 90 100
Decision Trees
Supervised algorithm

We provide training data along with labels and


they can be considered example inferences

We ask the algorithm to look at this data and fi nd


the patterns, when we give it unlabelled data, we
ask it to discover how it fi ts.

It can be used for customer analysis and medical


conditions

I NODE

11200
LIKEWALKING
Decision trees are essentially flow
INTERNAL diagrams which has root nodes, internal
NODE Y N nodes and leaf nodes

Root nodes are where things start, the


LIKERUNNING CATPERSON
internal node asks another questions
which flows down to another node which
Y N is the leaf node
LEAF
CATPERSON DOGPERSON NODE
We dont need to have the same number
of leafs across the branches

Decision tree outputs can be binary or


numeric but is generally based on a
BINARY NUMERIC CHOICE
numeric question

WALKS DISTACE COLOR We can also use decision tree decision


Y N 4km 31km RED GREEN points to find out a choice like fav colour.
v a v a u
DOG CAT DOG CAT DOG CAT
Decision Trees

RUNNING

LIKE WALKED FAI8 E


RUNNING Km's
WALKINGLIKE
Type
WALKING KMS 4 No yes 1 GREEN DOG
No No 2 Blue CAT
e s Yes YES 1 RED DOG
s
KMS Z YES NO 1 GREEN CAT
WALK µq
YES No 3 GREEN DOG
YES yes 4 BLUE Dok
c s
e s No No 3 RED CAT

How do we start our root? We would need to understand which feature assigns most
closely to the question we are asking.

In this example when analysing the data set, we see that it is, 'likes running' for who is a
dog person vs cat person.

We would then fi lter the data based on that data and identify what is the next most
important feature which in turn would make up the next node and go through other
branches.

You may fi nd some of the features was not selected as it had no correlation between the
question.

We wont see the actual decision tree when it is created but we can give it new data and
categorise the data to see its behaviour
Random Forest
Random forests are supervised algorithms

We have pre-labelled the data and ask it to


infer a binary, classifi cation and numeric
outputs

It is essentially a collection of decision trees

The problem with decision trees on its own, it


can be inaccurate.

Random forest is a way to make decision trees


more accurate.

RUNNING

s
WALKING Kms 4
e s s
Kms 2 WALKING

c s
e s

When you create a decision tree you need to know what question you will place in the root node. The
random forest will check 2 different features and follow down the branch - it is chosen randomly.

We build the decision tree in this way and continue until we have a decision tree with a random
variance.

We repeat this entire process a


random number of times and then
surveying all the data set and run it
u u u u
into the ML algorithm, and see the
output and then based on the
DOG majority output, we would label the
WINS new data.
K-Means

Supervised ML algorithm

Data is provided and example inferences.

Looking for patterns in the data set with examples


of what you are looking for which can only be a
yes or no - a binary outcome.

Typically its used to understand if this data is an


example of something or not.

so
so
so
so
go 40

so so

so so

lo lo

lo w so 40 so so o so go coo to zo so 40 so so yo so go too

If we want to find 3 classes of data, the algorithm makes some random guesses and places 3 points
across the dataset.

It then goes through each data point and checks which centre point it is closest to. The next step is
to figure out all the closest data points. At this point the data classification will be wrong so it will
move the centre point to the middle of its classes.

The algorithm will then go through the cycle again including moving the central point until the
distributions make sense. We need to find equilibrium where moving to the central point does not
effect the classification

60
so

40

so

to 20 30 40 so Go to so go coo TOTALVARIATION TOTALVARIATION


ELBOW PLOT

Few
7

er

I 2 3 4 5 6 7 8

CLUSTERS K

To find out how many times we cycle through, we can graph the number of clusters and
reduction in variation.

The first cluster variation will be at 0 and as we increase the number of clusters we will
eventually see a elbow plot where the variation does not change much.

So after a certain number of clusters, it is ineffective to do more as the variation is minimal.


K-Nearest Neighbour

Supervised algorithm

Used for classifying data thats already


classifi ed

K-means would have found some


clusters within the dataset already,
however the challenge is to know
which class to associate the new
data point to.

K-nearest neighbours takes into account,


the number of nearest neighbours to
Go
consider.
so

40 The 'k' means number and therefore


considers the number of data points to
so
take into account in order to establish the
20 new data point.
o
It should be large enough to reduce
to 20 30 40 so 60 To so 90 too influence from others however small
enough such that small clusters do not get
overlooked.
Latent Dirichlet Allocation (LDA)

Unsupervised Algorithm

It is used for classifi cation and sentiment analysis.

It is a description of the way documents are


constructed. If you have a number of documents,
those documents are made of a number of different
topics along with multiple words which can also be
in multiple topics.

LDA does not understand what is written in the


document but it does statistical analysis to get some
idea of the content.

Document

WORD
WORD
WORD
Document
word
Topic word
Topic word
Topic word
Document word
word
word
word

There is data analysis steps which are done before any processing is done which involves removing
particular words like 'stop words' and words such as 'and'. These words do not help towards
understanding the content.

We then apply stemming to words such as, learned, learning, and learn are all condensed into a
single word i.e learn. Once this is complete, we can then tokenise the words into an array.

Finally we choose the number of topics we would want LDA to find and this is K.

So we take all the words in our array, if we select 3 topics to find, the algorithm will randomly assign a
topic number to all the words.
WORD TOPIC TOPIC2 Topless
Topic
WORD 1 MACHINELEARNING 22 33 43
WORD 2 FUNRUN 32 34 23
WORD 3 DEEP LEARNING 44 23 34
WORD 1 We then calculate each word and
LAMBDA 51 43 23
WORD 2 how often they appear in each
WORD 3 STORAGE 33 64 54 topic
WORD 2 ARTIFICIAL
INTELLIGENCE 45 33 23
WORD 3 Once that is complete we can then
check each document and how
WORD 1
often each topic appears there.
WORD 2
DOCUMENT TOPIC7 TOPICZTOPICS
WORD 2
We take the number of times a
WORD 3 STORAGE 123 23 34 word appears in a topic and how
WORD 1 MACHINELEARNING 43 143 45 many times it appeared for a
particular document and multiply
LAMBDA 24 35 132
them together.

Whichever one comes out higher,


we then reallocate to that topic.
This happens as many times as
necessary until all the topics and
Topic WORD TOPICITopic2 Topless
words are complete across the
WORD 1 MACHINELEARNING 22 33 43
WORD 2
documents.
FUNRUN 32 34 23
WORD 3
DEEPLEARNING 44 23 34 We can then see what those
WORD 1
23 documents are mostly about.
WORD 2 PYTHON 51 43
WORD 3 STORAGE 33 64 54
WORD 2 ARTIFICIALINTELLIGENCE 45 33 23
WORD 3
WORD 1
WORD 2 DOCUMENT Topic7 TopiczTopics
51 24 1224
WORD 2
STORAGE 123 23 34 43 35 1505
WORD 3
MACHINELEARNING 43 143 45 23 X 132 3036
WORD I
LAMBDA 24 35 132
Neural Networks

On the left hand side is input layer, some


hidden layers and then an output layer

Data is processed at each layer on the


network and activated in order to get an
inference

On first layer, which is the input layer, we need


to load data into all of those inputs. As an
example, if it was an image, each pixel would
be put into every input.

Random values are then allocated to the input


I 0 6n 12 8 I neurons and these are referred to as weights.
3 0.3
These are the factors used to adjust before it
05in 2 5 gets to the next layer.
2 t BIAS
0.3in v
ACTIVATIONFUNCTION The weights are multiplied together and we
3 add a value to the next neuron.

We also add a bias to the sum and this is


applied to an activation function.

WEIGHTS
ACTIVATION FUNCTIONS
x RELU
2.5

25
y
SIGMOID

if 2s
x

NH
There are 3 types of activation functions which are

• ReLU
◦Does not consider any negative values

• Sigmoid
◦Generally places values between 0 and 1

• Tanh
◦Is similar to Sigmoid but also trends to negative 1 on the y axis.

If we plot the x value on the function, the y value is the activation function which is provided.

We do not tend to use Sigmoid or TanH generally, ReLU is most commonly used.

b
The bias is there to prevent our neuron
w from being deactivated. If the result was 0
b w
then it would not infl uence anything -the
w
more neurons you have turned off, the less
w
effective the network is.
s
w b
At this point the output will be wrong
because everything will be random and this is
b called forward propagation.

HOW CORRECT AM I
FORWARD PROPAGATION

N
b w
w LOSS FUNCTION
s w
E H w b s
EEEL EEE

BACK PROPAGATION

Once we get to the end we do a loss function which is an evaluation of the calculations that was
made.

This is also known as back propagation and it uses gradient descent and learning rates to reduce
the loss that takes place. It looks at a way to update weights and biases.

The iteration of doing forward and back propagation is epochs and this is how it learns.
Convolutional Neural Networks (CNN)

Supervised Algorithm

Mainly used for classifi cations and mostly


image classifi cation and image detection.

The hidden layers inside the network are


known as the convolutional layers within
the network.

Images generally have particular


CAT
characteristics such as edges, feathers,
eyes and beak if it was an penguin for
DOG example.

The different layers in the network will


work towards identifying these different
characteristics.

For an image, we would use a


convolutional filter.

We would use the first 9 pixels 3 x 3 and


use the filter on it and calculate the
outcome onto a new image. We continue
this across the whole image.

This particular filter does detection for


the edge. We can use multiple filters
which are pre-trained by others, this is
called transfer learning
Recurrent Neural Network (RNN)

Supervised Algorithm

This can be used for stock predictions,


time series data and voice recognition.

HOUR ACTIVITY This robot helps in various scenarios.

6
Let's say we do these repeated activities at
7 various times during the day. There is a linear
8 relationship here of activities.

However the next day, we miss an activity as a


particular time and all the times are about to
ML MODEL change.
11

12 NOT RNN The Not RNN model is not going to handle


this very well.
13

There is a pattern to the activities.

So on the left would be the input layer and


we would map it to the next layer.

We would imagine they all have a weight, but


the key part is whatever the output is,
becomes the input on the next round.
MLMODEL s

MEMORY

The main thing is we take the output and feed it back into the model, it has a
memory to know previous predictions to influence future predictions.

Recurrent neural networks (RNN) can remember a bit

Long short-term memory (LSTM) can remember a lot


Confusion Matrix

I
SVM DECISIONTREES LOGISTIC REGRESSION

Ability to visualise the output from the testing that we do

We can use different algorithms to our data but the question would be, which algorithm is
best suited to our desired inference?
KNOWNTRUTHS

LIKESDOGS LIKESCATS
GoesnotukeDoes

LikesDogs Truepositives falsepositives

1 LIKESCATS FALSENEGATIVES TRUENEGATIVES


LOGISTIC REGRESSION 0
E Cooesnotuke
Does

We can split our data to training and testing data and use Logistical Regression, SVM or
Decision Trees.

As we have labelled data, we can push the testing data through the models and get a result
but we want to establish which is best suited for our scenario.

One of the tools to be able to do this, is called a confusion matrix. This matrix maps on one
side the model prediction vs known truths such that we can see the accuracy.

You would see TP vs FP and FP vs TP.

So simply put the model predicted they do like animals when they didn't or the model
predicted they dont like animals when they did.

KNOWNTRUTHS

LIKESDOGS LIKES
CATS
iuxeooa.si
cooesno We would do this confusion
matrix across the different
LIKESDOGS 120 98
E algorithms to be able to see
E
LIKESCATS which algorithm performs
SVM E
E iuxeooa.si
cooesno 109 200
better.

KNOWNTRUTHS
Its not always clear which one
is better unless we
LIKESDOGS LIKES
CATS
iukeooa.si
cooesno understand our question in
more detail, we would then
E
LIKESDOGS 240 40 choose based on our
E particular use case.
EE LIKESCATS
LOGISTIC REGRESSION E iukeooa.si
cooesno 45 202
Sensitivity and Specificity

True Positive Rates (TPR / Recall) and


Sensitivity Specificity True Negative Rate (TNR)

TPR is the correct positives out of the


True Positive Rate (TPR) True Negative Rate (TNR)
Recall
actual positives.

TNR is correct negatives out of the


actual negative results.

TP
SENSITIVITY
Tp FN
KNOWNTRUTHS
Sensitivity = True Positives / ( True
Positives + False Negatives)
YES NO
The closer the sensitivity value is
E
to 1 then the most accurate it is.
Ee YES TruePositives FalsePositives

Specificity = True Negatives / (True


I No atives
Farsenea True
nectarines

Negatives + False Positives)


a

TN
SPECIFICITY
TN FP

Banks are more interested in the sensitivity score since they are looking for fraudulent activities.

It is more important to catch fraud then falsely identifying - this can fixed it or account can be
unblocked if it was not fraud for example. Therefore the ML model will have higher sensitivity.

This could be similar to medical scenarios too, if it turns out to be false identification, the doctor
can use additional methods to verify.

Specificity is used for example when we have a child watching videos on YouTube. False positives
are not acceptable, we can put up with videos that would of been suitable but was not shown but
displaying unsuitable content will cause issues.
Accuracy and precision

Accuracy is the proportion of all the predictions that was correctly identified

Precision is the proportion of actual positives that were.correctly identified

We need to be careful how we frame the question when it comes to identifying and in a technical
manner.

Accuracy = TP + TN / total

Precision = TP / (TP + FP)

Accuracy with 100% means it is likely overfit and needs to be more generalised.

Precision of 1 can be possible to have no false positives

We can calculate the accuracy and precision for Logistic Regression against decision trees for example
then we can see the difference between each
ROC/AUC
LIKESCOFFEE
If we consider a logistical regression
graph for a Binary situation i.e. likes
PROBABILITY
coffee vs does not, we can then model OFLIKING
COFFEE
this behaviour. It must however, be
binary and also we need to identify
a
where that cut off is actually located. go.gg
to 20 30 40 so 60 70 80 90 100

INCREAJESPECIFICITY
LIKESCOFFEE If we move the line up, we are
L
increasing specificity, which means you do
PROBABILITY
not want any of the classifi cations incorrect.
OFLIKING
COFFEE
If we move it down then we are
a increasing sensitivity, but we dont mind if
go.gg some people are captured who was a false
10 20 30 40 so 60 70 80 90 100 positive but at least they are captured but
we can address this with further checks and
balances later.

The question is, where do we draw this line and it depends on what we want to show.

The other consideration is where is the best balance balance between sensitivity and specificity.
One extreme to another is not going to be useful as it will always return the same result.

The confusion matrix can be used to KNOWNTRUTHS

identify where that line should be to


understand TP and TN the same is done LIKESCOFFEE LIKESCOFFEE
for FN and FP.

In this example there is some test data LIKESCOFFEE TRUE POSITIVE FALSEPOSITIVE
that has been labelled as likes to drink
coffee vs do not. a
1
LIKESCOFFEE FALSENEGATIVE TRUENEGATIVE
Everything on the right of the vertical line
will be classified as liking coffee and
everything on the left as not liking.
Here we can see we correctly identified
LikesCOFFEE all 5 as liking coffee and we got 3 for
known
h aunts true negatives.
probability i.ee
ukescaieeukesca
ofukinaco.ee

y.qu.es.oee g z We did however end up with 2 false


i.ee
ukesca
O 3 positives and 0 false negatives
Like
DoesNot
COFFEE

to 20 so 40 so 60 70 so go 100

In this example if we now move the


LikesCOFFEE horizontal line up we can see the
Trunts
known results of the confusion matrix
probability i.ee
ukescaieeukesca changes.
ofukinaco.ee
y.qu.es.oee 4
i.ee
ukesca
I 4 We misclassified a single point as
Like
DoesNot
COFFEE
negative and positive
to 20 so 40 so 60 70 so go 100

LikesCOFFEE In this example we move the horizontal


line further up.

aJ
truths
known

probability i.ee
ukescaieeukesca

We ended up capturing 3 true


positives, and all 5 of the true negatives.
Like
DoesNot
COFFEE

to 20 so 40 so 60 70 so go too We did end up with 2 false negatives


and no false positives.

We now have a selection of confusion matrix's. We now need to understand what we


do with these.

What is the best point for our cut off point with all our data?

We could have repeated the above at a variety of different points.


Truths
known
This is where ROC / AUC comes
i.ee
ukescoeeeeukesca
into play.
Truepositiverate TPR Likes
coffee
5 z
sensitivity E
E Likes
coffee 0 3 If we have a graph of FPR and
s e
Sto
TPR from our calculations of the
confusion matrix, we can then
o FPR I
Falsepositiverate 2 plot our results.
32
0.4

ROC

BESTMODEL
TPR WITHMAX
SENSITIVITY
The line at the top is the ROC which is
Receiver Operating Characteristics and
FPR
the point where we go from the upper
0 I
slope to the line - this is the cut off point
for max sensitivity the start of the slop
is the best model for max specificity.

BESTMODELWITH In both cases we need to identify where


TPR on the graph the points change
MaySPECIFICITY
direction effectively.

0 FPR i

AUC is the area under the curve and it represents


r generally how well the model overall is good at
TPR AUC distinguishing between the different classes. The larger
the area under curve, the better it is at distinguishing.

0 FPR I

ROC is useful to understand a balance between sensitivity and


specificity and the AUC for overall separability between the classes.
Gini Impurity

In decision trees, the algorithm goes through the data looking for the data that represents
the biggest split. This can be calculated in various ways and Gini impurity is one of them.

What splits the data best? We need to look each of the features.

PROBABILITYOFDOG2
GINI IMPURITY I PROBABILITY OFCAT

v
WALKING RUNNING COLOR TYPE
NO
LIKES WALKING
YES GREEN DOG
NO NO BLUE CAT
120 Y N 98
YES YES RED DOG S
YES NO GREEN CAT TYPE TYPE
YES NO GREEN DOG DOG CAT DOG CAT
YES YES BLUE DOG 97 23 68
30
NO NO RED CAT

LIKESWALKING
LIKESWALKING
2 120 Y N 98 2 2
I 120 Y N 98
1 ft e
TYPE
s
TYPE
Zog
Type
s
TYPE
0310 DOG CAT DOG CAT DOG CAT DOG CAT
97 23 30 68 0.425 97 23 so 68

WEIGHTEDAVERAGE G INIIMPURITY VALUES


YESLIKESWALKING GINI
FEATURE GI
IMPURITY LIKESWALKING 0.362
ALLPEOPLE
LIKES
RUNNING 0.384
NOLIKESWALKING GINI
AW PEOPLE IMPURITY FAVORITECOLOR 0 371

Likes walking has the lowest weighted Gini impurity, so it best separates people who like dogs
over cats. We will use likes walking as our root node.
F1 Score

F1 is a combination of recall and precision, it takes into consideration


the false positives and false negatives in the calculation

FI COMBINATION OFRECALLAND PRECISION

2
I
PRECISION
RECALL

RECALL X PRECISION X2
RECALL 1 PRECISION

LOGISTIC REGRESSION DECISIONTREES

SENSITIVITY 0.543 SENSITIVITY O 864


SPECIFICITY 0.835 SPECIFICITY O 824
ACCURACY 0.839 ACCURACY O 844
PRECISION 0.857 PRECISION 0.839

• F1 score is a better way to calculate accuracy

• Accuracy = (TP + TN) / Total

• Whenever you see F1, it is discussed as recall rather than sensitivity that's why its mentioned in that
manner.

• If you have an uneven class distribution then this is proved to be a better way to analyse
AWS Services, ML and DL Frameworks

• Algorithm such as CNN along with a framework such as MXNet, these two put together make up
the model which is then trained to create inferences

• TensorFlow has been developed by google and powers suggested videos, spam filtering etc..

• AWS have done considerable work with MXNet and SageMaker. MXNet is very good at scaling
across cloud infrastructure.

• PyTorch is runner up to TensorFlow and established machine learning and SciKit learn is a easier
framework to use and natively has support for many algorithms.

Ability put placeholders for values in tf graph capability by running


TensorFlow
various models

• Deep learning framework built on top of python


• PyTorch better for recurrent neural networks
PyTorch • TensorFlow has a graph so it can see where jt came from for back
propagation
• PyTorch needs to keep track of what happen so it can improve the model
• The auto grad feature stores where calculations come from

• Used most by SageMaker


• Shares a architecture similar to PyTorch rather than TensorFlow
MXNet • Nd array is similar to np array(Numpy) which is a tensor for MXNet
• MXNet is aware of the processes it runs on, so can see gpu and cpu
• We need it to record and watch the tensor for when it comes to back
propagation and we do this with autograd

• It has a number of datasets built into it already


SciKit-Learn • Getting the data, formatted and enough of it, is the biggest challenge in
ML
• Digits dataset
AWS Services
• AWS Glue Crawler, can create a database
definition from the data stored in S3
• Glue does not store any data but makes
connections including JDBC or
DynamoDB
Glue
• It can perform some ETL tasks and some
AWSGLUE DATACATALOG
ML capabilities
v
• The ML algorithm can DeDupe table
r
record sets
KINESIS s S3 ATHENA HEAREGEMAKER
• We can load up CSVs and essentially grab
the schema of the dataset
• We can also glue together different
datasets to have a single view

Athena
• It provides an SQL interface into S3
• Source data from multiple S3 locations s O c s.ES
neetEMAkER
SQaL f
• Athena looks at the schema of the data
which comes from glue
• We can do feature engineering from the SCHEMA GLUE SCHEMA
n n
original dataset to then use for analysis v v v
or train our algorithm S3
S3 S3

• Ingesting large amounts of data and this might be


from few or many data points
• Video streams, data streams, data firehose,
Kinesis
data analytics
• Video streams allows streaming video from
KINESIS connected devices for analytics and ML and other
CAMERA VIDEO Rekoanition
streams video processing.
• Data streams is a catch all and general endpoint
to ingest large quantities of data which might to
KINESIS send to EC2 instances which can do the logic or
DATA
MOBILE L SNS L LAMBDA L STREAMS other services like Spark on EMR. However it is
more complex to configure.
• Data firehose is an endpoint to stream data into
S3, RedShift, ES or Splunk.
• Data analytics can process streaming data from
Kinesis Streams or Firehose at scale using SQL.
• Cost effective storage for large amounts of data
• Structured data
S3 ◦CSV
◦JSON
KINESIS ATHENA • Unstructured data
FIREHOSE
g ◦Text files
◦Images
f S3 EHAEGEMAKER
• Data lake
◦Add data from many sources
OTHER ◦Define the data schema at the time of
EMR
analysis
◦Much lower cost than data warehouse
solutions
◦Unsuitable for transactional systems
◦Needs cataloguing before analysis

• Business Intelligence (BI) tool


• Visualise data from many sources
◦Dashboards
◦Email reports QuickSight
◦Embedded reports
• End user targeted

EMR

• Managed service for hosting massively


MASTERNODE parallel compute tasks.
• Integrates with storage service S3
• Petabyte scale
• Uses 'big data' tools like Spark, Hadoop,

CORE NODE TASK NODES HBase


Amazon Rekognition

Image moderation
Facial analysis
Celebrity recognition
Face comparison
Text in image

Use Cases

Create a filter to prevent inappropriate images being sent via a messaging platform. This can
include nudity or offensive text.

Enhance metadata catalog of an image library to include the number of people in each image

Scan an image library to detect instances of famous people


Amazon Rekognition Video

LAMBDA
a

REKOGNITION c
GET LAMBDA s
S3

L
SNS SQS

In this example we start off by storing a video in S3 bucket

We have a lambda function which is invoked based on new object event

The lambda function uses the Rekognition which would go to S3 bucket and get the data

Rekognition will go through the data and will send a message to SNS Topic on completion
which will be written to an SQS Queue.

Another lambda function will see the message in the queue and go to Recognition to get the
completed job.

Use Cases

Detect people of interest in a live video stream for a public safety application.

Create a metadata catalog for stock video footage library

Detect offensive content within videos uploaded to a social media platform


Amazon Polly

You can enter some plain text and it will transform into speech

Female or male voices

Custom lexicons which is the ability to create your own specific words and pronunciations.

SSML (Speech Synthesis Markup Language) allows you to add syntax to change the way
something is spoken i.e. you could put an effect like 'whispered' which would say it in a whispered
tone.

There is a variety of languages like French, German, Hindi, Italian, Romanian etc..

Use Cases

Create accessibility tools to 'read' web content

Provide automatically generated announcements via a public address (PA) system.

Create an automated voice response (AVR) solution for a telephony system (including Connect)

Amazon Transcribe

You can either speak directly into the mic or pass it files which would be written to text

Use Cases

Create a call centre monitoring solution that integrates with other services to analyse caller
sentiment

Create a solution to enable text search of media with spoken words.

Provide a closed captioning solution for online video training


Amazon Translate

We can either pass in files or do in real-tie

There is a large variety of languages

Ability to add custom terminology also

Provides a variety of metrics, such as the ability to see successful request count, throttled request
count and character count along with others

Use Cases

Enhance an online customer chat application to translate conversations in real-time

Batch translate documents within a multilingual company.

Create a news publishing solution to convert posted stories to multiple languages


Amazon Lex

Automatic speed recognition (ASR)

Natural language understanding (NLU)

Use Cases

Creates a chatbot that triages customer support requests directly on the product page of a website

Create an automated receptionist that directs people as they enter a building

Provide an interactive voice interface to any application


AWS Step Functions

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows.

It allows you to stitch together services such as Transcribe, Comprehend along with others lambda
functions and services

LAMBDA LAMBDA s
J s LAMBDA LAMBDA s s

v
AMAZON
AMAZON COMPREHEND
SPEECH S3 TRANSCRIBE

In this example we recorded some audio and uploaded it into S3.

We could then use call a lambda function based on an event which would fall into amazon step
function which will then orchestrate the desired behaviour between the different services.

We are triggering another lambda function which in turn will go and speak with Transcribe and kick
off a job against the s3 bucket.

We could then use another function after a period of time to check if the job has completed or not
and based on the response we can decide what we want to do next.

Once we have the desired response we can then use another lambda function to speak with Amazon
Comprehend which allows you to extract key phases, entitles, sentiment, language amongst other
things.

This data can then be stored in a database or used by a application


SageMaker Overview

Ability to build, train and deploy machine learning models quickly

It covers the entire machine learning workflow to label and prepare your data, choose an
algorithm, train the model, tune and optimise it for deployment, make predictions and take action.

Ability to pull data in data from


S3 EFS FSX various different sources and we do
this using Channel Parameters.

PARAMETER CHANNEL

AWS Recommend the larger instance sizes for training

Some algorithms only support GPUs

GPU instances are more expensive, but faster

There is also managed spot training and you can keep checkpoints of the model state in S3.

This is 90% cheaper than on-demand instances

• SageMaker --> Training Jobs


◦In a S3 bucket if we have a collection of data i.e. cats and dogs
◦We could then pick a Algorithm source i.e. a built in algorithm from SageMaker
◦We can also choose the type of algorithm like Image classification
◦Then we need to decide how we wish to input the data i.e. File or Pipe
◦Ability to select here the type of instance sizes, VPC and encryption
◦There is also ability to use hyper parameters here such as batch size, minimum epochs etc. AWS
pre-populates a lot of this when doing in the console.
◦We can then set our training data, validation data and output

Once you have the above, you will end up with a model which can then be used to make inferences.
SageMaker - Batch / Realtime

• Real Time
◦It is possible to do real-time inferences by allowing the application to invoke the SageMaker
Endpoint which would then call on the model.

INVOKEENDPOINT

SAGEMAKERENDPOINT

S3 MODELL ECR

• Batch
◦Batch Transform jobs
◦They will put in data that we want to get inference from
◦We could then push that into our classifi cation model for example to understand if we have
a high value customer

S3 s S3
BATCHTRANSFORMATION

S3 SAGEMAKER DOCKER
SageMaker - Deploy

• SageMaker --> Models


◦Once we have created our model, we can then set a container to host the model

• SageMaker --> Endpoint Configuration


◦We can then add our model to this endpoint configuration
◦Within this we will know the model, the instance type

• SageMaker --> Endpoints


◦We create a new endpoint here and use and existing configuration which was create

At this point you could run a command and give it a new file and use the model to create an
inference.

aws sagemaker-runtime invoke-endpoint --endpoint-name catdog --body filled://cat.png --profile


sandbox ./output.json

You might also like