0% found this document useful (0 votes)
27 views129 pages

DesignSafe Bootcamp V1

This document provides an introduction to machine learning, deep learning, and artificial intelligence concepts and how they can be applied to natural hazard engineering research. It discusses available platforms at TACC and DesignSafe for using these capabilities, and provides a hands-on example of using artificial intelligence to classify building damage from images within DesignSafe.

Uploaded by

Ergun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views129 pages

DesignSafe Bootcamp V1

This document provides an introduction to machine learning, deep learning, and artificial intelligence concepts and how they can be applied to natural hazard engineering research. It discusses available platforms at TACC and DesignSafe for using these capabilities, and provides a hands-on example of using artificial intelligence to classify building damage from images within DesignSafe.

Uploaded by

Ergun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

Introduction to ML/DL and its

Applications in Natural Hazard


Dan Stanzione, Mahyar Sharifi, Zhao Zhang, Niall Gaffney

DesignSafe
Texas Advanced Computing Center
The University of Texas at Austin
2/17/20

1
Welcome!
• We are glad to have you visiting TACC, DesignSafe, and
at this tutorial!

• Intros, and a couple of questions.

2
TACC AT A GLANCE - 2020
Personnel
185 Staff (~70 PhD)
Facilities
12 MW Data center capacity
Two office buildings, Three
Datacenters, two visualization
facilities, and a chilling plant.
Systems and Services
>Seven Billion compute hours per year
>5 Billion files, >100 Petabytes of Data,
NSF Frontera (Track 1), Stampede2
(XSEDE Flagship), Jetstream (Cloud),
Chameleon (Cloud Testbed) system
Usage
>15,000 direct users in >4,000 projects,
>50,000 web/portal users, User
demand 8x available system time.
Thousands of training/outreach
participants annually
2/15/20 3
Goals for Today
• First, give you a basic understanding of concepts in AI/ML/
DL.
• To talk about how these concepts could be applied in the
context of Natural Hazards Engineering research (a lot
more on this tomorrow and Wednesday).
• To talk about the platforms we have available here at TACC
and in DesignSafe that allow you to use these capabilities
• To provide a hands-on example of using AI to classify
building damage from images within DesignSafe.
• Hopefully, to get you to think about how to apply this in
your research.

4
Schedule
• 1-1:30 AI/ML/DL and HPC at TACC

• 1:30-2:15 Introduction to ML/DL

• 2:15-3:00 Introduction to Keras and TensorFlow

• 3-3:30 Break

• 3:30-5 Hands-on with ML/DL Examples in Natural Hazards


• Damage classification from images with Deep Learning

• Correlating damage with other factors with Random Forest and other ML
techniques
• Both with Hurricane Harvey datasets

5
AI/ML/DL Landscape

Machine
Artificial
Deep Learning Learning
Intelligence
e.g. CNN and e.g. Logistic
e.g. Knowledge
RNN Regression and
Bases
SVM

• These terms get used somewhat interchangeably, but


formally have different definitions. . .
• Deep learning is a subset of machine learning, which is
a subset of AI.

6
AI/ML/DL - Hype vs. Reality
• There is an enormous amount of hype around these
terms today, but there is really nothing magical around
these techniques, or how we compute them.
• Neural networks, and in fact most of the concepts in
ML/DL, have been around for decades.
• The big difference is that in the last 10 years we have
the computational power to get interesting results,
faster than traditional methods
• We apply about a billion times more computation to
these networks than when I used them in grad school.

7
AI/ML/DL - Hype vs. Reality
• In fact, you can think of almost all ML/DL methods in use today as
simply *statistical* models of a system, rather than physics-based
models.
• So, the questions are:
• Can statistics outperform other prediction methods?

• Do you have enough data to build good statistics?


• What does your model do when you see something you have
never seen before? (and will this happen?).
• In 2008, the financial people discovered, “yes this
happens”.
• For all intents and purposes, ML/DL is just “advanced curve
fitting”, but using more than straight lines on engineering paper.

8
Terminology
• AI - Artificial Intelligence
• ML - Machine Learning
• DL - Deep Learning
• NN - Neural Networks (layers of interconnected “neurons”, where
the neuron is a summation of “weights” and signals from other
neurons, multiplied by some non-linear function).
• CNN - Convolutional Neural Net
• RNN - Recurrent Neural Net
• C vs R refers to the topology of the neuron layers (see next slide)
• In general, “Deep” Learning is using NNs with lots of layers of
neurons.
• There are also lots of other techniques in AI that don’t use NN.

9
Neural Networks
• Feed Forward


• Convolutional

• Recurrent

10
It’s likely the networks,
neurons, frameworks,
and algorithms will
change a lot in the next
few years.
Terminology
• Training:
• Present inputs and known outputs to the network, and
iteratively adjust the weights (through a process
called backpropagation) until the network produces
and answer close to the right answer.
• Inference
• Use a pre-trained network on input data to produce
an output.

12
Places (in research) you may wish to use AI
• Artificial Intelligence (AI) is a driving application for
exascale machines, along with simulation and big data

Data Inverse Surrogate Design


Analytics Problems Models and Control
Approximate Optimize Design
Classification Model
expensive of Experiments
Reconstruction
Simulations
Control
Regression
Instruments
Parameter Approximate
Estimation Experiments
Navigate State
Clustering
Spaces
Fill in Missing
Dimensionality Denoising Models in Learn form
Reduction Simulations Sparse Rewards

Credit: Kathy Yelick, in Monterey Data Conference, 2019

13
Data-driven Decision Making Pipeline
Predictions, e.g.,
transient events

Controller Serving Timeliness, ...


s
a t ion
v Model
bser
O
Frequency, Robustness,
Telescopes, Detectors Training

Processed
Data
O
bs
e

Data Cleaning,
rv

Data Processing
at

Calibration, …
io
ns

Data

Persistence, Versioning,
Storage & Archiving

14
Software Support for Deep Learning
• While you can produce custom code for about any method, most
of what you need is easiest to get too from common frameworks.
• For Deep Learning, PyTorch, Keras, TensorFlow
• Many ML methods in data science frameworks like Pandas
• The typical language of choice for these methods is Python —
you don’t really need to know Python for today’s exercises.
• The best way to work interactively in Python is through Jupyter
notebooks.
• The best way to run Jupyter notebooks is in a container.
• You can launch Jupyter from within DesignSafe
• (Guess how we are going to do this today)

15
Front-end
• Command Line
• Jupyter Notebook

16
Hardware Support for Deep Learning

• There is no reason you can’t do DL just fine on a regular processor.


• But traditionally in technical computing, we’ve optimized for double
precision as the “gold standard”
• When you look at those weights — you don’t need 2^64 values to tell
you if a connection between neurons is “important” or not (arguably,
you could use off or on).
• Training networks means flowing data forward and back through
those networks, usually at single or half precision.
• GPUs are well-suited for this, and have been aggressively optimizing
“low precision” for many years.
• Currently, most frameworks have a pretty substantial performance
advantage on GPU, even correcting for cost.
• Purpose-built silicon is coming that will do this faster than either (e.g.
Google TPU), as well as instruction enhancements on CPUs.

17
AI Hardware at TACC
• In general, we support AI on every platform.
• For this purpose, we will focus mostly on GPUs
• Frontera
• Longhorn
• Maverick
• Chameleon
• Today you will use Frontera for your Deep Learning
exercises.

18
Frontera Single Precision Subsystem
• Frontera is the #5 supercomputer in
the world, with more than 450,000
processors achieving 40 PetaFlops at
double precision.
• It also has a smaller subsystem
optimized for System Features:
• 90 nodes/360 GPUs
• 2x Broadwell processors
• 128 GB RAM
• 4x NVIDIA Turing Quadro RTX 5000
GPUs per node
• 150 GB local SSD
• Infiniband connected to Frontera main
filesystems (50 Petabytes).

19
Longhorn
• System Features:
• 112 IBM nodes/448 GPUs
• Dual socket POWER9 processors
with 40 total cores (20x2), 2.4GHz
(3.0GHz) turbo, up to 4 hardware
threads per core
• 256 GB RAM
• 4 NVIDIA Tesla V100 GPUs per node
• 1TB HDD per node
• IBM Elastic Storage System (ESS)
with 5 PB capacity for /home and /
scratch file systems
• EDR InfiniBand with spine-and-leaf
interconnect

20
Software

Frontera (CPU) Frontera (GPU) Longhorn (GPU)

Keras/TensorFlow/
✔ ✔ ✔
Horovod

PyTorch/Horovod ✔ ✔ ✔

MXNet/Horovod ✔ ✔

Caffe/Intel MLSL ✔

21
Today’s exercises
• General ML:
• Will run a Jupyter notebook on a tailored container in
a dedicated DesignSafe virtual host, using Pandas.
• Deep Learning:
• Will run on a Jupyter notebook in a tailored container
on a Frontera RTX GPU node, launched via
DesignSafe, using Keras over TensorFlow to build,
train and infer with a CNN.
• Data sets from Hurricane Harvey reconaissance.

22
A Glimpse of TACC DL Research
• Applications
• [DLS’19] Mattmann, Chris A., Zhang, Z., Deep Facial Recognition with TensorFlow, The 3rd
Deep Learning on Supercomputers Workshop, in conjunction with SC’19, Denver, CO
• [Biorixv’19] Fang, L., Monroe, F., Novak, S.W., Kirk, L., Schiavon, C.R., Seungyoon, B.Y.,
Zhang, T., Wu, M., Kastner, K., Kubota, Y.
and Zhang, Z., Deep Learning-Based Point-Scanning Super-Resolution Imaging. bioRxiv,
p.740548. in submission to Nature Methods
• Algorithms — Scaling DL on O(1K) Processors
• [ICPP’18] You, Y., Zhang, Z., Hsieh, C.J., Demmel, J. and Keutzer, K., Imagenet training in
minutes.
In Proceedings of the 47th International Conference on Parallel Processing (p. 1). ACM. Best
Paper
• System Software — Scaling I/O to O(1K) Processors
• [IPDPS’20] Zhang, Z., Huang, L., Pauloski, J. G., Foster, Ian T., Efficient I/O for Neural
Network Training with Compressed Data, to appear in IPDPS’20
• Architecture — Gaming GPU or Server GPU?
• [CLUSTER’19] Zhang, Z. Huang, L., Huang, R., Xu, W., Katz, D. S., “Quantifying the Impact
of Memory Errors in Deep learning”, IEEE Cluster 2019, Albuquerque, NM

23
A Few Production DL Applications @ TACC
• Face recognition to fight human trafficking
• In Collaboration with NASA JPL

• ~100 GB image data,


• 2,622 celebrities, 1.2 million images
• TensorFlow + Horovod

[1]DLS’19 Mattmann, Chris A., Zhang, Z.. Deep Facial Recognition with TensorFlow,
The 3rd Deep Learning on Supercomputers Workshop, in conjunction with SC’19, Denver, CO
[2] Courtesy image from: https://megapixels.cc/datasets/msceleb/

24
A Few Production DL Applications @ TACC
• Face recognition
• VGG16 network

• Each run takes ~12 hours on 16 NVIDIA GTX 1080 Ti


GPUs

[1]DLS’19 Mattmann, Chris A., Zhang, Z.. Deep Facial Recognition with TensorFlow,
The 3rd Deep Learning on Supercomputers Workshop, in conjunction with SC’19, Denver, CO
[2] Courtesy image from: https://megapixels.cc/datasets/msceleb/

25
A Few Production DL Applications @ TACC
• Neural image resolution enhancement
• In collaboration with Salk Institute

• ~600 GB neural image dataset


• 300K image pairs
• TensorLayer+TensorFlow+Horovod ->PyTorch+FastAI

Biorixv’19 Fang, L., Monroe, F., Novak, S.W., Kirk, L., Schiavon, C.R., Seungyoon, B.Y., Zhang, T., Wu, M., Kastner, K., Kubota, Y.
and Zhang, Z., 2019. Deep Learning-Based Point-Scanning Super-Resolution Imaging. bioRxiv, p.740548.

26
A Few Production DL Applications @ TACC
• Neural image resolution enhancement
• ResNet-34 based UNET

• Each run of the early version takes ~16 hours on 2


NVIDIA V100 GPUs

Biorixv’19 Fang, L., Monroe, F., Novak, S.W., Kirk, L., Schiavon, C.R., Seungyoon, B.Y., Zhang, T., Wu, M., Kastner, K., Kubota, Y.
and Zhang, Z., 2019. Deep Learning-Based Point-Scanning Super-Resolution Imaging. bioRxiv, p.740548.

27
A Few Production DL Applications @ TACC
• Traffic Video Analysis
• In collaboration with the
Transportation Department
at City of Austin
• Use deep learning methods
to automatically recognize
moving objects from video
stream, e.g. car,
pedestrian, cyclist, bus
• Covert video into indexable
objects that can be
searched / analyzed later
• Traffic volume estimation,
location based safety study.

28
A Few Production DL Applications @ TACC
• Domain informational vocabulary extraction (DIVE)
• Text based Information Extraction System
• Corpus: Plant Biology journal articles
• Extracts biological entities that may be of interest
• Combination of methodologies used
• Attempts to infer new relationships between entities
• Structure information
• Textual data
• Allows for Expert User curation of extracted/inferred data
• In collaboration with American Society of Plant Biologists (ASPB)
• Neuroner, Easy to use wrapper around Tensorflow 1.0
• Bidirectional Long Short Term Memory (LSTM) network
• Learns from forward and backward sequence of tokens
• Consists of 3 layers
• Token Embedding layer
• Label Prediction layer
• Label Sequence Optimization layer

29
Questions?

Then let’s get into the details. . .


Have you tried logging in yet?

[email protected]

30
Introduction to Conventional
Machine Learning

Mahyar Sharifi
Texas Advanced Computing Center

31
Types of Machine Learning
• Supervised Learning
• Learn a function to map an input to an output.
• Given: training data + desired outputs
• If Outputs is categorical == (Classification)
• If Outputs are numerical == (Regression)

• Unsupervised Learning
• Find previously unknown patterns
• Given: training data (without desired outputs)

• Reinforcement Learning
• Rewards from sequence of actions
• Maximize a cumulative reward.

32
Examples of Supervised ML Algorithms
• Linear Regression (Regression)

• Logistic Regression (Classification)

• K-Nearest Neighbor (Classification)

• Decision Tree (Regression or Classification)

• Random Forest (Regression or Classification)

33
Steps
• Importing Libraries
• Scikit-Learn, Numpy, Pandas
• Importing Dataset
• Pandas dataframe
• Handling Missing Data
• Handling Categorical Data
• Feature Scaling
• Splitting the Dataset to Training and Test set
• Fit Model to Train Dataset
• Compute Model Performance on Test dataset
• Confusion Matrix

34
Linear Regression
Y = b0 + b1*X1 + b2*X2 + … + bn*Xn
Linear
Regression
Given (x1,y1), (x2,y2), … , (xn,yn)
Learn a function f(x) to predict y given x

• Assumptions of a linear regression:


1. Linearity
Y
2. Homoscedasticity
3. Multivariate normality
4. Independence of errors
5. Lack of multicollinearity
x

35
Logistic Regression (1)
Y (1/0)
Y Y = b0 + b1*X
1

0
X X
Y (1/0)

0
X
36
Logistic Regression (2) Y (1/0)

1
Y = b0 + b1X

Sigmoid 1 0
p= X
function
1 + e −Y
p^
1
p
ln( ) = b0 + b1X
1−p

0
X

37
Logistic Regression
p^
1

threshold probability

0 X

• Applying sigmoid function to linear regression.


• Compute probabilities for each data point.
• Predict y according to the probability and a threshold.

38
K - Nearest Neighbor
• Step 1:
Choose number k of neighbors
X2
Category 2
• Step 2:
Find K nearest neighbors to new data point (Eculidean
distance or etc.)
New Data
Point
• Step 3:
Count the number of data points in each category.

Category 1 • Step 4:
X1 Assign the new data point to the category where you
counted the most neighbors.

39
Decision Tree Classification
X2 < 60

Split 2 Yes No

X2
X1 < 70 X1 < 50
Yes Yes
No
No
60 Split 1

40 Split 4 X2 < 40
Yes
No

Split 3
X1
50 70
• Old Method.
• Reborn with upgrades.
• Random Forest, Gradient Boosting and etc.

40
Random Forest Algorithm
• STEP 1: Pick at random K data points from the Training set.

• STEP 2: Build the Decision Tree to these K data points.

• STEP 3: Choose the number Ntree of tress you want to build


and repeat STEPS 1 & 2.

• STEP 4: For a new data point, make each one of your Ntree
trees predict the category to which the data points belongs,
and assign the new data point to the category that wins the
majority vote.

41
Confusion Matrix
• Accuracy = (TP + TN) / (Total)

K-Fold Cross Validation


10 fold

Training 10 iteration
Dataset

Test
42
Introduction to Deep
Learning

Zhao Zhang
Texas Advanced Computing Center

43
History
• 60s — Cybernetics

• 90s — Connectionism + Neural Networks

• 10s — Deep Learning


• Two key factors for the on-going renaissance
• Computing capability
• Data

44
Image Classification

ImageNet Classification Top-5 Error (%)


30

25
28.2
25.8
20

15 16.4
10 11.7
5 7.3 6.7
3.6
0
10

11

et

13

et

et
VG
N

eN

sN
20

20

20
ex

Re
gl
14
AL

oo
20

15
G
12

20
14
20

20

45
Image Classification

ImageNet Classification Top-5 Error (%)


30
28.2
152 layers
25
25.8
20 22 layers
15 16.4 19 layers
10 11.7
7.3 6.7
5
shallow 8 layers
3.6
0
10

11

et

13

et

et
VG
N

eN

sN
20

20

20
ex

Re
gl
14
AL

oo
20

15
G
12

20
14
20

20

46
Introduction to Deep Learning
• Overview

• From Linear Regression to Neural Network

• DL and HPC

• TACC Hardware and Software

• Introduction to Keras, TensorFlow, and Horovod

47
Linear Regression
• Example: Predicting house price with square footage

y=wx

wx(i)-y’(i)

Determine a function y=w*x to minimize


Loss =1/2 * ∑(wx(i)-y’(i))2
48
Model Generalization
• In practice, we divide a labeled training dataset into two
parts. E.g., 80% and 20%, referred as training and
validation dataset, respectively
• We derive the value of w using the training dataset.
• value of w can be referred as model
• Then we apply the model to the validation dataset and
compare the prediction with the labels
• The difference between the prediction and the label is
referred as error or loss
• A good model has low training error and low validation
error
• This is referred as good generalization
49
In practice
• We may use a bias item: y = w*x + b, or even a
regularization item: y = w*x + 0.5*𝜆*w2
• We use a vector of X = {x1, x2, …, xn} as the set of
features
• We may use the gradient descent algorithm to find the
w with minimum error
• We may use cross-entropy as error/loss instead of the
distance

50
From Linear Regression to Neural Network

Input A linear model Prediction

X w, b y=wx+b

activate(y)
A neuron, a unit

Activation function
Convert a real value to binary value: 0 or 1
In the analogy of a neuron

51
From Linear Regression to Neural Network
Layer
Hidden layer

w, b w, b
w1,0, b1,1 refers to the
model in Layer 1, Unit 0

activate(y’) activate(y’)

Input Prediction

w, b w, b

X w, b y
activate(y’) activate(y’)

w, b w, b

activate(y’) activate(y’)

52
From Linear Regression to Neural Network
• Now we have labeled data
• We can calculate y and the error with label y’
• We can then update w2,0
• How can we update w1,0, w1,1, w1,2?

w, w,

activat activat

w, w,
w2,0,
X b2,0 y
activat activat

w, w,

activat activat

53
From Linear Regression to Neural Network
• The back-propagation algorithm
• W1,0 should be updated as W1,0=W1,0 - 𝜆*∂Loss/∂W1,0
• ∂Loss/∂W1,0 = 2 1

4 ∂Loss/∂y2,0 * ∂y2,0/∂Activate1,0 * ∂Activate1,0/∂y1,0 * ∂y1,0/∂W1,0


• The Chain Rule
3
y1,0=w1,0*X1+ b1,0 1
w,

activat
2 3
X1,0=Activate(y1,0)
w, w,

X y2,0=w2,0*X1+b2,0
activat activat
Loss = 1/2 * (y2,0-y’)2
w, w,

4
activat activat
54
From Linear Regression to Neural Network
• Stochastic Gradient Descent
• So for each iteration, we take a small size of n (e.g.,
n=512), and update the parameters based on the
averaged gradients

w, w,

activat activat

w, w,
w2,0,
X b2,0 y’
activat activat

w, w,

activat activat 55
From Linear Regression to Neural Network
• The notion of Epoch
• The time by which every training data item is visited
once
• So for1,200,000 images with a 512 mini-batch size,
an epoch roughly take 2,400 iterations

• How many epochs is enough?


• Case by case
• A somewhat standard practice uses 100 epochs for
AlexNet and 90 epochs for ResNet-50

56
Convolutional Neural Network
• What we just saw is a multi-layer perceptron (MLP)
network
• If in any layer, there is a convolution operations, it is
called convolutional neural network
• Often coupled with pooling operation
• Example applications:
• Image classification
• Object detection
• Autonomous driving

https://ikhlestov.github.io/pages/machine-learning/convolutions-types/ 57
Recurrent Neural Network
• Recurrent Neural Network is another typical neural
network architecture, mainly used for ordered/sequence
input
• RNNs provide a way of use information about
Xt-i, …, Xt-1 for inferring Xt
• Example applications:
• Language models, i.e. auto correction
• Machine Translation
• Auto image captioning
• Speech Recognition
• Autogenerating Music

58
Generative Adversarial Network

Courtesy image from O’Reilly

59
Deep Reinforcement Learning

https://skymind.ai/wiki/deep-reinforcement-learning

60
AI/ML/DL and HPC
• Artificial Intelligence (AI) is a driving application for
exascale machines, along with simulation and big data

Data Inverse Surrogate Design


Analytics Problems Models and Control
Approximate Optimize Design
Classification Model
expensive of Experiments
Reconstruction
Simulations
Control
Regression
Instruments
Parameter Approximate
Estimation Experiments
Navigate State
Clustering
Spaces
Fill in Missing
Dimensionality Denoising Models in Learn form
Reduction Simulations Sparse Rewards

Credit: Kathy Yelick, in Monterey Data Conference, 2019

61
Notions
• Neural Network Architecture • Training and Validating
• Multi-layer Perceptron • Training Dataset
• Convolutional Neural Network • Validation/Test Dataset
• Recurrent Neural Network • Training Accuracy
• Activation, Loss, and • Validation/Test Accuracy
Optimization Training Loss
• Activation Function • Validation/Test Loss
• Loss Function • Epoch
• Back-propagation • Iteration/Step
• Gradient Descent
• Stochastic Gradient Descent

62
Introduction to Scikit-Learn
Mahyar Sharifi
Texas Advanced Computing Center
Scikit-Learn
• Simple and efficient tools for
predictive data analysis

• Accessible to everybody,
and reusable in various
contexts

• Built on NumPy, SciPy, and


matplotlib

• Open source, commercially


usable
https://scikit-learn.org/

64
Importing Libraries and Data
• import pandas as pd
• import numpy as np
• import matplotlib.pyplot as plt

• Harvey = pd.read_pickle(‘Merged_Harvey_2G.pkl')

65
Building X and Y
• Creating matrix of features and dependable variable Y
• X = Harvey.iloc[:, [1,2,3,4,5,6,7]].values
• y = Harvey.iloc[:, 0]
Label, Y Matrix of features, X

66
Handling Categorical Variables
• from sklearn.preprocessing import LabelEncoder, OneHotEncoder
• labelencoder_X = LabelEncoder()

• X[:,3] = labelencoder_X.fit_transform(X[:,3]) #roof_shape


• onehotencoder = OneHotEncoder(categorical_features = [3])
• X = onehotencoder.fit_transform(X).toarray()

Roof_Shape Encode Roof_Shape Dummy1 Dummy2 Dummy3

Complex 1 Complex 1 0 0

Complex 1 Complex 1 0 0

Hip 2 Hip 0 1 0

Gable 3 Gable 0 0 1

Gable 3 Gable 0 0 1

Complex 1 Complex 1 0 0
67
Building Train and Test Data
• #Splitting the dataset into the Training set and Test set
• from sklearn.model_selection import train_test_split

• X_train, X_test, y_train, y_test


= train_test_split(X, y, test_size = 0.25, random_state = 4)

68
Feature Scaling
• from sklearn.preprocessing import StandardScaler
• sc = StandardScaler()
• X_train = sc.fit_transform(X_train)
• X_test = sc.transform(X_test)

• This process is not needed when fitting “Decision Tree”


as it is not based on Euclidean Distance.

69
Fitting Models
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

from sklearn.ensemble import RandomForestClassifier


classifier = RandomForestClassifier(n_estimators=100, criterion = 'entropy',
random_state = 0)
classifier.fit(X_train, y_train)

• y_pred = classifier.predict(X_test)

70
Model Performance
from sklearn.metrics import confusion_matrix
from sklearn.utils.multiclass import unique_labels
plot_confusion_matrix(y_test, y_pred,
classes=class_names, title='Confusion matrix’)
plt.show()

71
Feature Importance
fi = pd.DataFrame({'feature': list(X_train.columns),
'importance': classifier.feature_importances_}).
sort_values('importance', ascending = False)

72
Launching JupyterHub in
DesignSafe and setup for tutorial

73
Open and login to DesignSafe Website,
Click on Research Workbench,
Select Data Depot.

74
• Click on Community Data
• Open Machine_Learning_Bootcamp

75
• Select “ML_DesignSafe_Tutorial.ipynb”
• Click Copy
• Select “MyData” from dropdown
• Click copy to “username”

76
Launch DS JupyterHub
• Go to http://jupyter.designsafe-ci.org/
• Login using your DesignSafe/TACC account.
• Start your server.
• Go to “my data”.

77
Open “ML_DesignSafe_Tutorial.ipynb”
Introduction to Keras/
TensorFlow

Zhao Zhang
Texas Advanced Computing Center

79
TensorFlow
• Product of Google Brain team.
• Open source symbolic math library ideal for DL
computations.
• Build up computational graphs operating on
n-dimensional arrays (tensors)
• Low level API, difficult to program
• Initial release 2015
• Version 1.0.0 release Feb 2017
• Current 1.15.2 and 2.1.0 release Jan 2020

80
Keras
• Keras is a Python API wrapping lower level Deep
Learning (DL) frameworks including Tensorflow, Theano,
and CNTK.
• Philosophy: “Being able to go from idea to result with
the least possible delay is key to doing good research.”
• Original author: Google engineer François Chollet
• Provides many common building blocks for building DL
models: layers, optimizers, activation functions
• Convenience functions for processing common data
types: image and text

81
Keras Programming Interface
• Constructing Models — Sequential and Functional API
• Setup Input Stream — Data Generator API
• Instrumenting Training — Callback API
• Inference/Serving — Prediction API

82
Constructing Models — Functional API
• inputs = Input(shape=(8,0))
X

X0
• x = Dense(4, activation=‘relu')(inputs)
X1

• x = Dense(8, activation=‘relu')(x) …

X7
• predictions = Dense(1, activation=‘sigmoid’)(x)

• model = Mode(inputs=inputs, outputs=predictions)

83
Extend a Pre-trained Model

1x1x4096 1x1x3

credits: https://neurohive.io/en/popular-networks/vgg16/

84
Loading a Pre-trained Model
• input_tensor = Input(shape=(224,224,3))
• vgg_model = VGG16(weights='imagenet', include_top=False,
input_tensor=input_tensor)

• x = vgg_model.get_layer(‘block5_pool').output
1x 1
• x = Flatten()(x)
• x = Dense(512, activation=‘relu')(x)
• x = Dense(2, activation=‘softmax')(x)

• model = Model(input=vgg_model.input, output=x)

85
Data Generator API
• A natural way to feed training data to models is to
• Placing training items in file system, with each
category in one directory
• Organizing the validation data the same way
├── Train
│ ├── C0
│ │ ├── 001c4ec9-a5a3-4ef8-8154-876d3f54a7eb.jpg
│ │ ├── 0073a751-e33f-4748-b0e0-12ab9306fe8d.jpg
│ │ ├── …
│ └── C4
│ ├── 000f1691-619b-4d89-849c-33e416dff150.jpg
│ ├── 0056732e-9b52-4b7e-ac67-e41a05f37116.jpg
│ ├── …

86
Data Generator API
• datagen = ImageDataGenerator()

• train_it = datagen.flow_from_directory('Dataset_binary/
Train/', target_size=(224,224),
class_mode='categorical', batch_size=16, shuffle=True)

• val_it = datagen.flow_from_directory('Dataset_binary/
Validation/', target_size=(224,224),
class_mode='categorical', batch_size=1, shuffle=False)

87
Data Augmentation
• datagen = ImageDataGenerator(
• rotation_range=40,
• width_shift_range=0.2,
• height_shift_range=0.2,
• shear_range=0.2,
• zoom_range=0.2,
• horizontal_flip=True,
• fill_mode=‘nearest'
• )

88
Connecting Data Generator and Model
• model.compile(loss='categorical_crossentropy',
• optimizer=opt,
• metrics=['accuracy'])
• print(model.summary())

89
Connecting Data Generator and Model
Layer (type) Output Shape Param

input_1 (InputLayer) (None, 224, 224, 3) 0

block1_conv1 (Conv2D) (None, 224, 224, 64) 1792

block1_conv2 (Conv2D) (None, 224, 224, 64) 36928

block1_pool (MaxPooling2D) (None, 112, 112, 64) 0

block2_conv1 (Conv2D) (None, 112, 112, 128) 73856

block2_conv2 (Conv2D) (None, 112, 112, 128) 147584

block2_pool (MaxPooling2D) (None, 56, 5 6, 12 8) 0

block3_conv1 (Conv2D) (None, 56, 5 6, 25 6) 295168

block3_conv2 (Conv2D) (None, 56, 5 6, 25 6) 590080

block3_conv3 (Conv2D) (None, 56, 5 6, 25 6) 590080

block3_pool (MaxPooling2D) (None, 28, 2 8, 25 6) 0

block4_conv1 (Conv2D) (None, 28, 2 8, 51 2) 1180160

block4_conv2 (Conv2D) (None, 28, 2 8, 51 2) 2359808

block4_conv3 (Conv2D) (None, 28, 2 8, 51 2) 2359808

block4_pool (MaxPooling2D) (None, 14, 1 4, 51 2) 0

block5_conv1 (Conv2D) (None, 14, 1 4, 51 2) 2359808

block5_conv2 (Conv2D) (None, 14, 1 4, 51 2) 2359808

block5_conv3 (Conv2D) (None, 14, 1 4, 51 2) 2359808 90

block5_pool (MaxPooling2D) (None, 7, 7, 512) 0


Callbacks
• Callbacks let you instrument the training process
• Examples:
• Checkpointing
• ReduceLROnPlateau
• EarlyStopping

91
Callbacks
• reduce_lr =
ReduceLROnPlateau(monitor='val_accuracy',
factor=0.1, patience=5, min_lr=1e-8)

• filepath="model-{epoch:02d}-{val_accuracy:.2f}.hdf5"
• checkpoint = ModelCheckpoint(filepath,
monitor='val_accuracy', verbose=1,
save_best_only=True, mode=‘max')

92
Training
• model.fit_generator(train_it,
• steps_per_epoch=83,
• callbacks = [reduce_lr, checkpoint],
• validation_data=val_it,
• validation_steps=363,
• epochs=5)

93
Training
• Epoch 1/5
• 1/83 [..............................] - ETA: 11:42 - loss: 9.0094 - accuracy: 0.2500
• 2/83 [..............................] - ETA: 5:50 - loss: 8.9650 - accuracy: 0.2812
• 3/83 [>.............................] - ETA: 5:43 - loss: 8.8843 - accuracy: 0.2917
• 4/83 [>.............................] - ETA: 5:28 - loss: 9.3953 - accuracy: 0.2656
• 5/83 [>.............................] - ETA: 5:29 - loss: 8.7584 - accuracy: 0.3125
• …
• 80/83 [===========================>..] - ETA: 12s - loss: 7.5598 - accuracy: 0.3901
• 81/83 [============================>.] - ETA: 8s - loss: 7.5693 - accuracy: 0.3899
• 82/83 [============================>.] - ETA: 4s - loss: 7.5603 - accuracy: 0.3897
• 83/83 [==============================] - 436s 5s/step - loss: 7.5605 - accuracy: 0.3903 - val_loss: 0.8355 -
val_accuracy: 0.5179

94
Inference/Serving
• l_model = load_model(“models/model-12-0.71.hdf5")
• img = image.load_img('Dataset_2/Validation/
C4/8108cbbf-60ca-47d8-af13-2e3603a5c30e.jpg',
target_size=(224,224))
• img = np.expand_dims(img, axis=0)
• y_pred = l_model.predict(img)
• print(np.argmax(y_pred))

95
Tuning — Model Structure
• Number of layers
• Unit count
• Variable initialization

96
Tuning — Hyperparameter
• Learning rate
• Momentum
• Penalty in logistic regression
• Loss in SGD

97
Running the application and
setting up for the tutorial

98
Go to Designsafe

99
Click to launch workbench

100
Login if you need

101
Use your designsafe/TACC account

102
This is the workbench

103
You will find the app in Data Processing

104
Click Jupiter to get form

105
106
107
Fill out email and duration

108
Click Run to submit

109
Job will appear in status

110
Job will go to Running

111
Go to your email and find email from
Designsafe

112
Copy password to clipboard and click link

113
Jupyter login will load

114
Enter password and Log in

115
All notebooks will be in your home

Do not delete this file until you are done

116
Hands-on of DL with
Natural Hazard Example

117
Hurricane Harvey Field Recon.
• Door to Door collection of perishable data on Wind-
induced Residential Building Damage in Texas.

• Event: Hurricane Harvey (2017)

• Published in DesignSafe
• David Roueche et al.
• DOI:10.17603/DS2DX22

118
119
Introducing Dataset
• Building Assessments focusing on the performance of
residential buildings.

120
Dataset
• Each row is a building and various columns describing it:

• Building Performance (0-4 Scale)


• Age, Roof Cover and Shape, Photos, etc.

121
Classification Problem
• Using Conventional ML algorithms classify the
performance of a building given its properties.

• Algorithms: Decision tree, Random Forest

• DesignSafe JupyterHub is used to implement the


classification script.

• Scikit-Learn module within a Python notebook is used.

122
Problem Variables
• Y (Dependent Variable):
• Overall Building Condition (0 , 1 or 2)

• X (Input Feature Matrix):


• Age (Numerical Variable)
• Max Wind Speed (Numerical Variable)
• Number of Stories (Numerical Variable)
• Roof Shape (Categorical Variable)
• Roof Cover (Categorical Variable)
• Wall Cladding (Categorical Variable)
• Structural Framing System (Categorical Variable)

123
Exercise 0
• Open ML-DesignSafe-Tutorial.ipynb

• Apply Decision Tree, Random Forest classifiers to dataset


with 2 classes.

• Find out which model performs better.

• Find the two most important features to this problem.

• Train for 3 class problem and compare your results.

124
Exercise 1
• Open DesignSafe-NaturalHazard-Tutorial-Train-1st.ipynb
• Run through the cells
• Train for the 1st time
• Tasks:
1. Monitor val_accuracy change along epochs
2. Monitor val_accuracy vs. train_accuracy

125
Exercise 2
• Open DesignSafe-NaturalHazard-Tutorial-
Train-2nd.ipynb
• Run through the cells
• Train for the 2nd time
• Tasks:
1. Pay attention to the data augmentation code
2. Monitor val_accuracy vs. train_accuracy and check
if overfitting exists

126
Exercise 3
• Open DesignSafe-NaturalHazard-Tutorial-
Train-3rd.ipynb
• Run through the cells
• Train for the 3rd time
• Tasks:
1. Pay attention to label smoothing in the loss function
2. Pay attention to the learning rate reducer
3. Monitor val_accuracy change along epochs

127
Exercise 4
• Open DesignSafe-NaturalHazard-Tutorial-Infer.ipynb
• Run through the cells
• Visualize selected image then predict using the trained-
model
• Tasks:
1. See if predictions match labels
2. Randomly choose images and run predictions

128
Discussion
• What is limiting the model performance?

2 Category 3 Category 5 Category

val_acc 92% 72% 42%

• Model capacity
• Data
• Quality
• Imbalance among categories
• Others

129

You might also like