0% found this document useful (0 votes)
2 views

04_MLModelingBasics

The lecture covers the basics of machine learning and modeling, focusing on supervised learning, computational notebooks, and the use of Scikit-Learn for data analysis. Key topics include the structure of data, model validation techniques, and the concepts of overfitting and underfitting. The session also includes practical demonstrations using decision trees and discusses potential improvements through ensemble methods.

Uploaded by

qq1034351717
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

04_MLModelingBasics

The lecture covers the basics of machine learning and modeling, focusing on supervised learning, computational notebooks, and the use of Scikit-Learn for data analysis. Key topics include the structure of data, model validation techniques, and the concepts of overfitting and underfitting. The session also includes practical demonstrations using decision trees and discusses potential improvements through ensemble methods.

Uploaded by

qq1034351717
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Lecture 4: Machine

Learning and Modeling


Basics
Outline
▪ IPython and Notebooks

▪ Supervised machine learning with classification/regression

▪ Deep learning with neural networks

Thanks to Christian Kaestner for preparing the material for these slides
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Learning Goals
▪ Explain the benefits and drawbacks of notebooks
▪ Demonstrate effective use of computational notebooks
▪ Understand how machine learning learns models from labeled data (basic
model)
▪ Explain the steps of a typical machine learning pipeline and their
responsibilities and challenges
▪ Understand the role of hyper-parameters
▪ Appropriately use vocabulary for machine learning concepts
▪ Evaluate a machine-learned classifier

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Computational Notebooks
▪ Origins in "literal programming", interleaving text and code, treating programs
as literature (Knuth'84)
▪ First notebook in Wolfram Mathematica 1.0 in 1988
▪ Document with text and code cells, showing execution results under cells
▪ Code of cells is executed, per cell, in a kernel
▪ Many notebook implementations and supported languages, Python + Jupyter
currently most popular

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Ipython and Jupyter Notebooks
▪ Interactive Python – command shell
▪ Persistent kernel – remembers variables/objects

▪ Notebook – markup text and code, typically alternating


▪ Able to display documentation and graphics after code
blocks
▪ Downside – persistence
▪ Eg: Delete a variable and re-run the code block, that variable is
still stored and callable

▪ Often used for prototyping data science projects,


migration to production a current topic of research
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Google Collab
• Allows you to run Python code in your browser. Uses Jupyter
Notebooks.
• Comes preloaded with several popular data science libraries.
numpy, scikit learn, pytorch etc.
• Most features provided by iPython notebook. Some very cool ones!

• Viewing notebook is open to everyone, free to run, but requires


Google login
• Some demos will use Colab, others will use VMs

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Scikit Learn: Introduction
▪ Most popular machine learning toolkit, used by data scientists everyday
around the world.
▪ Created by David Cournapeau during a Google Summer of Code
▪ Brought together and implemented common ML algorithms from literature
• Provides tools for standard statistical machine learning.
• Actively under development, last major release May 2024 (minor release
July 2024)
• Easy modeling using standard ML algorithms such as classification,
regression, clustering and several more.
• Well documented API with examples

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Scikit Learn: Datasets
• Well, the first thing you need for machine learning is data. Scikit-
learn provides us some assistance with this.
• The library comes with some predefined toy datasets that you
can load, while larger, more comprehensive datasets are also
supported which you must fetch
• Often, data is represented as basic table in a two-dimensional grid
of data, in which the
• rows represent individual elements of the dataset
• columns represent quantities related to each of these elements.

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Features and Arrays
▪ Features matrix
▪ A matrix of data is also called the features matrix, with dimensions
[n_samples * n_features]
▪ By convention, this features matrix is often stored in a variable
named X, is two dimensional in shape and usually a Numpy array.
▪ The samples (i.e., rows) always refer to the individual objects
described by the dataset. In the Iris dataset, the sample is a flower.
▪ The features (i.e., columns) always refer to the distinct
observations that describe each sample in a quantitative manner.
Usually numbers or booleans.

▪ Target array
▪ In addition to the feature matrix X, we also generally work with a
label or target array, which by convention we will usually call y.
▪ The target array is usually one dimensional, with length
n_samples, and is generally contained in a NumPy array.
▪ The target array may have continuous numerical values, or
discrete classes/labels.

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Target array
▪ Essentially, the target array is that it is usually the quantity
we want to predict from the data: in statistical terms, it is
the dependent variable.
▪ For example, with the Iris dataset, we may wish to construct a
model that can predict the species of flower based on the other
measurements; in this case, the species column would be
considered the target array.

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Scikit Learn - Estimators
▪ Every machine learning algorithm in Scikit-Learn is implemented via the
Estimator API, which provides a consistent interface for a wide range of machine
learning applications.

▪ Most commonly, the steps in using the Scikit-Learn estimator API are as follows:
1. Choose a class of model by importing the appropriate estimator class from Scikit-Learn.
2. Choose model hyperparameters by instantiating this class with desired values.
3. Arrange data into a features matrix and target vector.
4. Fit the model to your data by calling the fit() method of the model instance.
5. Apply the Model to new data. For supervised learning, often we predict labels for unknown
data using the predict() method.

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Scikit Learn - Model Validation with Holdout sets
▪ Model validation is very simple: after choosing a model and its
hyperparameters, we can estimate how effective it is by applying it to some of
the training data and comparing the prediction to the known value.

▪ We want to evaluate the model on data it has not seen before, and so we will
split the data into a training set and a testing set.
▪ In some data sets there will be a second data set available. Other times we
will need to do the split ourselves. We hold back some subset of the data from
the training of the model, and then use this holdout set to check the model
performance.
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Scikit Learn - Model Validation via cross-validation

▪ One disadvantage of using holdout sets for model validation is that


we lose a portion of our data to the model training. To the side, half
the dataset does not contribute to the training of the model!
Particularly problematic for small datasets.

▪ One way to address this is to use cross-validation; that is, to do a


sequence of fits where each subset of the data is used both as a
training set and as a validation set.

▪ Cross-validation simply repeating the experiment multiple times,


using all the different parts of the training set as validation sets
and gives a more accurate indication of how well the model
generalizes to unseen data.
▪ Downside: Multiplies the evaluation time

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Define Machine Learning (simplified, applicable only to the next couple of
slides)

▪ learn a function (called model)

▪ by observing data
▪ Typically used when writing that function manually is hard because the
problem is hard or complex.
▪ Examples:
• Detecting cancer in an image
• Transcribing an audio file
• Detecting spam
• Predicting recidivism
• Detect suspicious activity in a credit card
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Supervised Machine Learning
▪ Given a training dataset containing instances

▪ which consists of features and a corresponding outcome label ,

▪ learn a function

▪ that "fits" the given training set and "generalizes" to other data.

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Example: House Price Analysis
▪ Given data about a house and its neighborhood, what is the likely sales price
for this house?

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Training Data for House Price Analysis
▪ Collect data from past sales

%Large
Crime Rate %Industrial Near River # Rooms ... Price
Lots
0.006 18 2.3 0 6 ... 240.000
0.027 0 7.0 0 6 ... 216.000
0.027 0 7.0 0 7 ... 347.000
0.032 0 2.1 0 6 ... 334.000
0.069 0 2.1 0 7 ... 362.000

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Common Datastructures: Dataframes
• Package: Pandas (pd)
• Tabular data, 2 dimensional
• Named columns
• Heterogeneous data: different types
per column
• Still need to be converted into numpy arrays
before ingestion by sci-kit learn

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Linear Regression

▪ In the one-dimensional case, it simply fits the function to best


explain all given data points (technically to minimize some error
between f(x) and y across all given (x, y) pairs, e.g. sum of squared
errors)

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Linear Regression
▪ Generalizable to more dimensions/parameters

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Does it Learn?
▪ Many different strategies to learn
function
▪ Anscombe’s quartet
▪ The average x value is 9 for each dataset
▪ The average y value is 7.50 for each dataset
▪ The variance for x is 11 and the variance for y
is 4.12
▪ The correlation between x and y is 0.816 for
each dataset
▪ Linear regression for each dataset follows the
equation y = 0.5x + 3
Parikh, 2014

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Terminology
• The decisions in a model are called model parameter of the model
(constants in the resulting function, weights, coefficients), their
values are usually learned from the data
• Degrees of freedom ~ number of model parameters
• The parameters to the learning algorithm that are not learned from
the data (and are typically user defined) are called model
hyperparameters
• EG: In today’s demo (decision tree)

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Overfitting/Underfitting
▪ Overfitting: Model learned exactly for the input data, but does not
generalize to unseen data (e.g., exact memorization)
▪ Underfitting: Model makes very general observations but poorly
fits to data (e.g., brightness in picture)
▪ Typically adjust degrees of freedom during model learning to
balance between overfitting and underfitting: can better learn the
training data with more freedom (more complex models); but with
too much freedom, will memorize details of the training data rather
than generalizing

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Overfitting/Underfitting

Geeksforgeeks.org

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Demonstration – Decision Trees
▪ Nested if-else statements

Towards AI

▪ Note: We are using decision trees as an example of a simple and


easy to understand learning algorithm. It is worth to understand at
least one learning approach in some detail, to get an intuitive sense
of the functioning and limitations of machine learning. Also this
example will illustrate the role of hyperparameters and how they
relate to overfitting/underfitting.
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Decision Tree Generation
▪ Attribute Selection Measures (ASM) – which attribute, at what value, splits the
dataset
▪ Information gain
▪ Entropy:
▪ Entropy of parent – (Sum of
entropy of children)

▪ Gini Index (gini impurity)

▪ Gain Ratio
: fraction of items labeled with class i in the set
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Example - Golfing 𝑓 ( 𝑂𝑢𝑡𝑙𝑜𝑜𝑘 ,𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ,𝑊𝑖𝑛𝑑𝑦 ) =𝑃𝑙𝑎𝑦 𝑔𝑜𝑙𝑓 𝑜𝑟 𝑛𝑜𝑡

Outlook Temperature Humidity Windy Play


overcast hot high false yes
overcast hot high false no
overcast hot high false yes
overcast cool normal true yes
overcast mild high true yes
overcast hot normal false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
rainy mild normal false yes
rainy mild high true no
sunny hot high false no
sunny hot high true no
sunny mild high false no
sunny cool normal false yes
sunny mild normal true yes

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
On Specifications
▪ No specification given for f(outlook, temperature, humidity, windy)
▪ Learning f from data!
▪ We do not expect perfect predictions; no possible model could always predict
all training data correctly:
Outlook Temperature Humidity Windy Play
overcast hot high false yes
overcast hot high false no
overcast hot high false yes
overcast cool normal true yes

▪ We are looking for models that generalize well


© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
In Class Demo
▪ Navigate to:
▪ https://colab.research.google.com/github/jGiltinan/SE4AI_DecisionTree/blob/master/golf.ipynb

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Possible Improvements
• Averaging across multiple trees (ensemble methods, including
Boosting and Random forests) to avoid overfitting
• building different trees on different subsets of the training data or basing
decisions on different subsets of features

• Different decision selection criteria and heuristics, Gini impurity,


information gain, statistical tests, etc.
• Better handling of numeric data
• Extensions for graphs

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Summary of Learning with Decision Trees
• Learning function fitting the data
• Generalizes to different decisions (categorical and numeric data)
and different outcomes (classification and regression)
• Customizable hyperparameter (here: max tree height, min
support, ...) to control learning process
• Many decisions influence qualities: accuracy/overfitting, learning
cost, model size, ...
• Resulting models easy to understand and interpret (up to a size),
mirroring human decision-making procedures
• Scales fairly well to large datasets
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Evaluating Models (Supervised Learning)
▪ Basic Approach
▪ Given labeled data, how well can the function predict the outcome
labels?
▪ Basic measure accuracy:

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Separate Training and Validation Data
▪ Always test for generalization on unseen validation data
▪ Accuracy on training data (or similar measure) used during learning
to find model parameters

▪ high probability of overfitting

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Detecting Overfitting
▪ Change hyperparameter to detect training accuracy
(blue)/validation accuracy (red) at different degrees of freedom

Degrees of freedom
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Demo
▪ Navigate to:
▪ https://colab.research.google.com/github/jGiltinan/SE4AI_DecisionTree/blob/master/golf_TrainTest
Split.ipynb

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Cross-validation
• Motivation
• Evaluate accuracy on different training and validation splits
• Evaluate with small amounts of validation data

• Method: Repeated partitioning of data into train and validation data, train and
evaluate model on each partition, average results
• Many split strategies, including
• leave-one-out: evaluate on each datapoint using all other data for training
• k-fold: k equal-sized partitions, evaluate on each training on others
• repeated random sub-sampling (Monte Carlo)

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Separate Training, Validation and Test Data
▪ Often a model is "tuned" manually or automatically on a validation
set (hyperparameter optimization)
▪ In this case, we can overfit on the validation set, separate test set is
needed for final evaluation

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Academic Escalation: Overfitting on Benchmarks
▪ If many researchers publish best results on the same benchmark, collectively
they perform "hyperparameter optimization" on the test set

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Summary
• Key concepts in machine learning: dataframes, model,
train/validation/test data
• A simple machine-learning algorithm: Decision trees
• Overfitting, underfitting, hyperparameter tuning
• Basic model accuracy measures and cross-validation
• Introduction to working with computational notebooks, typical
iterative workflow, benefits and limitations of notebooks

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
ML Modeling Basics: Neural Networks and Deep Learning
Learning Goals

• Give an overview of different AI problems and approaches


• Explain at high level how deep learning works
• Some neural network architecture techniques
• Introduction to explainability

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Neural Networks
▪ Artificial neural networks are
inspired by how biological
neural networks work
("groups of chemically
connected or functionally
associated neurons" with
synapses forming
connections)

From "Texture of the Nervous System


of Man and the Vertebrates" by
Santiago Ramón y Cajal
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Artificial Neural Networks
▪ Simulating biological neural networks of neurons (nodes) and
synapses (connections), popularized in 60s and 70s
▪ Basic building blocks: Artificial neurons, with n inputs and one
output; output is activated if at least m inputs are active

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Threshold Logic Unit / Perceptron
▪ computing weighted sum of inputs +
step function

▪ Univariate, single input (constant,


vector, matrix, etc)
▪ Heaviside step function:

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Threshold Logic Unit / Perceptron

▪ ( and are parameters of the model)


▪ TLU/Perceptron: Binary
classification depending on if
weighted sum > threshold

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Multiple Layers
▪ Layers are fully connected here, but layers may have different numbers of
neurons

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Learning Model Parameters (Backpropagation)
▪ Intuition:
• Initialize all weights with random values
• Compute prediction, remembering all intermediate activations
• If output is not expected output (measuring error with a loss function),
• compute how much each connection contributed to the error on output layer
• repeat computation on each lower layer
• tweak weights a little toward the correct output (gradient descent)

• Continue training until weights stabilize


▪ Works efficiently only for certain , typically logistic function: or rectified linear
unit (ReLU):
▪ Why? Smooth with monotonic derivatives.
▪ Logistic function:
▪ ReLU: = 0, 1
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Deep Learning
• More layers (technically, > 2)
• Layers with different numbers of neurons
• Different kinds of connections
• fully connected (feed forward)
• not fully connected (eg. convolutional networks)
• keeping state (eg. recurrent neural networks)
• skipping layers
• and other possibilities!

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
On Terminology
• Deep learning: neural networks with many internal layers
• DNN architecture: network structure, how many layers, what
connections, which (hyperparameters)
• Model parameters: weights associated with each input in each
neuron

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Example Scenario
▪ MNIST Fashion dataset of 70k 28x28 grayscale pixel images, 10
output classes

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Example Scenario
• MNIST Fashion dataset of 70k 28x28 grayscale pixel images, 10
output classes
• 28x28 = 784 inputs in input layers (each 0..255)
• Example model with 3 layers, 300, 100, and 10 neurons
▪ How many parameters does this model have?

▪ parameters, assuming 32-bit float values, > 1 MB just parameters

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Network Size
• 50 Layer ResNet network -- classifying 224x224 images into 1000
categories
• 26 million weights, computes 16 million activations during inference, 168 MB
to store weights as floats

• OpenAI’s GPT-2 (2019) -- text generation


• 48 layers, 1.5 billion weights (~12 GB to store weights)
• released model reduced to 117 million weights
• trained on 7-8 GPUs for 1 month with 40GB of internet text from 8 million web
pages
• Demo (sometimes down)
• GPT-3, released June 2020, has 175 billion weights (~650 GB)

• Llama 3 (2024) – text generation, up to 405 billion weights (~ 1.5


© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
Tb) © 2024 University of Maryland
Convolutional neural network (Intuition)
▪ Connect local neurons

3x3x3 convolutional kernel acting on a 3 channel input. Source:


https://machinethink.net/images/vggnet-convolutional-neural-net
[email protected]

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Pooling Layer
▪ Reduce dimensionality
▪ Max or Average
▪ Can be 1, 2, or 3D

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Dropout Layer
▪ During training – randomly drop nodes
each iteration
▪ During testing/validation – include all
nodes
▪ (1-Probability of node dropout) is used as an
extra weighting factor

▪ Improves generalization and reduces


overfitting (regularization)

Srivastava et al., “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” 2014.

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Loss functions
▪ Regression:
▪ Mean Absolute Error:
▪ Mean Square Error:

▪ Classification:
▪ Categorical cross entropy:
▪ Sparse CCE – similar,
does not require one-hot encoding

https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Optimization Algorithms
▪ Learning rate: constant which affects step size
when updating the weights
▪ Gradient descent: straightforward, but
computational expensive
▪ Stochastic Gradient Descent: save computation mc.ai

by taking gradient at a single example or small


batch
▪ Adagrad – adaptive gradient algorithm, learning
rate is variable w.r.t. parameters
▪ Adam – adaptive moment estimation, variable
w.r.t parameters, uses history of gradients
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Reusing and Retraining Networks
• Incremental learning process enables continued training, retraining,
incremental updates
• A model that captures key abstractions may be good starting point
for adjustments (i.e., rather than starting with randomly initialized
parameters)
• Reused models may inherit bias from original model
• Lineage important. Model cards (introduced 2019) promoted for
documenting design rationale, training method, use, and evaluation,
e.g., Google Perspective Toxicity Model

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
Deep Learning Discussion
• Can approximate arbitrary functions
• Able to handle many input values (e.g., millions of pixels)
• Internal layers may automatically recognize higher-level structures
• Often used without explicit feature engineering
• Often huge number of parameters, expensive inference and training
• Often large training sets needed
• Too large and complex to understand what is learned, why, or how
decisions are made (compared to decision trees)
▪ Student experiences?
© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic
© 2024 University of Maryland
Tensorflow and Keras
▪ Tensorflow: developed by Google
▪ Symbolic math language to write differentiable programs – ML and NN are target use cases

▪ Keras: neural-network library, written to be an interface between developers


and backend
▪ Mostly written by François Chollet

▪ Also popular: Pytorch


▪ Torch library in Python, from Facebook
▪ 2024: Now much more popular than Tensorflow and Keras

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
LIME (Local Interpretable Model-agnostic Explanations) Results
▪ Principle: Have a trained model predict many points locally around a datapoint
you want to explain.
▪ Make a linear approximation
▪ Works better on edge cases

“Why Should I Trust You?” Explaining the Predictions of Any Classifier , Ribeiro et al. 2016
https://github.com/marcotcr/lime

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland
LIME Performance on Image Classifiers
▪ Also compatible with regression and NLP

▪ Virtual machine demonstration:


▪ Zipped file link
▪ Username: student
“Why Should I Trust You?” Explaining the Predictions of Any Classifier , Ribeiro et al. 2016
▪ Password: fraunhofer https://github.com/marcotcr/lime

© 2024 Fraunhofer USA, Inc. - Center Mid-Atlantic


© 2024 University of Maryland

You might also like