Extracted Text
Extracted Text
pptx ---
Artificial Intelligence
Introduction to Machine Learning
2024
Lecturer
Mindaugas Bernatavičius
Multiple regression
Introduction to Machine Learning
Polynomial regression
What if your data is more complex than a straight line? Surprisingly, you can use a
linear model to fit nonlinear data. A simple way to do this is to add powers of
each feature as new features, then train a linear model on this extended set of
features. This technique is called Polynomial Regression.
Scikit-Learn’s PolynomialFeatures() class to transform our training data, adding
the square (second-degree polynomial) of each feature in the training set as a new
feature.
PolynomialFeatures(n) constructor accepts the degree of the polynomial function -
for cubic functions you need to use 3 (or more) - anything below 3 would underfit /
have a lot of bias if the true function that explains the data is indeed cubic. N
is yet another hyperparameter.
The added polynomial features increase the size of the models internal state i.e.
makes it more powerful.
Naming confusion. You will sometimes hear people saying “polynomial linear
regression” - this is the correct full name as the model as it is linear with
respect to the weights that are learnt. In reality polynomial regression is
considered a special case of multiple linear regression, see:
https://stats.stackexchange.com/questions/92065/why-is-polynomial-regression-
considered-a-special-case-of-multiple-linear-regres and
https://en.wikipedia.org/wiki/Polynomial_regression
Problems being solved:
https://iq.opengenus.org/polynomial-regression-using-scikit-learn/
https://prutor.ai/implementation-of-polynomial-regression/
https://github.com/karthikeyanthanigai/Polynomial-regression...
https://www.kaggle.com/rahulkadam0909/polynomial
https://www.askpython.com/python/examples/polynomial-regression-in-python
https://codekarim.com/sites/default/files/ML0101EN-Reg-Polynomial-Regression-Co2-
py-v1.html
Exercise: take the forest fire dataset we have and apply polynomial regression. See
if a better model is produced. Does the polynomial model take longer to train
(longer to predict)?
Logistic regression
Logistic Regression (also called Logit Regression) is commonly used to estimate the
probability that an instance belongs to a particular class (what is the probability
that this email is spam). If the estimated probability is greater than 50%, then
the model predicts that the instance belongs to that class (called the positive
class, labeled “1”), and otherwise it predicts that it does not (i.e., it belongs
to the negative class, labeled “0”). This makes it a binary classifier (by
default).
One again, we need to understand 3 things for each “model”: error, learning
(training), predicting (inference).
Predicting:
Like Linear Regression model, Logistic Regression model computes a weighted sum of
input features (+ bias term)
Instead of outputting result directly like Linear Regression model, it outputs the
logistic of this result.
The logistic is the sigmoid function - so the sigmoid function of the prediction is
the output.
The results are interpreted as probabilities that the item belongs to class 1 or
class 0.
Error / cost:
Objective of training is to set parameter vector θ so that model estimates high
probabilities for positive instances (y = 1) and low probabilities for negative
instances (y = 0).
The cost function - log loss - is calculated over the whole training set is the
average cost over all training instances.
Usually maximize the log-likelihood function (LLF) (there are others like cross
entropy loss) and the process is called “maximum likelihood estimation”.
Learning / training:
Bad news is that there is no known closed-form equation to compute the value of θ
that minimizes this cost function (there is no Normal Equation that the OLS
algorithms solves for).
Good news is that this cost function is convex (question: what is a convex
function, can you give examples).
Gradient Descent (or any other optimization algorithm (liblinear, lbfgs)) is
guaranteed to find the global minimum (if the learning rate is not too large and
you wait long enough) since the cost function is convex.
We will talk about gradient descent in depth in future slides. For now let’s just
define it as an optimization algorithm for minimizing any arbitrary function by
calculating contributions to the total error by each model weight and adjusting the
weights to the direction that minimizes the error by some magnitude (number),
called learning rate.
Regularization also applies to logistic regression and can improve model
performance on unseen data by preventing overfitting.
Sometime confusion matrices can be very similar, with tradeoffs (ebola vs. finding
potential protestors against chinese gov. and/or “vatnik detector” (what features
would you use for that?)).
Other indicators of classification “accuracy”:
Ratio of the number of correct predictions (TN + TP) to the total number of
predictions (P + N) = TN + TP / P + N ⇒ ACC (accuracy)
Precision (Positive predictive value: PPV) - ratio of true positives to the sum of
true and false positives ⇒ TP / (TP + FP) ⇒ TP / All Possitives measured
Negative predictive value (NPV) - ratio of true negatives to all negatives ⇒ TN /
(TN + FN) ⇒ TN / All Negatives measured
Sensitivity (or recall or true positive rate) - ratio of number of true positives
to the number of actual positives ⇒ TP / (TP + FN)
Specificity (or true negative rate) is the ratio of the number of true negatives to
the number of actual negatives ⇒ TN / (TN + FP)
We choose the indicator that is most important for us based on the problem at hand
- for example in medicine, FN rate might need to be minimized even if the FP rate
increases in that case.
change of the logit 𝑓(𝑥), different values of the probabilities 𝑝(𝑥), a different
shape of the regression line, and possibly changes in other predicted outputs and
classification performance. High value of C tells the model to give high weight to
the training data, and a lower weight to the complexity penalty. A low value tells
the model to give more weight to this complexity penalty at the expense of fitting
to the training data. Basically, a high C means "trust this training data a lot",
while a low value says "This data may not be fully representative of the real world
data, so if it's telling you to make a parameter really large, don't listen to it".
Other important tunning parameters: solver, penalty (default=’l2’), tol -
https://stats.stackexchange.com/a/255380/162267
Exercise: can you improve the model w/o adjusting the C value? Try different
options.
See: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
See: https://towardsdatascience.com/dont-sweat-the-solver-stuff-aea7cddc3451
See: https://stackoverflow.com/questions/38640109/...
L1 is synonymous with Lasso regularization (sparse regression and classification)
L2 is synonymous with with Ridge.
Note: you would need to study each of these solvers to understand how each of them
learns the weights as these are precisely learning algorithms for logistic
regression. Example: Newton's Method is important because it's an iterative process
that can approximate solutions to an equation with incredible accuracy.
Same is true of gradient descent, but it is not a newtonian optimization method.
More on them: https://www.baeldung.com/cs/gradient-descent-vs-newtons-gradient-
descent
Outlier detecton
OneClassSVM() can be used to find abnormal data - outliers (unsupervised).
Ref: https://scikit-learn.org/stable/modules/outlier_detection.html
Computational Complexity
Inference: O(log2m), where m is the nodes count. So very scalable as decision trees
are general quite well balanced.
Training: algorithm compares all features (or less if max_features is set) on all
samples at each node, so O(n x m log2m). For small training sets (less than a few
thousand instances), Scikit-Learn can speed up training by presorting the data (set
presort=True). Doing that slows down training for larger training sets.
Introduction to Machine Learning
Decision Trees
Tunning - increasing min_* hyperparameters or reducing max_* hyperparameters will
regularize the model.
max_depth - easy to understand, does not allow splitting indefinitely -
don’t want overfit - restrict the tree, want more power - allow the tree to grow.
min_samples_split - minimum number of samples a node must have before it can
be split.
min_samples_leaf - min number of samples leaf node must have.
min_weight_fraction_leaf - same, expressed as fraction of the total number of
weighted instances.
max_leaf_nodes - maximum number of leaf nodes,
max_features - maximum number of features that are evaluated for
splitting at each node.
Remember: decision trees are highly stochastic - random - so if the random seed
parameter is not specified you will get different trees. It is also stochastic
w.r.t. to hyperparamters.
Regression
How does it work? Predicts value per tree node. Predicted value for each region is
always the average target value of the instances in that region. The CART algorithm
works mostly the same way as earlier, except that instead of trying to split the
training set in a way that minimizes impurity, it now tries to split the training
set in a way that minimizes the MSE. Also see: https://www.youtube.com/watch?
v=g9c66TUylZ4
Demo: California housing dataset.
Limitations:
Sensitive to rotations in the data since the decision boundaries are orthogonal.
Advisable to use PCA when a simple linearly separable problem does not separate due
to rotation.
Sensitive to small variations in the training data - remove single datapoint that
was responsible for a single split and you will have a different tree.
Greedy algorithm limitations.
Exporting the learned decision tree as efficient code (learned rule extraction):
https://mljar.com/blog/extract-rules-decision-tree/
https://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-
from-scikit-learn-decision-tree
https://github.com/papkov/DecisionTreeToCpp
Introduction to Machine Learning
K-Nearest Neighbors
K-nearest neighbors (k-NN or KNN) is a non-parametric (again, the number of
parameters learned (not hyperparameters) grows with the training data and is not
set or fixed in place), supervised, ML algorithm first developed 1951 and later
expanded on.
KNN belongs to a subcategory of nonparametric models - instance-based. Models based
on instance-based learning are characterized by memorizing the training dataset,
and lazy learning is a special case of instance-based learning that is associated
with zero cost during the learning process.
Analogy: can be remembered by the proverb “tell me who your friends are and I will
tell you who you are”.
Can be used for classification and regression tasks and in scikit it is inherently
multiclass and multilabel.
The output depends on whether k-NN is used for classification or regression:
In k-NN classification output is class membership by plurality vote of its
neighbors, with object being assigned to the class most common among its k nearest
neighbors. If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.
In k-NN regression - the average of the values of knn.
Once again: learning, error and inference:
learning: there is no explicit training / learning step or process.
error: no error function, just distance metric between neighbors and k - how many
get a vote.
inference: find k-nearest neighbors of the sample that we want to classify, by
distance. Assign the class label by majority vote or value by averaging in
regression tasks.
See: https://www.youtube.com/watch?v=HVXime0nQeI
Demo: simple KNN model with scikit.
Introduction to Machine Learning
K-Nearest Neighbors
Advantages:
No training step or process.
Simple to understand and tune
Easily implemented from scratch: https://www.youtube.com/watch?v=ngLyX54e1LU
Main advantage of memory-based/instance-based approach - immediately adapts as we
collect new training data.
No inertia in the training process because there is no training process (no
partial_fit(), warm_start, etc, considerations).
Disadvantages:
If features represent different physical units / come in different scales
normalizing training data can improve its accuracy dramatically.
Computational complexity for classifying new samples grows linearly with the number
of samples (assuming constant feature count, in general O(nm)). But the algorithm
can be implemented using efficient data structures such as KD-trees.
Space complexity: can't discard training samples since no training step - storage
space can become a challenge for large datasets. Other models that have internal
state that represent the data they have been trained on have a compressed version
of knowledge about that data. KNN does not have an internal state so the full
dataset needs to be always available. Other models only require a new X sample to
classify X, but KNN requires the entire dataset on each prediction.
Sensitive to useless features. If you are using a KNN based pipeline, it would be
useful to implement feature importance check step or at least combinatorially try
to pass different feature sets to KNN and see what works (you can use decision
trees, random forests for the purpose if not specific techniques - you will only
need to use them 1 time to deptermine which features are most important, after that
you would only need the KNN model).
Suggestion for mini-research: all of these statement can be experimentally
verified.
Suggestion for mini-research: feature-importance cross reference. Can you prove or
disprove that all models will see the same features as important or not?
Introduction to Machine Learning
K-Nearest Neighbors
Tunning
KNN is sensitive to the local structure of the data. Useful technique: assign
weights to the contributions of the neighbors, so that the nearer neighbors
contribute more to the average than the more distant ones. For example, a common
weighting scheme consists in giving each neighbor a weight of 1/d (d is distance to
the neighbor). Implemented in scikit with weights : {'uniform', 'distance',
callable’}. And If the neighbors have similar distances, the algorithm will choose
the class label that comes first in the training dataset.
How to choose K? Start with small number, then try and see. The range is usually
between 3 and 7, but can grow into 10’s or more.
How to choose distance metric? Euclidian - common choice, Minkowskian - same as
Euclidian when p=2, Manhatan when p=1. There are 2 well known distance metrics,
cosine and Hamming in addition to the previous ones (and some less known ones, like
Mahalanobis). See: https://www.analyticsvidhya.com/blog/2020/02/4-types-of-
distance-metrics-in-machine-learning/
How to choose the algorithm?
Brute Force may be the most accurate method due to the consideration of all data
points.
For small data sets, Brute Force is justifiable, however, for increasing data the
KD / Ball Tree is better alternatives due to their speed and efficiency, see more:
https://medium.com/swlh/k-nearest-neighbor-ca2593d7a3c4 a nice explanation of kd-
tree: https://www.youtube.com/watch?v=Glp7THUpGow .
Note that you can use “auto” that will attempt to select the appropriate parameter
- this is described here:
https://scikit-learn.org/stable/modules/neighbors.html#choice-of-nearest-neighbors-
algorithm . This is a decision based on dimensions, not on dataset size (D < 15).
2 Level
1 Chapter
Today you will learn
Linear Algebra with Numpy
01
02
Linear Algebra Roadmap
Python Crash Course
00
Linear Algebra Intro
03
Linear Algebra in Computer Graphics
04
Linear Algebra in ML / DL
05
06
07
Linear Algebra in SE / CS
//
//
Linear Algebra Intro
Linear algebra touches on many fields in modern science and engineering and is used
in may disciplines and human endevours.
There is often a misconception about Linear Algebra as being about matrices,
vectors or eigenvectors and calculations on them, however the broader picture of
how these can be used is much more interesting and important.
Definition: L.A. is a branch of mathematics concerning linear equations and linear
maps, their representations and linear transformations of them. Ref:
https://en.wikipedia.org/wiki/Linear_algebra
Linear algebra can be understood from two perspectives: algebraic and geometric.
Ref: https://www.youtube.com/watch?v=kjBOesZCoqc
Ref: https://www.youtube.com/watch?v=ZKUqtErZCiU
… we also can remember the difference between conceptual math and
numeric/computational math.
Recommended courses:
https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
https://www.youtube.com/playlist?list=PLHXZ9OQGMqxfUl0tcqPNTJsb7R6BqSLo6
https://www.youtube.com/playlist?list=PLybg94GvOJ9En46TNCXL2n6SiqRc_iMB8
Python Crash Course
Linear Algebra Intro
Usage in general. In his classical book on the topic titled “Introduction to Linear
Algebra“, Gilbert Strang provides a chapter dedicated to the applications of linear
algebra. In it, he demonstrates specific mathematical tools rooted in linear
algebra. Briefly they are:
Matrices in Mechnical Engineering, such as a line of springs.
Graphs and Networks - analyzing networks with adjacency matrices (computer science)
Markov Matrices, Population, and Economics, such as population growth.
Linear Programming, the simplex optimization method.
Fourier Series: Linear Algebra for functions, used widely in signal processing.
Linear Algebra for statistics and probability, such as least squares (OLS) for
regression.
Computer Graphics, such as the various translations, rescalings and rotations of
images.
Physics: Einsteins Relativity with Tensors.
Electronics and circuits.
Civil Engineer (?), Epidemiology (?)
Usage in datascience. Ref: https://www.youtube.com/watch?v=X0HXnHKPXSo
Code vectorization
Image recognition / image filters (edge detection, bluring)
Dimensionality reduction
Graph theory
Linear algebra is at the heart of google “50billion-dollars-just-sitting-not-
invested-in-their-bank-account” business: https://www.youtube.com/watch?
v=qxEkY8OScYY
Python Crash Course
Linear Algebra with Numpy
This is all great of course, but how can numpy help? Well numpy is very efficient
when performing all the operations required in these applications, so if you work
with this numpy can be used. But what if you do not have a job an where to apply
these skills? Well numpy can be used efficiently with any tutorial online to verify
their results, extend the examples shown and learn linear algebra in a more
entertaining way - not just writing down matrices on pen and paper.
Python Crash Course
Linear Algebra with Numpy
Numpy provides powerful methods for manipulating multidimensional arrays which is
exactly what is needed for L.A.
LA applications with numpy:
https://colab.research.google.com/drive/1k9OVrpvNnxt9JDQBdfZzDlk12QohbAQM?
usp=sharing
Specific functions from LA package:
https://github.com/derekbanas/NumPy-Tutorial/blob/master/NumPy%20Tut.ipynb
We will implement most of the practical demonstrations of matrices with numpy from
this video: https://www.youtube.com/watch?v=rowWM-MijXU
Additional resources:
https://www.youtube.com/watch?v=kZwSqZuBMGg
https://www.youtube.com/watch?v=tVQZvJwi-ec
Python Crash Course
Linear Algebra Roadmap (TBD)
Research on all the practical usages of LA. How is it used in Google’s Page Rank
algorithm? How is it used in computer networking? Biology? Computer graphics? Prime
/ trick your brain to understand that this is important. You don’t have to learn
any subject the way you learned it at school.
Matrices as transformations
Dot products as repeated transformations
Matrix multiplication
Determinants …
Eigenvectors and Eigenvalues and how they are used in dimensionality reduction.
Eigenvector of a matrix is a vector that is only scaled by the transformation
represented by that matrix.
Eigenvalue the scaling factor by which the transformation scales the eigenvector (2
would be 2x).
…
Python Crash Course
Linear Algebra in Computer Graphics
Ref: https://www.youtube.com/watch?v=vQ60rFwh2ig
DEMO: A few simple object rotations implemented using Pygame.
Additional research: recreate the same demo with FPS display - compare FPS when
numpy is used and when it is not used. When single matrix multiplication is used
and when it is not used.
Python Crash Course
Linear Algebra in ML / DL
PCA / Dimensionality reduction
Synaptic weight calculations / Forward propagation algorithms.
Backprop
TBD
Python Crash Course
Linear Algebra in SE / CS
Googles Pagerank Algorithm
Python Crash Course
Summary
Linear Algebra basis is vectors [1, 2, … , n]. Physics, maths and CS definitions.
Vectorization of features is the basis of ML/DL (most of ML and ~all DL models
accept numbers).
Vector addition: physical intuition - two forces acting on an object together: [a1
+ b1 , a2 + b2]
Matrices - transformations (of space of vectors).
[ 1 5 ] [5]
[ 5 6 ] [3]
Matrix multiplication - is a combination of transformations. Translation @ rotation
→ M
[ 7 5 ] | [ 1 5 ] [5]
[ 5 6 ] | [ 5 6 ] [3]
Dot product interpretation - similarity metric of two vectors. Input two vectors
and you get a scalar indicating how closely two vectors align, in terms of the
directions they point.
Dot product is not the same as matrix multiplication, but there is a special case:
https://math.stackexchange.com/a/4189120/438982
Determinant: when you apply a matrix to a space by how much is the area of the
space scaled.
Eigenvector is the vector that is only scaled by a matrix (application: PCA).
Prerequisites: Determinant, Matrices as transformations, Vectors, … .
Eigenvalue is the scaling factor of the the matrix.
Python Crash Course
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
AutoML tools
01
02
Auto-Sklearn
AutoML
00
What is AutoML
Google Automl Tables
03
AutoML (Automated Machine Learning) is the process of automating the end-to-end
process of applying machine learning to real-world problems. Because we think about
ML (and all of its branches) as automation - this is metaautomation or automation
of automation. AutoML Involves automating tasks such as: data pre-processing,
feature selection/engineering, model selection, hyperparameter tuning, model
evaluation, causal discovery. Great presentation: https://www.youtube.com/watch?
v=SEwxvjfxxmE
AutoML tools typically work by searching for the best combination of data pre-
processing techniques, feature selection methods, algorithms, and hyperparameters
for a given machine learning problem. The search can be performed using techniques
such as grid search, random search, Bayesian optimization, and genetic algorithms.
The objective is to find a model that performs well on a validation dataset while
minimizing the risk of overfitting.
Areas of research:
NAS - Automated Neural Architecture Search (notably Auto-Pytorch, AutoKeras).
HPO - Automated Hyperparameter Optimization
Meta Learning - learning to learn, the science of systematically observing how
different machine learning approaches perform on a wide range of learning tasks,
and then learning from this experience, or meta-data, to learn new tasks much
faster than otherwise possible.
Multi-Objective DL - multiple objectives like accuracy and performance to be
determined automatically.
What is AutoML
AutoML
//
AutoML tools
AutoML
//
AutoML tools
AutoML
Ref: https://www.automl.org/automl/auto-sklearn/
Ref: https://arxiv.org/abs/2007.04074#
Ref: https://neptune.ai/blog/a-quickstart-guide-to-auto-sklearn-automl-for-machine-
learning-practitioners
Classifiers:
https://github.com/automl/auto-sklearn/tree/master/autosklearn/pipeline/
components/classification
Regressors:
https://github.com/automl/auto-sklearn/tree/master/autosklearn/pipeline/
components/regression
Data preprocessors:
https://github.com/automl/auto-sklearn/tree/master/autosklearn/pipeline/
components/data_preprocessing
Feature preprocessors:
https://github.com/automl/auto-sklearn/tree/master/autosklearn/pipeline/
components/feature_preprocessing
When learning you can try and find better model-hyperparameter combinations than
AutoML library. This is not that easy. How much restriction on time you have impose
to AutoML metamodel to still beat it easily?
Auto-Sklearn
AutoML
import autosklearn.classification
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y,
random_state=1)
model =
autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=60)
model.fit(X_train, y_train)
AutoML
Detailed course plan
Slides, tasks and so on
Additional information
2 Level
1 Chapter
Today you will learn
Numpy basics
01
02
Exercises and Project Work
Python Crash Course
00
Numpy definition
Numpy definition
NumPy is a scientific computing library, containing many mathematical, array and
string functions that are very useful and much faster compared to Python.
Along with all the basic math functions you'll also find them for Linear Algebra,
Statistics, Random Number Generation (distribusions), Fourier Transform, etc.
It is centered around single and multi dimensional array objects.
Used by numerous other Python Data Science libraries sitting on top of Numpy
(Scipy, Pandas). Some even consider numpy to be foundational knowledge for DS in
Python ecosystem and consider numpy to be THE library that pushed Python into the
prominence of the scientific and data communities (it has a lot to do with
performance since both these fields use big data).
Ref: https://numpy.org/ and https://github.com/numpy/numpy
Let’s see basic operations before diving into performance and advanced cases.
Python Crash Course
Numpy basics
Installing numpy is simple, just do it with pip (available on collab already)
Importing numpy uses the convention import numpy as np (don’t be evil: import
tensorflow as np :D )
Version np.__version__: 1.19.5
Numpy at it’s core provides a fast and powerful multidimensional array object
allong with some powerful functions around it! It has a shape - see images above.
When to use numpy: in general when we need to process large arrays fast (this
usecase includes even webapps, scripts, etc.) or when need some functionality that
numpy provides like: linear algebra, stats functions, random distributions, etc.
(this is mainly for data science projects).
Creating arrays:
Python Crash Course
Numpy basics
Creating arrays continued.
Python Crash Course
Numpy basics
Printing arrays: notice that that the amount of brackets on the sides indicate the
dimensions: [[[ -- 3 brackes - 3 dimensions
Additionally summarization / truncation can be switched off using print options,
see: https://stackoverflow.com/questions/2891790/how-to-pretty-print-a-numpy-array-
without-scientific-notation-and-with-given-pre
Python Crash Course
Numpy basics
Array operations: element wise addition, division. Element wise multiplication and
dot product.
See dot product visualization: https://gfycat.com/ajarselfassuredgoldfish-matrix-
multiplication-literature-subject
Python Crash Course
Numpy basics
Additional operations
Python Crash Course
Numpy basics
Universal functions: functions for performing mathematical operations - function
that operates on ndarrays in an element-by-element fashion, supporting array
broadcasting and other standard features, list:
https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs
Python Crash Course
Numpy basics
//
Python Crash Course
Numpy basics
Indexing and slicing - access elements efficiently.
Common way to reverse the array by omitting the start, end and then stepping back:
a[::-1]
<U7 - little endian unicode dtype:
https://numpy.org/doc/stable/reference/arrays.dtypes.html
Python Crash Course
Indexing and slicing continued.
Numpy basics
Python Crash Course
Numpy basics
Python Crash Course
Iterating over the array.
The .flatten() function performs row based flattening (row by row) by default.
Numpy basics
Python Crash Course
Iterating over the array continued.
nditer() is different than .flatten() - .nditer() is just for iteration,
while .flatten() returns a flattened array.
Array will be writable if special flags are passed
Numpy basics
Python Crash Course
Reshaping the array.
Ravel is similar to flatten, but does not create a copy when it is not needed.
Specifying -1 when reshaping means that you do not know how big will that dimension
be, it’s called: “one unknown dimension”
Numpy basics
Python Crash Course
Splitting the array
np.split(x, [4, 7]) → split the array at index 4 and at index 7
Numpy basics
Python Crash Course
Image manipulation
Images are N-dimensional arrays: RGB (255, 0, 0), 3-chanell images, 3D array.
Grayscale (0.0 - 1.0) - 1 chanell, 2D array.
Interms of np.shape (X, Y, 1) - grayscale, (X, Y, 3) - RGB.
Numpy basics
Python Crash Course
Array views - shallow copy. Will point to the same underlying data so any
modifications on the copy will be reflected.
We can change the data in the original though the view.
Reshaping a view does not reshape the original. We can have multiple shapes.
Calling reshape() either way returns a new view, assigning .shape property
Numpy basics
Python Crash Course
Deep copies. Not efficient to make them, but when you need them you need them.
Exercises and Project Work
Python Crash Course
Please complete:
This quiz: https://www.w3schools.com/python/numpy/numpy_quiz.asp
These exercises: https://www.w3schools.com/python/numpy/numpy_exercises.asp (can be
done part by part)
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
Advanced Topics
01
02
//
Scikit Tips and Advanced Topics
00
Tips
//
03
04
//
05
06
07
//
//
08
//
//
09
//
Know your Data (toy, real and generated):
https://scikit-learn.org/stable/datasets.html
Know your GridSearch and friends (AutoML):
https://scikit-learn.org/stable/modules/classes.html#hyper-parameter-optimizers
Know your Estimator API, know which ones is most appropriate for situations:
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Know your DummyModels (strategy == uniform) & random guesser creation:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.dummy
Know your Scallers - to scale the data automatically:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing
Know your Pipeline - simple decorator to wrap data transformations and model
training steps into one single variable
Know your Incremental (“datreniruojami modeliai”) learning (partial fit):
https://stackoverflow.com/questions/49841324/what-does-calling-fit-multiple-times-
on-the-same-model-do
Know your Model Persistence:
https://scikit-learn.org/stable/modules/model_persistence.html
Tips
Scikit Tips and Advanced Topics
Estimators:
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Tips
Scikit Tips and Advanced Topics
Unsupervised Neural Networks? RBM for feature extraction
This can be considered an advanced topic in ML or DL, but the nice thing is we can
implement examples with scikit and learn about it with this simple library then
progress to more advanced implementations (like reimplementing it with Keras).
Explanation and terminology:
https://scikit-learn.org/stable/modules/neural_networks_unsupervised.html
Tutorial with scikit:
https://scikit-learn.org/stable/auto_examples/neural_networks/plot_rbm_logistic_cla
ssification.html
GPU support:
Experimental array API: https://scikit-learn.org/stable/modules/array_api.html
Hummingbird: https://github.com/microsoft/hummingbird and small tutorial:
https://www.youtube.com/watch?v=GbC1BujV-J4
Alternatives like the praised NVIDIA RAPIDS cuML lib:
https://docs.rapids.ai/api/cuml/stable/estimator_intro.html#Linear-regression-and-
R^2-score … they also present themselves as a window to distributed GPU based ML
and analytics: https://developer.nvidia.com/blog/scikit-learn-tutorial-beginners-
guide-to-gpu-accelerated-ml-pipelines/
Advanced Topics
Scikit Tips and Advanced Topics
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
Organizing larger programs
Distributing packages
01
02
03
Package plugins (optional)
Python Crash Course
00
Modularity
06
Git for version control
07
Module management with pip and venv
08
Version management with pyenv
09
Project management with pipenv (optional)
04
Standard library modules
10
Project management with poetry (optional)
05
Logging and config files
11
Distributable applications (optional)
Modularity
Creating larger programs with Python requires understanding of how modules work.
A broad discussion about modularity is a prerequisite for a discussion of project
architecture and organization.
Functions, modules (python) and packages (python) are all constructs in Python that
promote code modularization (and reusability/dry-ness):
The most fine grained python modularity level in python is a function. As a peace
of reusable code it replaces repeating logic in our python script by extracting
that logic, putting it into one place with a name and then giving the ability to
reuse it.
Functions can be grouped into classes (optional).
A group of functions (and variables, constants, classes) defined in a file with
a .py extensions are called a module. Modules allow us to separate project
functionality into parts that are responsible for one part of the application.
(“Panašus prie panašaus”)
A group of modules is called a package and is usually associated with a directory
containing >= 1 modules.
Libraries and products/projects are the highest level we care about (threads,
processes, microservices, applications … dynamic modularity - modularity during the
execution of said application. We are talking about modularity in terms of code
(static)).
Since we already covered functions quite extensively, we will jump into modules
next. The important thing is that you should think about the new mechanisms we are
about to learn as extensions of previous mechanisms - generalization.
Python Crash Course
Modularity
We can sa that there are 3 types of modules in python:
written in Python itself (files that we create with .py)
written in C and loaded dynamically at run-time (re, numpy)
built-in module in the interpreter, (itertools module, time, sys, os, time)
A module can be imported and used by calling the functions inside - as a library
using the import statement. Or it can be run.
Import does not make all the names automatically available to the calling modules
symbol table - still need to call them explicitly. It only adds the imported module
name to the calling module’s symbol table. You can check the symbol table with
empty call to dir()
Module code is executed on import, but only once - on first import. All top level
statements are run (def is also a statement so it is run, remember why we can’t use
mutable data as default parameters? See: https://florimond.dev/en/posts/20... ).
To be able to recognize if the module needs to be imported or executed we use the
__name__ check. When module is imported the name check will allow not to execute
code inside the module, just make the functions / classes available . But when we
launch it directly i.e.: python blah.py then it will executed. It is recommended to
make all your python modules (so all .py files) importable, since testing is easier
that way.
Ref: https://www.w3schools.com/python/python_modules.asp ;
https://docs.python.org/3/tutorial/modules.html
Demo:
Creating a module
Importing via repl
Recognizing if we want to call or import the module with the dunder __name__
Importing into the app (another file)
Aliases (from functions import median as med, average as avg)
Go to definition (win: ctrl + click or ctrl + b) and back (win: ctrl + alt + ←)
dir(), locals(), globals()
Python Crash Course
Modularity
What happens when we import modules? The interpreter searchers for <module_name>.py
file in a list of directories:
The directory from which the input script was run or the current directory if the
interpreter is being run interactively (REPL)
The list of directories contained in the PYTHONPATH environment variable, if it is
set. (The format for PYTHONPATH is OS-dependent but should mimic the PATH
environment variable.)
An installation-dependent list of directories configured at the time Python is
installed (sys.path)
This implies that if you have a module import error you need to inspect the sys
path and probably append it.
You can always check where the modules resides: using the module_name.__file__
dunder field.
You can use various ways to import: from <module_name> import <name> as
<alt_name>[, <name> as <alt_name> …] see: https://realpython.com/python-modules-
packages/
You can also import inside a function / confitionally - then the module will be
imported only when the function is called. It is NOT reimported every time you call
the function. This can sometimes be beneficial for small performance gains, when
the module is used rarely but usually is not needed and most style guides recommend
importing modules at the top.
Python Crash Course
Modularity
The difference between script, module and application?
Python is sometimes called a scripting language, but that is a contextual term. You
can write small scripts that automate tasks with python. However python is much
more than batch or bash scripts are in windows or linux, so it should not be
considered as “only a scripting language”. The presence of complex libraries,
frameworks and entire ecosystems of tools prove that.
Since it is recommended to make every python file importable, your scripts should
be also - that means script ~~ file.
Python Crash Course
Modularity
Packages are directories that hold one or more related modules - it’s a special
kind of module that can hold other modules.
Packages allow for a hierarchical structuring of the module namespace using dot
notation. In the same way that modules help avoid collisions between global
variable names, packages help avoid collisions between module names.
We have 2 types of packages, ref: https://stackoverflow.com/questions/37139786/is-
init-py-not-required-for-packages-in-python-3-3
Regular packages → contain __init__.py file, you should prefer these. It contains
package initialization code, often empty.
Namespace packages → do not contain __init__.py file, these have a specific
usecase, serve for package nesting
Python Crash Course
Modularity
__init__.py is useful for library developers to hide internal structure of the
library and make import easier (example: requests lib imports get, post and other
methods in the __init__ file). Additionally __all__ = [ … ]
Sibling imports / Subpackages
use ..
example: from ..cacl.functions import print_hi
If you will try to import / launch files directly from non-project-root then you
will need to use os.getcwd(), sys.path.append() and knowledge of file system paths.
Python Crash Course
# src.__init__.py
from src.cacl.functions import *
from src.math.my_math import *
# main.py
from src import print_hi
from src import print_hi_with_exclamation
Modularity
When writing modular code we need to document it well.
We use docstrings inside the function for documentation.
This allows use to read the documentation imported function using help()
Docstring conventions: https://www.python.org/dev/peps/pep-0257/
Sphinx uses it w/ some additional syntax (used by scrapy, bs4, etc.).
Also noticeable: https://about.readthedocs.com/
To be able to launch a file w/o constantly writing python <filename.py> we can use
shebang in *nix (read: “unix-like”) type systems.
And from Python 3.3 it should also work with windows.
Python Crash Course
Modularity
Python Crash Course
Modularity
Python Crash Course
Modularity
Executable diretory
Putting a __main__.py file at the root of the directory makes it an executable
directory
Python will execute it if launched with python dirname , and when launching the
directory is put at the beginning of the sys.path
Executable zip
We can zip executable directories and distribute them
Python knows how to launch them (conceptually similar to java jar)
Executable package
if you put a __main__.py inside a package python will execute it if launched with
python -m package_name
example (note, this is not an executable package, only a well know example of
calling a python module directly):
$ echo "{"\"a\"":{},"\"b"\":{}}" | python -m json.tool
import sys
# importing project root will allow all directories to be searched
# ... even if you try to launch the tests from tests directory: python .\test_x.py
# ... or from the root directory: python .\tests\test_x.py
# ... or from anywhere: python .\tests\abc\def\test_x.py
sys.path.append(r"C:\Users\Mindaugas\Desktop\Projects\CAAI\Proj")
# sys.path.append(r"../")
# sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__),
r'..\')))
# sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__),
"..\\..\\")))#
from src.some.string_returner import return_string
print(return_string())
Package plugins (optional)
This is quite a fuzzy topic in many programming language ecosystems - how do you
develop a system of independent modules that can be added separately to an
application after it already has been developed for weeks, months or years? How do
you need to modify your application to make this happen?
Often when creating a Python application or library you’ll want the ability to
provide customizations or extra features via plugins.
Because Python packages can be separately distributed, your application / library
may want to automatically discover all plugins available.
Define extension points that the plugins can implement.
Everything in this topic revolves around importlib package.
The package will use discovery techniques to load the packages at runtime.
namespace packages and pkgutil mechanism → core package treats subpackage as
extension point, commonly called plugin. The core package scans the subpackage at
runtime to see which plugins have been configured. The plugins can exists in
different directories on the path, but the folder structure of the plugin must
match the folder structure of the core package.
setuptools entry points mechanism → a setup.py file is defined where a set
of extension points are defined. At runtime the core package iterates over the
extension points can calls the extensions when needed.
3rd party, library based or custom mechanisms → the most flexible, but non-
standardized.
Important note: there is a PEP 518 standard (aka: pyproject.toml) that replaced
setuptools packaging mechanism and it became the de facto packaging method. In the
future it is recommended to take a look into that.
There are more ways to implement extendability in your python project. You can
think yourself, how would you create a plugin system for the users of your
framework / app.
https://packaging.python.org/gui des/creating-and-discovering-plugins/
https://stackoverflow.com/questions/932069/building-a-minimal-plugin-architecture-
in-python
https://www.computerandnet.com/2021/09/collection-of-python-plugin-frameworks.html
(existing plugin frameworks in python)
https://github.com/mitsuhiko/pluginbase
https://www.youtube.com/watch?v=cbot48lckOs
Python Crash Course
Distributing packages
Steps
Ensure unique package name - one that does not exist in pypi.
Create a new project in the IDE of your choice.
Create the following project structure:
<lib_name>/
setup.cfg
pyproject.toml
<... other files like readme …>
<lib_name>/
__init__.py
<code_file>.py
[options]
packages = find:
python_requires = >=3.7
include_package_data = True
[build-system]
requires = [ "setuptools>=54", "wheel" ]
build-backend = "setuptools.build_meta"
Distributing packages
Steps (.cont):
Run pip list to see if the package was installed like any other package
To uninstall package from local file system: pip uninstall <package name>
Test the package (you can delete the .egg-info and build dirs)
Run pip install build and python -m build
If the build succeeds (see screenshot) run: pip install twine
… and python -m twine upload --repository testpypi dist/*
… you will need credentials (2FA and API tokens need to be used)
Test it in another project by installing it:
.. pip install -i https://test.pypi.org/simple/ mindaugas-test-library==0.0.1
After confirming that the package works you can publish it to pypi:
… python -m twine upload --repository pypi dist/*
If installation fails - need to troubleshoot.
… for example if “__init__.py” filename is incorrect only dist-info
… will be generated. Updating existing lib is not possible
… even after deleting it in pypi (“names are forever” in pypi).
… You need to change build number. See :
https://stackoverflow.com/questions/21064581/how-to-overwrite-pypi-package-when-
doing-upload-from-command-line and
https://www.reddit.com/r/Python/comments/35xr2q/
howto_overwrite_package_when_reupload_to_pypi/
eprint("Hi")
Distributing packages
Take note, that the static files are sometimes need to be included with the
distributed library, so make sure that the locally installed package we just
created has them:
Python Crash Course
Distributing packages
Note, we did not install transitive dependencies automatically with the package we
created. But there are ways to do that. See:
https://stackoverflow.com/a/48314070/1964707
Python Crash Course
Distributing packages
How about distutils, setuptools (setup.py) which many modules (tinygrad and others)
are still using?
See this:
Important ones:
sys → doing work with python interpreter and runtime:
https://docs.python.org/3/library/sys.html, sys.argv, sys.path, sys.exit()
os → doing work with the operating system:
https://docs.python.org/3/library/os.html, os.environ, os.getcwd(), os.getpid()
io → work with input output: https://docs.python.org/3/library/io.html,
io.open(), readline()
time → work with time: https://docs.python.org/3/library/time.html ,
time.sleep(), time.time()
re → standard regex module: https://docs.python.org/3/library/re.html ,
re.match(), re.split()
pickle → the popular binary serialization tool
https://docs.python.org/3/library/pickle.html , pickle.dump(), pickle.load()
statistics → standard package for simple statistics
https://docs.python.org/3/library/statistics.html statistics.mode(), median(), etc.
logging → standard logging package … will see soon.
… math, random, datetime, argparse, csv, pprint, functools, decimal, marshal and so
on.
For large projects we need to know how to log errors and notices so that the
application operator / administrator would be able to effectively troubleshoot, fix
and / report issues (to maintainers). Print() are ephemeral (not saved),
contaminate the command line, limited in features.
In many projects we are logging to at least two files (or more): error.log and
<some-name>.log (statspagescrapper.log) files.
Python comes with the builtin modules for logging so there is no need to install
anything else for many applications (but: structlog module).
What is there to know about logging: levels, how much to log and standard-ish log
structure.
5 standard logging levels: DEBUG, INFO, WARNING (default), ERROR, CRITICAL, see:
https://www.youtube.com/watch?v=W1vOdzHCa-I there is an RFC for syslog standard:
https://datatracker.ietf.org/doc/html/rfc5424 .
How much should you log? Maximal amount of logging depends on the developer. Based
on how advanced the application. Start by logging all handleable exceptions and
possibly I/O calls (calling an external service, the database, getting console
args). The general rule - log all significant events (lifecycle events) while
providing the ability to turn on different logging levels for more extensive
logging.
Refs: https://realpython.com/python-logging/ and
https://docs.python.org/3/howto/logging.html
DEMO: logging, logging to file, logging attributes
(https://docs.python.org/3/library/logging.html#logrecord-attributes), dedicated
loggers.
Python Crash Course
Logging and config files
There is an approach to app development called “12 factor apps” that is gaining
popularity. They have a chapter about logging: https://12factor.net/logs
Python Crash Course
Logging and config files
There are also alternative ways of logging configuration when you delegate the
configuration to the app: https://docs.python-guide.org/writing/logging/
We will take a look at configuration options next.
The visual below is just to make the slide more memorable.
Python Crash Course
Logging and config files
db_server=localhost
db_port=3363
db_user=root
db_pass=gfdg78746565@$545
Git works by creating a separate database in our project .git - it’s a file based
database.
Git does not store diffs in the database, but we can think like that for the time
being to make it simpler (more: https://stackoverflow.com/questions/10398744/does-
git-store-diff-information-in-commit-objects ). The project directory will only
contain the files at any particular moment, but the database will remember all the
history from the first to the last commit and everything inbetween.
Advanced intro of git internals: https://www.youtube.com/watch?v=P6jD966jzlk
There are common workflows / work cycles with git and we will take a look at one
now.
Python Crash Course
Git for version control
There are many Git GUI tools, however I would recommend using the CLI, as once you
learn the CLI you will be able to choose any GUI tool you like. If you learn GUI-
first, then your knowledge might be a bit limited.
Workflow:
git init → if you are the initial founder of the project
git clone → if you join the existing project
git --version → check the version
git status → continuously runnable command. State of the repo.
git add <f/d/. > → stages the changes.
git rm --cached . → unstages the changes. (-r for multiple files)
git reset → remove change from staging, not file
git commit -m ‘’ → creates a point in time to return if needed
git update-ref -d HEAD → revert first commit (rarelly used)
git log → shows the history of the current branch
Exercise: create simple git repo locally with single commit. Send git log output.
Python Crash Course
Git for version control
Workflow (continued):
git branch -a → list branches
git branch <name> → create branch
git checkout <name> → switch to branch
created branch will have the same code as the parent branch initially
changes made in a branch belongs to that branch
changes need to be committed or stashed before switching branches
stash is like a temporary storage where you can put unfinished work
file committed to a branch belongs to that branch (new file doe not prevent
switching)
git stash → stash current uncommitted changes (to not lose
them!)
git stash -- filename.ext → git stash a single file w/o any message
git stash push -m "more" → add more to the stash
git stash list → list stashes
git stash pop 0 → apply the changes and drop the stash (if conflict
happens, drop will not be automatic)
git stash drop 0 → drop w/o applying
git stash apply 0 → apply the stash w/o dropping it
git stash show -p 1 → check what is inside the stash before applying it
git merge <branch> → merge the branch (have to be on branch into which
the merge is performed)
git branch -d <name> → delete the branch (-D force delete)
git branch -rd origin/<name> → delete branch reference locally if you deleted
branch first on github
git branch -m old new → raname branch from old to new
git push -d origin <name> → delete remote branch:
https://stackoverflow.com/a/2003515/1964707
git checkout <commit-hash> → get back to the past
git checkout <branch-name> → back to the future
git diff HEAD^ HEAD → diff current head with the previous commit
git diff sha1 sha2 → diff commits (use chronological order)
git diff main ticket1 → compare branches (before merging for example)
git show <commit-hash> → show extended details of the commit (including
changes made as a diff)
git revert <commit-hash> → undoes changes that are made by this specific
commit (rem / add / code /files )
Python Crash Course
Git for version control
Common gotchas:
Don’t commit passwords in tacked files. Create .example file that the users who use
your project will checkout and enter passwords in.
Use .gitignore to indicate which files are not to be tracked by git (like config
files with passwords).
Use .gitkeep to keep track of project directories that are kept empty (only to
indicate project structure) in git (/log, /tmp, /bin).
Do not create git repos inside other git repos - git repos are not nestable (we can
use git submodules for that, it’s an advanced topic).
Never commit your library directories e.g.: venv
Exercise: create github account and push code (from the previous exercise) to it.
Send POW.
… you need to create a token (classic) to push code to github (password auth not
supported when pushing for years)
… use personal access token: https://docs.github.com/en/authentication/keeping-
your-account-and-data-secure/managing-your-personal-access-tokens#using-a-personal-
access-token-on-the-command-line
Python Crash Course
Git for version control
Github (cont.):
Opensource workflow:
OSS project are hosted in git.
We need to fork them and create a PR (pull requrest) if we want to contribute.
After the PR it will be reviewed and either rejected or approved.
Python Crash Course
Git for version control
Question from students - git rebase:
With the rebase command, you can take all the changes that were committed on one
branch and replay them on a different branch
Rebase is used to make git history cleaner and more understandable.
There are advantages of using rebase, but there are disadvantages and cases when
you should not use rebase:
Do not use rebase on public branch that multiple people use - might cause problems
(do not rebase into master or from master)
Some teams do not permit a rebase at all - check when you come into a new company
Why we use rebase - squashing (history cleanup, smaller log and so on) and rebasing
from master (pulling w/o merge) when its safe. The last part should arguably be the
default: https://stackoverflow.com/a/4675513/1964707
Python Crash Course
Git for version control
Demo squash:
create main, add file, create ticket1 branch, add couple commits then
git log --oneline → a less verbose version of git log
git merge-base ticket1 main → get original branching point from main
git rebase -i <sha>
choose the commit you want others to be squashed into
leave a single commit message (all can be left aswell)
git reflog
Disadvantages:
Since commits are squashed you can’t travel back to each one
Python Crash Course
Git for version control
Demo rebasing from master:
create main, add file, create ticket1 branch, add couple commits then
switch to main and create yet another file (we are simulating the situation when a
teammate commes in and says that there is an important new peace of code,
potentially a “hotfix” in master that we want to also incorporate right away so
that we would see if your code is compatible with that new hotfix and there are no
issues)
don’t forget to commit in main
switch to the other branch
check the log
git rebase main
check the log and the merge base!
Pyenv
manage different python versions and easily switch between them
usually used on a development machine
don’t confuse w/ pyvenv, a venv predecessor, now deprecated
useful to test application against multiple versions of python
Installation for mac and linux: https://github.com/pyenv/pyenv
Installation for windows: https://github.com/pyenv-win/pyenv-win#installation
More: https://realpython.com/intro-to-pyenv/...
Pipenv
a new possible replacement/improvement for pip
aims to combine pipfile (a replacement for the requirements.txt file that uses toml
format and store dev, test and prod dependencies in separate sections of the same
file), pip and virtualenv into one command on the command-line.
some literature still recommends to use pip for beginner python developers - the
workflow is easier
more: https://pipenv.pypa.io/en/latest/ … hoeever to start better use this
documentation: https://docs.python-guide.org/dev/virtualenvs/
see this for an extended community discussion:
https://stackoverflow.com/a/41573588/1964707
Poetry
a new replacement/improvement for pip and pipenv
a tool for dependency management and packaging in Python
allows you to declare libraries your project depends on and it will manage
(install/update) them for you
offers a lockfile to ensure repeatable installs / dependable builds, and can build
your project for distribution
poetry can be used for both application creation and library creation (it is
sometimes said that pipenv focuses more on applications rather than libs).
Installation + PATH
https://python-poetry.org/docs/#installation
remember poetry is not prepackaged with python and is a global tool if installed
the default way (global to all users or to your user) and there are important
implications to that:
it will not be managed by pyenv and will be bound to your system python.
it does create virtual environments, but it does that outside of the project (which
is something I dislike, because if you delete the project the dependencies will not
be deleted automatically. Note, this is subject to change).
because it is bound to your system python, it will declare your global python
version as the minimal supported version [tool.poetry.dependencies]
Commands (windows):
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content
| py -
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content
| py - --uninstall (don’t forget to remove pypoetry from PATH)
Python Crash Course
Project management with poetry (optional)
Commands
poetry new <project-name> → create a new project with poetry
poetry init → initialize poetry into the project that was
already started (I usually use this approach)
poetry install → install all dependencies (ussually run after
cloning the project), does not remove packages
poetry install –sync → if you want to uninstall all packages
removed and install new packages added from/to pyproject.toml
poetry add <package-name> → to add a new package to the dependency list
of our app
poetry remove <package-name> → uninstall the package
poetry update → updates the packages, but respects the
poetry show –tree → show dependencies as a tree
poetry add requests → add dependency
poetry add pytest --group test → add dependency to group test (group
names can be chosen at will, no need to specify group when uninstalling)
poetry show --only main → list non development dependencies (only the
main dependencies)
poetry run python app.py → run app (you can encounter specialized
commands, poetry run flask run they require additional configuration)
Python Crash Course
Distributable applications (optional)
Raw modules (python files via gitgists, or direct download)
Executable Zips (convered)
Docker containers
Creating installable software and .exe files: PyInstaller and py2exe, see:
https://packaging.python.org/en/latest/overview/#bringing-your-own-python-
executable
Python Crash Course
Practical Project 1 (PP1)
We have learned quite a lot of Python (and some additional concepts) in this part -
we want to solidify our knowledge by practicing!
To complete this practice assignment you need to create a python web scrapper
project.
Requirements:
Scrape one or more websites, that are publicly available - it can be a minimal
project or you can go as in depth as you like.
It must scrape in either of 2 ways (or use both ways) with navigation either - in
depth (e.g.: items page + item page) or in breadth (pagination) - this is the
minimal requirement to get a positive grade. You don’t have to implement both, but
that would be good practice.
You can use any library / framework, but if you use scrapy, bs4, selenium,
requests-html please use recommended python project structure.
It must contain a config file - you decide what parameters need to be configurable,
simple options: url, selectors, port, logging level, how much time to wait before
accessing next page, print scrapped information to console or to file or both etc.
It must log errors to a centralized file - at least one log file, for example
main.log.
Code is hosted in github (can be private, but please invite the teacher as a
collaborator to verify the project) with at least 3 commits, containing readme file
with launch instructions (document how to launch the project easilly),
requirements.txt (or equivalent).
Optional requirements (future lectures will cover these topics):
Write some unit/integration tests (pytest)
Incorporate excel file processing
Save values to a database
Anything else you might think of or want to try…
Python Crash Course
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
Klasės ir objektai
Klasės metodai
01
02
03
Klasės kintamieji
Python Crash Course
00
Objektinio programavimo esmė
04
Statiniai metodai
05
Paveldimumas
06
07
Dauginis paveldimumas
Enkapsuliacija
08
Kompozicija
09
Polimorfizmas
Objektinio programavimo esmė
Python Crash Course
Objektinis programavimas tai yra viena iš programavimo paradigmų (ne apie
deklaratyvus vs. imperatyvus):
Procedūrinis (struktūrizuotas) programavimas: programos skaidymas funkcijomis bei
naudojimas if / switch ir kitų blokų. Nėra klasių / objektų, kurie laikytų savyje
duomenis ir metodus. Tai ką darėme iki šiol.
Funkcinis programavimas: funkcijos kaip kintamieji, immutable duomenų struktūros,
grynos funkcijos, deklaratyvumas (SQL yra deklaratyvi kalba, Regex. Deklaratyvumas
priešinamas su imperatyvumu. map() + filter() vs. for() loop).
Objektinis programavimas: viskas modeliuojama objektais kaip “realiame pasaulyje”
(objektai turi savybes ir metodus). Objektą pagaminti naudojamas šablonas,
vadinimas klase. Yra 4-turi (5?) pagrindiniai OOP principai, kuriuos tuoj
apžvelgsime.
Today, many languages do not fall neatly into these 3 categories (or the categories
of declarative vs. imperative style). Many languages are hybrid / multiparadigm
languages, e.g.: Python - for loops (procedural style), classes/objects (oop),
closures/lambda/map (functional).
These paradigms are for humans (mainly productivity) - computers / CPUs do not care
(maybe compilers and interpreters do). Creators of OOP thought that huge enterprise
apps will be much easier to understand in OOP style (although that is not where the
benefits came from - garbage collection was much more productivity enhancing).
Objektinio programavimo esmė
Python Crash Course
Kas yra klasė? Klasė turi savyje duomenis ir metodus t.y. ji enkapsuliuoja savybes
ir elgseną po vienu vardu. Taigi galime klasę tiesiog laikyti “konteineriu” arba
sugrupavimu atskirtų duomenų ir su jais susijusių metodų. Tai pirmasis būdas kaip
klasė gali būti apibrėžta - statinės klasės atitinka šį apibrėžimą, bet neatitinka
antrojo.
Klasė tai pat gali būti objekto šablonas. Iš klasės galime pagaminti daugybę
objektų turinčių tą pačią struktūrą (struktūra = savybės + metodai) - atitinkančių
tą pačią formą - tačiau kitokias tų duomenų reikšmes. Pvz: Employee class.
Kiekvienas darbuotojas turės id, vardą, pavardę - bet konkretus duomuo bus
skirtingas ir dažnai unikalus (ar bent specifinis) kiekvienam objektui. Dažnai
klasės yra paaiškinamos cookie-cutter analogija.
class ClassName:
pass
object_name = ClassName()
tačiau jei norime savybių reikšmes priskirti objekto kūrimo metu - kaip dažnai būna
patogiausia daryti - turime apsirašyti __init__ metodą. Tai pirmasis mūsų ‘duoble
underscore’ (arba ‘dunder’) metodas kartais vadinamas init, konstruktoriumi arba
dunder init.
kintamasis self nurodo į objektą, pagamintą iš klasės, kurioje metodai yra
kviečiami (panašiai kaip this kitose programavimo kalbose).
Klasės kintamieji
Python Crash Course
Tai kintamieji, kurie priklauso klasei ir yra bendri visiems objektams,
pagamintiems iš to klasės.
Kitose kalbose tai vadinamieji static kintamieji (nors Python turi ir static
kintamuosius)
Jie yra priešinami su instance savybėmis / kintamaisiais (instance == object).
Klasės savybės inicializuojamos kai klasė kuriama, todėl negali priklausyti nuo
vėliau inicializuojamų dalykų, pvz: instance savybių.
Instance savybės priklauso kiekvienam objektui atskirai. Jų reikšmės unikalios tam
objektui kaip jūsų vardas ir pavardė yra priskirti Jums ir kitas žmogus turės kitą
vardą ir pavardę, taip ir instance metodai is savybės galimai skiriasi nuo visų
kitų objektų (žinoma, gali ir sutapti, tačiau nėra būtinybės sutapti).
Klasės savybės priklauso klasei t.y. šablonui iš kurio padaryti tie objektai.
Klasės savybės yra bendros visiems iš tos klasės sukurtiems objektams. Jei
pakeistume tą savybę tai tas pasikeitimas atsipindėtų visuose objektuose, kurie
pagaminti iš tos klasės. Jeigu keistumėme per objekto vardą, pvz.: emp1.a = 5, tai
būtų sukuriamas naujas kintamasis instance lygmenyje / scope.
Vienu metu, klasė gali savyje talpinti ir klasės ir instance kintamuosius (they are
not exclusive, they can coexist).
Klasės kintamieji
Python Crash Course
Klasės kintamieji
Python Crash Course
Python klasės kintamasis egzistuoja klasės lygmenyje:
print(Classname.__dict__)
print(objectName.__dict__)
… tačiau name lookup taisyklės yra tokios: jei interpretatorius neranda kintamojo
objekto viduje, jis eiško to kintajomo klasės viduje. Tai įsipaišo į bendrąją
Python vardų radimo tvarką - LEGB (local, enclosing, global, builtin), see:
https://realpython.com/python-scope-legb-rule/#class-and-instance-attributes-scope
class variable - local scope
object_name.class_var_name → local scope (object) to enclosing scope (class)
x = 5
def test():
# in this case we would reach even the built-in scope
print(abs)
test()
x = 5
def test():
# x will be searched in local scope (function),
# ... then enclosing scope (which is also global scope)
print(x)
test()
Klasės kintamieji
Python Crash Course
Tai svarbu suprasti, nes jei bandysime keisti klasės savybę turime dvi galimybes:
keisti kviečiant su klasės varbu arba su objekto vardu. Jei kviesime su objekto
vardu tai objekto scope bus sukurtas naujas kintamasis (fieldai objektams gali būti
priskirti dinamiškai).
Taigi, jei norime, jog klasės viduje raise_amount būtų išimtinai pakeičiamas /
overridintas vienam objektui- galėtume vienam objektui pritaikyti išimtį -
paliksime klasėje self.raise_amount. Tokiu būdu nesunku implementuoti “default
case”.
Klasės kintamieji
Python Crash Course
Situacijoje, kai negalime sugalvoti kodėl reiktų overridinti klasės kintamąjį
vienam ar keliem objektam naudosime klasės vardą:
Klasės metodai
Python Crash Course
Klasės metodai sukuriami su dekoratoriumi @classmethod.
Tokiu būdu pirmasis kintamasis paduodamas į tokį metodą bus klasė, o ne objektas
self.
Paveldimumas
Python Crash Course
Paveldimumas (inheritence)
kodo perpanaudojimo principas, kai iš tėvinės klasės vaikinė paveldi duomenis ar
metodus ir joje šiųjų aprašyti nebereikia.
palaiko DRY (don’t repeat yourself) principą programavime (no boilerplate, bottom-
up design ).
taip pat galime mastyti per abstraktaus modeliavimo prizmę (labiau top-down
design), bet taip dažnai studentai tiesiog pasimeta (Animal → Cat → Tiger arba
Person → Employee → Teacher). IS-A santykis.
Dažnai knygose žmonės pradeda kalbėti apie abstrakčias kategorijas, kurioms
priklauso realaus pasaulio daikai, kad paaiškintų paveldimumą. Tačiau
produktyvesnis būdas dažnai yra tiesiog pademonstruoti kaip yra perpanaudojamas
kodas (DRY).
Paveldimumas
Python Crash Course
Method resolution order (MRO):
Paveldimumas
Python Crash Course
Pasidarykime dar vieną klasę:
Sužinoti ar objektai priklauso tai pačiai klasei galime su isinstance() metodu
Sužinoti ar du objektai susiję paveldimumo ryšiu galime su issubclass() metodu.
Sužinoti koks klasės, kuriai priklauso objektas pavadinimas galime įvairiais
būdais: https://stackoverflow.com/questions/510972/getting-the-class-name-of-an-
instance
Dauginis paveldimumas
Python Crash Course
Python kalba yra viena iš naudaugelio kalbų, kurios palaiko multiple inheritence.
Ref: https://www.programiz.com/python-programming/multiple-inheritance
Tokiu atveju MRO: https://www.programiz.com/python-programming/multiple-
inheritance#resolution naudojamas C3 linerizacija MRO išrišti:
https://stackoverflow.com/questions/55692832/method-resolution-order-mro
MRO paaiškina, kodėl reikia naudoti ClassName.__init__() vs. super().__init__(),
see: https://stackoverflow.com/a/42413830/1964707
Pažvelkime į paveldimumo tipus apskritai:
Dauginis paveldimumas
Python Crash Course
If you subclass (verb) a class that has multiple inheritance, your super() can in
fact delegate to a slibing, not a parent! When using multiple inheritance refer to
super constructors using the class names directly in the entire inheritance chain:
Employee.__init__()
The Diamond Problem:
https://en.wikipedia.org/wiki/Multiple_inheritance#The_diamond_problem
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def __str__(self):
return f"{self.name} is {self.age} years old"
class Employee(Person):
def __init__(self, name, age, rate, num_of_hours):
print('Employee%s' % super().__init__)
super().__init__(name, age)
self.rate = rate
self.num_of_hours = num_of_hours
def show_finance(self):
return self.rate * self.num_of_hours
class Student(Person):
def __init__(self, name, age, scholarship):
super().__init__(name, age)
self.scholarship = scholarship
def show_finance(self):
return self.scholarship
# class Employee(Person):
# def __init__(self, name, age, rate, num_of_hours):
# Person.__init__(self, name, age)
# self.rate = rate
# self.num_of_hour = num_of_hours
#
# def show_finance(self):
# return self.rate * self.num_of_hours
#
#
# class Student(Person):
# def __init__(self, name, age, scholarship):
# Person.__init__(self, name, age)
# self.scholarship = scholarship
#
# def show_finance(self):
# return self.scholarship
def show_finance(self):
return self.rate * self.num_of_hour + self.scholarship
print(WorkingStudent.__mro__)
os4 = WorkingStudent("Monica", 24, 9.5, 70, 500)
Paveldimumas
Python Crash Course
Apibendrinimas:
Kada žinoti, kad turėtume naudoti paveldimumą? Na tai visų pirma, turime įsikinti,
kad reikia naudoti OOP išvis. Kas mažoms programoms, kartais nereikalinga. OOP
sukurtas valdyti sambius projektus kaip abstrakcijos lygmuo. Skirptinant, mažoses
programose pilnai pakaktų duomenis reprezentuoti kolekcijomis (pvz: list of dict)
ir logikai - “free floating”/global funkcijas.
Jei jau matome, kad turime daubybę skirtingų duomenų (domain objects), kuriuos
reikia reprezentuoti - klasikinis pavyzdys iš lentelių imami duomenys apie
Employees, Account, ShopingCartItem, Item, Auction, SalesEvent, Event, Building.
For existing project - use whatever they use and whichever way they use it. Even if
it means learning to do things incorrectly.
It kai nusprendėme naudoti OOP, tai paveldimumas panaudojamas arba “bottom-up” arba
“top-down”. Top-down tai planuojant išaiškės. Bottom-up tai tiesiog rašai
aplikaciją ir matai, kad yra 3-ys klasės (“the rule of 3”), kurios labai panašios
(Employee, Customer, GuestUser) ir visos turi id, name, surname - tai galima
sutaupyti kodo pasidarant superklasę Person, kuri bendrąsias savybes turės savyje
(id, name, surname) specifinės savybės ir metodai jiems apdoroti bus atskirose
klasėse.
Enkapsuliacija
Python Crash Course
Encapsuliacija tai klasės implementavimo principas kai objekto duomenys ir metodai
yra enkapsuliuojami t.y. talpinami viduje jos.
Duomenų keitimas bei skaitymas daromas kontroliuojamai, naudojant specialius
metodus bei prieigos modifikatorius.
Tai dažnai siejama su getters ir setter (accessors and mutators) metodais kitose
kalbose.
Python kalboje naudojami dekoratoriai implementuoti getteriams ir setteriams:
Property dekoratorius leidžia aprašyti metodą, kurį galime kviesti kaip property su
dot operator.
Deleteris - naudojamas sunaikinti / nullinti duomenis objekto viduje
Daugiau:
https://www.geeksforgeeks.org/getter-and-setter-in-python/
https://stackoverflow.com/questions/2627002/whats-the-pythonic-way-to-use-getters-
and-setters
Enkapsuliacija
Python Crash Course
Python turi ir prieigos modifikatorius. Jų yra 3: public, protected ir private:
vars with public access modifiers can be accessed anywhere inside or outside the
class. They are written normally.: self.name
protected can be accessed anywhere unless protected by the @property decorator -
they are written with a single underscore self._name . Jie konvenciškai nėra
kviečiami iš už modulio ribų, nors interpretatorius tai leidžia.
private variables can only be accessed inside the class - written with double
underscore: self.__name
Ref: https://stackabuse.com/object-oriented-programming-in-python/#accessmodifiers
Bendrai, private / public data skirtis nėra tokia svarbi kaip Java / PHP ar kt. OOP
kalbose. Kai kurios knygos / tutorialai internete to net nemini. Palyginus kitose
kalbose bet koks narmalus tutorialas turėtų namažą diskusiją apie tai.
Egzistuoja ir private metodai.
“In python, nothing is trully private”, see:
https://stackoverflow.com/questions/70528/why-are-pythons-private-methods-not-
actually-private
Kompozicija
Python Crash Course
Objektas talpina savyje kitą objektą kaip lauką (field), kad išpildytų savo
funkcionalumą, dalį jo deleguodamas.
HAS-A santykis su klasikiniu Car → Engine pavyzdžiu (car has an engine).
Vieno objekto padavimas į kitą iš išorės per konstruktorių ar setter metodą yra
vadinamas Dependency Injection (DI). Tai yra creational design pattern.
Pagal tai ar dependencis yra paduotas per konstruktorių ar setterį išskiriami 2 DI
tipai:
setter injection
constructor injection
Viena klasės gali turėti tiek kiekvieną iš DI tipų, tiek ir abu.
Dependencis gali būti vadinamas lietuviškai “priklausančiu objektu”. Engine
priklauso Car’ui.
Ref: https://realpython.com/inheritance-composition-python/#composition-in-python
Simple variable injection is usually not considered composition in the literature.
Kompozicija
Python Crash Course
Sometimes we distinguish between aggregation and composition.
Composition: when a class creates a dependent object inside of itself (inside a
constructor).
Aggregation: when a class receives a dependent object externally (dependency
injection).
Polimorfizmas
Python Crash Course
Daugiaformiškumas (iš graikų kalbos)
Pakeičiamumas tarp vaikinės ir tėvinės klasės (polimorfizmas per paveld.).
į kolekciją su tipo deklaracija yra paduodamas vaikinis objektas, nors tipas
specifikuotas tėvinis (polimorfinė kolekcija).
į metodą ir paduodamas vaikinis objektas, nors metodas priima tėvinį (polimorfinė
funkcija)
Tai vadinama subtype polymorphism.
Polimorfizmą sunkiau pademonstruoti dinamiškai tipizuotose kalbose, nes ten galima
dėti visus tipus į visokias kolekcijas ir funkcijų parametrai kiauri.
… tačiau dabar žinome apibrėžimą ir panaudojimą šios koncepcijos.
Ref: https://www.geeksforgeeks.org/polymorphism-in-python/
—--
Polimorfizmas visada eina “į apačią paveldimimo hierachijoje”, nes jei klasė
paveldi kažką iš tėvinės klasės tai mes galime tiek polimorfinėse kolekcijose, tiek
polimorfinėse funkcijose tuo pasikliauti ir naudoti paveldėtus metodus ir savybes.
Kai kuriose šaltiniiuose išskiriamas dinaminis ir statinis polimorfizmas (method
overloadingas) ir polimorfizmas aiškinamas kaip procesas, kurio metu išrišamas
(bindinamas) arba pakviečiamas reikalingas metodas. Mano paaiškinimas kaip matote
yra kiek kitoks, bet susiveda į tą patį - juk pakeičiamumas tarp tėvinės ir
vaikinės klasės (polimorfizmas per paveldimumą) remiasi tuo, kad metodai yra
paveldimi vaikinės klasės, todėl ji visose situacijose gali atstovauti tėvinę.
Gera diskusija: https://softwareengineering.stackexchange.com/questions/335704/how-
many-types-of-polymorphism-are-there-in-the-python-language
Polimorfizmas
Python Crash Course
Pakeičiamumas funkcijose - įeities ir išeities parametruose:
Polimorfizmas
Python Crash Course
Duck typing is an alternative to type hinting (... or static types).
Duck typing: if it quacks like a duck …
In python we can pass objects to functions and as long as the object being passed
has what the functions uses there is no issue.
We are loosing the ducktyping capabilities if we are restricting the types with
typehinting (mypy). But we regain some of that flexibility from polymorphism.
So duckyping is (arguably) a non-inheritence based polymorphism - it does not pass
the isinstance() test, but allows for liberal parameter passing and containment of
various types inside collections.
Summary
Python Crash Course
Magiški metodai
Python Crash Course
Plius (+) operatorius Python yra overloadintas - jei sudedame du skaičius gausime
skirtingą semantiką (tą pačią sintaksę) nei sudėdami du str objektus.
Kas bus jei panaudosime sudėties simbolį tarp dviejų, mūsų pačių parašytų objektų?
Tai galime apibrėžti su dunder - double underscore metodais.
Jau matėme dunder init metodą - konstruktorių.
Kiti metodai:
__repr__() → objekto reprezentacija loginimo, debubinimo tikslais skirta
developeriams. Rekomenduotina gražinti konstruktoriaus pakvietimo išraišką.
__str__() → įskaitoma (human readable), draugiška skaitytojui objekto (vidurių)
reprezentacija (kaip toString() kitose kalbose).
Magiški metodai
Python Crash Course
Ref: https://levelup.gitconnected.com/python-dunder-methods-ea98ceabad15
When we start supporting a single comparison operation, it is advisable to support
them all or most.
Magiški metodai
Python Crash Course
SK: kur galime pamatyti __repr__ veikimą:
REPL’e
colab, kai nedarote print(custom_obj)
Objektų palyginimas
Python Crash Course
Objektai palyginami su __eq__ dunder metodu. Šis metodas nusprendžia objektų
palyginimo semantiką t.y. ką reiškia dviems mūsų sukurtiems objektams būti lygiais.
Pagal nutylėjimą yra lyginami objektų id(), tačiau dažnai norime palyginti ar
objektai lygūs savo reikšmėmis (ar didesni, mažesni ir t.t.)
Ref: https://stackoverflow.com/questions/1227121/compare-object-instances-for-
equality-by-their-attributes
Ref: https://www.kite.com/python/answers/how-to-compare-two-objects-in-python
Objektų palyginimas
Python Crash Course
Pamenate klausimą - kokios dvi operacijos yra svarbios rikiavimui: palyginimas ir
swapas.
Mokėdami lyginti objektus, galime pradėti kalbėti apie jų kolekcijas, o tuomet ir
apie operacijas objektų kolekcijoms.
Pažvelkime į šias operacijas:
rikiavimas → jam reikalingi __eq__ ir __lt__ (jei plain .sort(), be key ) dunder
metodai
paieška → reikalingas __eq__ dunder metodas, daugiau variacijų:
https://stackoverflow.com/questions/9542738/python-find-in-list . When we define
__eq__ we can use in operator or list.index() method for searching.
Digression: set search is much faster than list search, because set uses hashing.
But set does not allow duplicate values.
mappinimas → same as simple variables
filtravimas → same as simple variables
Atminkite, jog priklausomai nuo duomenų reprezentavimo būdo, naudojant OOP, galime
operacijas daryti pagal įvarias objekto savybes:
kurios yra paprasti skaliarianiai kintamieji (int, str ir pan),
kurios yra kolekcijos: listai, dictai (listo suma ar vidurkis - darbuotojo
atlyginimas, padarytų užduočių kiekis, studento pažymiai ir t.t.)
kurios yra agregacijos / kompozicijos ryšiu susiję vidiniai objektai.
kurios yra įvairios aukščiau minėtų trijų dalykų kombinacijos
Objektų palyginimas
Python Crash Course
Define __hash__() if you want to use your objects as keys in dict or as values in
set.
Rule: if __eq__() returns True for two objects they should have the same __hash__()
value.
If objects are hashable (so have a __hash__() method defined) they should be
immutable. Why? Because when you add a hashable object into a dictionary it is
placed into a particular position by the hash value and if you change the internal
object properties latter then the hash will change and you wil not be able to
retrieve the key, because the hash will be different due to the change of property
values.
OOP kritika
Python Crash Course
https://www.youtube.com/watch?v=goy4lZfDtCE
https://www.youtube.com/watch?v=QM1iUe6IofM
https://www.youtube.com/watch?v=tD5NrevFtbU
Programos planas
Čia galite susipažinti su programa
Additional information
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Some call this structured data or relational data. I would argue that we should
call it structured only if it complies to 1NF from relational theory. However this
is not an established definition or something like that.
What is tabular data
Neural Networks for Tabular Data
[1] Even this can be argued against as this tweet shows:
https://twitter.com/math_rachel/status/990375128314736640
Good / bad loans, overpriced / underprised flats, etc.
We have seen many tabular datasets already!
CC Fraud
California and Boston house price prediction
Wildfire area prediction
Adult dataset for 50K salary prediction
What is tabular data
Neural Networks for Tabular Data
Every column can be of a different category of data. Most general categories -
quantitative (numerical) vs. qualitative (categorical) data. See:
https://studyonline.unsw.edu.au/blog/types-of-data . Google’able term: datatypes in
statistics.
Tabular data types
Neural Networks for Tabular Data
In statistics, there are four data measurement scales: nominal, ordinal, interval
and ratio.
Nominal - simply labels w/o quantification, names of things (cat, dog), like hair
color.
Dichotomous (2 or N possibilities)
Non-overlaping (south, north)
Sometimes we might want to split categories: wagon - passenger and cargo, maybe the
model would work better if we split cargo wagons to heavy cargo vs. light cargo.
Ordinal
implies order
no scale to compare - what is the difference between happy and very happy?
does not have a mean, but does have mode (most frequent) or median (middle of
sorted set).
more examples: https://www.graphpad.com/support/...
Interval
order and scale - diff between 50cm and 60cm is the same as 60cm and 70cm degrees
[TODO] no true zero - 0 does not mean absence (20 degrees C is not twice as hot as
10 degrees C, don’t beleive that? Convert to Fahrenheit).
impossible to calculate ratio, but central tendency and dispersion (stddev) can be
calculated
examples: time is an interval measure, duration (time interval) is not, temperature
is interval data. 2x 10:00 is not 20:00, but 2x1h = 2h (durration).
Ratio - numbers
both descriptive and inferential statistics to can be applied
examples: weight, height, income, durration, speed
Tabular data types
Neural Networks for Tabular Data
These data scales can be understood as concentric circles:
N > O > I > R
…. can they?
Tabular data types
Neural Networks for Tabular Data
https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-
ratio/
Tabular data types
Neural Networks for Tabular Data
https://www.mymarketresearchmethods.com/data-types-in-statistics/
Tabular data types
Neural Networks for Tabular Data
Cardinality is the measure of uniqueness in the set.
High cardinality - highly unique values, most values differ.
Low cardinality - highly repeating, most values are the same.
In relational database world we know that indexes should be add on columns with
higher cardinality, ideally unique and normally distributed values to provide
maximal partitioning of data
Sometimes usefull when choosing which encoding to use (one-hot encoding on high
cardinality data might be not optimal)
Tabular data types
Neural Networks for Tabular Data
Note: tabular data can contain text, images (image url), audio and other types of
data or references to some other type of data. To deal with this we have 2 options:
eliminate that data if we don’t know how to feed it to the model / model it.
multi input model
Example: dermoscopy image together with tabular (meta)data.
Ref: https://towardsdatascience.com/integrating-image-and-tabular-data-for-deep-
learning-9281397c7318
Tabular data types
Neural Networks for Tabular Data
During the 2010s, deep learning revolutionized computer vision and natural language
processing, but plain old tabular datasets have proved a tougher nut to crack.
In general, deep learning (neural networks stacked in many layers, sometimes
hundreds or thousands of them) is effective because it can learn deep hierarchical
representations of data.
Language and the visual world has structure that can be analysed at hierarchical
levels (words, phrases / edges, corners), and at a higher level (sentences, grammar
/ objects, relationship between objects). Images are “structured” data because they
have local structure (nearby pixel values tend to be highly correlated), so that
for example convolution operations can model them well - this is not the case with
tabular data - one row can be completely different than the other and a group of
records can be very diverse.
Before deep learning started to be effective in the 2010s, language processing and
image analysis relied on creating hand-crafting features that reflected certain
properties of the data, but today, models like BERT (for language) and DenseNets
(for image analysis) are able to learn very informative representations of the
data, removing the need for feature engineering.
In addition, images and language data has local structure which lends itself well
to certain types of operations, such as convolutions, which are implemented in all
standard neural network libraries.
For tabular data, there is generally speaking no local or hierarchical structure
(although there could be, in specific cases). For this reason, many people think
that deep learning is irrelevant to tabular data. Instead, past experience seems to
indicate that versions of decision tree ensembles (random forests, gradient
boosting etc.) are the most reliable methods for tabular data.
But we have a desire not to multiply aproaches, but to reduce them - deep learning
provided the hope of reduction for the model zoo, however if we can’t integrate
tabular data then there is no reduction.
Problems with Tabular Data
Neural Networks for Tabular Data
** Note No SOTA architecture means that AutoML for DL Tabular is more complicated.
One-hot encoding
Performs well in many situations
Typically used, popular
Disadvantage: with high cardinality data it produces a lot of columns that slows
down the learning significantly. Also useless features multiply potentially harming
precision of inference
Preparation
Neural Networks for Tabular Data
Demo: one hot and ordinal encoding, adult dataset as a example of a diverse
dataset.
Preparation
Neural Networks for Tabular Data
Entity embeddings: are vectors in euclidian space that are close in that space when
the underlying feature represented by that vector is “close” to the other feature.
They are learned during model training - adjusted during backpropagation.
Two big advantages: they produce more precise models and even high cardinality
columns can be represented with smaller feature vector (less memory than one-hot
encoding especially relevant for high cardinality collumns).
Applications: word embeddings in NLP, collaborative filtering, encoding categorical
features.
Short video overview: https://www.youtube.com/watch?v=186HUTBQnpY
Paper: https://arxiv.org/pdf/1604.06737.pdf
Article: https://medium.com/@apiltamang/learning-entity-embeddings-in-one-breath-
b35da807b596
We are going to talk more about it when we reach the last lecture of part 6 and NLP
part of the course.
Preparation
Neural Networks for Tabular Data
Embedings came from NLP, but can now are use for categorical data.
Preparation
Neural Networks for Tabular Data
Summary for encoding categorical values:
Preparation
Introduction to Deep Learning
We talked about how to encode different kinds of qualitative data (nominal and
ordinal).
How about if the data contains text - one or more sentences, maybe even more?
Stacking DL models - NLP for tabular data using one model for only the text column
and others for numeric. But even when doing that the problem does not disappear
completelly we still need to feed “text” to a neural network.
Encoding w/ Universal Sentence Encoder
The Universal Sentence Encoder encodes text into high dimensional vectors that can
be used for text classification, semantic similarity, clustering and other natural
language tasks. The model is trained and optimized for greater-than-word length
text, such as sentences, phrases or short paragraphs.
https://tfhub.dev/google/universal-sentence-encoder/1
https://www.tensorflow.org/hub/tutorials/
semantic_similarity_with_tf_hub_universal_encoder see this
https://amitness.com/2020/02/tensorflow-hub-for-transfer-learning/
Featurize with pretrained NLP model (BERT) - take the embeddings from the
pretrained model, see: https://towardsdatascience.com/nlp-extract-contextualized-
word-embeddings-from-bert-keras-tf-67ef29f60a7b
Perform term frequency, inverse document frequency (TFIDF) - which is a numerical
expression of words importance - sometimes recommended.
Stacking w/ specialised text models (create specialised model used in prep).
A simple Count vectorization approach (scikit has that available)
We will talk about preprocessing of text data in the future part on NLP. For now,
just remember that dealing with text might not be as simple as it appears!
Preparation
Introduction to Deep Learning
For numerical data everything is as usual - normalizing / standardizing the data
makes gradients more evently distributed, helping convergence and reliability
accross datasets so we usually do that.
Should we normalize encoded categorical (nominal and ordinal data)? The only case
where is could potentially matter is when you use ordinal encoding for many
“labels” - (1 … 200). Then it would mean that your standardized features and the
encoded feature is different by 2 orders of magnitude. Keep that in mind. But in
general you can standardize normalize.
Preparation
Introduction to Deep Learning
Dates - this is not only mostly transformed feature, but usually there is a lot of
feature engineering that can be done.
See: https://stackoverflow.com/questions/46428870/how-to-handle-date-variable-in-
machine-learning-data-pre-processing
Let’s think about unix timestamps and split dates 2001.06.16 → | 2001 | 6 | 16 |
vs. 80866730. When using timestamps we are loosing information about the month
(unless you add it back), the cyclical nature of time is not longer inferable.
Feature engineering tips:
sometimes adding day of week is helpful for the model.
mark holidays/non-working days
When in doubt you can always just try creating a model without a particular collum
Preparation
Introduction to Deep Learning
Our NNs now will have multiple layers and many neurons. They can “remember” stuff,
so it it important to validate the models we train appropriatelly.
Not only choosing the hyperparameters (activation function, layer count, etc), but
… also shuffling, k-fold cross-validation and holdout are imperative.
Do you remember what (k-fold) cross-validation is?
Train on k-1 folds, hold one as validation, average the scores.
https://www.youtube.com/watch?v=fSytzGwwBVw
https://www.datarobot.com/wiki/training-validation-holdout/
Helps us prevent hitting a biased trainiing sample
Preparation
Introduction to Deep Learning
Start with low-capacity network - start small to get a baseline. Few hidden
neurons.
Choose output activation function for classification:
Design
Introduction to Deep Learning
For regression - multiple linear uniform or gaussian / normal - rmse, poisson
regression - poisson, if the output is “zero-inflated” use tweedie (could we
improve the forestfire area predictions even with 0’s kept?)).
Sorting + binning helps see target distribution, there are tests to see what the
distribution actually is (Shapiro–Wilk test, Kolmogorov–Smirnov test and others).
More: http://proceedings.mlr.press/v80/imani18a/imani18a.pdf
Scikit Tweedie regressor and keras tweedie loss:
https://datascience.stackexchange.com/...
Design
Introduction to Deep Learning
And the fact that we should adapt the loss functions using the target distribution
is very interesting.
We compared classification models not only using auROC or f1 score, but also
confusion matrix, to inform us which class of data we are being the most incorrect
on. This informs our analysis of the residuals and can tell us how to tune out
network.
We can do similar things for regression models by tracking how the distribution of
targets is learned (rather than simply tracking RMSE or r2). We can have similar
RMSE for our prognoses for different models, but the errors might show weakness in
learning different parts of the distribution. If, for example, our model makes most
of the mistakes predicting the lowest prices of flats we can say that this is a
systematic (rather than random error).
Making mistakes in systemic way (on high/low flat prices) inform use that we might
need more samples of low priced flats, we might want to take a look at feature
engineering that might help predict low prices or even have ensemble model
dedicated for predicting low prices houses.
TODO :: implement tracking of distribution learning.
Design
Introduction to Deep Learning
What about hidden activation functions? So many to choose from!
eLU - how a negative value should be squashed is determined by hyperparameter
alpha. This means that you can determine it in the tunning process
Design
Introduction to Deep Learning
Dropout: a percentage of neurons are turned off, but only durring training.
This technique prevent overfitting. If you don’t see it, don’t use it.
Additionally dropout mitigates neuron death, like the image on the side exaplains
(we could confirm that by training ReLU NN models).
Analogy with sommelier.
Design
Introduction to Deep Learning
Ideally, before training a model, we would establish a benchmark using well known
techniques / models (multiple ones even). It would be used as a comparison.
Each model should be optimized (specific preprocesing, feature engineering) for a
better benchmark. Doing automated HP tunning is also an option.
This is ussually not done, but if you wanted to have the absolute best model, like
in competitions you either way would use multiple models.
Training
Introduction to Deep Learning
** The table on the side does not indicate that models will always perform in this
order. Linear models do not always perform better than KNN or XGB over other
Ensemple models. This is just an example of a table you might track you
benchmarking results for model selection.
Tunning
Various options exist:
Random search
Grid search
Bayesian optimization
Genetic algorithms
AutoML
Random search
RandomizedSearchCV from sklearn.
Grid search
GridSearchCV from sklearn.
Training
Introduction to Deep Learning
Batch size:
Too small - underfitting and slow
To large - more ram, never finds global minima
… but that is old news.
1% rule - a new thing! (or find a better one with grid search).
Journeyman
Master
Resources
Introduction to Deep Learning
//
Resources
Introduction to Deep Learning
DANets: Deep Abstract Networks for Tabular Data Classification and Regression :
https://arxiv.org/abs/2112.02962
Applying Deep Learning To Airbnb Search https://arxiv.org/abs/1810.09591
Resources
Introduction to Deep Learning
New (2020.12 initial commit) pytorch-based framework for tabular data:
https://github.com/manujosephv/pytorch_tabular
Resources
Introduction to Deep Learning
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Additionally, it is not even necessary for the user to interact with the system -
we might know what kind of movies are liked from the geographical region where the
users IP is from, that would also be a content based recommender (market
segmentation based recommendation). Also, we will discuss latent
likeeability/feedback factors latter.
Content based
Recommender Systems
With collaborative filtering we have many users in the system and we essentially
extrapolate like this:
if the user has similar tastes in movies (he likes the same movies and dislikes the
same movies) as another user
we will recommend other unseen movies that the other users with similar tastes
liked.
The matrix of users and products - interactions matrix - can be very large and very
sparse (most users have not evaluated the movie). This matrix can be approximated
by two other matrices: user factor matrix and item factor matrix. Recommendation is
just multiplication then using the latent factor matrices.
This type of recommender does not require market segmentation (no user profile of
inherent properties - age, income, etc.) of users or metadata about the items.
Collaborative filtering
Recommender Systems
Compared to content-based filtering: item-similarity vs. user-similarity
Collaborative filtering
Recommender Systems
Knowledge based recommenders can be thought of as partially automated
recommendations.
They explicitly ask the user about his preferences.
These systems are often used when building a user-product matrix or item similarity
matrix is too diffiicult. Difficulties arrise when the sistems is rarelly used -
for example how would you recommend a house for a person to buy when he buys one or
two in his lifetime? Learning w/o iteration is close to impossible. This is
ussually done via a survey / wizzard or just leave possibility to filter content on
the site.
Questioning users can be done post-fact, or prior (image a sales person approaching
you and asking what are you looking for in a house that you want to buy).
Many online teaching / publishing platforms have use this approach.
Knowledge based recommenders
Recommender Systems
A hybrid approach can be constructed in two major ways:
creating a system that will use either one of the 3 approaches depending on a
situation: not many users (impossible to use collaborative filtering) but a new
user has rated several movies - content based recommender. No info at all - add a
2 question survey at the begging of the registration or recommend by country / IP /
gender if they are known or most popular.
“synchronous” hybrid: use all 3 when possible and combine the results. Research
suggests that this approach might be the best. This is similar to an ensemble
model.
A hybrid approach is (or at least was) used by the youtube recommendation system
Hybrid approach
Recommender Systems
There are different algorithms to implement either content or collaboration based
recommenders, from computing vectors and their similarities algebraically, to ML
approaches (KNN) and of course deep learning approaches.
For DL approaches we simply use a neural net to predict how likelly the user is to
evaluate the item favorably - user rating predicted vs. user rating actuall would
be the metric.
The advantages of DL model would include them being more precise.
The disadvantages: it has to be trained, it might not be explainable, can be slower
than some other approaches.
The distribution of total evaluations should be taken into account (tweedie), so
the classification problem would of the asymmetric type.
This kind of recommender might be considered closer to content based filtering,
although it is content + knowledge.
Deep learning models
Recommender Systems
Skewness and sparsity
The matrix of users and products offered is sparse and skewed (few interactions in
the user-item space). There can be hundreds of users and thousands of items and
most of the matrix is values are zero - so costly computations and memory wasted.
This is because:
most customers interact with 1%-2% of your products.
most products are reviewed by few people who just like reviewing / buying products
and are not necessarily representative of the population of buyers or any specific
buyer.
This contributes to sparsity. Skeweness is due to:
few users are very prolific and most are very quiet
some products are very popular and others are not (that fact that the user likes an
item that all / most people like does not tell use much about his preferences).
You can't just naivelly take all reviews for all products.
Pitfalls of recommenders
Recommender Systems
Cold-start (ramp-up)
When a new item or a new user is added to the system, the system does not have any
possibility to judge the future behavior of the user or the product based on
information from the past.
We might also consider the problem when system is completely fresh (new system),
but this is not what is usually meant by cold-start problem in the literature.
Pitfalls of recommenders
Recommender Systems
Popularity based - cold start for new item, but not for new user.
Content based - cold-start problem for new user (because we know the item
properties)
Collaborative filtering - both.
Lack of explicit feedback - inferred preference!
We must sometimes relay on implicit features / implicit user behaviors when
evaluating their preferences. It is common for less than 1% of people to evaluate
an observed video via a thumbs up.
Pitfalls of recommenders
Recommender Systems
Lack of explicit feedback - inferred preference (cont.)!
How to overcome this problem - implicit data, examples:
Assume that if the user watched the whole video - he likes it
Assume that if the user listened to a song for 2 times in a row (in 30min) - he
likes it.
If the meal was finished, then assume the use likes it.
“Likes it” does not mean that positive emotions were produced - just that the user
wants more of that or similar or complementary thing. Negative news if the user is
angry about them we still can use the term “likes it”.
Even though we can overcome lack of explicit data with implicit data, explicit
should still be the ground truth from the ethical standpoint (and when possible,
Deep Learning based approaches should use explicit evaluations for their loss
calcualtion). Think: should you recommend a video to a user if it gives him strong
negative response but the user is addicted to that negative response.
Another thing: serving the user or educating the user - should the system inject
“counter recommendations”. How can a company/system choose - what can it optimise
in this situation: time spent on the platform, it might be higher if the user is
exposed to content he/she/it does not “like”.
Pitfalls of recommenders
Recommender Systems
Context aware recommenders systems (CARS) take into account the context of the
experience and the recommendation. This is especially true when the recommener uses
user ratings / evaluations.
There are also 3 ways of injecting context into a recommender in CARS:
contextual pre-filtering (next slides),
post-filtering,
modeling - deep learning models can pay attention to some context data.
see: https://link.springer.com/chapter/10.1007/978-1-4899-7637-6_6 (author
Lithuanian, university of minesota: https://scholar.google.com/citations?
user=oWCSRZ0AAAAJ&hl=en workshop: https://www.youtube.com/watch?v=ZrMxfbZhLT8 ).
Advanced models
Recommender Systems
Ref: https://medium.com/@andresespinosapc/the-basics-of-context-aware-
recommendations-5dd7a939049b
Recommender Systems
Detailed course plan
Slides, tasks and so on
Additional information
GAN applications
DRL applications
Tabular data (often time DL is overkill here, use it when there are lots of feature
a loads of data)
Classification - good vs. bad loan / spam vs. ham
Regression - house price (numeric continuous)
Clustering - marketing segments
Anomaly detection - spiked negative comments for product.
Success story: predicting patient mortality (tabular data).
Textual data (RNNs)
Predicting article categories (classification).
NLP on written data: authors emotions from comments.
Autocomplete (gmail uses that).
Translation btw/ languages.
Headlines generated based on the content of the article.
Applications of neural networks by datatype
Introduction to Deep Learning
Image data
Object recognition, age regression, emotion detection, gender clasification.
Pixelized image segmentation - convolutional encoder decoder.
Image captioning - CNN + RNN.
Image resolution enhancement.
GANs to fill the missing part of the image. Seeing how people will look when old.
Synthetic celebrieties.
GANs are also used for image generation based on text (caption).
Audio data
CNN + RNN to recognize songs (vehicles / animals): Shazam
CNN + RNN speetch to text transcription
CNN + RNN decoder encoder nets for real time translation
Speech synthesis: dilated causal convolutional neural network. We can synthesize
our own voice as well.
Applications of neural networks by datatype
Introduction to Deep Learning
Video data
Similar things to image, but also prediction
Automatic sign language translation
CNN encoder decoder for video resoration and colorization:
https://www.youtube.com/watch?v=h7GX3wEfxcg https://www.youtube.com/watch?
v=ELmVmJEt4L4
Video generation: deep fakes and deep dreaming.
** Not necessary, can increase your mark but will not decrease it if not complete.
Glossary, recap, practical project 5
Introduction to Deep Learning
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
Algorithm classification, Big O
Datastructure classification
01
02
03
Datastructure definition
Python Crash Course
00
Algorithm definition
04
Examples of algorithms and datastructures
05
Learning algorithms and datastructures
06
Design patterns
07
Design principles
08
Microbenchmarking
Algorithm definition
What is an algorithm?
Steps to take to wash your hair.
To drive to work.
Steps that a robot takes to cut out ventilation holes.
Definition: sequence of steps to solve a problem or complete a task.
In CS/SE algorithms additionally have clear beginning and end, sequence, input and
output.
But what is a program? Also - a sequence of steps to solve a problem… And so is a
function?
So what is the difference?
Algorithms are designed, not written - whereas programs are written.
Algorithms are invented, discussed, proposed in the design phase of SDLC. Programs
- in the implementation phase.
Algorithms are analyzed, programs are tested (a priori analysis and a posteriori
testing).
Algorithms are language, OS, platform independent, program - are written in a
particular language.
Important: distinguish between the algorithms and its implementation (RSA is proven
to be secure, but the implementation might have bugs)
Algorithm is conceptually the size of the function. Algorithms are usually
encapsulated in one of a few functions. Commonly: ~10-100 LOC
Python Crash Course
Algorithm classification, Big O
Algorithms can be represented as:
Text, words and sentences
Pseudocode (case analysis: SWED interview)
Flowcharts
Start, end - oval
Input / Output (memory extenal) - parallelogram
Branch / loop - rhombus
Operacija atmintyje - rectangle
Implementation in code (function, programm)
Python Crash Course
Algorithm classification, Big O
We can classify algorithms in many ways (like balls in a tray - by color, by size
or by size and color):
By problem solved: sort, search, mathematical / numeric (number generation, square
root, fast inv square root, fib), graph, tree algs, etc.
By implementation technique: recursive / iterative (left_pad()), divide and conquer
(binary search, merge sort), dynamic programming (defined as optimization of
recursive algorithms), greedy, bruteforce, backtracking (maze solved, next best
move in chess).
By performance metrics / complexity (time, space): O(n^2), O(n) and so on.
For example reversing a string has at least 10 possible implementations:
https://trycatchblog.com/sinhadroid/top-10-ways-to-reverse-a-string/
If we have many ways to do things, which way do we choose? Can we decide which is
better? Yes.
There are qualities we want to achieve, things we value about the algorithm that
makes it better
Faster is better, less memory used is better, simpler is better, not requiring I/O
is better, less energy consumed is better.
How do we decide which algorithms is “faster”?
If we count the seconds then we will need to perform the analysis on a single
computer!
We do not count seconds , we perform complexity analysis - let’s turn to that next.
Python Crash Course
All the qualities / desirable characteristics:
Complexity:
Time → time function, not time itself (ms, microsecs). Simple statement takes 1
unit of time and we take the sum of them at the end and pay attention to the most
significant terms.
Space → space function, not space itself (MB, KB). Same as time - sum, take the
most significant value.
Stability - is initial order preserved or destroyed. Why is that important? When
sorting objects we might want to preserve the order between equal objects w/
regards to properties that we are not comparing against. Why? So that we might sort
again with the same algorithm, but by different property (id and name).
Qualities we care about in programms not in algorithms (ussually): I/O (net
(latency, bandwidth, packet counts), disk (r/w speed, latency)), power consumed (in
general the faster the programm, the less CPU it uses (but this becomes untrue when
multithreading is involved)), cpu (registers, cache size and misses*), processor
count (can we parallelize), stack size. In practice we might need to choose a
slower algorithm in case we don’t have much memory, or we might choose an algorithm
that performs the least number of operations to preserve memory.
Most important is time complexity usually (when memory is not very constraint,
which is usually the case in web development and data science, unless you do it on
IoT devices). We perform asymptotic complexity analysis to understand the
complexity class of the algorithm.
Asymptotic analysis - works, because for large N we don’t care about constants or
lower importance terms, because they will be completelly dominated by the important
terms.
Other algorithms in deep learning: optimizers (adam, nadam, adagrad, etc.), forward
propagation.
Machine learning: how a kNN algorithms learns is described by an algorithm, how k-
means does clustering - also. And so on and so on.
Python Crash Course
Example:
Algorithm classification, Big O
Python Crash Course
Example: https://www.bigocheatsheet.com/
Algorithm classification, Big O
Python Crash Course
Case analysis addUpToN(n) - linear vs. constant time.
Algorithm classification, Big O
Python Crash Course
Classification:
Ref: https://adrianmejia.com/most-popular-algorithms-time-complexity-every-
programmer-should-know-free-online-tutorial-course/
Heuristics:
Algorithm classification, Big O
Python Crash Course
Student question:
Why not measure in seconds?
Answer:
We can use seconds, but we need good isolation. Measuring in seconds is called
timing the function.
However seconds depend on computer hardware.
They also depend on the load on the computer - the time it takes for the function
to finish can vary between runs. If the variation between runs is bigger than the
variation between algorithms we can’t decide.
Data dependence - we want a measurement that abstracts the dependence on data.
… so we measure the proportion between input size increase and the increase in
steps taken.
Exercise:
Algorithm classification, Big O
Python Crash Course
Answer to the exercise question:
Sorting algorithms
Slow: bubble sort / insertion sort / selection sort:
bubble sort is not a useful algorithm, so why learn it? It is simple (i), it has 2
simple optimizations that students can sometimes discover themselves which allows
introducing the concept of optimization (ii) and connects the important concepts of
swapping two values and comparison between them that are fundamental to sorting
algorithms (iii) also we can easily use it when discussing the difference between
software engineering and computer science (iv) and there are a few interesting
problems like: implemented a counter to count how many iterations / swaps were
made, how to sort custom objects.
Fast: merge, quicksort, timsort
Trees:
BST: https://qvault.io/python/binary-search-tree-in-python/
Invert binary tree (classic problem): https://medium.com/@theodoreyoong/coding-
short-inverting-a-binary-tree-in-python-f178e50e4dac
B-tree (balanced trees): https://www.youtube.com/watch?v=UzHl2VzyZS4
We can connected roads (edges) and intersections (vertices) and calculate the
travel path when there is a single unidirectinally connected edge (introducing
asymmetry S→F vs. F→S). Also we can have a weighted graph signifying how long it
takes on a 5min interval average to travel through that edge.
Python Crash Course
Examples of algorithms and datastructures
Graph tasks:
Python Crash Course
Examples of algorithms and datastructures
Graphs
The part of datascience that deals with graphs is called Network Science. It was
created from Graph Theory.
Historically: Euler circuit theorem (konigsberg bridge problem).
NetworkX is a python library for working with graphs.
It can visualize networks
And answer some questions for analysis
See: https://networkx.org/documentation/latest/tutorial.html
However for massive amounts of data we would need Apache Spark, Neo4J
NetworkX does not scale to gigabytes of data.
Apache Spark uses graph frames for working with graphs.
https://spark.apache.org/graphx/
There are more tools and examples:
for visualizing graphs: https://towardsdatascience.com/visualizing-networks-in-
python-d70f4cbeb259
analyzing: https://towardsdatascience.com/visualising-graph-data-with-python-
igraph-b3cc81a495cf
Python Crash Course
Microbenchmarking
Although we care about the scalability of algorithms, we do care about time in
implementation of the algorithms (functions or programs).
This can be done in many ways in python, ref:
https://stackoverflow.com/a/2866456/1964707
Notes:
Do not overcomplicate the code between t0 and t1 - we want to time just the part of
the algorithm that varies, we do not care about the initial allocation for example,
because it is the same for both algorithms.
Be fair - you should time both variations after they achieved the same result. If
the first part creates a string and the second one just a list and you time that,
but latter on convert it to a string, then you are timing it unfairly.
You can also vizualize the workings of the algorithms in https://pythontutor.com/
Python Crash Course
import time
############## 1 ##############
mystr = "Mindaugas"
t0 = time.time()
for i in range(0, 10000):
finalstr = ""
for i in range(len(mystr)-1, -1, -1):
finalstr += mystr[i]
t1 = time.time()
print(t1-t0)
print(finalstr)
############## 2 ##############
mystr = "Mindaugas"
t0 = time.time()
for i in range(0, 10000):
mylst = []
for i in range(len(mystr)-1, -1, -1):
# print(f'{i}:{mystr[i]}')
mylst.append(mystr[i])
final = ''.join(mylst)
t1 = time.time()
print(t1-t0)
print(final)
# print(''.join(mylst)) # unfair advantage
Programming competitions:
Top competitions: https://www.quora.com/Which-are-the-best-coding-competitions
https://www.youtube.com/watch?v=MVLSQB5Durg → How to Ace Top Programming
Competitions
https://www.youtube.com/watch?v=xAeiXy8-9Y8 → How To start
List of competitive programming competitions: https://clist.by/
Guide for beginners: https://github.com/Errichto/youtube/wiki/How-to-practice%3F
Python Crash Course
Design patterns
Since we are DS / ML / DL students it is quite uncommon to learn design patterns.
However a quick mention will not hurt to much.
DPs are standard solutions to object oriented design problems, they mostly make
sense in OO languages (see, for an exception:
https://stackoverflow.com/questions/4112796/are-there-any-design-patterns-in-c +
functional languages, like “functional strategy”).
GoF DPs are the ones to know, but there are way more:
https://en.wikipedia.org/wiki/Software_design_pattern#Classification_and_list
Frameworks and libraries use design patterns internally.
Design pattern among other concepts (in the layers of the abstraction cake):
Architectural patterns - high / application level solutions to problems (MVC,
Microservices): https://martinfowler.com/eaaCatalog/ and
https://learn.microsoft.com/en-us/azure/architecture/patterns/
Design patterns - intermediate level (Singleton).
SOLID design principles
OOP principles - inheritance, polymorphism, composition interfaces / abstract
classes … / data structures
Algorithms (similar to the level of functions) - “you don’t need algorithms”
means: “you will not need to create custom ones” (creation vs. usage). Usage will
be unavoidable.
Idioms / syntax mechanism - low level language specific constructs (foreach(),
class X {}).
Groups of design patterns:
Creational. Recommended: Dependency Injection, Singleton (not really for python),
Factory (method) / (Fluent) Builder. Advanced: Object Pool - when you don’t want to
invoke GC.
Structural. Recommended: Facade (requests), Decorator.
Behavioral. Recommended: Iterator, Strategy (simplified in functional programming
with functional strategy pattern).
Concurrency. Recommended: Thread pool (usefull when you want to call a throttled
API w/o being blocked)
Sources:# https://www.youtube.com/@ArjanCodes (probably the best resources on
design patterns)
https://refactoring.guru/design-patterns/python
https://sourcemaking.com/design_patterns/abstract_factory/python/1
https://python-patterns.guide/gang-of-four/iterator/
Python Crash Course
Design principles
Design principles are also not often thought for DS / ML / DL students.
SOLID: https://towardsdatascience.com/solid-coding-in-python-1281392a6a94
Single-Responsibility Principle (SRP) → this can be applied to functions,
classes even programms (unix design philosophy).
Open-Closed Principle (OCP) → new functionality should not touch old
code (add new functionality by just adding new code)
Liskov Substitution Principle (LSP) → child classes should be
replace’able by parent classes.
Interface Segregation Principle (ISP) → better to have more interfaces,
separate your interfaces: FileManager vs. Reader / Writter
Dependency inversion Principle (DIP) → bubbleSort([List of Sortable]) vs.
bubbleSort([List of Person])
Some like it shorter: keep your classes small and create interfaces!
Recommended video: https://www.youtube.com/watch?v=pTB30aXS77U , code for this:
https://github.com/ArjanCodes/betterpython/tree/main/9%20-%20solid
Python Crash Course
We have types of programming:
Mobile
Web: Fe / Be / FS
Desktop
Dataroles: Data / ML / DL (Quants?)
Ops/Infra: Devops / Secops / Admin / Netops
Game dev
System (OS, Webserver, RDBMS, language/compilers)
Low level / electronics: embedded, drivers
Even more succinctly: Applications, Systems, Data, Ops, Games (keep in mind that
this is an arbitrary differentiation, imprecise model)
It is important to note, that thus far we only worked with word level models.
Another level of model are character level models. These models and more expensive
to train (slower and require more memory). However they potentially could be the
future as better algorithms and more powerfull hardware is to be developed. (TODO:
sub-word models, byte-level models, super-word level)
As A. Karpathy wrote in the same article: "Currently it seems that word-level
models work better than character-level models, but this is surely a temporary
thing." See: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ . Why is that?
I could guess that some forms of words can be rare in text (“Mokinys apsimetė
ąžuolu”), but a character level model might be able to learn the form itself from
other examples (“Silkę užkando obuoliu”) and adapt it even if the identical word
did not exist in the entire corpus - morphology.
Additionally punctuation could be learnable via character level model, not “Let's
Eat Grandma”, “skųsti negalima tylėti”.
Generating Shakespeare with Char RNN
Natural Language Processing
… in modern times Tokenization became a much more complex problem with the rise of
LLMs. As mentioned there are more levels that just word and character. This is
visible by just looking at the new tokenizers Keras provides:
https://keras.io/api/keras_nlp/tokenizers/
Generating Shakespeare with Char RNN
Natural Language Processing
TimeDistributed() is a wrapper type in keras (just like Bidirectional() - also
wrapper).
TimeDistributed(Dense()) applies the same Dense() layer to every timestep in the
input sequence. Every input should be at least 3D, and the dimension of index one
of the first input will be considered to be the temporal dimension [[], [], []] →
[seq1, seq2, seq3] → the output will be generated for each input sequence
independly, so we will have output: [out_4_seq1, out_4_seq2, out_4_seq3]
It is a wrapper used for seq-to-seq problems (can be used for others: one-to-one,
one-to-seq). It is applicable to Dense() layer or Conv1D() - these are most
commonly used, but is applicable to any keras.layers.Layer.
There are two key points to remember when using the TimeDistributed wrapper:
Input must be (at least) 3D. This often means that you will need to configure your
last LSTM layer prior to TimeDistributed wrapped Dense layer to return sequences
(return_sequences=True).
The output will be 3D. This means that if your TimeDistributed wrapped Dense layer
is your output layer and you are predicting a sequence, you will need to resize
your y array into a 3D vector.
If we were to take this example model.add(TimeDistributed(Dense(1))) it would mean
the following:
apply a dense layer for each input sequence.
the single output value in the output layer is key. It highlights that we intend to
output one time step for each time step in the input.
TimeDistributed wrapper
Natural Language Processing
When should you use TimeDistributed?
If your Data is dependent on Time, like Time Series Data or the data comprising
different frames of a Video, then TimeDistributed(Dense()) is said to be more
effective than simple Dense Layer (could we create an MWE proving that?).
TimeDistributed(Dense()) applies the same dense layer to every time step during
GRU/LSTM Cell unrolling. That’s why the error function will be between the
predicted label sequence and the actual label sequence.
Using return_sequences=False, the Dense layer will get applied only once in the
last cell. This is normally the case when RNNs are used for classification
problems. (seq-to-one)
If return_sequences=True, then the Dense layer is used to apply at every timestep
just like TimeDistributedDense.
TimeDistributed might be hard to understand right away and easy to forget if you
are not using it often. See this answer for more information:
https://stackoverflow.com/a/42758532/1964707 also
https://stackoverflow.com/questions/47305618/what-is-the-role-of-timedistributed-
layer-in-keras/47309453#47309453
TimeDistributed wrapper
Natural Language Processing
For a char level model we return the probability of the next caracter for each
sequence, so we use a layer Dense(count_of_all_possibilities) + softmax +
categorical_cross_entropy. As if we were solving a classification problem.
TimeDistributed wrapper
Natural Language Processing
Until now, we have used only stateless RNNs: at each training iteration
(batch_size) the model starts with a hidden state full of zeros, then it updates
this state at each time step, and after the last time step, it throws it away, as
it is not needed anymore. What if we told the RNN to preserve this final state
after processing one training batch and use it as the initial state for the next
training batch? This way the model can learn long-term patterns despite only
backpropagating through short sequences. This is called a stateful RNN.
So for long sequences we know we can use:
LSTMs / GRUs over SimpleRNNs
Statefull RNNs
First, note that a stateful RNN only makes sense if each input sequence in a batch
starts exactly where the corresponding sequence in the previous batch left off. So
the first thing we need to do to build a stateful RNN is to use sequential and
nonoverlapping input sequences (rather than the shuffled and overlapping sequences
we used to train stateless RNNs). When creating the Dataset, we must therefore use
shift=n_steps (instead of shift=1) when calling the window() method. Moreover, we
must obviously not call the shuffle() method.
Statefull RNNs
Natural Language Processing
When training Statefull RNNs, the state needs to be reset after each epoch and this
can be performed with Keras callback API.
Also, after this model is trained, it will only be possible to use it to make
predictions for batches of the same size as were used during training. To avoid
this restriction, create an identical stateless model, and copy the stateful
model’s weights to this model. Does this affect performance of the inference? We
can test it - create the necessary batches and then pass them though the same
network and though a new networnok with weight copied (it should t make a
difference, but it’s good to think about how to prove it / test it).
Statefull RNNs
Natural Language Processing
Sometimes called NMT (neural machine translation). Let’s take a look at a simple
neural machine translation model that will translate English sentences to German.
To understand how difficult natural language translation is think about the naive
approch - map words between two languages 1-1 and when you have a sentence - just
convert each word using that mapping. This totally ignores grammar of the language
for some very simple sentences it would be fine, but it would not fine:
This car is better -> Šis automobilis yra geresnis.
Contrary to popular belief, Lorem Ipsum is not simply random text. -> Priešingai
populiariam tikėjimui ... (semi-good although most people with some proficiency in
Lithuanian would say "Priešingai populiariam įsitikinimui". We have to use
different synonyms that are context-dependent).
So essentially NMT is learning the grammar of the language and even more broud
context-dependencies (in the semantic realm).
And because of the difficulty of the task neither
level 0: naive model nor,
level 1: rules based translation models solve it well. We need
level 2: statistical learning / neural networks.
Beyond classification: Translation
Natural Language Processing
Let's try that same sentence now: "Gardu. Nepigu. Interjero sprendimai pritaikyti
greitam pavalgymui, bet ne ilgesniam pasisėdėjimui su draugais."
Is targetting a specific language better than being a jack of all trades model?
Note: https://www.lrt.lt/naujienos/mokslas-ir-it/11/1085161/lietuviska-masininio-
vertimo-sistema-pralenke-google-microsoft-ir-kitus-technikos-milzinus (question:
does google translate use one or many models for translation). Currently LLMs are
multilingual, although trained on English mostly.
Try it: https://translate.tilde.com/#/ (another company that does something with
NLP in Lithuania: Tildė, TokenMill, Semantika.lt (VDU))
https://www.deepl.com/en/translator
Beyond classification: Translation
Natural Language Processing
Most of us were introduced to machine translation when Google came up with the
service. But the concept has been around since middle of last century.
Research work in Machine Translation (MT) started as early as 1950’s, primarily in
the United States. These early systems relied on huge bilingual dictionaries, hand-
coded rules, and universal principles underlying natural language.
In 1954, IBM held a first ever public demonstration of a machine translation. The
system had a pretty small vocabulary of only 250 words and it could translate only
49 hand-picked Russian sentences to English. The number seems minuscule now but the
system is widely regarded as an important milestone in the progress of machine
translation.
The paper is an interesting read still: http://www.hutchinsweb.me.uk/GU-IBM-
2005.pdf
Soon, two schools of thought emerged:
Empirical trial-and-error approaches, using statistical methods, and
Theoretical approaches involving fundamental linguistic research (rule-based
approaches)
In 1964, the Automatic Language Processing Advisory Committee (ALPAC) was
established by the United States government to evaluate the progress in Machine
Translation. ALPAC did a little prodding around and published a report in November
1966 on the state of MT. Below are the key highlights from that report:
It raised serious questions on the feasibility of machine translation and termed it
hopeless
Funding was discouraged for MT research in the report
It was quite a depressing report for the researchers working in this field
Most of them left the field and started new careers
Not exactly a glowing recommendation!
A long dry period followed this miserable report. Finally, in 1981, a new system
called the METEO System was deployed in Canada for translation of weather forecasts
issued in French into English. It was quite a successful project which stayed in
operation until 2001.
Brief history
Natural Language Processing
The world’s first web translation tool, Babel Fish, was launched by the AltaVista
search engine in 1997.
And then came the breakthrough we are all familiar with now – Google Translate. It
has since changed the way we work (and even learn) with different languages.
Breaf history
Natural Language Processing
In short: launched 2006, used statistical models rather than rule-based models,
2016 started using neural machine translation (SMT -> NMT) which was an offshoot of
Google Brains project (Jeff Dean and Andrew Ng). It was a first demonstration of
zero-shot transfer learning in MT when the engine was able too translate
Korean⇄Japanese having only been trained on Japanese⇄Korean.
https://en.wikipedia.org/wiki/Google_Translate
https://en.wikipedia.org/wiki/Google_Neural_Machine_Translation
https://arxiv.org/abs/1609.08144 (... our model consists of a deep LSTM network
with 8 encoder and 8 decoder layers using attention and residual connections + many
more optimizations or accuracy, training and inference speed)
Let's see a video about how google translate works: https://www.youtube.com/watch?
v=AIpXjFwVdIE
A much longer video about google translate, a few important points at the begging:
https://www.youtube.com/watch?v=nR74lBO5M3s Note, this video talks about language
recognition which is an important ML application. We did not train a model to
reconize a language but it is important for google's translation project since, as
you probably have seen, it tries to guess which language you are using. How would
you phrase this problem - is it a regression, a classification, a summarization, a
translation problem?
The application fails to translate Monthy Pythons "The funniest joke in the world":
"Wenn ist das Nunstück git und Slotermeyer? Ja! Beiherhund das Oder die
Flipperwaldt gersput!" w/ a FATAL ERROR (it did so in 2017 and still does in 2021,
it does not in 2022)
Google translate
Natural Language Processing
The most common metric used in NMT is the BiLingual Evaluation Understudy (BLEU)
score, which compares each translation produced by the model with several good
translations produced by humans: it counts the number of n-grams (sequences of n
words) that appear in any of the target translations and adjusts the score to take
into account the frequency of the produced n-grams in the target translations.
Can we compare to some superhuman translation score? We could, potentially, by
measuring comprehensibility. Translate instructions and see whether people perform
those instructions more correctly when they are translated by many profesional
interpreters and NMT. However this raises issues, like: where does translation end
and enhancement throught rephrasing begin (does the translator add anything). Would
this enhancement count as “beyond translation”?
With the BLEU score a superhuman translation would never be recognized.
More on this: https://www.youtube.com/watch?v=DejHQYAGb7Q
BLEU score
Natural Language Processing
BLEU score
Natural Language Processing
BLEU score
Natural Language Processing
Encoder-decoder is a sequence-to-sequence (some authors use: encoder: sequence-to-
vector + decoder: vector-to-sequence) architecture (takes in sequences for
predict() and outputs sequences). Came into view at 2014:
The classic paper on this is: https://arxiv.org/abs/1409.3215
Video introduction: https://www.youtube.com/watch?v=_i3aqgKVNQI
Sequence-to-sequence learning (Seq2Seq) is about training models to convert
sequences from one domain (e.g. sentences in English) to sequences in another
domain (e.g. the same sentences translated to French).
"the cat sat on the mat" -> [Seq2Seq model] -> "le chat etait assis sur le tapis"
NearestNeighbors(n_neighbors=5, algorithm='brute',
metric='euclidean').fit(feature_list)
How about distance metric? We said we are going to prefer cosine distance metric,
not euclidian? Actually we can imitate the cosine distance somewhat by normalizing
the data and then using euclidean distance, see:
https://stackoverflow.com/questions/34144632/using-cosine-distance-with-scikit-
learn-kneighborsclassifier
The outputs will not be cosine-like, but the behavior will be similar.
Of course, I would invite you to verify it, not just believe it blindly!
KNN for RIS
Reverse Image Search
t-SNE visualizing proximate images
Reverse Image Search
t-SNE (t-distributed stochastic neighbor embedding, pronounced: tee-snee) is a
dimenionality reduction that is similar to PCA but has nicer mathematical
properties. It is very usefull and often used in the industry to vizualize higher
dimensional data projected into a low dimensonal graph / cluster.
STUDY TIP: better learn the intenals of PCA and then try to understand t-SNE as it
is a bit more complicated and a bit less important then PCA.
Refs:
https://www.datacamp.com/community/tutorials/introduction-t-sne
https://www.youtube.com/watch?v=NEaUSP4YerM
https://www.youtube.com/watch?v=RJVL80Gg3lA - more technical explanation
What is perplexity? The video explains this as the measure of density between
plots, that has effects on how the final tsne plot is calculated. A tunnable
parameter, there are some indications that higher values produce clearer shapes,
ussually it varies between 5 and 50. Getting the most from t-SNE may mean analyzing
multiple plots with different perplexities. See:
https://distill.pub/2016/misread-tsne/
https://scikit-learn.org/stable/auto_examples/manifold/plot_t_sne_perplexity.html
t-SNE visualizing proximate images
Reverse Image Search
Comparing t-SNE vs. PCA.
If you got the picture below in a job interview and were asked to explain it?
We have two properties that are desirable
Similar items should be close together - strong clusterisation
Disimilar clusters of items items should be further appart
Ref: https://suneeta-mall.github.io/2022/06/09/feature_analysis_tsne_vs_umap.html
t-SNE visualizing proximate images
Reverse Image Search
//
t-SNE visualizing proximate images
Reverse Image Search
In real production RIS app we will most often perform which operation? It will be
the operation of finding the nearest neighbors of the image uploaded by the user.
This can be optimized!
Optimizations:
Standard features - 1024 (mobilenet) or 2048 (resnet), we can establish a benchmark
KNN with standard features the the popular architectures are producing. The feature
space if big.
PCA - what are the values for the metrics we care about after performing PCA (100
features)?
Brutefore, Balltree, KDtree - how fast is the KNN with these variations?
Use approximate NN libraries (see next slide)
Kernel weights are trained, i.e. found durring training. During that process they
specialize in detecting certain features.
Different kernels can contain different numbers, hence extract different features.
Applying kernel to image gives us the feature map (aka convolution result).
Convolutions
Computer vision and image classification
How is the convolution operation applied?
In a sliding window fashion, giving location invariance (feature is recognized at
top or bottom or the picture)
We have a vertical stride and a horizontal stride. And they are hyperparameters, so
tunnable (overlapping vs. non-overlapping strides).
The kernel size is also a hyperparameter (small size → ussually more efficient,
3x3, not 9x9. It can be even: 9x6)
Multiplication is performed on every channel separatelly (so 3 times in RGB):
Animation
Kernel will not stride outside the image, zero padding might be needed (we’ll talk
about it a bit latter).
Convolutions
Computer vision and image classification
//
Convolutions
Computer vision and image classification
//
Convolutions
Computer vision and image classification
Input of the convolution is an image (1st layer), output is also an image-like
thing (feature map).
So we can chain them.
Chaining two or more convolutions gives us layers of convolutions.
One convolution is composed of a stack of feature maps (multiple feature maps).
Convolution layers are sparse and reduce the number of hyperparameters (1 “neuron”
connected to multiple “pixels”: many input values conneted to 1 output value) i.e.
the same kernel is applied across the entire receptive field - parameter sharing.
Convolution layers
Computer vision and image classification
Calculation on layer weights size decrease due to “weight sharing” - because the
same kernel travel thought the entire image and only kernels have trainable weights
we get a sparse NN (thus solving the parameter explosion problem).
Arithmetic for the learnable synaptic weights (NOT ALL PARAMETERS, JUST TRAINABLE)
Note Conv layers have bias parameters that would change the calculation, but DNN
also have them, so the relative difference might not be affected.
Convolution layers
Computer vision and image classification
Lastly it is common for the output from the convolution to pass through an
activation function! Commonly ReLU is used but certainly you should try other to
see which performs best!
Let’s take a look at an animation that summarizes what I have explained
https://www.youtube.com/watch?v=f0t-OCG79-U
Convolution layers
Computer vision and image classification
Input to the pooling layer is the image-like / feature map (array if pixels) /
matrix.
Performs reductions / aggregations: max, avg. Ref:
https://pytorch.org/docs/stable/nn.html#pooling-layers
We say: “max pooling operation” also in a sliding window fashion.
The resulting matrix is much smaller also with highlighted important features.
Has size and stride. With conv. overlap is desirable, with pooling - don’t overlap
(stride >= polling dimensions)
Although similar to conv layer there is a big difference: there are no weights or
biases, just the operation.
Pooling layers
Computer vision and image classification
Pooling does not reduce the count of the feature maps, but reduces their size.
Pooling layers
Computer vision and image classification
Input is an image, output is an feature map of the Convolution + Pooling layer
groups.
Outputs between layers are also feature maps, but smaller and smaller due to
pooling, but deeper and deeper due to amount of feature maps. Conv. part of this NN
performs feature extraction!
Result of CNN layers is fead to FCFFNN, which spits out probabilities (4
classification).
CNN Architecture
Computer vision and image classification
Source: https://www.youtube.com/watch?v=m8pOnJxOcqY
CNN Architecture
Computer vision and image classification
Famous feature extraction image shows that convolutions are learning to extract
different levels of abstraction at each layer.
Image set rich in details or with a lot of differences in the image set diverse
dataset) would need more convolutions to learn more features.
CNN Architecture
Computer vision and image classification
Before creating a CNN, let’s see how convolutions and pooling operations work
We will see feature extraction and subsampling capabilities, some well known
filters.
Questions:
What is a convolution?
What is a kernel in the context of CNNs?
What do the pooling layers do?
Are convolution layers fully connectioned? → no
Do convolution layers have trainable parameters / weights? → yes
Do convolution layers have biases? → yes (can have)
Summary and questions
Computer vision and image classification
Training a ConvNet
Optimizing a ConvNet
What’s next
Computer vision and image classification
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Pytorch supports 8 CPU and 8 GPU tensor types (TF has 13).
Demo: simple tensor operations and conversions between numpy and torch tensors
PyTorch tensors
Introduction to Deep Learning
CUDA support - the standardized parallel computing platform and API allowing to run
computations on GPU.
PyTorch GPU
Introduction to Deep Learning
GPU executions are async be default
PyTorch GPU
Introduction to Deep Learning
Autograd library is a library that performs automatic differentiation (see:
https://en.wikipedia.org/wiki/Automatic_differentiation
Differentiation is the process of calculating the derivatives of functions and is
used by gradient decent process (durring backpropagation)
Note, the automatic differentiation package in Tensorflow is called tf.GradientTape
Demo: we now know enough about Pytorch to solve a simple linear regression problem.
Autograd is used by pytorch optimizers - optimizers are just “minimizers of error
fuction w.r.t model weights”.
PyTorch autograd and optimizers
Introduction to Deep Learning
Just like TF has the Estimator API and/or Keras on top of it, Pytorch has the
torch.nn module
PyTorch nn package
Introduction to Deep Learning
Pytorch differs from Keras as it does not hide: tensor creation, loading data and
model on the GPU and the training loop. Hence we have to add additional code for
the code to be adapted to both contexts - when GPU is available and when not.
We perform a check on whether GPU is enabled with code.
PyTorch device agnostic code
Introduction to Deep Learning
We can use HiddenLayer: https://github.com/waleedka/hiddenlayer
Demo: using hiddenlayer
Usages. Forged signature would be an interesting and quite easily doable personal
project.
Image Similarity Problem
Reverse Image Search
We can see these approaches being implemented in traditional software.
Non-DL approaches to image similarity
Reverse Image Search
Compare patches - rotation or cropping destroys the viability of this method.
Subtraction comparison - we will implement that.
Hashing - duplicates of an image can be found. One use case for this approach would
be the identification of plagiarism in photographs.
Histogram for RGB approach - used in image dedublication software that finds
repeating images on your hard drive. However small changes in hue, color or
lighting can make this method struggle.
MSE - mean squared error. We will implement that.
SSIM - structural similarity index measure. We will implement that.
Scale-Invariant Feature Transform (SIFT) - uses feature comparison, less helpfull
when comparing deformable / changing objects. Available in OpenCV.
Speeded Up Robust Features (SURF) - uses feature comparison, less helpfull when
comparing deformable / changing objects. Available in OpenCV.
Oriented FAST and Rotated BRIEF (ORB) - uses feature comparison, less helpfull when
comparing deformable / changing objects. Available in OpenCV.
More:
https://stackoverflow.com/questions/5730631/image-similarity-comparison
https://stackoverflow.com/questions/843972/image-comparison-fast-algorithm
Try it out:
A list of APIs that solve image similarity: https://rapidapi.com/search/image
%2Bsimilarity
Non-DL approaches to image similarity
Reverse Image Search
An ideal way to find similar images would be to use transfer learning. For example,
pass the images through a pretrained convolutional neural network like ResNet-50,
extract the features, and then use a metric to calculate the distance between the
image feature vectors like Euclidean distance.
Going through the convolution and pooling layers in a CNN is basically an act of
reduction, to filter the information contained in the image to its most important
and salient constituents, which in turn form the bottleneck features (headless CNN
produces feature maps (feature vectors, sometime called bottleneck features)).
Training the CNN molds these values in such a way that items belonging to the same
class have small Euclidean distance between them (or simply the square root of the
sum of squares of the difference between corresponding values is smallest) and
items from different classes are separated by larger distances - essentially what
the embedding layer is doing!
Does CNN perform dimensionality reduction?
224 * 224 * 3 = 150528
7 * 7 * 512 = 25088
DL approaches to image similarity
Reverse Image Search
Take a look at the image on the side, dinosour is as close to blonde dog as white
dog in terms of Euclindean distance. Cosine distance seems to work better in this
case.
Cosine similarity / distance is a metric used to measure how similar the vectors
are irrespective of their size. Mathematically, it measures the cosine of the angle
between two vectors projected in a multi-dimensional space.
The cosine similarity is advantageous because even if two similar documents are far
apart by the Euclidean distance (due to the size of the document), chances are they
may still be oriented closer together. The smaller the angle, higher the cosine
similarity. Cosine similarity ranges between -1 and 1 (cos 0 = 1, cos 90 = 0, cos
180 = -1).
There is some tension between euclidian (L2) and cosine distance, where they can
produce somewhat different results and it's generally accepted cosine distance
should be favored. However, I would probably advice trying both & thinking about it
as hyperparameter to KNN
We will generalize the cosine distance calculation to N dimensions to calculate the
similarity between images based on CNN calculated feature vectors. This technique
is used for recomendation systems, NLP for text similarity that is text-lenght-
insensitive as well.
More:
https://medium.com/@Intellica.AI/comparison-of-different-word-embeddings-on-text-
similarity-a-use-case-in-nlp-e83e08469c1c
https://cmry.github.io/notes/euclidean-v-cosine#when-to-use-cosine
Cosine vs. Euclidian distance
Reverse Image Search
We will use transfer learning models from keras trained on the ImageNet(1K) dataset
(source data).
We can use Caltech101 or any other dataset or image (target data) to see how
feature extraction works, see:
http://www.vision.caltech.edu/Image_Datasets/Caltech101/
Note: usually the target image is taken from source dataset, meaning that we are
literally uploading the same images to find similar items - like we could do in an
e-commerce setting, but a much more powerful application is one that works with
unseen images!
We will extract the features by calling predict on a headless transfer learning
model:
Feature extraction
Reverse Image Search
Mini Project: During training of a CNN you could monitor how a bottleneck features
for two images that belong to the same class get progressively closer and closer
together in the Euclidian feature space and create a gif illustrating that. “Do
cool s**t and put it on the internet”.
2 Level
1 Chapter
Today you will learn
Optimization techniques: cython, pypy
Process of optimization
01
02
03
Optimization techniques: numba
Python Crash Course
00
Python performance
05
Parallelism and concurrency
06
Multithreading
07
Multiprocessing
08
Async
04
Optimization techniques: others
09
Microoptimizations
Python performance
Python is an interpreted, dynamically typed language which makes it slow(er) for
certain kinds of operations compared to other languages (something like that … in
reality languages are neither slow nor fast - their runtimes are). To understand
the problem better let’s go though this article: https://hackernoon.com/why-is-
python-so-slow-e5074b6fe55b
Take note, that:
we are talking execution time, mostly for CPU bound processes (not I/O bound).
python is exploding in domains like ML and finance where speed is very important
developer time is more important than execution time in most cases (100K per year
with 10% productivity increase?). Established companies transitioned to “faster”
languages at some point in their evolution (twitter → ruby → java, facebook → php →
hack).
There are many measurements or metrics we can compare the performance of our
programms (+languages +libraries) against:
total execution time (s, ms),
startup time (s, ms),
responsiveness to user input,
ability to handle “spikes” (sudden changes in load),
memory footprint (MB, GB),
disk I/O max,
disk I/O total,
networking bandwidth usage (MB, GB),
susceptibility to network latency (and jitter),
usage of network sockets (#),
file handle usage (#) and on.
The most common and important ones are: time and memory usage.
How can we make our python programms faster? … to answer that we probably want to
prove that it is slow first.
NOTE: Python performance as a language will be OK in 95% of the cases, the biggest
bang for the buck is to be achieved by optimizing IO.
Python Crash Course
Optimization techniques: cython, pypy
Let’s compare Python with C and Js, ref for the implementation:
https://medium.com/codex/how-slow-is-python-compared-to-c-3795071ce82a
Horrible, what are the solutions? Some of them are (almost) zero-cost -
macrooptimisations:
Cython: with cython you write pseudo python code, build an extension in a separate
step and then import that extension as any python library and it should be much
faster because it’s in C! You don’t need to rewrite the whole app just the parts
that are slowest or most frequently called.
Pypy: performance almost for free! A different python interpreter, it needs to be
installed - https://www.pypy.org/download.html. Some disadvantages are noted here
(be careful they can be outdated, you need to benchmark if you want to be sure):
https://www.geeksforgeeks.org/why-pypy3-is-preffered-over-python3/
Mojo: (will see in the future)
Use the newest python version.
Python Crash Course
Optimization techniques: numba
Ref: http://numba.pydata.org/ and you can read a short 5min guide:
https://numba.pydata.org/numba-doc/dev/user/5minguide.html
Which code is best suited for numba? Functions with lots of loops, that use numpy
arrays and functions!
Let’s take a look at @jit, @njit, @vectorize
Some advanced usecases: paralelization, running on GPU (cuda)
https://towardsdatascience.com/supercharging-numpy-with-numba-77ed5b169240 and
https://numba.readthedocs.io/en/stable/cuda/index.html
In serverless / lambda environments it might not be easy to adapt it, but it is
useful to have in the toolbox.
Python Crash Course
There are multiple ideas on how to develop a routine for optimizing code and how to
think about it.
Usually it boils down to something like this, see: https://wiki.c2.com/?
RulesOfOptimization
Observe an actuall performance issue - there is no issue until someone says: “look
this is really slow, we need to make it faster”. Remember: “premature optimization
is the root of all evil” and “don’t fix it if it’s not broken”.
Measure the performance: benchmark and profile time and / or memory so you will
know that you actually improved something. There are multiple tools for that:
https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html
Make sure there are functional tests for the code you are about to optimize!
Find the most expensive part or the part that repeats the most - bottleneck. Or the
most often executed place.
Optimize the most expensive part first and “do the simplest thing first”. Remember:
you can literally rewrite half of the application 5-6 times when optimizing:
rearrange the code, try 5-6 libraries, apply bit twiddling hacks, try different
runtimes, I/O optimization (disk and networking), multithreading and
multiprocessing and even rewriting parts of code in other languages or … buying
better hardware!
Measure again and see if that is enough (ussally a 2-3x improvement is more than
enough for most users to notice and be happy, sometimes the optimization has a set
goal (a business requirement or technical requirement where another system depends
on the time your sistem completes a task and the chain fails if your code is not
fast enough)).
Ref: https://www.toptal.com/full-stack/code-optimization and
https://llllllllll.github.io/principles-of-performance/how-to-optimize-code.html
Process of optimization
Python Crash Course
Optimization techniques: others
Recommended talk: https://www.youtube.com/watch?v=YjHsOrOOSuI
… in short use python mechanisms instead of writing algorithms in python.
Python Crash Course
Parallelism and concurrency
Concurrency is the execution of multiple instruction sequences at the same time.
This is possible when the instruction sequences to be executed simultaneously have
a very important characteristic, which is that they are largely independent of each
other. This characteristic is important both in terms of the order of execution and
in the use of shared resources.
with order of execution, this means that the order of execution of these
instruction sequences should have no effect on the eventual outcome. If task 1
finishes after tasks 2 and 3, or if task 2 is initiated first, but finishes last,
the eventual outcome should still be the same.
with shared resources, the different instruction sequences should share as few
resources between each other as possible. The more shared resources that exist
between concurrently executing instructions, the more coordination is necessary
between those instructions in order to make sure that the shared resource stays in
a consistent state. This coordination is typically what makes concurrent
programming complicated. However, we can avoid many of these complications by
choosing the right concurrency patterns and mechanisms depending on what we are
trying to achieve. For example read-only operations are easy to make parallel.
Types:
… even instruction level parallelism on a single core of a CPU (SIMD)
Parallel programming - splitting the task into subtasks and assigning the tasks to
separate cores / processors of the machine to be executed simultaneously.
Multiprocessing: best for CPU-bound / CPU-intensive tasks: string processing
(regex), number crunch, search, graphics, etc. Multithreading: best or I/O bound
tasks: db reads/writes, web service calls, data download and upload i.e. file I/O.
Async programming (single thread concurrency) - useful for I/O bound tasks: db
reads/writes, web service calls, data download and upload i.e. file I/O.
Distributed computing / programming (parallelism over multiple machines): DDoS
(primitive version of distributed tasks), Hadoop / Apache Spark computations
(distributed computing engines).
… and so on (multiple datacenters).
Python Crash Course
Parallelism and concurrency
Multithreading
Python Crash Course
Multiprocessing
Multiprocessing
A process (running instance of a program w/ threads, properties and memory) do not
share memory space (like threads inside a process do) by default (shared memory
exists, but it’s not the default)! That means if we could clone a python process we
could potentially execute on multiple procesor cores at the same time!
Other advantages … and disadvantages.
The multiprocessing lib in python is very similar to the threading library, but
it’s easy to terminate processes from the parent process, while threads are not
terminatable so easily (they can disrespect termination singaling). However abrupt
terminating processes is only advised if they do not have access to shared
resources as killing a process will not prevent it from running finaly blocks and
other exit handlers potentially leaving the resource in an inconsistent state.
Let’s see a simple multiprocessing demo.
Python Crash Course
Multiprocessing
Multiprocessing
We can see that the multiprocessing version is slower? Why?
Collab does not allow for multiprocessing (as of 2022).
Spawning processes and threads takes time and CPU cycles! In fact a multithreaded
or multiprocessed app will always take more CPU cycles than the same app that does
not use these mechanisms simply due to process and thread management overhead
(start, join and so on, even without any synchonization mechanisms).
Sharing state
shared memory: multiprocessing.Value
manager processes
Synchornization
Same concepts as in threading lib
Python Crash Course
Multiprocessing
Main classes:
Process: represents individual process and provides methods for starting,
terminating, and managing it.
Pool: provides a way to create a pool of worker processes. It allows you to easily
parallelize the execution of a function across multiple input values. Manages the
creation and management of the worker processes and provides methods like map(),
apply(), and apply_async() for executing functions in parallel.
Queue: thread-safe First-In-First-Out queue implementation for communication
between processes (IPC). Used to pass data between the parent process and its child
processes.
Lock: a way to synchronize access to shared resources across processes. Mutual
exclusion - only 1 process can access resource at a time.
RLock: a reentrant lock, which means that it can be acquired multiple times by the
same process without causing a deadlock. This is in contrast to a regular Lock,
which would deadlock if a process tries to acquire it more than once.
Event: provides a simple way to communicate between processes using a flag. It
allows one process to signal an event, and other processes can wait for that event
to occur before continuing their execution.
Condition: provides a more advanced synchronization mechanism compared to Lock and
Event. It allows multiple processes to wait for a condition to be satisfied before
proceeding. It is often used in producer-consumer scenarios.
Semaphore: is a synchronization primitive that allows you to limit the number of
concurrent access to a shared resource. It maintains a counter that represents the
number of available resources, and processes can acquire or release the semaphore
based on this counter.
Barrier: synchronization point for a fixed number of processes. Allows processes to
wait until all participating processes have reached the barrier before continuing.
Python Crash Course
Concurrent.features
Main classes:
ThreadPoolExecutor: represents an executor that uses a pool of worker threads for
executing tasks concurrently. It allows you to submit functions or callable objects
for asynchronous execution and manages the creation and recycling of threads.
ProcessPoolExecutor: represents an executor that uses a pool of worker processes
for executing tasks concurrently. It is suitable for CPU-bound tasks that benefit
from parallel processing.
Executor: abstract class serves as the base class for ThreadPoolExecutor and
ProcessPoolExecutor. It defines a common interface for submitting tasks for
execution and managing their execution.
Future: represents the result of an asynchronous computation. You can obtain a
future object by submitting a task to an executor. Futures provide methods for
checking the status of the computation, cancelling it, and retrieving the result
when it becomes available.
as_completed(): takes an iterable of futures and returns an iterator that yields
completed futures as they become available. It allows you to process the results of
multiple futures as they finish, without waiting for all of them to complete.
wait(): takes iterable of futures and waits for them to complete. Allows to wait
for a group of futures to finish and optionally specify a timeout.
Python Crash Course
Async
Asynchronous programming is based on single threaded asynchrony for multiplexing
I/O bound workloads i.e. single thread switching between tasks, typically for I/O
bound workloads.
Historically came to prominence with event driven architectures: nginx, nodejs -
use event loops.
While event loop is internal in nodejs, in python if we decide to use async we can
manage the event loop - we need to engage it explicitly.
A great chess analogy: https://realpython.com/async-io-python/#async-io-explained …
and let’s run some simple demos from this tutorial.
With python use asyncio package to perform explicit asynchronous programming
We “decorate” our functions with async keyword making them into coroutines -
special functions that are wrapped and that have the ability to be executed
asychronously.
Let’s see a simple demo and get familiar with the syntax - some great examples in
the tutorial: https://docs.python.org/3/library/asyncio-task.html
Why is asynchronous programming useful? A single thread switching between tasks
when they are blocked in much more memory efficient than multiple threads that can
are separately blocked.
NOTE: just like with multiprocessing there are limitations when using async within
google collab: https://stackoverflow.com/questions/55409641/asyncio-run-cannot-be-
called-from-a-running-event-loop
Python Crash Course
Async
Main classes:
EventLoop: represents an event loop, which is responsible for executing coroutines
and managing their execution. The event loop provides methods for running
coroutines, scheduling callbacks, and managing timeouts.
Task: represents a coroutine wrapped in a future. It allows you to schedule and
manage the execution of a coroutine in the event loop. Tasks can be used to monitor
the progress and retrieve the result of a coroutine.
Future: represents the eventual result of an asynchronous operation. Futures
provide a way to interact with coroutines and receive their results. They can be
awaited, cancelled, and used to add callbacks for handling the completion of an
operation.
Semaphore: is a synchronization primitive that limits the number of coroutines that
can access a shared resource concurrently. It uses counters to control the
concurrency and allows coroutines to acquire or release the semaphore.
Queue: an asynchronous queue implementation that allows communication between
coroutines. Coroutines can put items into the queue and get items from it
asynchronously.
Lock: provides a way to synchronize access to shared resources in an asynchronous
context. It allows only one coroutine to acquire the lock at a time, ensuring
mutual exclusion.
Condition: provides an asynchronous equivalent to the Condition class in the
threading package. It allows coroutines to wait for a condition to be satisfied
before proceeding. Coroutines can wait, notify, or notify_all on a condition
object.
TimerHandle: represents a timer handle that can be used to schedule a callback to
be executed after a certain time interval. It allows scheduling timed events in the
event loop.
def f1():
while True:
pass
def f2():
while 1:
pass
print(dis.dis(f1))
print("---------")
print(dis.dis(f2))
TBD
// disassembling python
// try Pypy
// try Mojo
// do not parallelize until you optimize via elimination. If sequential operations
are expensive and take a lot of time, parallelizing them can overwhelm the
bottleneck system completely.
Python Crash Course
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Additional information
--- Content from 1ZXHvn6uwORVQh6P4Ok2CGorMvyyhNzhy.pptx ---
Artificial Intelligence
Sequential Data Analysis
2021
Lecturer
Mindaugas Bernatavičius
Note: forecasting / extrapolation is just accidental for FFT. And for the next few
models it will be the main purpose. Keep that in mind and don’t think that FFT and
ARMA are closelly related - they are not.
FFT, applications
Sequential Data Analysis
Autoregressive (AR) models operate under the premise that past values have an
effect on current values. AR models are commonly used in analyzing nature,
economics, and other time-varying processes. As long as the assumption holds, we
can build a AR model that attempts to predict value of a dependent variable today,
given the values it had on previous days.
Moving Average (MA) Model Assumes the current value of the dependent variable
depends on the previous days error terms.
The ARMA model is simply the combination of the AR and MA models.
AR + MA mechanisms are at the core of a whole family of linear autoregressive
models, which incluse ARMA, ARIMA, SARIMA, SARIMAX
Important: all of these models work well only for short forecasting
horizons/periods unless the pattern is simple to forecast. There seems to be no
rule of thumb as to how many (10%, 1%) values you can forecast, but at any rate you
are safest when you forecast just a single value - single value forecast should be
the most accurate.
Here is a theoretical introduction: https://www.youtube.com/watch?v=kaXKnjCvEUQ
AR, MA, AR+MA
Sequential Data Analysis
Autoregressive Moving Average (ARMA) method models the next step in the sequence as
a linear function of observations and residual errors at prior time steps (lags,
which are the parameters p and q).
The notation for the model involves specifying the order for the AR(p) and MA(q)
models as parameters to an ARMA function, i.e. ARMA(p, q).
An ARIMA model can be used to develop AR or MA models. This is a combination of AR
+ MA models with the added parameter to adjust for non-stationarity of the mean
function (i.e., the trend) via differencing. ARIMA(p, 0, q) is the same as ARMA(p,
q)
Types of ARMA Models:
ARIMA: Non-seasonal Autoregressive Integrated Moving Averages
SARIMA: Seasonal ARIMA used for seasonal data, see: https://www.youtube.com/watch?
v=IK67f3IItfw
SARIMAX: Seasonal ARIMA with exogenous variables.
We can use pyramind auto arima package: https://pypi.org/project/pmdarima/ -
although the library is called pmdarima, it does evaluate SARIMA models if
seasonality is set to true: auto_arima(..., seasonal = True, …)
AR, MA, AR+MA
Sequential Data Analysis
Parameter selection process for autoregregresive models:
ACF and PACF plots are used to determine the order or MA and AR terms respectively.
d - (differencing in ARIMA) check for stationarity with augmented dickey-fuller
test, choose the parameter by trying different values (possibly in a loop).
p - (for AR) use PACF, model order should be when the lags are not above
significance line
q - (for MA) use ACF, model order should be when the lags are not above
significance line
the residuals of the model need to have constant mean, no autocorrelation, be
normally distributed
If you violate these 3 principles in the residual data then you need to tune your
model or choose a better one because there is usefull information still there.
Or you can use pmdarima that can select for the best model.
Good resources:
https://www.baeldung.com/cs/acf-pacf-plots-arma-modeling
https://people.duke.edu/~rnau/411home.htm
AR, MA, AR+MA
Sequential Data Analysis
Anomaly vs. Outlier: Very important when searching for literature and material
(some books and articles are called 'outlier detection', others 'anomaly
detection')! These terms are not stricly defined nor differentiated and some argue
that there is no difference, especially in everyday talk, but in academic
literature and when searching for information online there is some difference: Read
more: https://datascience.stackexchange.com/questions/24760/what-is-the-difference-
between-outlier-detection-and-anomaly-detection
They can be overcome using other simple techniques, like detecting changes in
rolling standard deviation.
TODO: crest factor based, RMS, statistical moments - other feature engineering
techniques: https://www.analyticsvidhya.com/blog/2022/01/moments-a-must-known-
statistical-concept-for-data-science/
Deviation bands
Sequential Data Analysis
We can simplify anomaly detections with STL.
If you analyze deviation of residue and introduce some threshold for it, you’ll get
an anomaly detection algorithm. The not obvious part here is that you should use
median absolute deviation (3 or 4 sigma (three sigma rule, which states that 99.73%
of values in a normal distribution)) to get a more robust detection of anomalies.
The leading implementation of this approach is Twitter’s Anomaly Detection library
(available only in R, abandoned project, algorithm is described here:
https://arxiv.org/pdf/1704.07706.pdf ). It uses Generalized Extreme Student
Deviation test to check if a residual point is an outlier.
STL for anonamly detection
Sequential Data Analysis
Luminol is a light weight python library for time series data analysis created by
LinkedIn.
The two major functionalities it supports are anomaly detection and correlation
btw/ 2 ts.
It can be used to investigate possible causes of anomaly.
Ref: https://github.com/linkedin/luminol
Luminol library
Sequential Data Analysis
Prophet, or “Facebook Prophet,” is an open-source library for univariate (one
variable) time series forecasting developed by Facebook. Prophet implements what
they refer to as an additive time series forecasting model, and the implementation
supports trends, seasonality, and holidays (significant events).
Ref: https://facebook.github.io/prophet/docs/quick_start.html
Previously import fbprophet now import prophet
Less tuning - can be compared to auto arima in that sense.
Implementations: https://towardsdatascience.com/anomaly-detection-time-series-
4c661f6f165f and https://www.youtube.com/watch?v=0wfOOl5XtcU
Library link: https://github.com/facebook/prophet
Prophet library
Sequential Data Analysis
Envelope of an oscillating signal is a smooth curve outlining its extremes.
Analysis of (wave) envelopes emphasizes the extreemes of time signal. This is
contrary to a lot of techniques that would dismiss extremes in favor of more
regular pattern.
Very important in mechanical bearing analysis, ref:
https://www.bksv.com/media/doc/bo0187.pdf
https://en.wikipedia.org/wiki/Envelope_(waves)
https://hal.science/hal-01714334/document
Envelope analysis
Sequential Data Analysis
Fit a OLS regression for times series and compare to ARMA family models and others.
Try to prove that OLS is not as useful as autoregressive models (...or disprove
it).
You can explore whether ANOVA methods are used for time series analysis.
You can also investigate Markov models.
Explore the Luminol and Prophet libraries further - are there better libraries?
Which has more features.
Investigate different forecasting models for robustness against anomalies /
outliers (fit ARIMA on ideal sinusiod, introduce 1 point anomaly or step anomaly,
see what happens to the prediction, … etc.)
Further explorations
Sequential Data Analysis
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Does the network have any weights and biases, like a traditional FCFFNN? Yes.
RNN Structure and Representation
Sequential Data Analysis
Let’s take a look at these introductory videos:
https://www.youtube.com/watch?v=LHXXI4-IEns
https://www.youtube.com/watch?v=yZv_yRgOvMg (theory, contains raw RNN for NLP)
RNN Structure and Representation
Sequential Data Analysis
For training a simple FCFFNN we used the backpropagation algorithm, which was also
used for CNNs. When we add "memory" or "temporal dinamic behavior" things get
complicated. So while the problem is structural and suceptable to be solved by
FCFFNNs and CNNs we used backpropagation, but with RNNs we use BPTT or
"backpropagation through time" (see next slide).
Teacher Forcing - when “ground truth” (y) is re-fed into the model during training
rather than the models own predictions. It is quite counter intuitive and not well
explained / used in the literature. A list of books is in wikipedia entry for this
topic: https://en.wikipedia.org/wiki/Teacher_forcing . You can find this video to
be somewhat informative: https://youtu.be/C08mT2VSHGg?t=275 . Something similar to
teacher forcing is commonly used for seq-2-seq models / encoder-decoder
architecture.
TBPTT - truncated backpropagation through time. Truncated Backpropagation Through
Time, or TBPTT, is a modified version of the BPTT training algorithm for recurrent
neural networks where the sequence is processed one timestep at a time and
periodically (k1 timesteps) the BPTT update is performed back for a fixed number of
timesteps (k2 timesteps). https://machinelearningmastery.com/gentle-introduction-
backpropagation-time/
Training
Sequential Data Analysis
Backpropagation through time - the staple technique for training feedforward neural
networks is to back propagate error and update the network weights. Backpropagation
breaks down in a recurrent neural network, because of the recurrent or loop
connections. This was addressed with a modification of the Backpropagation
technique called Backpropagation Through Time or BPTT.
[Need to clarify] Instead of performing backpropagation on the recurrent network as
stated, the structure of the network is unrolled, where copies of the neurons that
have recurrent connections are created. For example a single neuron with a
connection to itself (A->A) could be represented as two neurons with the same
weight values (A->B). This allows the cyclic graph of a recurrent neural network to
be turned into an acyclic graph like a classic feed-forward neural network, and
Backpropagation can be applied.
This unrolling is the depth of the RNN. Because of the fact that sequences passed
thought the RNN can be very long it is unussual to see many layers in RNNs (3-5 is
already quite big) - shallow.
Ref: TODO
Training
Sequential Data Analysis
We will formulate our problem like this – given a sequence of 50 numbers belonging
to a sine wave, predict the 51st number in the series (forecasting problem).
When working with RNNs we have to understand another concept: the data needs to fed
in a specific format.
X = (seq_count x timesteps_in_each_seq x number_of_features_at_each_timestep) and Y
(seq_count x predicted_feature_count_for_each_timestep) E.g.: (100, 50, 1), (100,
1) - we are predicting a single number for sequence of which we have a hundred.
We can create an RNN from scratch:
https://www.analyticsvidhya.com/blog/2019/01/fundamentals-deep-learning-recurrent-
neural-networks-scratch-python/
More object oriented approach: https://github.com/pangolulu/rnn-from-scratch
Simple RNN
Sequential Data Analysis
We have to have input as 3rd order tensors - why?
We will see latter that RNNs, LTSMs require 3D input for the X values. This is
often very confusing for beginners and people who return back to the field after a
break (make sure you remember this, because even if you are an employeed data
scientist you might not work with RNNs and forget this information, which is
important in case you want to understand RNNs. If you don’t understand them you can
still train them, but less effectivelly). What are those 3 dimensions:
Samples / how many sequences. One sequence is one sample. A batch is comprised of
one or more samples (batch size not specified).
Time Steps / how many values in each sample. One time step is one point of
observation in the sample.
Features / collumn count. One feature is one observation at a time step. This is
essentially how many values at each time step you have (unidimensional vs.
multidimensional data).
Simple RNN
Sequential Data Analysis
If you have two collumns - you have two features (multivariate time series) and you
want to feed it into the network as two samples of size 5 that both fave 2
features. Presure and temperature
Simple RNN
Sequential Data Analysis
The following things hold true working with RNNs:
The RNN input layer must be 3D.
The meaning of the 3 input dimensions are: no. samples, time steps, and features.
The RNN input layer is defined by the input_shape argument on the first hidden
layer (in keras, for example).
The input_shape argument takes a tuple of two values that define:
the number of time steps per sequence (none) and
number of features (“columns in a table”, in keras) … we’ll see that latter, ref:
https://stackoverflow.com/a/61155200/1964707
The number of samples is assumed to be 1 or more.
The reshape() function on NumPy arrays can be used to reshape your 1D or 2D data to
be 3D.
The reshape() function takes a tuple as an argument that defines the new shape.
Simple RNN
Sequential Data Analysis
There are numerous resources for RNNs in Keras on the internet, the essential
theory
Add one of three popular types of recurrent layers SimpleRNN, GRU, LSTM (the last
two will be covered latter) into a sequential container (at least for most cases).
See: https://keras.io/api/layers/recurrent_layers/
You can stack them together if the moder is not powerful enough (obtain deep RNN)
If they are stacked set return_sequences=True in all layers intermediate layers
(last RNN layer is an exception and to this rule and there is more theory involved
here). If you don’t the output will be 2D and the next layer will complain.
Keras has complex RNN layers with multiple regularizers, multiple initializers
(these are due to multiple weight matrices associated with a single cell/neuron),
dropout for the layer and the hidden state, even more than one activation function
that can be different. We will cover some of the options in the future, but please
spend some time to research other options for your specific tasks/problems.
RNNs in Keras
Sequential Data Analysis
Simplest RNN
model = keras.models.Sequential()
model.add(keras.layers.SimpleRNN(1, input_shape=[None, 1]))
That’s really the simplest RNN you can build (contains a single layer, with a
single recurrent unit).
We do not need to specify the length of the input sequences, since a recurrent
neural network can process any number of time steps (this is why we set the first
input dimension to None).
By default, the SimpleRNN layer uses the hyperbolic tangent activation function.
The initial state h is set to 0, and it is passed to a single recurrent neuron,
along with the value of the first time step, x . The neuron computes a weighted sum
of these values and applies the hyperbolic tangent activation function to the
result, and this gives the first output, y . In a simple RNN, this output is also
the new state h . This new state is passed to the same recurrent neuron along with
the next input value, x , and the process is repeated until the last time step.
Then the layer just outputs the last value, y. All of this is performed
simultaneously for every time series (in a batch).
For each neuron, a FCFFNN model has one parameter per input and per time step, plus
a bias term (total of 51 parameters). In contrast, for each recurrent neuron in a
simple RNN, there is just one parameter per input and per hidden state dimension
(in a simple RNN, that’s just the number of recurrent neurons in the layer), plus a
bias term. In this simple RNN, that’s a total of just 3 parameters. This does not
mean that the RNN model takes less memory and obviously not that it is faster to
train or infer!
RNNs in Keras
Sequential Data Analysis
Deep RNNs - layering
Given a standard feed-forward multilayer Perceptron network, a recurrent neural
network can be thought of as the addition of loops to the architecture. For
example, in a given layer, each neuron may pass its signal latterly (sideways) in
addition to forward to the next layer. The output of the network may feedback as an
input to the network with the next input vector and so on.
If your model “hits a wall” in the training process and the error is not decreasing
and remains stable, that is probably due to the lack of internal state space /
power of the network you can make it multilayered.
model = keras.models.Sequential()
model.add(keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]))
model.add(keras.layers.SimpleRNN(20, return_sequences=True))
model.add(keras.layers.SimpleRNN(1))
Make sure to set return_sequences=True for all recurrent layers (except the last
one, if you only care about the last output). If you don’t, they will output a 2D
array (containing only the output of the last time step) instead of a 3D array
(containing outputs for all time steps), and the next recurrent layer will complain
that you are not feeding it sequences in the expected 3D format. This parameter
essential says that the previous layer should pass all the output's / predictions
to the next layer.
RNNs in Keras
Sequential Data Analysis
Batch size in case or RNN indicates how many sequences at a single iteration are
passed to NN
Explaining Batch Size
Sequential Data Analysis
Consider the following taxonomy of sequence problems that require a mapping of an
input to an output (from Andrej Karpathy).
One-to-One: image classification.
One-to-Many: sequence output, for image captioning, classical example - music
generation by genre.
Many-to-One: sequence input, for sentiment classification, next ts stock
forecasting.
Many-to-Many: sequence in and out, for machine translation (phrase in one language
→ phrase in a another) (encod.-decod.)
Synced Many-to-Many: synced sequences in and out, for video classification.
Implementation of each: https://stackoverflow.com/questions/43034960/many-to-one-
and-many-to-many-lstm-examples-in-keras
RNN taxonomy
Sequential Data Analysis
We can discuss another type of taxonomy/classification of RNNs, not based on ratio
of input to output but based on the internal mechanism of the neuron, also called a
“cell”.
Schematically (see side picture).
Why are these variations? They solve the problem of simple RNN: Short term memory -
inability to effectively use information that is farther down the chain, simple
RNNs have a hard time learning long from long sequences.
Let’s check this nice summary: https://www.youtube.com/watch?v=8HyCNIVRbSU
LSTMs and GRUs
Sequential Data Analysis
Some additional things to remember:
LSTMs and GRUs are faster to converge than DenseNNs and SimpleRNNs.
LSTMs are more complicated than GRUs.
Historically GRU came latter than LSTMs and where invented to simplifying LSTMs
while retaining most of the power they have.
There is no consensus as to which one to use where: as ussuall you can/need to try
both to see which one works better. Additionally if we try all we can learn whether
our data has long time/lag dependencies.
There are some general considerations: LSTMs have historically been proven to be
more powerfull and flexible. However due to their complexity they are slower and
not so practical when building complicated big recurrent networks. So there is a
tradeoff: if your data requires bigger network - GRUs could be a good choise, if
your data requires more powerfull cells - LSTMs are ussually advised. However some
research has argued that GRUs are more powerful in most cases (citation needed).
However: https://www.researchgate.net/publication.
LSTMs and GRUs
Sequential Data Analysis
Handling variable lenght sequences
We have said that RNNs can accept variable length sequences.
It is possible to do that with ragged tensors: See:
https://stackoverflow.com/questions/62031683/ragged-tensors-as-input-for-lstm
Most of the time variable lenght sequences are not being passed to RNNs, they are
padded. However, when using frameworks the padding is not that simple, we use
framework specific metchanisms for that - see this tutorial for a full discussion:
https://www.tensorflow.org/guide/keras/masking_and_padding#mask_propagation_in_the_
functional_api_and_sequential_api - this involves padding: the process of adding
fake/ignorable data - and masking: telling the NN model to ignore padded data via
mask flag.
And: https://stats.stackexchange.com/a/452205/162267
Note - for time series data there is not much need for padding, the data can be
naturally assembled into several hundred sequences of the same length using numpy
eitherway.
Also: https://datascience.stackexchange.com/questions/26366/training-an-rnn-with-
examples-of-different-lengths-in-keras/27879#27879
Sequential Data Analysis
Keras previously had CuDNNGRU and CuDNNLSTM layers for training RNN architectures
on a GPU.
Now keras LSTM and GRU layers implicitly default to GPU implementations under
certain conditions.
What are those conditions: https://stackoverflow.com/a/60468424/1964707
TODO: Demo
TODO: What happens with Pytorch
Training on GPU
Sequential Data Analysis
Do RNNs have biases and how are they connected?
https://stats.stackexchange.com/questions/169329/bias-inputs-in-an-rnn/169555 +
use_bias=True by default in keras
What are Bidirectional RNNs (BRNNs). Biderectinal RNNs can take past and future
sequence values into account. Very usefull for analyzing sentences in natural
language processing. They can not be used for real time tasks, only when you have
the whole data, e.g.: "I think Teddy Roosvelt was a good president!" (Teddy depends
on president). We will be learning about those in NLP part. With stock prices BRNNs
would not be useful for forecasting but might be useful for data imputation.
Are RNNs the only ones capable of handling sequences? No. Transformers and other
architectures can also do that, see excerpt.
Is data fed into RNN one time-step at a time (very important to use the term-
timestep, LLMS start talking about batching if you do not specify something like
“one value at a time”, be precise). Yes in the same batch.
2 Level
1 Chapter
Today you will learn
Overview
01
02
Data Structures
Python Crash Course
00
Why Pandas? Installation
Data input formats
03
04
Grouping operations
05
06
07
Indexing and filtering
Datascience pipeline and datascience pyramid
08
Vizualization and matplotlib
Pandas performance
Why Pandas? Installation
Pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool, built on top of the Python programming language + other deps.
Ref: https://pandas.pydata.org/
Pandas is the defacto tool for tabular data in data analysis tasks.
The main benefits: speed, size of the community giving support, intuitive
dataformats used to represent the data.
Installation is simple with pip , and pandas is already installed in google colab.
Built on top of numpy, so we have that as a dependency.
Additionally we will see more dependencies latter on that various pandas methods
use behind the scenes, ref. Please use these integration tools if possible because
they are often more performant than doing things in python.
https://pandas.pydata.org/docs/getting_started/install.html#optional-dependencies
Python Crash Course
Overview
Main points:
“goal of becoming the most powerful and flexible open source data
analysis/manipulation tool available in any language”
Suitability for heterogenous tabular data.
Easy indexing, slicing and munging (cleaning and transforming)
… and much more:
Ref: https://pandas.pydata.org/docs/getting_started/overview.html
Performing operations @ the database level vs @ pandas vs. @ python. Example: load
data from DB and data enrichment from file - perform as many operations in DB then
pandas (for data loaded from file).
Since we are working with a Dataframe that has two dimensions, what should we do
with a highly nested JSON? Pandas has a tool for flattening hierarchical data like
json: pd.json_normalize()
Demo: https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-
flattening-json-13eae1dfb7dd
Examples:
for a, group in df.groupby(a) → return dataframe groupby object (<class
'pandas.core.groupby.generic.DataFrameGroupBy'>).
This returns multiple dataframes with all same values in the column we grouped by.
df.groupby(a)[col].head() → shorthand form
df.groupby(a).agg(func) → apply aggregation
df.groupby(a).transform(func) → apply transformation
df.groupby(a).filter(func) → apply filter
2 Level
1 Chapter
Today you will learn
Histograms
01
02
Line charts
Python Crash Course
00
Data Visualization
Scatter plots
03
Bar graphs
04
05
06
Other graphs and charts
Aggregating and sampling
07
Lying with graphs
08
Bokeh and Seaborn
Data Visualization
Data visualization is a very important process in the data science pipeline and
skill in your carrer.
Often one can get a much better insight visually
People working with data often want to present their findings in meetings visually
to be more convincing
We already know how to visualize data using pandas, but this time we will talk more
about visualization techniques and when to use which, anwer questions like: when
should we use piechart, numeric histogram and so on.
Additionally we will include a few tools that are commonly used for data
visualization in the python world, like: bokeh and seaborn.
Heatmap - used to show the comparative value and intensity of a variable that is
dependent on two independent variables. Very powerful visual technique for finding
outliers / intensity areas / specialization / high correlation and so on. Examples:
Farmers vs. products → size of the yield
Book titles vs. bookstores → sales
Students vs. subjects → grades
Histogram - how many items in our dataset fall into specific category/range
(TTFB / response time buckets)
Line chart - evolution over time, often for comparison (APPL stock price)
Scatter plot - correlation between two variables (weight vs. height) - least
interpretation, most raw visualization
Bar chart - usefull for comparing different categories against the same
metric (profit vs. cost), sometimes time evolution
Piechart - how much of a proportion of the whole does specific category
occupy
Heatmap - intensity of variable that depends on 2 independent variables
Boxplot - variables max/min/25th percentile/75th percentile/median and
outliers - all in one!
Funny enough matlab plotly has 3d pie chart and python version does not
https://plotly.com/matlab/3d-pie-plots/
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Remember: with NNs we are trying to achieve a constant learning curve as the
network gets exposed to more data it should learn something. If the learning curve
is flat or declining we have a problem.
Improvements to the model
Recommender Systems
Fast.ai framework takes collaborative filtering seriously
… they have a dedicated part in their course (https://youtu.be/cX30jxMNBUw?t=5114 -
precise point)
… and in their library documentation for collaborative filtering
(https://docs.fast.ai/collab.html)
See the link to get the from scratch implementation and some more explanations:
https://colab.research.google.com/github/fastai/fastbook/blob/master/
08_collab.ipynb#scrollTo=AF5XmS6Fwtya
Let’s implement a recommender using goodbooks dataset (this is the second dataset
for recomm.). This dataset can be used for content-based filtering using tags.
Fast.ai and book recommendations
Recommender Systems
collab_learner() is dual - can be dot product based and can neural network based.
Inter retation of the bias term - the fast.ai documentation states that a single
number learned per movie - the bias parameter represents the “intrinsic value of
the movie”. How did they arrive at this conclusion? Correlation between the average
rating of all people vs. bias for each movie is strong (we should verify that).
We can research: what does the user bias represent. How - just check various
parameters that it correlates to. It might be that the user bias will represent
“the intrinsic value of the user” :D … that would obviously be in the context of
this dataset, maybe users that have the most reviews would be though as valuable.
Check the correlation between user review count and the bias (we can do that with
model trained on Keras or Fast.ai).
We learned an important lesson about model interpretability - check what various
parameters are correlated with. By that we can understand what the neural network
(or other models) represent with that parameter.
Fast.ai and book recommendations
Recommender Systems
Apply the knowledge you gained here to a new dataset.
Research: take any neural network we trained and check the bias of the last layer.
Is it correlated with anything? For example maybe we are trying to guess the flat
price by size and #rooms and the bias is high when the room count it big and low in
reverse case (just one possible example).
Most recommenders (at least in the literature) do not take into account how
recently the evaluation of a movie (and thus adjustment to users preferences) was
made. Movies watched by the user recently probably should have higher impact on the
current taste of the person than those made 15 years ago (IMDB accounts are >15
years old). How would we implement that? We would save a timestamp of when the
evaluation was made and then some logic to dampen the evaluations as they become
old: eval - (eval * 0.1 * N_years) … of course real implementation would be
different. Recommendation systems should not be time-dependent. Also, keep in mind
that the algirthms do not say that the item recommended is better in any
substantive / objective manner.
Further explorations
Recommender Systems
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Recommender Systems
Detailed course plan
Slides, tasks and so on
Additional information
2 Level
1 Chapter
Today you will learn
Goals of EDA
01
02
Datasets and Demo
Tabular Data Analysis with Pandas
00
What is EDA
Common questions
03
//
04
05
06
//
//
07
//
08
//
What is EDA
EDA - exploratory data analysis. It is nothing more than common sense (+ stats and
visualization techniques) applied to unknown data - if you see a dataset (a bunch
of images, videos, songs, texts, excel or csv tables) what questions will you ask
about it? What immediate things would you like to know?
Images: how many? RGB, CMYK, BRG? Expected categories? Incorrect categories?
Videos: how many? Size? Format? Expected categories? Exceptions to the common
rules?
Tabular data: dimensions, column data types, columns as data features, missing
values? Nan?
Time series data: missing values…
Text data: …
Although typically data science process is depicted linearly - it is obviously not
linear (side images).
EDA + Transforms / Cleaning + Feature engineering > 50% of data science process /
data project time.
https://ocw.mit.edu/courses/6-s897-machine-learning-for-healthcare-spring-2019/
video_galleries/lecture-videos/
Unlike NLTK, which is widely used for teaching and research, spaCy focuses on
providing software for production usage.
spaCy also supports deep learning workflows that allow connecting statistical
models trained by popular machine learning libraries like TensorFlow, PyTorch or
MXNet through its own machine learning library Thinc.
Using Thinc as its backend, spaCy features convolutional neural network models for
part-of-speech tagging, dependency parsing, text categorization and named entity
recognition (NER).
Prebuilt statistical neural network models for 23 languages (English, Portuguese,
Spanish, Russian, Chinese … )
There is also a multi-language NER model.
Additional support for tokenization for >65 languages.
Ref: https://spacy.io/
Intro
Natural Language Processing
How much of a tradeoff are we talking about?
Efficiency vs. accuracy
Natural Language Processing
//
TRF vs. Other model types
Natural Language Processing
//
Further explorations
Natural Language Processing
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Automatic reshaping - we can reshape w/o specifying one of the dimensions (any 1
dimension can be specified as unknown dimension: -1).
Python Crash Course
Vectorization
Numpy can convert a function to work with arrays automatically even if it is
declared to accept a single parameter
it’s a convenience tool, it does not make the code run faster (?).
Many tricks implemented in numpy can become research projects for the curious
minds, I would encourage anyone who is interested in how numpy speeds things up to
research that. G. Hotz implements fast matrix multiply, comparing it to numpy using
FLOPS: https://www.youtube.com/watch?v=VgSQ1GOC86s
See: https://stackoverflow.com/a/3379505/1964707
Python Crash Course
Numpy financial
Provides financial functions to help financial calculations.
Needs to be installed and imported separately: import numpy_financial as npf
Refs:
Repo: https://github.com/numpy/numpy-financial
Docs: https://numpy.org/numpy-financial/
Function reference: https://numpy.org/doc/1.17/reference/routines.financial.html
Python Crash Course
Miscellaneous operations
np.argmax() , np.argmin() - get the position of the biggest or
smallest values
np.argsort(arr, kind=’mergesort’) - sort indexes by values
arr[np.argsort(arr, kind=’mergesort’)] - display values in sorted order
np.linspace(0, 5, 5) - return evenly spaced numbers over a
specified interval.
np.geomspace() - geometric progression
np.random.randint(10, 50, size=(2, 3)) - generate random matrix
np.random.randint? - display the docstring
saving numpy objects for future work:
Python Crash Course
Digression on statistical distributions - recommended: uniform, pareto,
gausian/normal.
Numpy performance
Numpy is fast. Much faster than regular python when it comes to array processing,
some estimates range from 2 to 1000 times faster.
Some tests performed and described: https://towardsdatascience.com/how-fast-numpy-
really-is-e9111df44347
There are numerous reasons for that:
Dense, homogenous arrays with the added benefit of locality of reference.
Reimplementation of common operations with C (do not reduce the answer to “Numpy is
implemented in C” as Python list are also C - this is a common mistake to make)
SIMD, vector processing instruction usage with 3rd party BLAS/LAPACK linear algebra
processing libraries:
https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
Some parallelism is also involved (BLAS has some multithreading)
Take note, if you try to optimize a regular program by adding numpy it might not
work as numpy performance benefits begin to show up only for large arrays:
https://stackoverflow.com/questions/52603487/speed-comparison-numpy-vs-python-
standard
Refs:
https://stackoverflow.com/questions/8385602/why-are-numpy-arrays-so-fast
https://stackoverflow.com/questions/41365723/why-is-my-python-numpy-code-faster-
than-c
Python Crash Course
Practical Task P2
Two choices / two tasks:
There is a dog breeder by picture shredding (googlable) meme on the internet. Some
say that this is a proof that we live in a simulation (see videos). Refs:
We live in a matrix video:
https://www.tiktok.com/@aleksandr.ne.bloger/video/7008549854318251265?lang=en
https://www.youtube.com/watch?v=f1fXCRtSUWU
https://digg.com/video/shredder-multiplying-photos
https://laughingsquid.com/a-clever-collage-shredding-trick/
Your task is to try and do that same, but with numpy. “Prove that we live in a
simulation” by cutting, rotating the cut images and displaying the intermediate
results. Include the final result (picture) into github readme or collab notebook.
Please provide the results as a github link to the code or google colab link to the
notebook.
Combine the knowledge you obtained from part1 with what we are learning in part2:
Scarpe some data from the internet (does not need to be complex).
Initialize a numpy array from/with that data.
Perform some calculations over it (mean / average, sum, other simple descriptive
statistics, and some array filtering must be included).
Please provide a complete working solution that has some documentation on how to
run it and what it does (readme / collab).
Please provide the solution as either a github repo or collab notebook with all the
data necessary to just launch it.
** You can augment your previous PP1 if you want.
Python Crash Course
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Python Crash Course
Detailed course plan
Slides, tasks and so on
Additional information
import cv2
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
while rval:
cv2.imshow("preview", frame)
rval, frame = vc.read()
key = cv2.waitKey(20)
if key == 27: # exit on ESC
break
cv2.destroyWindow("preview")
vc.release()
Connecting to a webcam
Advanced Computer Vision
Launch on GPU
Train with custom dataset and hyperparameters
Real time YOLO based object detection - use your webcam.
Further explorations
Advanced Computer Vision
For this part
use any of the tools / approaches (YOLO, RetinaNET, SRCNN) and use it on your own
data.
* end2end project with labeling and single object detection.
** real time object detection with webcam
Write a short paragraph on what you learned while implementing a solution for this
specific task (not part 13 of the course, just the task) (5 sentences / ideas
minimum).
Please provide a link to the collab notebook (double check the share options of the
notebook) when finished for review and evaluation.
Practical Project 13
Advanced Computer Vision
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
ML Algorithm Classification
01
02
ML Pipeline
Machine Learning
00
Machine Learning Definition
Scikit-learn
03
04
Bias-variance tradeoff
05
06
07
Regression Intro
ML as a service
08
Data Science Process Frameworks
ML Certification
09
MWE
Machine Learning Definition
We have a few main goals in the first lecture:
Define Machine Learning and distinguish, separate it from other, related fields.
Walkthrough the complete ML pipeline process.
Explain what regression problem is.
Explain the different learning algorithms that can solve regression problem.
The most confusing part for the students is the plethora of ML models that exist:
when to use which
how do they differ
why are there so many algorithms that can be used for the same problem.
This is not the lecture that will answer this question completely, but it will be a
start to answer this question! And the important thing: pay attention and ask
yourself did I understood when to apply a particular model - this is the most
important thing in this part.
Source: https://en.wikipedia.org/wiki/Machine_learning
We can note, that ML training is an iterative process (what isn’t these days) -
this means the model is never finished. You achieve an MVP, release it and improve
it. Source: https://towardsdatascience.com/not-yet-another-article-on-machine-
learning-e67f8812ba86
Files (with different formats), SOAP / REST / GQL endpoints, SQL database, non-
relational / no-sql databases, sensors in proprietary formats - all structured
data. Unstructured data like audio, images and videos are also commonly used.
Even if we take textual data we can realize how many different data source we can
have: tweets, facebook posts, emails, blogposts, notes in notepad can all
datasources of a textual type.
Python Crash Course
ML Pipeline
3. Data Preparation: Since the data can be raw and unstructured, it is rarely in
the correct form to be processed. It usually involves filling missing values or
removing duplicate records or normalising and correcting other flaws in data, like
different representations of the same values in a column for instance, dropping
outliers, etc. This is where the feature extraction, construction and selection
takes place too. It is important to remember: most ML models work only with data
expressed / encoded into numbers - so non numeric data needs to be converted. This
is often the most time consuming step in the ML pipeline, some estimates say that
this is where 40%-70% percent of time is spent [citation needed]
Feature engineering:
Apply domain knowledge to transform features or create new ones (e.g.: hemoglobin /
lipid qoutient)
Simple examples: turn date of birth into age as it will be simpler to understand
the the model most likely, convert weekdays / month names to numbers. If you have
patient data, maybe you would crete BMI collumn from height and weight.
Feature scaling / normalization:
Most machine learning models work based on data that is of similar scale.
If you have thousands of metters and tens of grams it might be a good idea to
express distance as 1.2Km instead of 1200m.
For example K-means clustering algorithm relies on Euclidian distance and is
affected by the scale of different attributes of your datapoints.
4 ways of data scaling: https://en.wikipedia.org/wiki/Feature_scaling important:
normalization and standartization.
Questions:
What kind of data preparation could be done on a video?
What kind of data preparation could be done on an audio recording? (cutting off the
silence at the beginning and end - can we detect it? Can we detect it
deterministically). Do we need meta information on audio file, can we remove it?
how much data should we use from all the data we have for each set? 70/30, 80/20,
90/10
should this number change if we have a lot of data? small amount of data? The less
data, the more we use for training. This is because ML/DL models are inherently
quite hungry for data, so no complex problem can be solved w/o some minimal
threshold of data that would be able to satisfy the complex model modeling the
complex problem domain (data augmentation).
do we just split the data randomly into 2 big portions? No, we randomize samples!
Sensitvity to initial samples: imagine you were training Tarzan to recognize cars
and showed red cars at the beginning - he might associate red-ness with car-ness,
next samples will have to negate this false connection between conpcets red and
car! Thats why we randomize data. We can test samples if they have similar
statistical properties: if you dedicate all expensive flats to the test dataset and
the train dataset will contain only cheap flats, the model might work well for
cheap flat prediction, but fail for expensive ones and you will waste time on
searching for better model while the only problem was with your skill of data
splitting - both test and train (and validation) should have the same (or just very
similar) statistical properties as the entire dataset.
what if we have series/sequential data to avoid selection bias? Should we split
simply have a cut-off point or randomize the datapoints so that test and split data
would have values from any period assigned randomly (hint: this depends of whether
there is a time related correlation / trend). EKG classification for heart disease
vs. stock price prediction.
Python Crash Course
ML Pipeline
Python Crash Course
4. Data Segregation (cont.)
ML Pipeline
Python Crash Course
4. Data Segregation (cont.)
Selection bias and why need to shuffle and randomize the samples:
In the standard formulation of machine learning problems, the learning algorithm
receives training and test samples drawn according to the same distribution (i.e.:
for example 60/40 ECG’s of sick/well patients in training set and 60/40 test set,
ideally). This assumption often does not hold in practice. The training sample
available is biased in some way, which may be due to a variety of practical
reasons: cost of data labeling or acquisition, data selection when feeding to the
model (if the data initially fed is only from sick patients, the model can become
overlly sensitive to sickness indicators and see them where there are none - this
can happen if a large portion of the initial samples is from a single category).
If you feed your data in a series and don’t randomize it, it might have selection
bias as it is know that parts of the dataset tend to clustter around the same
values.
Ability to identify biased datasets is a research topic as well as a topic for EDA.
Check your dataframes/ arrays/tensors for asymmetric distributions.
More biases: https://developers.google.com/machine-learning/crash-course/fairness/
types-of-bias
We assume sample independence when we shuffle data - we predict only based on
features of a single sample. This does not concern sequential / time series.
Scikit-learn
We will use scikit-learn library, which is arguably the most important machine-
learning (non-DL) specific library in the world.
This library is opensource and build on top of numpy, scipy and other powerful
libraries: https://github.com/scikit-learn/scikit-learn
It contains most of the ML non-DL algorithms and some DL capabilities.
Also contains all the peripheral algorithms for scaling the data (scaler objects),
segregating (train_test_split, cross_validate) data, visualization (ROC, confusion
matrix, f1 score) and so on.
You can essentially learn most of what is non-DL machine learning by just learning
this one library.
6. Candidate Model Evaluation / Tunning: Assess the performance of the model using
test and validation subsets of data to understand how accurate the prediction is.
This is an iterative process and various algorithms might be tested until you have
a model that sufficiently answers your question. Different metrics can be evaluated
independent of accuracy for example - time to train, inference time, memory used /
CPU / GPU / electricity consumed (remember learning algorithms are also
algorithms). There are several ways to improve the model: hyperparameter
optimization / tuning and ensemble learning (adding another ML algorithm to the
mix).
Fundamentally: “Increasing the bias will decrease the variance. Increasing the
variance will decrease the bias.” - if you make the model more powerful (like use
neural networks for regression problems and add more and more neurons) or use a
more powerful algorithm due to it’s design ir will increase variance. In other
words: the larger the number of parameters the larger the search space for
optimization to get lost in.
How do you increase the bias or variance, two ways: by choosing models and by
tunning them (random forest - more trees - higher variance).
Ref: https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-
trade-off-in-machine-learning/
Python Crash Course
Regression regularized
Many types of algorithms to solve regression problems - ridge, lasso, support
vector regression and so on https://www.jigsawacademy.com/blogs/data...
Some of them are algorithms that add regularization parameters to classical
regression models.
Why do we need it? In general regularization is performed to increase the
performance of the model mostly by constraining it - so avoiding overfitting /
reducing variance.
There are 3 notable examples that use regularization for linear regression problem:
Ridge regression - adds an additional term to the error function for linear
regression. Most suitable when a data set contains a higher number of predictor
variables than the number of observations or there is collinearity between the
features (multicolinearity). See: https://www.youtube.com/watch?v=Q81RR3yKn30 and
https://stats.stackexchange.com/questions... and
https://satishgunjal.com/univariate_lr_scikit/
Lasso regression - also adds a regularization term, but a different one. It tends
to eliminate the weights of the least important features (i.e., set them to zero),
so use it for multivariate problems with many useless features (could we test this
using a synthetic MWE?)
Elastic net regression - combines the techniques from lasso and ridge. Elastic Net
is a middle ground between Ridge Regression and Lasso Regression. The
regularization term is a simple mix of both Ridge and Lasso’s regularization terms,
and you can control the mix ratio r.
Note: these regularization parameters can be added to univariate (although not
needed), multivariate, multioutput and other types of regression we will see latter
on. You can have Polynomial Ridge Regression and Polynomial Multioutput Elastic Net
Regression, just like you can have red bus and red supercar.
Exercise: think about what generated dataset would prove the effectiveness of ridge
and lasso regression (compared to regular and against each other) - generate that
dataset and compare it with unregularized linear regression.
Exercise: how would you go about disproving the the statement that you need more
than 50 samples.
Python Crash Course
Regression regularized
Exercise: implement the full ML flow (problem → analysis → model → …) using:
student grades dataset
a linear regression model with OLS learning
and this ref: https://stackabuse.com/linear-regression-in-python-with-scikit-learn/
( https://archive.ph/ei77a )
if the data link is dead use this:
https://github.com/MindaugasBernatavicius/DeepLearningCourse/blob/master/
04_ML_Intro/student_scores.csv
See: https://youtu.be/8aMzR8iaB9s?t=299
Python Crash Course
Data Science Process Frameworks
We talked about data science pipeline before. There is a formalized, somewhat
established (although not hugelly recognized in the indistry) version of it in
data-mining field CRISP-DM (CRoss Industry Standard Process for Data Mining):
https://www.datascience-pm.com/crisp-dm-2/
Many have recognized that both the DS / ML pipeline and CRISP-DM processes do not
take into account the different people / roles in the organization wrt/
chronological process of doing industrial datascience - for example data storage
vs. modeling - does a single role just do everything?
Microsoft offered their solution called TDSP - Team Data Science Process. It
offers:
A data science lifecycle definition
A standardized project structure
Recommended infrastructure and resources
Recommended tools and utilities
Tasks and responsibilities for each project role
Ref: https://www.datascience-pm.com/tdsp/ and
https://docs.microsoft.com/en-us/azure/architecture/data-science-process/overview
This is for project management / for project managers.
Such a 3D tensor can then be processed by a RNN layer or a 1D convolution layer (or
just a Dense layer after Flatten)
When you instantiate an Embedding layer, its weights (its internal dictionary of
vectors) are initially random, just like with any other layer. During training,
these word vectors will be gradually adjusted via backpropagation, structuring the
space into something that the downstream model can exploit. Once fully trained,
your embedding space will show a lot of structure -- a kind of structure
specialized for the specific problem you were training your model for (problem
specificity).
For some nice visuals see this: https://medium.com/@kadircanercetin/intuitive-
understanding-of-word-embeddings-with-keras-6435fe92a57b
Word Embeddings
Natural Language Processing
Assuming all values in the initial matrix are unique so the dict size id 10
Sometimes, you have so little training data available that could never use your
data alone to learn an appropriate task-specific embedding of your vocabulary. What
to do then?
Instead of learning word embeddings jointly with the problem you want to solve, you
could be loading embedding vectors from a pre-computed embedding space known to be
highly structured and to exhibit useful properties -- that captures generic aspects
of language structure. The rationale behind using pre-trained word embeddings in
natural language processing is very much the same as for using pre-trained convnets
in image classification (same as transfer learning): we don't have enough data
available to learn truly powerful features on our own, but we expect the features
that we need to be fairly generic, i.e. common visual features or semantic
features. In this case it makes sense to reuse features learned on a different
problem.
Important reading material:
https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469 (I recommend
implementing this as a homework)
Using pre-trained word embeddings
Natural Language Processing
Such word embeddings are generally computed using word prediction based on context
window (“shallow window” - word2vec) and/or word cooccurrence statistics
(observations about what words co-occur in sentences or documents - GloVe), using a
variety of techniques, some involving neural networks, others not.
The idea of a dense, low-dimensional embedding space for words, computed in an
unsupervised way, was initially explored by Bengio et al. in the early 2000s, but
it only started really taking off in research and industry applications after the
release of one of the most famous and successful word embedding scheme: the
Word2Vec algorithm, developed by Mikolov at Google in 2013. Word2Vec dimensions
capture specific semantic properties, e.g. gender.
Additional explanations:
(Using word embeddings, Andrew Ng) https://www.youtube.com/watch?v=ARIqkgvYUbY -
(Learning word embeddings, Andrew Ng) https://www.youtube.com/watch?v=yXV_Torwzyc -
intro to concept of context and training of embeddings with different contexts
(last 4 words vs. surrounding 4 words, last 1 word) and with different “problems”
(classfication / prediction).
(Word2Vec, Andrew Ng) https://www.youtube.com/watch?v=3eoX_waysy4 - skipgram
selection based word2vec DNN (embeddings + softmax layer) using nearby 1 word.
Paper: https://arxiv.org/abs/1301.3781 and code:
https://code.google.com/archive/p/word2vec/
Word2Vec
Natural Language Processing
The whole playlist: https://www.youtube.com/playlist?list=PLhWB2ZsrULv-
wEM8JDKA1zk8_2Lc88I-s
Word2Vec
Natural Language Processing
There are various pre-computed databases of word embeddings that we can download
and start using in a Keras Embedding layer. Word2Vec is one of them. Another
popular one is called "GloVe", developed by Stanford researchers in 2014. It stands
for "Global Vectors for Word Representation", and it is an embedding technique
based word co-occurrence statistics. Its developers have made available pre-
computed embeddings for millions of English tokens, obtained from Wikipedia data /
Common Crawl data.
Additional explanations:
Video: https://www.youtube.com/watch?v=InCWrgrUJT8
(GloVe, Andrew Ng): https://www.youtube.com/watch?v=EHXqgQNu-Iw
Original paper: https://nlp.stanford.edu/projects/glove/
GloVe
Natural Language Processing
We have a few choices
Whether to train our embeddings or use pre-trained ones
If we choose pre-trained ones - which ones (word2vec / glove / other)?
Word2Vec vs. GloVe
The choice is mostly empyrically derived - try and see.
There are some tasks that word2vec is better at and some where glove is better at.
Sometimes you can guess based on the data they were trained on.
Importantly all embedding models are evaluated against word analogies dataset,
however it is not necessarily predictive of the performance you will achieve for
your NLP task [needs confirmation].
In general - whichever is bigger will probably require a bigger neural network in
order to not undefit. But neural networks are often not on the verge or
underfitting, quite the opposite.
Performance reasons (?)
Choosing between embeddings
Natural Language Processing
Bigger decision is whether to choose to train your own embedings or use pre-trained
ones. Two considerations are in order:
Do you have a lot of data? Remember we said before - if you have enough data you
can train your own embeddings and they will outperform generic ones. Just like with
CNNs and transfer learning - the more data you have, the less you need pretrained
models in the transfer learning models (you can judge if you have insufficient data
by the network underfitting).
Does your task use similar words to the ones that the embedding was trained on? A
very dissimilar reusable embedding will perform much worse than a very similar. Can
you make them trainable? Yes. You can find that out from the papers. Although the
datasets are so extensive that these embeddings are more like embeddings for the
whole language.
Do LLMs use pretrained embeddings? No, this would limit their ability to learn task
specific representations, which are learned doing in an end-2-end training fashion.
Also, using pre-trained embedding is for brokies, not big companies like google, ms
or openai.
Summary: so after all the decision on what to use is primarily based on how much
data you have - if it’s a small dataset, just use pretrained ones, otherwise try
all possibilities. If one pretrained embedding does not work better than custom one
chances are that others will not work better as well.
Lithuanian pretrained embeddings?
Choosing between embeddings
Natural Language Processing
Let’s see how to use custom embeddings, prebuilt embeddings and see what accuracy
we can achieve with the IMDB dataset
You can find some usage examples here:
https://keras.io/examples/nlp/pretrained_word_embeddings/
https://www.tensorflow.org/tutorials/keras/text_classification
https://www.tensorflow.org/tutorials/keras/text_classification_with_hub
https://keras.io/examples/nlp/bidirectional_lstm_imdb/
TODO: fast.ai
TODO: pytorch
Classification with Keras
Natural Language Processing
Another technique that we will introduce in this section is called "bidirectional
RNNs". A bidirectional RNN is common RNN variant which can offer higher performance
than a regular RNN on certain tasks. It is frequently used in natural language
processing (translation on full text, classification/sentiment analysis).
RNNs are notably order-dependent, or time-dependent: they process the timesteps of
their input sequences in order, and shuffling or reversing the timesteps can
completely change the representations that the RNN will extract from the sequence.
This is precisely the reason why they perform well on problems where order is
meaningful, such as our temperature forecasting problem. A bidirectional RNN
exploits the order-sensitivity of RNNs: it simply consists of two regular RNNs,
such as the GRU or LSTM layers that you are already familiar with, each processing
input sequence in one direction (chronologically and antichronologically), then
merging their representations. By processing a sequence both ways, a bidirectional
RNN is able to catch patterns that may have been overlooked by a one-direction RNN.
However it is not always helpfull, so you need to test it for your problems!
Remarkably, the fact that the RNN layers in this section have so far processed
sequences in chronological order (older timesteps first) may have been an arbitrary
decision. At least, it's a decision we made no attempt at questioning so far. Could
it be that our RNNs could have performed well enough if it were processing input
sequences in antichronological order, for instance (newer timesteps first)? Let's
see what happens if we reverse the sequences to be in reverse chronological order.
This would be a good task - how much can we learn processing in reverse order.
Bidirectional RNNs
Natural Language Processing
To instantiate a bidirectional RNN (BRNN) in Keras, one would use the Bidirectional
layer, which takes as first argument a recurrent layer instance. Bidirectional will
create a second, separate instance of this recurrent layer, and will use one
instance for processing the input sequences in chronological order and the other
instance for processing the input sequences in reversed order.
2 Level
1 Chapter
Today you will learn
Installation and tool preparation
Python history and versions
01
02
03
04
05
Python as a programming language
Other Interpreters
Python Crash Course
Python interpreter
00
Computer literacy test / discussion
Computer literacy test / discussion
Demo
Create a new notebook
Привет мир / Hello world
Saving your work and launching again
Using shell commands inside collab (!apt install iputils-ping)
Adding screenshots / heading (hierarchy) for personal notes
Python advantages:
Python (Very HLL) vs. Java / C# / C++ (HLL)
Does not use c-like syntax - quite unique, whitespace sensitive / significant
whitespace
Easy to learn the basics and commonly praised for developer productivity.
Runtime speed vs. writetime speed.
Interpreted language, but: Link1 and https://i.imgur.com/PJME67T.png
Dinamically typed
GC’ed - no explicit allocation / deallocation (of course we can’t accumulate data
ad infinitum, so leaks are still possible)
Multiparadigm: procedural, object oriented, with functional mechanisms.
Suitable for scripts / automation, data processing, GUI programming, simple games,
web applications, AI, security.
Popularity: google trends / tiobe / github stars / polls (you can see yourself at
dirbkit.lt (cvbankas) / linkedIn). List of companies for ML/DS.
Python has gone the “batteries included” path - a large standard library (time, os,
sys) and a giant pypi (python package index) package system
Python as a programming language
Python Crash Course
Disadvantages:
Not widely used in mobile, desktop realms.
Interpreted languages are often slower than compiled ones (remember: languages are
not slow or fast - runtimes are).
Unique syntax - means more difficult to switch to other languages.
Python as a programming language
Python Crash Course
Reports:
Python as a programming language
Python Crash Course
Reports:
Python history and versions
Main things:
Guido van Rossum, BDFL
PEP (python enhancement proposal), PEP8 ir PEP20 (PEP 20, we can see it with import
this ). Root page: https://peps.python.org/
PEPs are “community driven” - evolve by community effort (Microsoft has contributed
and others)
Pypi index: https://pypi.org/
2nd version, 2.7 not supported from 2020.
https://en.wikipedia.org/wiki/History_of_Python
Which version will we use?
Which to use:
CPython
but if you need speed - Pypy (of course you need to benchmark whether your
application will see speed improvements).
Well and if you have a lot of Java / C# code or you know a good library and want to
call it from Python code then the other options.
Python Crash Course
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
DDL
DQL
01
02
03
DML
Python Crash Course
00
SQL
05
Joins
06
Python database connectivity
07
08
04
Grouping and aggregation
Relationships and data modeling
BONUS: mongodb, neo4j, apache spark
09
Recommended resources
SQL
Structured query language a very popular language for querying information from a
relational database - RDBMS.
RDBMS is an acronym for relational database management system. It refers to common
software products like: MS SQL server, Mysql, Oracle 18c, PostgreSQL, etc.
An RDBMS is also synonymous with the term “database server”.
An RDBMS is a highly optimized software product for the purpose of writing,
storing, retrieving data in addition to controlling access to it and many other
operations (vs. flat files).
Each installation of RDBMS can contain many databases which can be defined as a set
of logically related tables, which in turn are composed of rows and columns.
Each column represents a quality and each row a particular entity of some specific
type (like: Person, Employee, Item, Invoice, etc). That way in each cell we store
the values of a particular quality for that entity (Name of Person).
R in RDBMS comes from the relational model (1968-1969) where data is described as
sets of tuples arranged in a table like structure where the tables themselves can
have relations.
NoSQL trend refers to databases that use not-only SQL - non relational databases,
commonly schema-less, like: MongoDB, Redis and such.
4 main categories nosql databases in total (see side pictures) + time series.
Ref: https://db-engines.com/en/ranking and
https://db-engines.com/en/ranking_trend . Check this video for more:
https://www.youtube.com/watch?v=W2Z7fbCLSTw
Importance of SQL for dataroles: https://www.google.com/search?... and
https://www.reddit.com/r/SQL/... and
https://www.youtube.com/@thedatajanitor9537/videos
Python Crash Course
SQL
We have 4 groups of SQL statements:
DDL – Data Definition Language (schema descr.)
DQL – Data Query Language (the important thing)
DML – Data Manipulation Language
DCL – Data Control Language
We are going to use MySQL to learn SQL. This is the most popular open source
database in the world
We need to install MySQL: https://dev.mysql.com/downloads/installer/
Also we will install MySQL Workbench to be able to write and execute queries
interactively against MySQL: https://dev.mysql.com/downloads/workbench/
If you don’t want to install anything just use: http://sqlfiddle.com/
Create a new database → table → record (and reverse these actions).
Define the columns, datatypes: https://www.w3schools.com/sql/sql_datatypes.asp . We
will need:
VARCHAR(?)
INT/BIGINT (signed/unsigned)
FLOAT/DOUBLE
DATE/DATETIME
Question: we want to save a phone number - which datatype should we use? How about:
isdn, barcode, mac address, social security number, etc.
We can potentially import more data from CSV. Use online generator to generate it:
https://extendsclass.com/csv-generator.html . Exercise.
Python Crash Course
SQL
Python Crash Course
SQL
Python Crash Course
DDL
CREATE – to create database and its objects like (database, table, index, views,
stored procedure, function and triggers)
ALTER – alters the structure of the existing database
DROP – delete objects from the database
There are more, but these are the important ones.
TRUNCATE – remove all records from a table, including all spaces allocated for the
records are removed, but not the table
COMMENT – add comments to the data dictionary
RENAME – rename an object
Python Crash Course
DDL
Explanation:
MySQL datatypes: https://www.mysqltutorial.org/mysql-data-types.aspx important
ones: INT, DOUBLE, VARCHAR, DATETIME
NOT NULL - Each row must contain a value for that column, null values are not
allowed (on insert).
DEFAULT value - Set a default value that is added when no other value is passed.
UNSIGNED - Used for number types, limits the stored data to positive numbers and
zero.
AUTO_INCREMENT - MySQL automatically increases the value of the field by 1 each
time new record added (on insert).
PRIMARY KEY - (pirminis raktas) Used to uniquely identify the rows in a table. The
column with PRIMARY KEY setting is often an ID number, and is often used with
AUTO_INCREMENT. Primary key = unique + not null.
DEFAULT CURRENT_TIMESTAMP → current time will be inserted if the caller does not
provide any value.
Python Crash Course
CREATE DATABASE Test; CREATE DATABASE `Test Database`;
There are more, but these are the most important ones:
MERGE - UPSERT, REPLACE, CALL, LOCK / UNLOCK table.
Exercise:
Create table Employee (id, name, surname, salary).
Add some data to the table using INSERT statement.
Update all rows that make less than salary X to add 10%
Create two rows with same information (just the id is different) - try to delete
it.
DML
Python Crash Course
INSERT INTO Guests
(`id`, `firstname`, `lastname`, `email`)
VALUES
(1, "Mindaugas", "Bern", "[email protected]"),
(2, "Jonas", "Kur", "[email protected]"),
(3, "Petras", "Per", "[email protected]"),
(4, "Juozas", "Ten", "[email protected]");
SELECT used for retrieving records from one or more database tables.
Select “A” → simplest select (for checking connectivity)
Select * → select all columns
Select coll_1, coll_2 → only certain columns
Select coll_2, coll_1 → change the order of columns at display time
Select count(*) | count(id) → combined with function count() it can tell you how
many items are in the table
Select x AS y from … → alias, useful for renaming columns in the result set
Limit and Offset → select only a portion of data (from, how many)
Where → can be combined with multiple operators which then can be
chained using AND / OR operators
Python Crash Course
SELECT * FROM test.guests;
SELECT count(id) AS `#Guests` FROM test.guests;
SELECT * FROM test.guests Limit 2, 4;
SELECT * FROM test.guests Limit 4 OFFSET 2;
SELECT
MIN(salary),
MAX(salary) / MIN(salary), -- how many does the best earner outearn the worst
earner
MAX(salary),
AVG(salary),
SUM(salary) / COUNT(salary),
STD(salary),
department_name
FROM test.employee
GROUP BY department_name;
Exercise 2:
Construct the postcode from first two letters of the city name + “-” + current post
code, i.e.: city-post_code_number. E.g.: NE-AX8485 (New York-AX8485)
Display all postcodes for a country separated by comma and space.
Python Crash Course
CREATE TABLE cities (
id INT PRIMARY KEY AUTO_INCREMENT,
country_name VARCHAR(255) NOT NULL,
country_code VARCHAR(10) NOT NULL,
city VARCHAR(255) NOT NULL,
post_code_number INT NOT NULL,
number_of_residents_in_city INT NOT NULL
);
INSERT INTO Phones (`number`, `c_id`) VALUES ('+3703', 1), ('+3704', 2);
Joins
JOIN clause hels us connect two or more tables into a resulting set based on some
matching condition (collumn(s)).
Usually used with SELECT statement (but it can be used with others, like UPDATE to
perform cross-table updates).
Most common and most important join types (there are more):
(Inner) Join
Left (outer) join
Right (outer) join
Full (outer) Join
P.S. Table1 is the table specified in the FROM clause, and Table2 in the JOIN
clause.
Python Crash Course
SELECT C.id, C.name, P.number FROM Clients AS C
JOIN Phones AS P ON C.id = P.c_id;
Joins
More Join types
If we were to define joins as a logical construct of connecting two more tables,
this infragraph would be valid, but usually we treat the concept of joins as a
syntactical concepts not only logical one.
Python Crash Course
Joins
Exercise:
Create a data model to present a company that provides online training for
programmers, like CodeAcademy.
There should at least be Students and Courses represented (bonus points for
additional domains models represent).
Think about the column types, names, relations, PKs, FKs, etc.
Include students that do not attend any courses.
Write a query that would display full student information including which course(s)
the student attend(s).
* Phone numbers, emails, contacts, teachers can be included into the modeled domain
for additional karma points.
Python Crash Course
Joins
Let’s illustrate all the join types using simple tables:
https://gist.github.com/MindaugasBernatavicius/ac6f3b4583f7d83a64a3e39ea9f15f86
DEMO: Simple joins (inner, left, right, full). Multi joins.
Question: why do we need right and left join if we can just flip up the tables?
Mathematically they are equivalent
… however if you have a complex join with more than 2 tables and you want to add an
additional ones you will not be able to change which table is the Table1 in the
join
Note - we can provide some formulations on tasks that can be handled / solved using
joins (problem solving or example based approach vs. conceptual introduction):
Get all clients that have phone numbers and provide their names and phone numbers.
Get all phones that do not have clients assigned (right join with exclusion)
Get all … TBD
… etc.
Python Crash Course
SELECT
GROUP_CONCAT(authors.id SEPARATOR ', ') as a_id,
GROUP_CONCAT(authors.name SEPARATOR ', ') as authors,
COUNT(authors.id) as '# authors',
books.id as b_id, books.title
FROM authors
JOIN books_authors ON books_authors.author_id = authors.id
JOIN books ON books_authors.book_id = books.id
GROUP BY books.title;
SELECT
C.id, C.name,
GROUP_CONCAT(P.number) as 'phone numbers',
COUNT(P.number) as '# numbers'
FROM Clients AS C
INNER JOIN Phones AS P ON C.id = P.c_id
GROUP BY C.name
ORDER BY '# numbers' DESC;
try:
with connect(host="localhost", user="root", password="mysql") as connection:
users_query = "SELECT Customer.name, Address.City, COUNT(Orders.id) " \
"FROM joinsexample.Customer " \
"INNER JOIN joinsexample.Address ON Customer.id =
Address.Customer_id " \
"INNER JOIN joinsexample.Orders ON Customer.id =
Orders.Customer_id;"
with connection.cursor() as cursor:
cursor.execute("SET sql_mode=(SELECT REPLACE(@@sql_mode,
'ONLY_FULL_GROUP_BY', ''));")
cursor.execute(users_query)
result = cursor.fetchall()
for row in result:
print(row)
except Error as e:
print(e)
try:
with connect(host="localhost", user="root", password="mysql") as connection:
df = pd.read_sql("SELECT * FROM dvwa.users", con=connection)
print(df)
except Error as e:
print(e)
Python database connectivity
SQLAlchemy is an ORM in the Python ecosystem.
An ORM is an “Object relational mapper” and it essentially allows us to query and
manipulate data w/o writing any or very little SQL.
It translates or maps the OOP world of objects with properties and SQL world with
tables rows and columns.
So in short it translates data stored in a database to python objects and vice
versa + allows us to perform CRUD operations.
Ref: https://www.tutorialspoint.com/sqlalchemy/index.htm and
https://towardsdatascience.com/sqlalchemy-python-tutorial-79a577141a91
pip install mysqlclient sqlalchemy
Python Crash Course
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import sessionmaker
class Customer(Base):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
name = Column(String)
Session = sessionmaker(bind=engine)
result = Session().query(Customer).all()
# print(Session().query(Customer))
for user in result:
print(user.id, user.name)
Student questions:
How to understand an already fully developed database if you are joining the
project.
There are many tips
NLP essentially tries to answer questions like: how to deal with written (text) and
spoken (audio) natural language in an algorithmic, programable way, allowing next
level automation - based on semantic meaning (like computer vision) - exist and
develop.
If you ever find yourself thinking about a particular sphere of machine learning
think in a simplified way - it's just automation. What could we automate if
computers where able to understand human languages?
Definition of NLP
Natural Language Processing
Common tasks - an incomplete list of NLP applications:
Giving commands to robots / computers (Cortana, Siri, Alexa, Google Assistent
(Google Duplex)): https://www.youtube.com/watch?v=D5VN56jQMWM (was 5 years ago.
Anything came from it? They are still working on it.)
Transcription (speech to text)
Automatic reading - (text to speach, ebooks/audiobook)
Sentiment analysis - classifying statements - possitive/negative,
constructive/destructive, tos compliant/tos violating (or anything). Facebook
contentment moderation.
Topic modeling (classification) - topic assignment given a text, automatic tagging.
Automatic text summarisation (automatic tagging).
Text generation (recent revolutions are often here, like GPT-3). Code generation.
Autocomplete (gmail has currently the best autocomplete, need to connect to gmail
though US proxy).
Translation
The primary usecases sometimes are not impressive for people new to NLP, but a
sentiment analysis model can be used for stock trading decisions - if you see a
hashtag on twitter trending or some news in the media about your company - a
negative news message can be an indicator you current position before many people
do and the price of the stock goes bust, saving you potentially thousands to
millions.
Text generation in the form of code autocomplete is the huge with the reveal of
GPT-3, OpenAI Codex, Github Copilot.
Definition of NLP
Natural Language Processing
//
Definition of NLP
Natural Language Processing
Starting from the earliest:
Turing test - using language to interact with others is distinctivelly human.
Chomskian revolution in linguistics / rule based models (1. Arguments against
behaviorists & 2. Theory of universal grammar)
Machine learning based models - linguistics is not necessary anymore, we don’t need
a ruleset, algorithms learn the rules!
Starting from about 2014-2015 (bidirectional) LSTMs begin to dominate NLP, 2018 -
Transformer architecture.
See: https://www.dataversity.net/a-brief-history-of-natural-language-processing-
nlp/# and https://en.wikipedia.org/wiki/...
Ref: https://buggyprogrammer.com/what-is-natural-language-processing/
History of NLP
Natural Language Processing
One of the more impresive and creative products:
Github Copilot (X)
Microsoft Copilot
ChatGPT based products
http://sentdex.com/political-analysis/us-politicians/
http://sentdex.com/how-sentdex-works/
https://zyro.com/ai/content-generator Ai text generator
etc.
Prepreped:
End of this arcticle: https://towardsdatascience.com/ultimate-beginners-guide-to-
collecting-text-for-natural-language-processing-nlp-with-python...
NLTK corpii: https://theflyingmantis.medium.com/exploring-natural-language-toolkit-
nltk-e3009de61576
Userfull list: https://www.nltk.org/book/ch02.html
Kaggle (text mostly, audio is more about animal sounds, not speech)
APIs:
A list of APIs is in the resource above
APIs or particular interest: twitter, facebook, reddit and on. “Simple”
(~understandable) task - create a chrome plugin that near each comment would show
if it’s a positive or negative comment.
Scraping
We had an extended discussion on scrapping which would help you for this part!
Obtaining data
Natural Language Processing
Transcription
We can use SpeechRecognition package in python to obtain text from audio files.
This is essentially a facade for multiple speach recognition APIs, like google,
microsoft bing and so on:
recognize_bing(): Microsoft Bing Speech
recognize_google(): Google Web Speech API
recognize_google_cloud(): Google Cloud Speech - requires installation of the
google-cloud-speech package
recognize_houndify(): Houndify by SoundHound
recognize_ibm(): IBM Speech to Text
recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
recognize_wit(): Wit.ai
https://pypi.org/project/Spgnize_bing(): MicroseechRecognition/
The following file formats are supported:
WAV: must be in PCM/LPCM format
AIFF
AIFF-C
FLAC: must be native FLAC format; OGG-FLAC is not supported
Obtaining data
Natural Language Processing
Once we obtained the words / text we can do the preprocessing - here are some
common preprocessing actions we need to do:
Preprocessing and mining
Natural Language Processing
For many tasks in natural language processing we can use the NLTK library - the
toolbox for NLP! NLTK is a leading platform for building Python programs to work
with human language data. It provides easy-to-use interfaces to over 50 corpora and
lexical resources such as WordNet, along with a suite of text processing libraries
for classification, tokenization, stemming, tagging, parsing, and semantic
reasoning, wrappers for industrial-strength NLP libraries. See the following
resources if you want to learn more: http://www.nltk.org/nltk_data/
The following are the popular tasks for NLP:
Sentence tokenization
Word tokenization
Stopword removal (commonly occurring words that do not add much meaning)
Building n-grams (sequences of n words (bigrams, trigrams and so on)). Helps us
construc “bag of n-gram models”
Stemming - only leave the stem of the word
Lematization - grouping together the inflected forms of a word so they can be
analysed as a single item
Tagging part of speach (POS tagging): https://pythonprogramming.net/part-of-speech-
tagging-nltk-tutorial/ . Why is it useful? We could reduce the dimensionality of
our data by just leaving nouns and verbs for example. Remember - as long as you are
getting adequate performance for your problem, you can do whatever it is you want
with the data. Knowing which words are verbs and nouns and filtering them as such
would need to be achieved through PoS tagging first. so It’s a preprocessing and
dimensionality reduction technique.
Displaying parse tree
Additional tasks that you might need to perform, but with raw python
Data cleaning - html/json/xml entity removal would be an example
Punctuation removal
Lowercasing, uppercasing, etc.
Data obfuscation (like email, ssn, pwd obfuscation, email is usually replaced with
placeholder <email>). Replacing the data fields/words that are particular to some
individual with predefined tokens/variables gives better generalization, because
the text becomes less specific.
Preprocessing and mining
Natural Language Processing
Usefull when you want a summary - let's say you have an ecommerce website and you
want to get the basic gist of what the customer is saying from a long text. There
many ways to do it, but without complex model we can exploit the idea that a few
important sentences in the text can disclose the meaning the entire piece of text -
our goal is to find those sentences.
Interesting applications as example:
summarizing facebook / twitter posts - see how much meaning they retain.
summarizing CNN/Delfi articles and comparing the summaries to the ones that the
author provided. Maybe the one we generated will be less click-bate-y.
can a summary be only a paragraph? Maybe a table with nouns | verbs | adject. we
could PoS tagging for that purpose.
scientific research paper summarization (and automated “Abstract” synthesis)
We have many ways to performing auto summarization, however there is a
deterministic classic algorithm - just find the most important words and sentences.
How do we decide which words are important? Word importance == word frequency
without counting stop words - they are most frequent, but least representative of
the topic being discussed.
Which sentences are most important? The more important words a sentence has, the
more importatant a sentence is! The sum of word importance.
After determining the most important sentences we can preserve their relative
order.
After you have the most important words and sentences choose how long the summary
needs to be and create it.
Auto summarization
Natural Language Processing
Neural networks represent a differentiable function mapping inputs to outputs,
hence they work only with numbers.
We need to convert words into numbers (remember the tabular data analysis part
where we converted categorical data to one-hot encoding). So we need some
representation for the words we are going to feed into our networks.
General steps:
obtain the text
tokenize the text into words
represent the words as vectors
2 Level
1 Chapter
Today you will learn
Iterators / Iterables
Decorators
01
02
03
Closures
Python Crash Course
00
Inner classes
05
Callables
06
Abstract Class
07
Interfaces
04
Digression on topic combinations
Inner classes
Python Crash Course
You may have wondered, coming from other languages, whether it is possible to
created nested or inner classes in Python? The answer is yes, it’s possible,
however (see below, ref: https://stackoverflow.com/question.... )
Inner classes are used as a design choice. When one class is inextricably tied to
another and it does not make sense to have it as a standalone class, we can nest it
inside.
A common example is the Iterator pattern, where ContainerIterator class is nested
inside a Container class (names chosen accidentally) or Person and FullName class,
etc.
If you have a class representing a collection and you want to iterate it in many
ways you can nest iterator classes - a common pattern in languages like Java / C#.
Iterators / Iterables
Python Crash Course
Definition:
An iterator is an object that contains a countable number of values, almost like a
“container class”.
An iterator is an object that can be iterated upon, meaning that you can traverse
through all the values.
Technically, in Python, an iterator is an object which implements the iterator
protocol, which consist of the methods __iter__() and __next__().
Iterator vs Iterable:
Lists, tuples, dictionaries, and sets are all iterable objects. Strings are
iterable objects, and can return an iterator aswell.
They are iterable containers which you can get an iterator from. All these objects
have a __iter__() method which is used to get an iterator.
The differences can be summarized:
Iterators / Iterables
Python Crash Course
Iterating over and iterator:
You can do it with next()
You can also do it with for loop. The for loop actually creates an iterator object
and executes the next() method for each loop.
Creating iterators:
To create an object/class as an iterator you have to implement the methods
__iter__() and __next__() to your object.
__iter__() - returns the iterator object itself.
__next__() method also allows you to do operations, and must return the next item
in the sequence.
Stoping iterations:
Iterator would continue forever if you had enough next() statements, or if it was
used in a for loop.
To prevent the iteration to go on forever, we can use the StopIteration built-in
exception.
In the __next__() method, we can add a terminating condition to raise an error if
the iteration is done a specified number of times.
Iterators / Iterables
Python Crash Course
Container class:
Exercise:
We have a Flight class.
It contains two lists (+ other scalar fields).
First list contains objects of Passenger class
The next one contains objects of CrewMember class
The Flight class can accept the two lists as parameters to __init__(self,
passengers, crew)
Create an iterator to iterate over all of the people on the plane.
When doing the exercise, think about how you can simplify it by creating a
simplified version of the problem first. After solving the simplified version
first, solve the final version.
Please try not to perform memory operations in __next__ or __iter__ methods like
concatenating lists, etc.
Closures
Python Crash Course
Closures are functions that are nested inside other functions and have access to
the variables of the outer function.
All of this behavior is enabled by pythons functions being first class citizens:
https://www.geeksforgeeks.org/decorators-in-python/
We have already seen how treating functions as first class citizens helps us in
creating things like the map, filter, reduce functions - since we can treat a
function as a variable, we can pass it to another function and then call it when
needed:
There are 3 ways to make functions flexible (more reusable, more DRY):
parameters,
functional strategy pattern - part of the logic is inject as a variable, see:
https://stackoverflow.com/a/30465042/1964707
decorators
Decorators
Python Crash Course
Decorators are usually used as metadata on other functions - @ sign + the name of
the decorator.
Decorators
Python Crash Course
We sometimes need to pass data to the decorator - when the function decorated
accepts arguments.
Note: decorators compatible with arguments are the default way of creating them!
Decorators
Python Crash Course
Practical examples: logging and timing
Decorators
Python Crash Course
Multiple decorators - the way we implemented the decorators we can’t use them all
or they are order dependent.
That is because the function name’s is being reasigned while one decorator is
called.
To make it work we can use a helper decorator: functools wraps
Decorators
Python Crash Course
Real world examples of decorators: https://realpython.com/primer-on-python-
decorators/#a-few-real-world-examples
Since decorators are mostly just functions, they can also accept arguments
themselves: https://stackoverflow.com/questions/5929107/decorators-with-parameters
Decorators
Python Crash Course
Classes can be used as decorators using the dunder __call__ method (aka:
Callables).
In the of callables classnames are often written in lowercase letters.
Decorators
Python Crash Course
Useful decorators
There are many useful decorators, see: https://towardsdatascience.com/10-fabulous-
python-decorators-ab674a732871
We will learn about @lru_cache, @cache, @jit later in the course when we talk about
performance optimization.
We learned about @property, @classmethod, @<x>.setter/deleter and so on in the
previous lecture.
Most common ones: @property, @<x>.setter/deleter, @staticmethod, @classmethod,
@lru_cache, @dataclass
Now let’s take a look at one of the most useful and time saving decorators -
@dataclass - eliminates the need to define all the class syntax explicitly,
simplifies the writing of classes.
Decorators
Python Crash Course
Dataclasses are oriented towards classes that represent data (InventoryItem,
Person, Employee, Teacher).
Features of dataclasses (see screenshot)
Decorators
Python Crash Course
Useful resources on decorators:
https://realpython.com/primer-on-python-decorators/ (contains useful examples and
real world / framework useage)
https://towardsdatascience.com/using-class-decorators-in-python-2807ef52d273
(simple example)
https://www.geeksforgeeks.org/class-as-decorator-in-python/ (just some examples)
https://github.com/lord63/awesome-python-decorator (more examples)
A simple and understandable cache implementation for functions:
https://stackoverflow.com/a/115349/1964707
Decorators
Python Crash Course
A final theoretical point on Decorators.
Some say they are just “Pythonic way to implement the decorator design pattern”.
However opinions differ as can be seen in the wikipedia article on Decorator
Pattern:
Callables
Python Crash Course
The Call Operator. When we use or invoke a function in Python, we typically say
that we call the particular function. The “calling” is achieved by placing a pair
of parentheses following the function name, and some people refer to the
parentheses as the call operator - ()
Without the parentheses, Python interpreter will just generate a prompt about the
function as an object itself — the function doesn’t get called. Let’s see the
nuance of using the function with and without the call operator. We can also test
if we have a callable typing using the callable() function:
Callables
Python Crash Course
In short:
In Python there are many things that are callable. Functions, methods, classes,
objects to name a few.
One common usecase for callables is decorator classes.
Read more about how to use them: https://stackoverflow.com/questions/3369640/when-
is-using-call-a-good-idea
Callables
Python Crash Course
For completeness, let me mention that not only objects can be decorators (when the
classes they are derived form implement dunder call), but we can also decorate
classes with decorators and extend their functionality just like we extended the
functionality of functions. This however is a rare usecase, commonly not advised
because it is replaceable with inheritance:
https://stackoverflow.com/questions/681953/how-to-decorate-a-class
https://stackoverflow.com/questions/9906144/decorate-a-class-in-python-by-defining-
the-decorator-as-a-class
Abstract class
Python Crash Course
Abstract classes in OOP languages allow us to create an class, that we can use for
inheritence purposes, but can not instantiate.
This is commonly described as “abstract class is a template for other classes” -
this definition, however fails to capture the diff between abstract class and
interface in other OOP languages, but for beginner programmers it might be
sufficient.
In classical OOP we have classes, interfaces and abstract classes. Class - have
concrete methods and data, interfaces - just abstract methods, abstract classes -
at least one abstract method and can have concrete data and methods.
Python does not provide abstract classes (or interfaces) as a native language
construct, but we can use a package for that.
ABC - abstract base class
It is hard to find a unique case where abstract class would be the most appropriate
solution. The closest thing seems to be when you (1) need to inherit certain
fields, (2) have some common interface that you need to keep in the abstract class,
(3) and you don’t have a reason to provide a default implementation i.e. you must
tell the concrete classes to implement their own implementation (no default).
A weaker argument for a design that includes abstract based classes is when a class
is needed only for inheritance.
Interface
Python Crash Course
An interface is a type that only declares abstract methods in classical OOP.
When you define a class you can declare that it implements a certain interface and
the concrete class then must implement the abstract methods that the interface
declares in itself - this provides a guarantee that the objects of the concrete
class can then be passed to all functions that accept the instances compliant to
that interface (an also add those objects to collections that then can be iterated
upon and a method called on them or they can be passed to some method). In that
case we have polymorphism enabled by interfaces, which is much more abstract.
Why is this better than a simple inheritence based polymorphism? Because interfaces
allow us to be very abstract and not provide any implementation, just a requirement
that the class must conform to a certain behavior (provide certain methods).
Another thing that Python does not offer natively.
It is common to simulate Interface functionality by simply having an ABC with only
abstract methods - this way we can simulate interfaces in Python!
… also, there is a python package for even more realistic interfaces:
https://stackoverflow.com/questions/2124190/how-do-i-implement-interfaces-in-python
, more concretely: https://stackoverflow.com/a/52960955/1964707
Programos planas
Čia galite susipažinti su programa
Additional information
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
The benefits are gaining a higher quality image from one where that never existed
or has been lost, this could be beneficial in many areas or even life saving on
medical applications (CAT, Roentgen (x-ray), MRI can be enhanced), CCTV quality
ehnacement for suspect identification. Another use case is for compression in
transfer between computer networks. Imagine if you only had to send a 256x256 pixel
image where a 1024x1024 pixel image is needed, corespondingly maybe we save storage
(1TB -> 500GB). REMEMBER: we do not talk about compression / decompression
algorithms here.
Introduction
Advanced Computer Vision
Ref: https://en.wikipedia.org/wiki/Data_processing_inequality
Data processing inequality
Advanced Computer Vision
We can differentiate the following historical methods for SR:
Classical methods: nierest neighbours interpolation, bilinear, bicubic.
ML: SVR based: http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.367.2012&rep=rep1&type=pdf
(SR)CNN: does not dream up details (source?): https://arxiv.org/pdf/1501.00092.pdf
(SR)GAN: does dream up details, gives nice looking images:
https://arxiv.org/pdf/1609.04802.pdf
SOTA models: https://paperswithcode.com/sota/image-super-resolution-on-set14-4x-
upscaling
Another interesting distinction is single-frame vs. multi-frame superresoltion
algorithms (used in mobile phones, for example).
We see a tradeof between precision (CNN) and details that can be dreamed up (GAN)
Historical summary
Advanced Computer Vision
WE can perform simple upsampling if we want our image to be larger w/o loosing too
much quality. This can be done with OpenCV and is one of the most popular ways to
do it with code. See: https://chadrick-kwag.net/cv2-resize-interpolation-methods/
Interpolation methods:
INTER_NEAREST – a nearest-neighbor interpolation
INTER_LINEAR – a bilinear interpolation (used by default)
INTER_AREA – resampling using pixel area relation. It may be a preferred method for
image decimation, as it gives moire’-free results. But when the image is zoomed, it
is similar to theINTER_NEAREST method.
INTER_CUBIC – a bicubic interpolation over 4×4 pixel neighborhood
INTER_LANCZOS4 – a Lanczos interpolation over 8×8 pixel neighborhood
Upscaling
Advanced Computer Vision
BSDS500
DIV2K (high res: https://www.tensorflow.org/datasets/catalog/div2k )
Datasets
Advanced Computer Vision
Ref: https://keras.io/examples/vision/super_resolution_sub_pixel/
Interesting parts:
Why is BICUBIC used for cR, cB channels and for Y model.predict() is used?
Why is the model trained on a single channel not on all 3? Performance?
Would it make the model worse if we used RGB rather than YUV?
SRCNN / ESPCNN
Advanced Computer Vision
Architecture: https://pyimagesearch.com/2022/06/13/enhanced-super-resolution-
generative-adversarial-networks-esrgan/ (note how generator generates SR images and
discriminator tries to distinguish them).
Ref: https://www.tensorflow.org/hub/tutorials/image_enhancing
SRGAN
Advanced Computer Vision
Ref: https://www.mathworks.com/help/images/image-quality-metrics.html
Measuring image quality objectively
Advanced Computer Vision
Create Nearest-neighbor interpolation SR from scratch.
There are many more approaches to image superresolution, explore them.
One of them is this tool: https://github.com/fperazzi/proSR
How would you go about creating a web tool that would allow users to increase the
size of an uploaded image? Would a SRCNN approach be feasible (CNN does not have
arbitrary output dimensionality, maybe we would offers just some subset of standard
scales: 640x480, 1024x768, 1280x1024). This discloses an interesting advantage that
classical upscaling techniques have. Maybe we would use a mixed approach: use a
SRGAN for initial upscaling that is bigger than the user requested, then downscale
it with classical techniques to get the resolution the user wants, see:
https://github.com/thekevinscott/UpscalerJS
Further explorations
Advanced Computer Vision
Try some of the available models or online SR tools on your own images.
… try digitizing and improving images from old photography / video tapes.
… check if you can recover all the details form images saved in google photos
(after downscaling) (hint: you would probably need a way to measure the quality
objectively in case you can’t see improvements subjectively).
Further explorations
Advanced Computer Vision
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
Using a debugger
Generators
01
02
03
Lambda
Python Crash Course
00
Functions
Basic file operations
04
Functions
Python Crash Course
Funkcijos - tai vardą turintys kodo blokai, kuriuos galima perpanaudoti prireikus
(DRY).
Funkcijos gali priimti parametrus ir gali gražinti rezultatą su return raktažodžiu.
Pagal rezultato gražinimą funkcijos yra dažnai klasifikuojamos į:
Grynasias funkcijas (pure functions) - f-jos, kurios neturi pašalinio efekto (side
effect), tik gražina reikšmę pagal paduotus parametrus.
Funkcijas su pašaliniu poveikiu (side effect) - f-jos, kurios gali gražinti arba
negražinti reikšmės, bet būtinai atlieka veiksmą “už atminties” (pvz: spausdina
tekstą į ekraną ar failą / duomenų bazę, daro užklausą į tinklą) ar modifikuoja
globalų kintamąjį ar pan.
Pure functions yra lengvai testuojamos, paralelizuojamos, suprantamos ir taip pat
vertinamos funkcinio programavimo ekosistemoje.
Praktinis panaudojimas: kuo daugiau pure functions, o visos impure functions yra
atskiriamos ir visus side effects stengiamės sutelkti keliose funkcijose arba išvis
globaliame script scope (pvz: suskaičiuoti vidutinį ilgį delfi šiandienos
antraščių: get_raw_html(url) → extract_titles(raw_html) → get_avg_titles([titles])
→ print(avg)
Pagal parametrų priėmimą funkcijos gali būti klasifikuojamos į:
Funkcijas su parametrais.
Funkcijas be parametrų.
Funkcijas su default parametrais (svarbi savoka possitional binding, named
variables).
Pagal tai kas rašo funkcijas galime jas išskirti į:
In-built funkcijas - tokias, kurios jau egzistuoja Python kalboje mums įsidiegus
interpretatorių (print(), len(), sum(), .append()) ar bibliotekom
User-defined - mūsų sukurtos funkcijos
Daugiau: https://www.w3schools.com/python/python_functions.asp
Functions
Python Crash Course
Labai svarbu suprasti skirtumą tarp funkcijos deklaracijos (declaration) ir
funkcijos iškvietimo (function call). Tik aprašyta funkcija, bet neiškviesta
niekada nesuveiks.
Built-in funkcijas mes praktiškai visada tik kviečiame, o user-defined f-jas ir
deklaruojame ir kviečiame.
Funkcija gali turėti parametrus nurodomus skliaustuose.
Funkcijos deklaracijos shema:
Šios funkcijos nėra blogis - kiekviena programa, kuri yra naudinga turi jų ne
vieną. Tačiau kai kuriame aplikaciją apsimoka laikytis taisyklės: kuo daugiau
funkcijų be pašalinio poveikio ir visus pašalinius poveikius turėti mažame kiekyje
funkcijų (koncentracija). Pvz.: delfio antraščių pavyzdys praeitoje skaidrėje.
Functions
Python Crash Course
Return:
Įdomi detalė python kalboje, kad funkcija visada kažką gražina, net jei nėra
raktažodžio return, tuomet gražinama None.
Taip pat, skirtingai nuo daugelio kalbų python galimas multiple return.
Multiple return mechanizmas remiasi tuple unpacking mechanizmu: a, b, *c = (1, 2,
3, 4)
Return nutraukia funkcijos veikimą - nesvarbu ar return statementas būtų panaudotas
ciklo viduje, conditional viduje ar pan.
Functions
Python Crash Course
Funkcijos gali kviesti kitas funkcijas ir tokiu būdu deleguoti dalį darbo.
Tai vadinama programos funkcine dekompozicija - pirminė programos forma gali būti
tiesiog vienas milžiniškas source kode failas. Jame galime įžvelgti tam tikras kodo
vietas, kurios pasikartoja - tokias kodo vietas pakeisti funkcijos pakvietimu.
Kviečiama funkcija turės tą pačią logiką, kuri buvo išlupta iš mūsų mintiniame
ekperimente naudojamo failo.
O tada dar galime iš funkcijos pakviesti ir kitas funkcijas, taip dar labiau
moduliarizuodami kodą.
def test(a):
a += 1
print(a)
a = 1
print(a)
test(a)
print(a)
def test2(list):
list.append('a')
print(list)
list = ['a']
print(list)
test2(list)
print(list)
Using the debugger
Python Crash Course
After learning how functions, loops and conditionals (if, else) work we can
effetivelly learn how to use a debugger in the IDE.
Concepts:
Debugger - a tool to see what is happening in the programm line-by-line.
Alternative: print()
Breakpoint - a point in the execution of the programm that when hit will stop the
execution and engage the debugger.
Execution flow - how the programm is executed (with all the loops, jumps to
function and all that).
DEMO: grouping exercise with dicts (jumping inside loops and conditionals): step
over
DEMO: refactor same example with functions: step into, resume with multiple
breakpoints.
DEMO: how to change variables during the execution of the debuggin session.
NOTE: Inbuilt functions written in C are not debuggable (you can’t step into them
with pydev debugger). There is also advanced debugging when CPython interpreter is
launched with breakpoints to see the internals of how the interpreter works -
mixed-mode debugging:
https://learn.microsoft.com/en-us/visualstudio/python/debugging-mixed-mode-c-cpp-
python-in-visual-studio?view=vs-2022
Lambda
Python Crash Course
Anoniminės funkcijos Python kalboje yra vadinamos lambda funkcijomis arba tiesiog
lambda.
Neturi def ir return keyword’ų, gali turėti default parametrus, bet nedažnai taip
naudojama.
Pvz: sum = lambda x, y : x + y
Dažnai naudojama tada kai reikia paduoti į funkcijos parametrus callable tipo
objektą, bet nėra prasmės kurti vardinės funkcijos (named function), nes logika
panaudota bus tik vieną kartą
Jau matyta sort() funkcija. Pvz: išrikiuokime listą, kuriame yra kitas listas ar
tuple pagal antrą reikšmę:
Taip pat naudojama rikiuojant objektų sąrašus pagal atributą (tai paliesime
susipažinę su objektiniu programavimu)
Ref: https://www.w3schools.com/python/python_lambda.asp
Lambda
Python Crash Course
Kaip ir dauguma šiuolaikinių programavimo kalbų Python kalba yra multiparadigminė
ir suteikia sąlygas rašyti procedūrinį kodą (paprasčiausi statementai), objektinį
kodą (turime klases ir objektus), bei funkcinį. Funkcinio programavimo principai,
tokie kaip: gebėjimas funkcijas paduoti į kitas funkcijas kaip kintamuosius
(functions as variables, functions as first class citizens), preferencija immutable
duomenų struktūroms bei rekursijai vietoj iteravimo bei deklaratyvaus (regex, sql)
kodo naudojimo principai sutinkami. Map, filter ir reduce funkcijos yra pavyzdys
paskutinio principo:
Map - pritaikome operacija kiekvinam aibės elementui, ref:
https://www.w3schools.com/python/ref_func_map.asp . Map gali priimti daugybę
parametrų ir tuomet funkcija, kuri paduodama į map turi priimti tokį pat argumentų
kiekį. Map funkcija yra lazy evaluatinama - reiškia negausime rezultato tol, kol
nepakviesime next() arba list() ant funkcijos gražinto rezultato. Gražina tiek pat
narių kaip priima: [1...n] → [1..n] . Galima map funkciją naudoti duomenų
generavimui, bet geriau tai daryti su comprehensions. Geresnis pavadinimas būtų
transform.
Filter - pritaikome sąlygą kiekvinam sekos nariui ir gražiname tik tą narį, kuris
patenkina sąlygą. Filter iš daug gražina daug arba nieko [1...n] → [1..n] | [1..a]
where a < n | [ ] . ref: https://www.w3schools.com/python/ref_f...
Lambda
Python Crash Course
Kombinuojame map su reduce:
Type hints
Python Crash Course
Python 3.5 (2015) pridėjo type hinting mechanizmą.
Padeda lengviau suprasti kodą, atsiranda tipai funkcijų preview IDE, taip pat
galima naudoti su statiniu tipų tikrintoju (static type checker, toks įrankis),
tokiu kaip mypy.
DEMO:
pip install mypy
mypy type_hints.py
rezultatas:
Dėmesio, type hintai neturi įtakos programos veikimui runtime metu (bent jau tol
kol cpython interpretatorius į juos nekreipia dėmesio), todėl viršuje esančią
programą paleisti galėsite be problemų.
Hintai leidžia aptikti problemas anksčiau nei runtime
“Compile” time error (error before launch)
Runtime error (error after launch)
No error but real bug still present
Tipai. Nuo python 3.9 kolekcijų tipams nereikia naudoti typing modulio
Ref: https://docs.python.org/3/library/typing.html
As of 3.12 version type hints are not mandatory and can’t be made mandatory, ref:
https://stackoverflow.com/a/63838550/1964707
Type hints
Python Crash Course
Pycharm type check severity konfigūracija:
Dėmesio: Pycharm opcijos pavadinimai bei vizualinė išvaizda, screenshotas tik
apytiksliam pailiustravimui.
Generators
Tai funkcijos leidžiančios generuoti iterable tipo objektus vieną po kito.
Naudoja lazy evaluation - kol kviečiantis kodas nepareikalauja reikšmės, ji nėra
generuojama.
Dėl šios priežasties atmintis laikyti kintamąjam nėra inicilizuojama iškart, bet
tik kai to prireikia.
Galima modeliuoti begalines sekas nenaudojant daug atminties, todėl naudojami
optimizacijai.
Naudojamas yield raktažodis.
Galima išgauti visas reikšmes, kurios būtų gražinamos iš generatoriaus, tiesiog
padavus generatorių į list()/tuple()/t.t. konstruktorių (... atminkite, jog tai
panaikina greitaveikos privalumus, kuriuos turi generatorių naudojimas, jei
generatorius begalinis, tai išnaudosime visą atmintį). Terminanting a generator /
generator termination / consuming a generator.
SK: Ar galima iš anksto sužinoti kiek generuos generatorius. Galime iš kodo
semantikos.
SK: Ar galima generatorių resetinti - teoriškai galima, bet nepatartina. Geriau
tiesiog susikurti naują generatorių (galima laikyti jo kopiją jei žinome, kad
reiks), nes tai yra pigus objektas, ref: https://www.quora.com/If-a-python-
generator-is-exhausted-is-it-possible-to-get-to-the-first-value-again
Atvejis, galintis padėti suprasti generatorius: turime funkciją, kuri generuoja
skaičius ar vardus/žodžius ar pan ir tos funkcijos gražinamas vertes naudojame
cikle, kuris gali suktis nuo 0 iki tūkstančių kartų. Generatorius čia galėtų padėti
greitaveikai - generuotų tik tiek reikšmių kiek kartų jis buvo iškviestas.
We note that map/filter/reduce functions return a generator for the optimizations
they offer. Because map/filter/reduce touches on various topics - functions, lazy
evaluation, generators (although it returns an iterators), functional programming
(functions as variables), functional strategy pattern, lambda expressions, method
chaining (which python does not allow with map, filter, reduce) (7 topics at least)
- with one simple example, we can branch out to all of these topics in a job
interview. There are a few of these cases in programming where multiple topics meet
each other in a very simple example.
Chained generators create a pipeline that never exceeds a certain amount of memory
(never requires a huge allocation of memory) - Raspberry pi.
Python Crash Course
Generator Expressions
Another way to instantiate a generator.
Syntax: (num ** 2 for num in range(10))
Python Crash Course
Basic file operations
Python Crash Course
Mokytis Python pagrindų, pamatinių koncepcijų dažnai užtenka tiesiog konsolės ir
konsolinių programų (daugiausia ką galite padaryti dabar: aritmetikos mokymosi
žaidimas, kalbų mokymosi žaidimas).
Tačiau naudingos programos dažnai reikalauja grafinės prieigos, failų apdorojimo,
susisiekimo su duomenų baze ar išorine sistema (service).
Paprasčiausias būdas atidaryti failą yra open() funkcija.
Jai reikia paduoti opciją (dar vadinamą flag’ą), kuri pasako ar norite skaityti (r)
ar rašyti (w) ar skaityti+rašyti (r+) ar pridėti duomenų į failo pabaigą (append,
a). Taip pat galime sukurti naują failą su ‘x’ opcija, tačiau žinotina, jog naujas
failas bus sukurtas, jei neegzistuoja ir padavus ‘w’ opciją.
Ref: https://www.w3schools.com/python/python_file_handling.asp bei
https://mkyong.com/python/python-difference-between-r-w-and-a-in-open/
Jei norime skaityti binary failus - flagai yra wb ir rb.
Taip pat galima naudoti vadinamąją context manager išraišką, su with - tai
rekomenduojamas būdas, nes automatiškai uždaro failą už with bloko ribų ar įvykus
klaidai. Mes apie context manager kalbėsime dedikuotoje paskaitoje.
Gal būt studentų tarpe yra žinančių, kad I/O operacijos yra brangios lyginant su
in-memory operacijomis - programai skaitant failą, intepretatorius greipiasi į
operacinę sistemą su vadinamuoju syscall’u ir operacinė sistema pasitelkdama I/O
driverius iš disko gražina informaciją. Kitoje skaidrėje galime tai matyti.
I/O operacijų pavyzdžiai - rašymas į failus, kreipimasis į tiklo resursą, vartotojo
įvestis (input()).
id,name,hourly_rate,age
1,John,30,50
2,Peter,70
id,name,hourly_rate,age
1,John,30,50
2,Peter,,70
When the field is completely missing and you do not validate the CSV or check for
that, the field for hourly_rate and age can be easily mixed up.
Imagine if someone gave you the code that parses this file id,name,hourly_rate,age
and this bug is hidden. Would it be easy to find?
Programos planas
Čia galite susipažinti su programa
Additional information
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
JSON
01
02
REST APIs
Python Crash Course
00
Serialization
03
GraphQL APIs
04
Pickle
05
XML
06
SOAP
Serialization
Python Crash Course
Having worked with files, exceptions and context managers we now turn to
serialization.
Serialization is the process of converting structured data (e.g.: Python objects or
lists) to a format that allows sharing/transmission or storage of the data in a
form that allows recovery of its original structure. Object in memory → string
(json, xml) / binary (pickle)
This means that we have an object in programs memory → we serialize it aka turn it
into some storable / transmisable representation → store it or transmit it → then
we want to recover the data back we need to deserialize it. So deserialization is
the reverse process.
In some cases, the secondary intention of data serialization is to minimize the
data’s size which then reduces disk space or bandwidth requirements (compression -
json vs. xml).
Common serialization forms:
Serializing to Json
Serializing to XML
Serializing to binary file format / proprietary format.
… less popular options: yaml / yml, serializing to a string
Python has several built in modules for serializing objects: marshall, json,
pickle, xml package
Serialization
Python Crash Course
Marshall
The oldest of the three serialization modules. It’s primarily used to read and
write compiled bytecode from Python modules. If you’ve ever seen .pyc files pop up
in your working directory when importing modules (in order to create a __pycache__
directory with .pyc files you need to create a module), that’s marshall working
behind the scenes.
The biggest takeaway is not to use marshall. It’s mainly used by the interpreter
itself and can have breaking changes that would mess up your code. Also remember
the term marshaling / unmarshaling.
Json
json is the newest of the three serialization modules. It produces standard JSON
output.
That means that it’s human-readable and it works very well with other languages
that have ways of parsing JSON files.
An issue with json is that it only works with certain data types.
You should use this if you want to serialize data into a human readable format or
when the client calling your code supports json. It is very popular with REST APIs
and in general when interoperability between python and other languages is
necessary.
Pickle
Pickle serializes data in a binary format, which means that it’s not human-
readable.
A benefit to pickle is that it works out of the box with many Python data types,
including custom ones that you define, it’s very fast.
You should use this if you want to serialize python objects for storage and don’t
need them to be readable.
It is quite popular in the world of Python, but you are a bit tied to python
although there are libraries that can read pickles python objects in other
languages: http://formats.kaitai.io/python_pickle/csharp.html
Used in ML/DL to pickle models for distribution
Other options:
There are plenty of other serialization options, protobuffers, serialization to
string with repr and so on.
Ref: https://docs.python-guide.org/scenarios/serialization/ - for more extended
list.
Serialization
Python Crash Course
Serialization vs. simple writing to file can be distinguished by:
noting that serialization saves the entire object state (ussually, so we do not
think about lines saved in the file, but objects saved, not necessarily as lines)
and
intention to former is to recreate the object from the serialized state (pickle to
some python object) - deserialization.
Summary of which type to use when:
JSON
Python Crash Course
Definition, RFC: https://www.rfc-editor.org/rfc/rfc7159.txt
Json uses k:v pairs and supports primitive types: integers and string as well as
complex types like arrays [ ] and objects / dict { } . Also they can be nested.
At the root of json document must be a single element - can’t leave two objects
dangling.
Int’s/floats can’t be keys: { 0:0, 1: 1 }
Another peculiarity: comments
Comparing json to other formats like XML - less verbose (see next slide).
How is Json used in the real world?
Imagine our client exposes an API (a REST API) where we can get information about
the events their company organizes.
This data is retrieved as Json, but we will want to modify / transform / sort it /
filter it using python and this is where we will need to deserialize it into some
workable form (json strings of even objects).
We use json module to work with json in Python. We use two methods:
dump() → serialize to file
dumps() → serialize to string
JSON
Python Crash Course
Comparing json to other formats:
Note JSON and YAML are ~40% “slimmer” than XML, but they still duplicate data for
fieldnames (unlike SQL / Relational tables / CSV) . Fieldname duplication allows
for skipping certain fields - an object is self sufficient even if the field
missing commared to other objects of same / similar kind. In SQL the representation
is more compact, but also more rigid (and relations are represented differently).
NoSQL Document stores exploit this property of JSON representation allowing for
structureless / schemalless storage of data.
JSON
Python Crash Course
Comments are absent in JSON - probably the biggest disadvantage. Also sensitive to
dangling commas. Whitespace insensitive. Probably the best way to represent
“hierarchical data”.
Sometimes config files are written in json, where comments are extremely important.
XML - disadvantage is its verbosity.
YML - whitespace sensitivity (we claimed this to be an advantage when talking about
Python (for avoiding “style debates”), but this prevents YML from being used
effectively for configuration files (unless proper tooling is used)).
JSON
Python Crash Course
JSON is not what is known as a framed format (has a single root): this means it is
not possible to call dump more than once in order to serialize multiple objects
into the same file, nor later call load more than once to deserialize the objects,
as would be possible with pickle (or csv). Same applies to XML.
So the update of { departments: [ .. ] } is more complicated - you can not just
append the last line, because that would produce invalid json. That means that it
is much more comfortable to deserialize json into (python) objects and then append
the data using python APIs for those objects instead of of treating JSON as string.
So, technically, JSON serializes just one object per file. However, you can make
that one object be a list, or dict, which in turn can contain as many items as you
wish.
JSON
Python Crash Course
We would probably want to deserialize json into application objects / models right
away to have the ability to use the associated methods.
This can be accomplished several ways:
https://stackoverflow.com/questions/15476983/deserialize-a-json-string-to-an-
object-in-python
A popular one is to use object_hook with the loads() function. Ref:
https://realpython.com/python-json/#encoding-and-decoding-custom-python-objects
If you ever see errors when deserializing json, you can validate that json using
online validators (or non-online ones). This tools can also add quotes and formt
json in case it is invalid (try with this: { id: 5 } )
JSON
Python Crash Course
You can obtain json (for testing and development) in various ways:
Gen: https://www.json-generator.com/ and https://extendsclass.com/json-
generator.html
Repos: https://github.com/jdorfman/awesome-json-datasets … and more
JSON
Python Crash Course
Short intro to data modeling w/ Json
How do we represent data in popular datastuctures like lists, dictionaries and so
on most effectively?
Datastructures is not the same field. TBD
Let’s model hospital admisions
You can represent relationships with Json
Your representation can be normalized or denormalized (single source of truth)
How would you model Code Academy data: Courses, Lecturers, Students.
How would model Flight companies data: Flight, Plane, Customer
JSON
Python Crash Course
//
JSON
Python Crash Course
Hospital admissions with this data: [{ jonas, jonaitis, male }, …]
would make it easy to calculate:
how many people are currently sick
how many males vs. females are sick
but would not support:
how many people are repeated admissions (have we seen this person already)
what is the average between repeated admissions
are there any correlations between repeated admission and inherent properties of
the patient
what is the average duration of successful and negative health outcomes
Hospital admissions with this data: [{ jonas, jonaitis, male, 01-01, 01-15},
{ jonas, jonaitis, male, 03-22, 03-27}, …]
would make it easy to calculate:
all of the above
how many people are currently sick very well (not as well as deleting admitted
patients from the list when they leave).
average stay time for all patients (globally)
Hospital admissions with this data: [{ jonas, jonaitis, male, admissions: [ [01-01,
01-15], [03-22, 03-27] }, …]
would make it easy to calculate person specific information (average stay time for
jonas),
average stay time for all people (which would inform on weather a particular
patient is an outlier or not)
but would not support: efficient global calculations on admissions (it would be a
bit harder to compute the global average stay time), how many admission today (or
for some paritular date interval).
JSON
Python Crash Course
Combinations - the main idea is to have the same information, but arrange it into
different forms. That is the primary concern of data modeling:
[{ jonas, jonaitis, male, 2022-01-01, 2022-01-15}, { jonas, jonaitis, male, 2022-
03-22, 2022-03-27}, …]
[{ jonas, jonaitis, male, admissions: [ [2022-01-01, 2022-01-15], [2022-03-22,
2022-03-27] }, …]
{ 2022-01-07: [{ jonas, jonaitis, male}, { petras, petraitis, male }, …], 2022-01-
08: [{ anelė, aneliutė, female }, …], …}
{ male: [{}, {}, …], female: [{}, {}, …] }
Thus far we only asked about the supported operations and their difficulty
(algoritmic side). Which of these representations would be the most memory
efficient. Compare:
[{ jonas, jonaitis, male, 2022-01-01, 2022-01-15}, { jonas, jonaitis, male, 2022-
03-22, 2022-03-27}, …]
[{ jonas, jonaitis, male, admissions: [ [2022-01-01, 2022-01-15], [2022-03-22,
2022-03-27] }, …]
Relational vs. hierarchical: We said that one structure better supports global
operations, and another supports operations on individual patient. Can we
efficiently support both? And not loose to much memory? Yes, we can store the
information in a relational fashion:
[
{ 1, jonas, jonaitis, male },
{ 2, anelė, aneliutė, female },
{ 3, petras, petraitis, male }
]
[
{ 2022-01-01, 2022-01-15, id: 1 },
{ 2022-03-22, 2022-03-27, id: 2 },
{ 2022-01-01, 2022-01-15, id: 1 }
]
Pydantic
Python Crash Course
Dataclasses with more power (including json serialization/deserialization).
Ref: https://www.youtube.com/watch?v=XIdQ6gO3Anc
class Person(BaseModel):
name: str
surname: str
class Employee(Person):
badge_id: int
REST APIs
Python Crash Course
To work with WEB APIs we need to first understand the client-server paradigm and
the HTTP protocol:
Server: software that listens for / accepts requests & sends back response
(overloaded term, also means physical server and it is often drawn as a physical
server: apache, nginx, h2o, Tomcat/Jetty, Kestrel, Flask / Django, email / ftp /
database servers).
Client: software that sends a request / initiates the interaction and receives and
handles the response (also not a physical box, but software like: browser, chrome,
ff, edge, python libraries (urllib, requests)).
Client-server paradigm is often juxtaposed to P2P communication (Torrent, webRTC).
HTTP versions: 0.9, 1.0, 1.1, — … a lot of time … – 2, 3
HTTP request / response - body / headers (use any website with curl and --trace-
ascii - )
HTTP req. methods / verbs (GET / POST + DELETE, PUT, PATCH, OPTIONS, etc.)
URL structure: http(s)://app1.de.myapp.com/some/path?a=1&b=2#some-fragment
REST APIs
Python Crash Course
REST API
A type of web api used to get and manipulate data on the server by clients
REST stands for Representational State Transfer, this is an API (almost) standard.
It uses features like:
URL tunneling (to represent resources): /api/books ; /api/books/1 ;
/api/books/1/authors ; /v2/posts - hierarchical like json
HTTP verbs (to indicate actions on the resources):
GET (read),
POST (create),
DELETE (delete),
PUT (update full),
PATCH (partial update), see: https://www.baeldung.com/spring-rest-json-patch#json-
patch
Response Codes indicate the result of the operation: (200, 201, 204, etc.) - see
next slides.
JSON as a message format (ussually, but not always)
Hypermedia for to describe itself to aid discovery (HETEOAS).
An example: https://blog.mindaugas.cf/wp-json/wp/v2/posts ;
https://www.makalius.lt/wp-json/wp/v2/posts
More on that: https://towardsdatascience.com/api-guide-for-data-scientists-
e373f997ed61
REST APIs
Python Crash Course
REST Api table summarizing responses and requests.
Note: you need to know how to send queries to REST APIs both how to get data and
how to send data to them.
REST APIs
Python Crash Course
REST response codes:
REST APIs
Python Crash Course
What HTTP method and what URL would you use to:
Obtain information about all employees?
Obtain information about the manager of employee with id 5?
Obtain information about the manager with id 5, will it be the same information as
in the previous request?
Create a new piece of information about an author “Jack Back”?
If you find this difficult just Postman (Insomnia, Pycharm HTTP Client Plugin)
REST APIs
Python Crash Course
Would be nice to learn how to authenticate to REST APIs as many of the APIs require
authenticated access. For this we can use hai-server:
npm install -g hai-server
hai-server --watch db.json --auth auth.json
Auth data on the bottom right.
curl http://localhost:3000/auth/login -X POST -d "{"email":"x","password":"x"}' -s
-H 'Content-Type: application/json'
curl "http://localhost:3000/auth/login" -X POST --data
"{ \"email\": \"[email protected]\", \"password\":\"mindas\" }" -H "content-type:
application/json"
curl http://localhost:3000/comments -s -H 'Authorization: Bearer XXX'
------------ db.json -----------
{
"products": [
{ "id": 1, "title": "Shoes", "count": 150, "price": 555.9 },
{ "id": 2, "title": "Dress", "count": 300, "price": 99.99 },
{ "id": 3, "title": "Pants", "count": 99, "price": 66.99 },
{ "id": 4, "title": "Pants", "count": 185, "price": 88.99 }
],
"comments": [
{ "id": 1, "text": "Labas pasauli!", "productId": 3 },
{ "id": 2, "text": "Sudie pasauli!", "productId": 4 }
]
}
REST APIs
Python Crash Course
HTML is made up of HTML elements, that are made up of tags, which can contain
content and attributes. Blogs that have API sometimes return HTML from their API
and if they do the HTML might be “escaped” HTML. Escaping HTML is needed when we
want to display HTML metacharacters (like “<” and “>” → “<p>”) in our HTML page,
like so: <p><p></p>. For this to be displayed as <p> it needs to be escaped.
Cleaning this would be called “unescaping” (a word you can google tools for this
activity by) can be done like so: https://stackoverflow.com/questions/... . See
more on HTML escaping:
How do we query REST API’s with Python is described everywhere. We will use
requests library, but there are many (note, the internet is a dumpster of old urls,
not working apis and examples. These examples might be old as well at some point):
Ref: https://www.geeksforgeeks.org/g...
Ref: https://www.nylas.com/blog/use...
<!DOCTYPE html>
<html>
<head>
<title>My awesome page!</title>
</head>
<body>
<h1>This is my poem</h1>
<p>Roses are red, violets are blue, I’m not dumb, not sure about
you.</p>
</body>
</html>
REST APIs
Python Crash Course
Here is a list of APIs you can use for your data science or general programming
projects (and of course you can find much more on the internet):
Google: https://www.creativebloq.com/features/google-apis
Twitter: https://developer.twitter.com/en/docs/tweets/post-and-engage/overview
SpaceX: https://api.spacexdata.com/v5/launches
More: https://www.springboard.com/library/data-science/top-apis-for-data-
scientists/
Still more: https://public-apis.io/
And then some: https://github.com/public-apis/public-apis
… enough already: https://rapidapi.com/collection/list-of-free-apis
REST APIs
Python Crash Course
Google Tranlate API:
REST APIs
Python Crash Course
You will need to add Billing details but you will also get 300$ for free.
If by any chance you get billed - contact support and they will refund you.
After entering debit / cc data you get this dialog:
REST APIs
Python Crash Course
After enabling the access to GCP you will need to set up the translation API,
install the package with PIP, export credentials as environment variables and so
on, see: https://cloud.google.com/translate/docs/setup
Then use this as a quickstart guide:
https://cloud.google.com/translate/docs/basic/quickstart#translate_translate_text-
python
REST APIs
Python Crash Course
After installation
REST APIs
Python Crash Course
Create the project folder and files (if not done before)
set GOOGLE_APPLICATION_CREDENTIALS=translate-key.json (use set in windows instread
of export)
from google.cloud import translate
response = client.translate_text(
request={
"parent": parent,
"contents": [text],
"mime_type": "text/plain",
"source_language_code": "en-US",
"target_language_code": "es",
}
)
translate_text()
REST APIs
Python Crash Course
Facebook API
provides socialgraph information
you need a facebook account
with that facebook account you create a facebook developer account
def x()
…
…
GraphQL APIs
Python Crash Course
GraphQL API
GraphQL is a query language and a type of API
Created by facebook (internally in 2012 before being publicly released in 2015)
It is not at all compatible with RESTfull API - they are completely different in
terms of how they accomplish data representation.
Graphql APIs use graph query language to define mutations and queries
Commonalities: json, stateless (no session auth), client-server paradigm, HTTP
(graphql limited features are used).
GraphQL APIs
Python Crash Course
GraphQL APIs
Python Crash Course
Communication with the API
GraphQL APIs
Python Crash Course
Common misconceptions addressed
GraphQL APIs
Python Crash Course
Tools for learning GraphQL API:
Playground: https://graphqlzero.almansi.me/api
Fake: https://github.com/marmelab/json-graphql-server
npm install -g json-graphql-server
json-graphql-server db.json (not db.js)
GraphQL APIs
Python Crash Course
Sending the query with curl (remember to include the content type header):
curl 'http://localhost:3000' -H 'Content-Type: application/json' --data-raw
'{"query":"query {Product(id: 3) { id title }}"}' -s (not for win cmd)
GraphQL APIs
Python Crash Course
Authentication is handled very similarly to REST if handled not by the GQL API
mutations but by standard authentication middleware. It can be also handled with
the GQL API, like described here: https://www.apollographql.com/blog/… . In this
case we just need to send a mutation to login, obtain the token and send the token
with all subsequent requests that must be authenticated.
GraphQL APIs
Python Crash Course
There are multiple libraries in the Python world for both creating and querying GQL
APIs: https://graphql.org/code/#python
Attention, you need a client for querying, not server - the link opens in the
section about Python servers for GQL.
Pickle
Python Crash Course
Pickle
What we have seen with json module – saving and loading python objects to and from
files using some predetermined file format – is usefull tool.
In particular it is quite common to share weights of pretrained models in ML and DL
as serialized files that contain objects like lists, tensors and other things. You
might know that the result of training a machine learning and deep learning model
is not clasified images, numbers identifying the bounding boxes of object detection
algoritms or sentiment expressed in a comment. It‘s actually vectors of weights –
multimensional matrices of numbers. If you have the same neural network topology
(think structure) you can import those weights to perform the tasks this network
was trained to do.
In fact you can save the entire model to disk using inbuilt library tools
(model.save() in Keras, for example). They don’t necessarily use pickle, but the
idea is all the same.
This reuse of pretrained models is called transfer learning and it‘s very popular
and usefull (we will cover it extensively latter).
We turn our attention another very popular serialization package in Python -
Pickle.
Pickle
Python Crash Course
Pickle
A module for serializing data / code in Python. Uses binary serialization.
Serializing and deserializing via these modules is also known as pickling and
unpickling.
Important note on versions: in Python 2 we had pickle and Cpickle where Cpickle was
a more performant version. In Python 3 pickle is optimized and very performant so
there is no need to use any custom pickles / serializers ussually.
Loading objects and data with unpickling is usually much faster than loading them
as json or getting data from a database if we are talking about huge number of
objects (10s-100s times faster).
It also can be easily compressed for faster transfers over the network.
Ref: https://realpython.com/python-pickle-module/
File extension: .pickle and .pkl , but other are sometimes also used:
https://stackoverflow.com/questions/40433474/preferred-or-most-common-file-
extension-for-a-python-pickle
One important note: there is a way to save python objects and even code as strings
– load them and even execute python code from a text file. This can be achieved via
a call to eval() function. Eval() function is very powerfull – in fact too
powerfull (eval is evil). If an “attacker” found a way to feed some python code to
an eval function used anywhere in our program he might do almost anything
permisstions of the user running the python process would allow him/her to do (for
example deleting your entire file system, sending some requests to another site
(perhaps one that is controller by the atacker) with some system files, etc.)
Unpickling is different from eval, because you will not obtain “crystalized RAM
contents” via code sharing as string, but pickling is essentially that - sharing a
snapshot of RAM at a particular time.
Pickle
Python Crash Course
Serialization, like deep copying, implies a recursive walk over a directed graph of
references. pickle preserves the graph’s shape: when the same object is encountered
more than once, the object is serialized only the first time, and other occurrences
of the same object serialize references to that single value. pickle also correctly
serializes graphs with reference cycles. However, this means that if a mutable
object o is serialized more than once to the same Pickler instance p, any changes
to o after the first serialization of o to p are not saved.
In short: You should not try to change the objects while they are in the process of
serialization.
Pickle serializes classes and functions by name, not by value. Pickle can therefore
deserialize a class or function only by importing it from the same module where the
class or function was found when pickle serialized it. In particular, pickle can
normally serialize and deserialize classes and functions only if they are top-level
names for their module (i.e., attributes of their module).
For example, consider the following:
def adder(augend):
def inner(addend, augend=augend):
return addend+augend
return inner
plus5 = adder(5)
Pickle
Python Crash Course
Pickles API:
pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)
pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)
pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict",
buffers=None)
pickle.loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict",
buffers=None)
In v3, protocols range from 0 to 5, inclusive; the default is 4 (from 3.8), which
is usually a reasonable choice, but you may explicitly specify protocol 2 (to
ensure that your saved pickles can be loaded by v2 programs), or protocol 4,
incompatible with earlier versions but with performance advantages for very large
objects (DL models?). It is always recommended to pickle with at least version 2.
0 is for ASCII and should only be used if compatibility with ancient Python
versions is required.
Choosing the version: https://stackoverflow.com/questions/23582489/python-pickle-
protocol-choice
Pickle
Python Crash Course
Dill
The dill module extends the capabilities of pickle. According to the official
documentation, it lets you serialize less common types like functions with yields
(generators), nested functions (closures), lambdas, and many others.
With dill you can even serialize the entire session of the interpreter and then
load it - like saving your work with all the initialized objects.
Before you use dill instead of pickle, keep in mind that dill is not included in
the standard library of the Python interpreter and is typically slower than pickle.
Compression
Although the pickle data format is a compact binary representation of an object
structure, you can still optimize your pickled string by compressing it with bzip2
or gzip.
When using compression, bear in mind that smaller files come at the cost of a
slower process.
Pickle
Python Crash Course
Pickle __getstate__, __setstate__:
Some objects like database connections are not pickle’able even with dill.
You solve the problem by reinitializing the objects using serialized data during
deserialization
For this case you can use __getstate__, __setstate__ methods - magic methods used
by pickle
__getstate__ - use __getstate__() to define what should be included in the pickling
process. This method allows you to specify what you want to pickle. If you don’t
override __getstate__(), then the default instance’s __dict__ will be used.
__setstate__ - do some additional initializations while unpickling.
XML
Python Crash Course
XML stands for eXtensible Markup Language.
XML was designed to store AND transport data (there are resources claiming
otherwise) … but originally intended for printable documents.
XML was designed to be both human and machine readable.
It is very similar to HTML - it has elements, tags, attributes.
Elements can be nested, giving HTML the tree-like hierarchical structure - same
applies to XML. Because of this notions like parent, child elements, root and leaf
nodes are used to describe the document.
Unlike HTML we can use any tags we like - we can define them.
Scalar data can be represetented in attributes alleviating the impact of opening
and closing tags.
Applications of XML: RSS feeds that blogs, news sites can expose to inform about
their updated feeds, also used by SOAP services and even microsoft office
documents, draw.io to represent drawings. Also used as file “database”, config
files for various tools.
In python we have several modules for working with XML data:
Minidom - mimic JS dom API
ElementTree
etc: https://docs.python.org/3/library/xml.html#
Let’s turn to code examples and illustrate these
XML has a big ecosystem of tools: https://webreference.com/xml/basics/xml-
technologies/
XML
Python Crash Course
XML has namespaces that often can be confusing
A namespace is just way to separate identifiers in a programing language or a
document so they would not clash.
XML uses them as well.
Let’s inspect this an RSS feed (RSS is just a data format, it is neither push or
pull so can be either, see: https://stackoverflow.com/a/47703199/1964707
XML
Python Crash Course
Querying XML with XPATH:
XPATH is a query syntax for XML and XHTML documents.
It is often used for locating elements in an XML document.
Syntax: https://www.w3schools.com/xml/xpath_syntax.asp
Examples:
//*[@id="post-2151"]/div/div/p
$x('//channel/item/title/text()').forEach((i) => { console.log(i.data) }); //
https://www.huffingtonpost.co.uk/feeds/index.xml
Cheatsheet: https://devhints.io/xpath
Describes how to do it in python:
https://web.archive.org/web/20230130213714/https://dzone.com/articles/processing-
xml-python
XPATH is declarative (like SQL and Regex)
We will see more of XPATH when we talk about web scraping.
XML
Python Crash Course
Online testers:
http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm
https://www.freeformatter.com/xpath-tester.html
http://www.xpathtester.com/xpath (also offers xslt and xquery testing capabilities)
<items>
<item>
<id>1</id>
</item>
<item>
<id>2</id>
</item>
</items>
SOAP
Python Crash Course
Web services standard, Simple Object Access Protocol:
In reality is not so simple and is much more complex than REST or GraphQL
It is an exclusivelly XML driven protocol for transmitting data usually over HTTP
protocol (but SMTP or JMS can be used as well)
Uses schema described in WSDL file - it describes the services available their
expected parameters:
https://www.dataaccess.com/webservicesserver/numberconversion.wso?WSDL
Read a comparison between SOAP and REST: https://smartbear.com/blog/soap-vs-rest-
whats-the-difference/
SOAP
Python Crash Course
Summary comparison of REST and SOAP (some items can be questioned)
SOAP
Python Crash Course
SOAP request and response structure - we can see how verbose SOAP was
SOAP
Python Crash Course
Learning SOAP with mocked endpoints:
https://www.mockable.io/a/#/space/demo4933883/soap/new?inwizzard=true → does not
fully imitate crud actions easily
https://getsandbox.com/ → does not seem to work with SOAP imports from URL (does
not create the appropriate endpoints after the import is done).
https://www.soapui.org/downloads/soapui/ → SOAPUI (not READY API) - once a payed
tool, now is free and is capable of generating SOAP endpoints from WSDL that you
can then enhance. SOAP UI has additional capabilities like exporting the mocked
SOAP endpoints to a Java WAR file, that you can then add to Tomcat or Jetty servlet
containers, see: https://stackoverflow.com/a/12750792/1964707
https://doughellmann.com/posts/evaluating-tools-for-developing-with-soap-in-python/
→ creating SOAP API with Python is not that popular. There are some tools, but it’s
not a common usecase so tutorials are old, scarce and not the best quality.
SOAP
Python Crash Course
Creating a mocked service with SOAPUI for learning:
Unfortunately you need to know groovy to be able to script the responses and you
dont get the full CRUD services which we got from json-server, see:
https://www.soapui.org/docs/soap-mocking/creating-dynamic-mockservices/
Simple usecase demo, ref:
https://www.dataaccess.com/webservicesserver/numberconversion.wso?WSDL
Exercise:
find a SOAP service from the list we have seen previously
call it with python and print the result
Homework
Python Crash Course
Watch this Postman tutorial - https://www.youtube.com/watch?v=VywxIQ2ZXw4 , unit 1
Complete the exercises in the collab notebooks, that you were not able to do in
class
Launch json-server
Launch graphql-server
Programos planas
Čia galite susipažinti su programa
Additional information
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Troubleshooting
If the app does not return appropriate pictures, inspect extracted CNN features in
a t-SNE plot
… also, add the head, freeze the weights, check how well the networks is
classifying
… if it does not predict well - train it more, or even change the architecture if
it’s not sufficient.
Try other KNN algorithms / libraries or PCA n_components parameter.
Flask App with RIS
Reverse Image Search
We can suggest finding examples of images for which the reverse search is not
returning good results, then further tunning the network until it does and then
recreating the web app with improved model.
Reimplement the system with your own custom images (CALTECH-256, Imagenet, or some
images downloaded from google, stanford cars dataset:
https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset/data ).
Could we use TF lite for this app? Could we use some smaller KNN library? Any other
means of making the dependencies smaller / fewer?
Further explorations
Reverse Image Search
Take all the code available for this part in the notebooks (all 3 of them) and
create a single notebook that has all the main parts of the code - code in one
place (CiOP):
downloads the data (caltech101 or similar),
initialization of the model,
extracting the weights from the images using the CNN,
using KNN for similarity search and
the flask web app code.
… you can skip the optimizations for KNN / PCA and so on.
You don’t need to improve the model or the webapp, just make it comfortable to
create the web app from the code in the notebook - the only requirement is that the
code should work.
Write a short paragraph on what you learned while implementing a solution for this
specific task (not part 8 of the course, just the task) (5 sentences / ideas
minimum).
Please provide a link to the collab notebook (double check the share options of the
notebook) when finished for review and evaluation.
Practical project 8
Reverse Image Search
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Process:
Start with values (often random) for the network parameters (weights and biases).
Take a set of examples of input data and pass them through the network to obtain
their prediction - forward pass (dot + sigmoid).
Compare predictions obtained with the values of expected labels and calculate the
loss with them using the cost / loss function (y_hat - y)
Perform backpropagation in order to propagate this loss to each and every one of
the parameters that make up the model of the neural network.
Use this propagated information to update the parameters of the neural network with
the gradient descent in a way that the total loss is reduced.
Continue for specific amount of iterations until we consider that we have a good
model.
Let’s take a step back - what do we notice about the training / learning?
It’s performed in a loop, like gaming (some frameworks hide it, like Keras,
Fast.ai, some don’t like TF, Pytorch).
Can we optimize the starting weights?
What else?
Demo: Let’s implement the learning for our perceptron. Ref, same as before:
https://www.youtube.com/watch?v=kft1AJ9WVDk
Additionally there is an object oriented version as well:
https://www.youtube.com/watch?v=Py4xvZx-A1E
Learning
Introduction to Deep Learning
A hyperparameter is an external parameter set by the operator of the neural network
as opposed to learnable parameters called “weights”.
Examples: number of iterations of training, number of hidden layers, or activation
function type.
Different values of hyperparameters can have a major impact on the performance of
the network.
Hyperparameters determine how the neural network is structured, how it trains, and
how its different elements function.
Related question: do you think neural networks are classified as parametric model
or not? Ref: https://stats.stackexchange.com/questions/322049/are-deep-learning-
models-parametric-or-non-parametric
The manual or automated adjustment of hyperparameters is called “tunning” of the
neural network.
Some hyperparameters related to neural network structure:
Number of hidden layers
Number of neurons in hidden layers
Activation function
Weights initialization (not weights, but their initializers:
https://keras.io/api/layers/initializers/ )
Whether to use bias or not
Some hyperparameters related to the training algorithm
Learning rate
Epoch, iteration and batch counts/size
Optimizer algorithm
Momentum (dampens the zig-zag pattern when searching for error minimum)
Loss / error function
Dropout amount
Hyperparameters, tunning
Introduction to Deep Learning
Momentum and different optimizers
Hyperparameters, tunning
Introduction to Deep Learning
Tunning is the process of optimizing the networks non-learnable parameters
(essentially structural properties) in order for it to learn more efficiently.
Optimizing hyperparameters is an art (or bruteforce ;^) ): there are several ways,
ranging from manual trial and error to sophisticated algorithmic methods.
Following are common methods used to tune hyperparameters:
Manual hyperparameter tuning - an experienced operator can guess parameter values
that will achieve very high accuracy. Requires trial and error.
Grid search - systematically testing multiple values of each hyperparam, retraining
the model for each comb.
Random search - a research study showed that using random hyperparameter values is
actually more effective than manual search or grid search.
Bayesian optimization - trains the model with different hyperparameter values over
and over again, and tries to observe the shape of the function generated. It then
extends this function to predict the best possible values. This method provides
higher accuracy than random search.
There are libraries and tools that help with hyperparameters tunning
(hyperparameter optimization frameworks, see: https://towardsdatascience.com/10-
hyperparameter-optimization-frameworks-8bc87bc8b7e3 ). Automated hyperparameter
tunning with keras: https://medium.com/analytics-vidhya/automated-hyperparameter-
tuning-with-keras-tuner-and-tensorflow-2-0-31ec83f08a62 . Also AutoKERAS
Demo: implement tanh function for our perceptron and change this hyperparameter.
Hyperparameters, tunning
Introduction to Deep Learning
Forward Propagation
The use of the NN w/ its current parameters to compute a prediction for each
example in our training dataset. This involves simple math (summation, activation).
We use the known correct answer that a human provided to determine if the network
made a correct prediction or not. An incorrect prediction, which we refer to as a
prediction error, will be used to teach the network to change the weights of its
connections to avoid making prediction errors in the future.
Backpropagation
Backward propagation of error, or more succinctly, back propagation. In this step,
we use the prediction error that we computed in the last step to properly update
the weights of the connections between each neuron to help the network make better
future predictions. This is where all the complex calculus happens. We use a
technique called gradient descent to help us decide whether to increase or decrease
each individual connection's weights, then we also use something called a training
rate to determine how much to increase or decrease the weights during each training
step. [...] Essentially, we need to increase the strength of the connections that
assisted in predicting correct answers, and decrease the strength of the
connections that led to incorrect predictions. We repeat this process for each
training sample in the training dataset, and then we repeat the whole process many
times until the weights of the network become stable. When we're finished, we have
a network that's tuned to make accurate predictions based on all the training data
that it's seen.
Backpropagation intuition
Backpropagation algorithm is sometimes not explained in detail in courses.
While it is not strictly necessary in order to use DL, however an intuitive
explanation would be no doubt helpfull.
Good video on the topic: https://www.youtube.com/watch?v=s8pDf2Pt9sc
Note: bias term is only applicable to non-input layers:
https://stackoverflow.com/questions/7241537/should-an-input-layer-include-a-bias-
neuron
Note: is bias a trainable parameter? Yes:
https://stackoverflow.com/a/54347129/1964707
Note: XOR problem is not solvable using a simple perceptron, this is related to
first AI winter, read more about it: https://towardsdatascience.com/history-of-the-
first-ai-winter-6f8c2186f80b#:~:text=Fall%20of%20Connectionism
Propagation
Introduction to Deep Learning
So finally what are deep NNs?
Deep NNs (abrev. DNNs) are neural networks with > 1 hidden layer.
Shallow NN aka Perceptron = 1 layer.
Adding more hidden layers allows the network to model progressively more complex
functions. The ability to model more complex functions is what gives deep neural
networks their power.
Abstractness increases with each additional layer. This is what we refer to as
learning hierarchical representations of underlying data. There's a hierarchy of
components starting from the low-level details to the high-level abstractions. A
deep neural network learns to model this compositional hierarchy in order to make
predictions.
A MLP - sometimes a simple fully connected feed forward neural network (FCFFNN or
FCFFANN) is called a multilayer perceptron.
Deep Neural Networks
Introduction to Deep Learning
We are not going to code a deep neural network from scratch, although there are
plenty of resources if you are interested. The main idea would be to pass from one
perceptron to another the output. Here is a good example:
https://developer.ibm.com/articles/neural-networks-from-scratch/
Scikit implements a Multilayer Perceptron for both Regression and Classification
tasks: sklearn.neural_network
Demo: Scikit MLP
We will use scikit MLP to familiarize ourselves with other activation functions and
optimizers. Unfortunately scikit does not support changing loss functions as
hyperparameters, so this will have to wait a bit till we reach serious Deep
Learning frameworks like Keras and Pytorch.
Deep Neural Networks
Introduction to Deep Learning
Learky ReLU and PReLU are prefered. Only Relu supported for scikit.
ReLU, Leaky ReLU, PReLU, GELU
Introduction to Deep Learning
Converts a set of values into a probability distribution represerving the relative
sizes of the input values.
If we take an input of [1, 2, 3, 4, 1, 2, 3], the softmax of that is [0.024,
0.064, 0.175, 0.475, 0.024, 0.064, 0.175]. The output has most of its weight where
the '4' was in the original input.
This is what the function is normally used for: to highlight the largest values and
suppress values which are significantly below the maximum value. Why not just use
the values themselves? Pior to applying softmax, some vector components could be
negative, or greater than one; and might not sum to 1; but after applying softmax,
each component will be in the interval (0,1], and the components will add up to 1,
so that they can be interpreted as probabilities.
Softmax is often used as the activation for the last layer of a classification
network because the result could be interpreted as a probability distribution
(actually the function is called cross-entropy loss: https://www.quora.com/Is-the-
softmax-loss-the-same-as-the-cross-entropy-loss ).
To gain more intuition, it’s good to play around with calculators:
https://redcrabmath.com/Calculator/Softmax
Softmax
Introduction to Deep Learning
RBF is associated w/ RBF networks commonly used in signal processing applications,
but can be used for classification and other things.
There are not part of your course, so if you want to explore something on your own
- perfect target! Implement it as a custom Keras activation function, write in
LinkedIn and promote yourself.
More:
https://en.wikipedia.org/wiki/Radial_basis_function_network
https://www.researchgate.net/publication/
280445892_Introduction_of_the_Radial_Basis_Function_RBF_Networks
https://ieeexplore.ieee.org/abstract/document/651633
Radial Basis Function - RBF
Introduction to Deep Learning
Aka: SiLU.
Not an often used function, but standard implementations exist in most frameworks.
Swish
Introduction to Deep Learning
Why do we need activation functions?
They introduce non-linearity to our NN. We especially need non-linear functions,
like ReLU
NN’s can be described as approximators of functions and to approximate non-linear
functions, we need non-linear activations (...or some other mechanism to introduce
non-linearity).
If we did not have non-linearities, the only problems we could solve is linear
regression.
Activation functions are ussually not complex due to the fact that we want our
forward pass to be fast.
NNs learn “slow”, but we want to work fast once they have learned (<~1ms) - in the
inference phase.
Why do we need so many of them?
They have different properties. Some cause fast convergence / training. Some have
vanishing gradients.
Different situations.
What are the most common activation funtions?
Sigmoid, tanh, relu (variats of relu).
Which one to use and when?
Learky ReLU - in the hidden layers (try others as a tunning exercise).
Softmax for classification, nothing (linear) for regression in the last layer.
Activation functions: summary
Introduction to Deep Learning
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
2 Level
1 Chapter
Today you will learn
Assertions
01
Python Crash Course
00
Exceptions
02
Context managers
Exceptions
Python Crash Course
All programs have a regular execution path. But in some rare cases circumstances
that deviate can arise, causing errors.
We call those errors “Exceptions” - objects representing errors - in Python and in
many other modern (?) programming languages.
Prior to exceptions (for example in C programming language) programmers relied on
flags and function return codes to indicate than an error happened, in modern
languages errors are mostly handled with Exceptions. This is because compared to
exceptions error code have disadvantages: they always need to be checked for
explicitly and as such can be easily just be ignored:
High abstraction level langauges ussually use exception objects, low level
languages use return codes and flags (C, RUST/ZIG (?)).
Examples where exceptions of are commonly used / encountered:
Program is instructed to read a file but the file does not exist
Program is instructed to read a file but the directory does not exist
Program is instructed to read a file but the user under which the program is
running does not have sufficient access privileges
Program is instructed to access an external service but it can not reach that
service
User enters invalid data, like 0 to the division operation (error class: user input
errors)
We can see from this list that the I/O boundary / user input is a very common
source or exceptions! Others: IndexError, KeyError
Exceptions
Python Crash Course
We will talk about exception handling and the process of using the exceptions
To learn how to raise the exceptions ourselves we need to learn how to handle them
in other peoples code. Knowing how to handle exceptions is more important than
knowing how to throw, because 3rd party code will throw them even in trivial
applications.
Exceptions
Python Crash Course
Exception messages and propagation through the function call stack and is contained
in the traceback:
From this we can formulate an important rule: if the exception is not handled it
will terminate the execution of the program at the location of the exceptions first
occurrence.
Exceptions
Python Crash Course
Exception handling is done with try except block/statements (similar to try-catch
statements in other languages).
The try statement surrounds the code that can throw the exception.
The except - code that is written with the purpose of executing if / when the
exception happens. It is the exception handler.
Both these statements create a block of execution.
There are no limitations on how many try blocks you can potentially have in a
single function.
You can have many except blocks in one function for a single try block (multi-
except).
The except block can target multiple exception types if we want to handle those
exceptions in a similar way.
Exceptions
Python Crash Course
Not all errors / exceptions should be handled - these errors should be handled at
development time almost always (unless you are developing tooling like IDEs,
interpreters, formaters, validators, linters, programs that accept code as input or
similar):
Additionally you can use pass keyword when the except block does not have much to
do.
You can use multiple returns if you need to return different values between the
successful and the exceptional path.
You can interrogate the exception object in the except block and print the
information from it, using an fstring like: f”{e!r}” prints the repr of the
exception object.
Exceptions
Python Crash Course
Exceptions can be raised and re-raised.
reraise the exception when you want to do something withing the function and pass
the original exception to the caller. This can often be done to indicate to the
caller that it might retry (with a different parameter or after some time). This
splits the handling code into two places, which might not be ideal from the design
perspective.
def avg(lst):
"""
What does it do
Errors should never pass silently unless silenced - this explains why exceptions in
python are ubiquitous. Exceptions are very intrusive and this is a benefit - the
developer is left without choice, but to handle the exception if he/she wants the
app to continue working.
Exceptions
Python Crash Course
Cleaning up is performed with the standard finally block.
The finally block is always run (except if the computer explodes 💣🤯 … or looses
power).
Common usecases
Closing files
Closing connections to external services
Closing database connections
A note on try-else: https://stackoverflow.com/questions/855759/what-is-the-
intended-use-of-the-optional-else-clause-of-the-try-statement-in
Exceptions
Python Crash Course
Python exception hierarchy - in python exceptions are arranged into a hierarchy
using inheritance.
This facilitates catching exceptions by their base classes (polymorphically).
For example IndexError and KeyError in a sense happen in similar circumstances -
are they related?
The hierarchy changes depending on the version.
All non-system-exiting exceptions inherit from Exception.
Ref: https://docs.python.org/3/library/exceptions.html#exception-hierarchy
Exceptions
Python Crash Course
It is recommended that exception payloads be of string type.
Most build in exceptions that we can throw accept a single argument in their
constructor.
PEP 352 is clear:
For more information about the cause of the exception object attributes can be used
(like UnicodeError):
Exceptions
Python Crash Course
Exceptions in python are chained - a thrown exception is associated with another
one.
You might have seen this if you ever saw the message in the screenshot.
Exceptions
Python Crash Course
Tracebacks are objects that can be interacted with using the traceback modules in
Python.
Traceback modules has many methods that are useful: print_tb, format_tb, etc.
Never store tracebacks in any collection for latter use as they are associated with
the function call, which in turn keeps the stack variables. It would cause memory
problems fast. And avoid accessing traceback objects beyond the scope of the
current exception or save them to a database or file (which is actually what
logging does) right away so that the object would be destroyed.
Exceptions
Python Crash Course
Exceptions and with block (context manager) patterns:
https://stackoverflow.com/questions/713794/catching-an-exception-while-using-a-
python-with-statement
Exceptions
Python Crash Course
Python interpreter is not x-platform – platform-specific (linux has one, windows
has another).
Working with platform specific code / x-platform code it is not uncommon to detect
exceptions that happen during import.
Exceptions
Python Crash Course
Lasty, a question for the audience. We have a program where a mistake is present in
the code - there is a defect in it. However it’s not handle’able no one knows how
to reproduce it or fix it. Which of the following scenarios is best & worst:
The error causes an error at startup - the program does not even start.
The error causes an error at runtime - the program starts and executes for
potentially long time before terminating.
The error does not cause the program to terminate at all - the program executes
even though the bug / mistake is there.
… and why?
… heisenbug.
… mathematically provable languages.
https://en.wikipedia.org/wiki/Formal_verification
https://stackoverflow.com/questions/4065001/are-there-any-provable-real-world-
languages-scala
https://en.wikipedia.org/wiki/Coq
Exceptions
Python Crash Course
Summary:
Raising exceptions interrupt normal program flow and transfer the execution to the
handling code or propagate through the call stack until either some code handles
the exception or program terminates.
try (for exception prone code, you will need to exercise some judgement to choose
what to include in the try block) , except (for handling the exceptions, don’t have
complex code in expect block to avoid exceptions there), finally (for code that
must execute regardless, cleanup, finally is even executed if except block has a
sys.exit() call)
Python uses exceptions pervasivelly, even for logic decision/conditionals - as a
control structure.
Avoid catching programmer errors (IndentationError, SyntaxError, etc.)
raise w/o argument reraises current exception
raise can throw a new exception type that is more readable and understandable
depending on the context (domain specific)
Prefer build-in exception types if possible (TypeError)
!r - repr in f-string
return codes are ignorable so prefer exceptions (over return code and error flags)
use EAFP in general (as we did for platform specific code)
always catch specific exceptions and never leave the except block empty unless you
explicitly know what you are doing.
Assertions
Python Crash Course
A short note on assertions:
Associated more with LBYL
They are used to validate the the code written inside the function / method is
correct - the invariants of the function
They are NOT (questionable) used to validate that the arguments passed by the
caller are valid for the logic the function encapsulates
AssertError is just another exception and we can pass messages to it: assert
[boolean condition] [message when fail]
Often used in when writing tests. Sometimes used to check initial conditions when
program starts (not enough ram).
Context manager
Python Crash Course
Context manager - object designed to be used in with statements ensuring that
resources are properly managed & automatically handled.
An extended discussion is provided here: https://realpython.com/python-with-
statement/
Two methods: one is executed on the beginning of the with block and the other at
the end even if there is an exception. Conceptually you can think of these methods
as setup and teardown / enter, exit.
We have used file context manager with this mechanism:
Context manager
Python Crash Course
Context manager protocol consists of only two methods: __enter__() and __exit__()
Order of operations. Note, the return value of the __enter__ method is bound to the
name “x”, not the value of the expression. The with block can be exited in two
ways: exceptionally or via normal termination in both cases __exit__ is executed.
Context manager
Python Crash Course
The enter method:
Because the behavior of the exit method depends on whether an exception was or was
not thrown it is common to check the exception type inside the exit method.
By default __exit__ propagates exceptions from the with block to the enclosing
context. To control this we use the return value of the exit method - if it is
falsy then the exception will be propagated further (all functions return None,
that is why the default). It is recommended to not reraise any exceptions, simply
return falsy value. Only raise exceptions when something bad happens inside the
function itself (?)
Context manager
Python Crash Course
Context manager decorator and the contextlib
For some applications creating context managers as classes is too much - this is
where this decorator is useful.
The @contextmanager decorator is used for creating new context managers (as
functions).
It is used on a generator function, that uses the yield statement. It does not use
two separate methods, just the contexts of the underlying generator function (enter
- is the body of the generator, exist - in the try block).
Note that using multiple context managers is equivalent to nesting the with
statements.
Because of this equivalence we can infer how exception handling will work when
using multiple context managers:
Context manager
Python Crash Course
A realistic example of context manager usage would be resource handling: File
Handling, Database Connetions, Network Connections, Locks (acquisition and release)
in a multithreaded program and so on.
Let’s imagine we are writing a library to manage database connections and
transactions. We want our transaction management to be usable as a context manager
so that the users of our library would not forget to commit the transaction when
everything is OK, and rollback when there is an exception inside the with block.
Transactions are a group of database statements that their happen all or not happen
at all (atomicity). DBMS systems will not commit an open transaction unless you
tell them to as application developers, so managing transactions can be a good
candidate to implement in a database connectivity library in Python.
Another example:
https://gist.github.com/MindaugasBernatavicius/93f20ad9c2d44dab1d7ccc3a62d510bd
Programos planas
Čia galite susipažinti su programa
Additional information
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Tensorflow intro
Introduction to Deep Learning
How is tensorflow used?
Most of the time we will just work with Keras high level API and the Data API.
But when you need extra control to write custom loss functions, custom metrics,
layers, models, initializers, regularizers, weight constraints, and more. You may
even need to fully control the training loop itself, for example to apply special
transformations or constraints to the gradients (beyond just clipping them) or to
use multiple optimizers for different parts of the network. We will cover all these
cases in this chapter, and we will also look at how you can boost your custom
models and training algorithms using TensorFlow’s automatic graph generation
feature.
Tensorflow intro
Introduction to Deep Learning
Installation considerations:
Consider: OS (windows only Python3)?
Docker, direct, virtual?
CPU / GPU? CUDA required so you need an nvidia card:
https://www.tensorflow.org/install/gpu
OpenCL and AMD GPUs: don’t know the current status, but people have ran on AMD
GPUs.
Google collab
Recommendation:
If you have windows and nvidia gpu - go native.
If you have linux with nvidia gpu - maybe go native.
Anything else: use CPU, native.
If you are always online - use google collab (we’ll see how to do it latter).
AWS / GCP / Azure
Tensorflow Installation
Introduction to Deep Learning
Installation w/ Anaconda:
conda install -c conda-forge tensorflow
problems when installing tensorflow-gpu (YMMV)
Installation w/ Pip:
GPU: https://www.codingforentrepreneurs.com/blog/install-tensorflow-gpu-windows-
cuda-cudnn https://stackoverflow.com/questions/51306862/how-to-use-tensorflow-gpu
pip3 install tensorflow-gpu==2.2.0 (or whichever version is newest)
Much simpler to install and use the gpu (than with anaconda)
Obviously, if you don’t use anaconda you will need to install jupyter with pip as
well
Tensorflow Installation
Introduction to Deep Learning
Verify the installation:
import tensorflow as tf
with tf.compat.v1.Session() as sess:
# verify that the math works
a = tf.constant(50)
b = tf.constant(51)
print("a + b = {0}".format(sess.run(a + b)))
Tensorflow Installation
Introduction to Deep Learning
If you see that your CPU supports some instructions that TF was not compiled with
ignore them for now, but if you want you can try and compile TF on your machine
with the full instruction set. This will improve the performance of your TF
programs:
Tensorflow Installation
Introduction to Deep Learning
Tensors
We saw how to evaluate scalars - now let’s evaluate tensors
Evaluating multiple tensors.
Demo: Tensors
Tensorflow datatypes
Introduction to Deep Learning
Tensors
Essentially we are using TF as numpy, since tensor is similar to ndarray.
They have shape and datatype and indexes
What are other operations on tensors
Stacking
Sliceing
Reshaping
More: https://www.tensorflow.org/api_docs/python/
PS: same operations are available as in NP, but they sometimes are different. For
example transpose: NumPy’s T attribute: in TensorFlow, a new tensor is created with
its own copy of the transposed data, while in NumPy, t.T is just a transposed view
on the same data
Tensorflow datatypes
Introduction to Deep Learning
Variables
Variables are a bit more involved and currently not necessary for us.
They are used when we want to do something low level in TF (custom ML).
Variable is a tensor that gets initialized and it’s value get’s changed as program
runs.
Ref: https://www.tensorflow.org/api_docs/python/tf/Variable
Tensorflow datatypes
Introduction to Deep Learning
Placeholders
Use to feed values into the computation graph
Not used much in TF2 eager mode, only if you need lazy mode
Tensorflow datatypes
Introduction to Deep Learning
Other datastrcutures
Sparse tensors (tf.SparseTensor) - efficiently represent tensors containing mostly
zeros. The tf.sparse package contains operations for sparse tensors.
Tensor arrays (tf.TensorArray) - are lists of tensors. They have a fixed size by
default but can optionally be made dynamic. All tensors they contain must have the
same shape and data type.
Ragged tensors (tf.RaggedTensor) - represent static lists of lists of tensors,
where every tensor has the different size. The tf.ragged package contains
operations for ragged tensors.
Tensorflow datatypes
Introduction to Deep Learning
Tensorflow datatypes
Introduction to Deep Learning
Debugging can be tricky in lazy eval paradigm. You will not see the result /
intermediate result untill you execute the program.
That is why eager execution is usefull and one of the reasons it became default in
v2.
However eager is not a panacea and you still need to know how to debug lazy TF if
you work with it
Here are the guidelines from google
Tensorflow debugging
Introduction to Deep Learning
Common problems
Shape problems. Solution: reshape
Datatype mismatch: casting.
Tensorflow debugging
Introduction to Deep Learning
Type conversions
Tensorflow debugging
Introduction to Deep Learning
Tensorflow debugging
Introduction to Deep Learning
Tensorflow estimators
Introduction to Deep Learning
Estimator API (deprecated, so don’t use them for future projects)
Most common TF estimators:
LinearClassifier - logisting regression
BoostedTreesClassifier
LinearRegressor
DNNClassifier - classification models based on dense, feed-forward neural network.
Tensorflow estimators
Introduction to Deep Learning
Tensorflow estimators
Introduction to Deep Learning
Checkpoints are saved training models: checkpoints capture the exact value of all
parameters (tf.Variable objects) used by a model.
To restart training completelly you need to delete the folder or checkpoint files.
Estimators use checkpoints from the start if present.
Checkpoints will be generated automatically in tmp folder:
Tensorflow checkpoints
Introduction to Deep Learning
A computation engine that is based on the concept of computation graphs
We can use tensorflow in eager or lazy mode
V1 and V2 are different:
V1 has session object that is rarelly used and is considered legazy in V2;
V1 was lazy by default in V2 we need to use functional api to have a lazy graph
built;
V2 had a slogan “functions not sessions”;
Most of the time, at least at the beginning we will not use raw tensorflow, we will
use Keras - to which we now turn.
Tensorflow summary
Introduction to Deep Learning
You have a friend that tries to learn DL w/ Tensorflow. He wrote the following code
to solve a simple linear regression problem: task . It does not work - for data:
[1000, 2000, 7000] instread of predicting [1250, 2250, 7250] the predictions are
off by more than 200 or similar number.
TASK: help your friend:
Generate a list of suggestions of what can be tried (enumerate those suggestions in
the notebook)
Use these suggestions and try them yourself.
GOAL: train the network to predict the values almost exactly (1) in ~500 training
steps (2).
HINT: when generating the list of suggestions, think about two categories of
suggestions: hypermarameter tunning (learning rate, optimizers, others) and data
adjustments.
SPOILER (don’t look if you wanto to tackle the problem on your own):
https://archive.ph/5tSH4
Tensorflow summary
Introduction to Deep Learning
What is Keras?
In short: it’s a bunch of methods that can call the framework underneath to perform
an ML task. Keras does not execute the neural network - it calls it’s backend.
There is also an R version, but most popular implementation is in python.
Schematically Keras is on top of other frameworks, most popular combo - TF + Keras.
Open source: https://github.com/keras-team/keras
Installation
The most popular option is to use Keras with Tensorflow as the backend. We will use
that.
However let’s note, that Keras standalone also makes sense if we wanted to try
Theano and other backends. This way we could compare the frameworks.
Note installing standalone keras for usage with tensorflow is not recommended. So
just install tensorflow and it will be available, see:
https://stackoverflow.com/a/68397927/1964707
We have two ways to create models with Keras: sequential and functional.
The sequential model is used for more simple model creation. Straightforward (a
simple list of layers), but is limited to single-input, single-output stacks of
layers (as the name gives away). Essentially think of MLP for now (although you can
create more advanced NNs).
This is how it would be defined:
Another: https://stats.stackexchange.com/a/1097/162267
Explanation:
We can give names to layers and model
Params are split into trainable and not (non-trainable will be seen when we discuss
transfer learning), use_bias=False will remove some params.
Shape: (None, 1), means we have a single-featured dataset of unbounded lenght
Objective: you should know how a network architecture looks by just reading the
code
Keras model visualization
Introduction to Deep Learning
Callbacks - an object that can perform actions at various stages of training (e.g.
at the start or end of an epoch, before or after a single batch, etc - training
lifecycle). Example: early stopping.
Callbacks enable checkpoints - you can save a model in some state and use it to
train it on new data or resume it. It’s usefull when the weight count is huge.
** Another option is take the model from the article above (both the model and the
data are provided) and improve the sensitivity and specificity. Also, additionally
you can research on how helpful were datascience techniques for providing solutions
to various problems over the pandemic (the status for mid. 2020 was that advanced
DS/ML/DL techniques were not very helpful - hopefully latter on there were some
changes).
Practical project 7
Computer vision and image classification
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)
import mlflow
# Set a tag that we can use to remind ourselves what this run was for
mlflow.set_tag("Training Info", "Basic LR model for iris data")
# Infer the model signature
signature = mlflow.models.infer_signature(X_train, lr.predict(X_train))
X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
mlflow.set_tracking_uri(uri="http://127.0.0.1:8080")
best_runs = mlflow.search_runs(filter_string="metrics.accuracy >= 1",
search_all_experiments=True)
print(f"Runs:\n{best_runs}")
run_id = best_runs.loc[best_runs['metrics.accuracy'].idxmin()]['run_id']
print(f"Best run id: {run_id}")
loaded_model = mlflow.pyfunc.load_model(f"runs:/{run_id}/iris_model")
predictions = loaded_model.predict(X_test)
print(f"Prediction: {result[:4]}")
Quickstart: https://docs.neptune.ai/usage/quickstart/#__tabbed_1_1
Mostly used as a cloud service, free to try and learn.
Self hosting is more complicated than with MLFlow (seems to require kubernetes).
Neptune.AI
E2E ML Platforms
Managed Apache Spark + More
Hosted datalakehouse
Main competitor: Snowflake
Focused on notebooks
Good presentation: https://www.youtube.com/watch?v=02DBOfYrYT0
Ref: https://www.youtube.com/watch?v=QNdiGZFaUFs
Databricks
E2E ML Platforms
//
Databricks
E2E ML Platforms
//
Databricks
E2E ML Platforms
Demo: creating a free account for learning - community ed. When registering note,
that they remove “.” and “+”:
https://community.databricks.com/s/feed/0D53f00001jxRjLCAU so [email protected] →
[email protected]
Databricks
E2E ML Platforms
This article compares a lot of e2e solutions: https://www.netguru.com/blog/machine-
learning-tools-comparison , namely:
Weighs & Biases
MLFlow
Neptune.ai
Databricks
Sagemaker
Backup: https://web.archive.org/web/20231204194946/https://www.netguru.com/blog/
machine-learning-tools-comparison
Others
E2E ML Platforms
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
E2E ML Platforms
Detailed course plan
Slides, tasks and so on
Additional information
DEMO
Visualizing heatmaps of class activation
CNN interpretability with CAMs
Apply CAM for MNIST (possibly on a custom CNN)
** Research, find and launch other CAM implementations (e.g.: faster-grad-cam and
others)
** Create a demonstration where you purposefully inject coocurrences (cat and dog
in one picture) and thus forcefully introduce bias. Then train a classification
model and prove that CAM help troubleshoot misclassifications!
Do Vision-Transformers provide the capability of extracting CAMs? Is it easier?
Further explorations
CNN interpretability with CAMs
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
We can also choose to apply batch normalization. Many advanced CNNs use it.
Stride and batch-norm
Computer vision and image classification
Let’s tune the following hyperparameters and see how the network performs:
Different Activation functions: Sigmoid, Relu, Tanh, Elu.
Different Pooling Layers: MaxPool2d, AvgPool2d, LPPool2d, more:
https://pytorch.org/docs/stable/nn.html#pooling-layers
Convolution kernel size: can’t be changed arbitrarily. Need to calculate the
correct size.
Add additional convolution layers.
CNN hyperparameter tunning
Computer vision and image classification
Let’s create an tune CNN with Keras
Flowers photo classification:
https://www.tensorflow.org/tutorials/images/classification
Fashion mnist classification
German traffic sign classification
Keras CNN
Computer vision and image classification
Summary:
We created and tunned a CNN
Hyperparameter tunning is similar to FCFFNN, we just have to take into account the
convolution / pool dimensions
Gains in model accuracy can be obtained by cleaver pre-processing techniques
(sharpening, contrast increase and so on). We often reduce RGB images to grayscale,
unless the objects in the pictures are only differentiable by colors (I probably
would not do that for parrot classification).
We also compared CNN with FCFFNN/DNN
Questions:
What is the default stride of convolution kernel in Pytorch?
What is the default stride of the pooling filter in Pytorch?
How can you pass a parameter indicating how many kernels need to be applied to a
CNN in Pytorch?
Which is better ussually, many smaller kernels or few large kernels?
Summary and questions
Computer vision and image classification
Student Questions:
If we see a chaotic learning curve (error decrease visualization) can we use model
checkpointing (checkpoint callback) as a viable solution to save the best model? In
a sense, we can, however we want repeatable stable learning process to be able to
reach very similar results each time. We can use model checkpointing to reach the
maximum after we know a good hyperparameter set, but not when the model starts
wobbling after 2-3 epochs and we grab checkpointing as a band aid. A bit more
reading about this to get you started investigating this:
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-
model-performance/
We saw great results from classifying MNIST dataset easily, in fact, I have not
seen such great results without much tunning. Seems like the results improved. An
interesting question can be asked - can a given model improve (reach better
validation accuracy) because framework developers changed something? What are the
things that can be changed by framework developers that would make a difference?
Default parameters for: LR, bias (on/off), weight initializers (they are random by
default), etc (good question to explore).
Summary and questions
Computer vision and image classification
Imagine getting a data science task - compare the complexity of an image dataset:
prove that a given image dataset is “more complex” than another dataset? How would
you do it? If a given dataset is easier to learn for a NN with architecture X does
that necessarily mean that it is more complex. What more deterministic ways would
you discover.
Further explorations
Computer vision and image classification
Transfer learning
What’s next
Computer vision and image classification
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
More advanced / bonus project: implement Behavior Sequence Transformer and compare
with the model we had (for comparison, provide your own argument/opinion on which
one is more precise and, more interestingly, why).
Practical Project 11
Recommender Systems
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Recommender Systems
Detailed course plan
Slides, tasks and so on
Additional information
2 Level
1 Chapter
Today you will learn
Unit Testing
Pytest
01
02
03
Unittest
Python Crash Course
00
Testing
05
Mocking
06
Job Scheduling: cron
07
Job Scheduling: windows task scheduler
08
Job Scheduling: jenkins
Testing
Testing - the activity people perform to ensure that the metrics and behavior of
software meets expectations.
Some argue that automated tests are not part of testing (notably James Bach). The
argument is that testing is a critical activity that is not repetitious but
investigatory / exploratory by definition.
Expectations need to be known - either explicitly defined by the client
(validation) or derived via common sense / technical-expert knowledge
(verification).
The difference can be understood when thinking about the creation of an automobile
or a plane:
Unit: valve springs, timing belt are tested individually (mechanical engineers)
Integration: engine is assembled and tested in a testbed (automotive engineers)
System: the car is driven in various conditions (rain, temperature, altitude,
speeds) (test drivers)
Since code is testing other code, why don’t we test, tests? Unit tests should as
simple as mathematical axioms. Namely, unit test should be independent from each
other:
No data should be created (or left after the test)
No data should be used that is not created by the test
No other side effects / no global, persisten side effects
Benefits:
Helps you design better - especially if you do test-after-one, test-first or test-
driven development. To do unit testing you need units!
Helps you refactor with confidence as when a test is successful after refactoring
you know you have to made a mess
Helps you be confident in your code as if the test pass you know you did at least
OK
Helps you debug code faster as the error is pinpointed right away
Helps you be more productive in long and medium run (not so much in the short run):
https://medium.com/swlh/why-invest-in-unit-testing-8f1bdc2d688e
Provides “executable documentation” - executable documentation is usually in-sync
with the code, while non-executable can be inconsistent
Protects agains regressions
Drawbacks
Requires you to write more code and when code is rewritten the test code also needs
to change
Need to know and learn more concepts and more tooling
Unit Testing
Python Crash Course
Unittest
A built in testing module in Python. No install.
Available since 2001, based on JUnit (xUnit architecture)
Has all testing framework tools needed:
assertions
test runner
reporting capabilities
mocking capabilities
We are going to implement and test a PhoneBook class, reqs:
ability to add name with assoc. phone number
ability to retrieve phone via name
validity: phone book can’t contain numbers with the same beginning
We will use TDD
Launching
Commant line: python -m unittest test.test_phonebook.PhoneBookTest (don’t forget
__init__.py if you are launching from a different folder)
Pycharm test runner
Python Crash Course
import unittest
class PhoneBook:
def __init__(self):
self.dict = {}
class PhoneBookTest(unittest.TestCase):
def test_find_by_name(self):
# given / arrange
phonebook = PhoneBook()
phonebook.add("Jonas", "+370-111-1111")
# when / act
number = phonebook.find("Jonas")
# then / assert
self.assertEqual("+370-111-1111", number)
# teardown
def test_fail1(self):
self.assertEqual(1, 2)
def test_fail2(self):
self.assertEqual(1, 2)
Unittest
Negative testing (negative path) - expecting that when incorrect input is given an
appropriate error / exception will be thrown (the opposite of happy path).
Skipping tests
Extracting a test fixture.
A test fixture is just code that supports your tests methods.
For example setUp(self) and tearDown(self)
These methods will usually be called by the test framework
tearDown() will be called if exception is thrown in the test
test will be not called if the error is thrown in the setUp() method.
Python Crash Course
def test_when_desired_name_not_in_phonebook(self):
# given / arrange
phonebook = PhoneBook()
phonebook.add("Jonas", "+370-111-1111")
# teardown
class PhoneBookTest(unittest.TestCase):
def setUp(self) -> None:
self.phonebook = PhoneBook()
def test_find_by_name(self):
self.phonebook.add("Jonas", "+370-111-1111")
number = self.phonebook.find("Jonas")
self.assertEqual("+370-111-1111", number)
def test_when_desired_name_not_in_phonebook(self):
self.phonebook.add("Jonas", "+370-111-1111")
with self.assertRaises(KeyError):
self.phonebook.find("Petras")
# @unittest.skip("WIP")
def test_empty_phonebook_is_consistent(self):
self.assertTrue(self.phonebook.is_consistent())
@unittest.skip("Not used")
def test_fail1(self):
self.assertEqual(1, 2)
@unittest.skip("Not used")
def test_fail2(self):
self.assertEqual(1, 2)
Unittest
Bad unit test design
class PhoneBook:
def __init__(self):
self.dict = {}
class PhoneBookTest(unittest.TestCase):
def test_find_by_name(self):
# given
phonebook = PhoneBook()
phonebook.add("Jonas", "+370-111-1111")
# when
number = phonebook.find("Jonas")
# then
self.assertEqual("+370-111-1111", number)
# teardown
Unittest
Exercise:
Create a class Calculator in src/ directory (use the methods defined below)
Write tests for the calculator in the test/ directory w/ the unittest library: for
methods: subtract() and divide()
Think about how many tests do you need for each method and why (BVA, EQP - black
box techniques, if you developed the function under test (FUT / AUT / SUT) you
might not need it).
Python Crash Course
Pytest
Pytest is an alternative to unittest
Unittest module is not “pythonic” as it was designed to be similar to the xUnit
family or tools. An alternative that is more pythonic was created when reasons to
be close to the patterns of xUnit were no longer too relevant. So it is not part of
the xUnit family.
No included in standard python distro so it needs to be installed w/ pip.
Launching: pytest test\test_phonebook.py or using Pycharm configuration (need to
create manually)
With pytest you need to import fewer modules. For example we can use the default
assert python keyword for assertions.
More:
https://realpython.com/pytest-python-testing/
https://docs.pytest.org/en/6.2.x/
Python Crash Course
class PhoneBook:
def __init__(self):
self.dict = {}
def get_all_names(self):
return self.dict.keys()
def test_find_by_name():
# given
phonebook = PhoneBook()
phonebook.add("Jonas", "+370-111-1111")
# when
number = phonebook.find("Jonas")
# then
assert "+370-111-1111" == number
import pytest
from src.phonebook import PhoneBook
def test_find_by_name():
phonebook = PhoneBook()
phonebook.add("Jonas", "+370-111-1111")
number = phonebook.find("Jonas")
assert "+370-111-1111" == number
def test_phonebook_contains_all_names():
phonebook = PhoneBook()
phonebook.add("Jonas", "+370-111-1111")
phonebook.add("Petras", "+370-111-1112")
assert "Jonas" in phonebook.get_all_names() \
and "Petras" in phonebook.get_all_names()
def test_missing_name_raises_keyerror():
phonebook = PhoneBook()
phonebook.add("Petras", "+370-111-1111")
with pytest.raises(KeyError):
phonebook.find("Petras")
Pytest
Python Crash Course
Pytest
Test fixtures implemented more consiselly, as functions being passed to tests.
All available at: pytest --fixtures
We can supply a new object before each test
We can also cleanup after (delete file)
And inject a temporary directory - tmpdir. This is another test fixture that the
fixture we defined is using.
Using markers:
@pytest.mark.skip
Custom markers can be defined:
https://docs.pytest.org/en/6.2.x/example/markers.html
Html reports: pip install pytest-html && pytest --html=report.html or python -m
pytest test/ --html=report.html
To print to stdout use: pytest -s
Python Crash Course
Pytest
Test doubles:
xUnit category.
Mock, spy, fake and others, see:
https://martinfowler.com/articles/mocksArentStubs.html#TheDifferenceBetweenMocksAnd
Stubs
Sometimes called mocks generically, but mock means a specific test double.
They are useful when you want to make your test independent of the infrastructure
by providing a mock object instead of a real one representing the infrastructure
(repository). Or provide other features when testing (like tracking how many times
it was called).
Python Crash Course
Pytest
Mocking:
Data obtained from data.csv with the following data:
Jonas
Petras
Jonas
Antanas
Mindaugas
Python Crash Course
import csv
from unittest.mock import MagicMock
class PhoneBookRepository:
def __init__(self):
self.file = '../src/data.csv'
def get_all(self):
print("File will be opened!")
with open(self.file, newline='\n') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
data = list(reader)
return data
class PhoneBookService:
def __init__(self, repo: PhoneBookRepository):
self.repo = repo
def get_most_popular_name(self):
# 0 .. initial implementation
# return self.repo.get_all()
# 1. .. final implementation
flat_list = [item for sublist in self.repo.get_all() for item in sublist]
from statistics import mode
return mode(flat_list)
# pbs = PhoneBookService(PhoneBookRepository())
# print(pbs.get_most_popular_name())
def test_given_single_name_in_repo_service_returns_that_name_with_mocking():
# arrange
pbr = PhoneBookRepository()
pbr.get_all = MagicMock(return_value=[["Jonas"]])
pbs = PhoneBookService(pbr)
# act
res = pbs.get_most_popular_name()
# assert
assert res == "Jonas"
assert pbr.get_all.call_count == 1
def test_given_a_common_name_in_repo_service_returns_that_name_with_mocking():
# arrange
pbr = PhoneBookRepository()
pbr.get_all = MagicMock(return_value=[["Jonas"], ["Krambalkis"], ["Petras"],
["Antanas"], ["Krambalkis"]])
pbs = PhoneBookService(pbr)
# act
res = pbs.get_most_popular_name()
# assert
assert res == "Krambalkis"
assert pbr.get_all.call_count == 1
def
test_given_multiple_names_with_same_count_in_repo_service_returns_first_name_with_m
ocking():
# arrange
pbr = PhoneBookRepository()
pbr.get_all = MagicMock(return_value=[
["Jonas"], ["Krambalkis"], ["Petras"], ["Antanas"], ["Petras"],
["Krambalkis"]])
pbs = PhoneBookService(pbr)
# act
# pbs.get_most_popular_name() # this will provoke 2nd call to get_all()
res = pbs.get_most_popular_name()
# assert
assert res == "Krambalkis"
assert pbr.get_all.call_count == 1
Doctest
Another library helpful for testing your docstring documentation inside the code.
It uses special syntax in the doctests to recognize where a test starts and run it.
See: https://docs.python.org/3/library/doctest.html
Python Crash Course
def factorial(n):
"""Return the factorial of n, an exact integer >= 0.
Factorials of floats are OK, but the float must be an exact integer:
>>> factorial(30.1)
Traceback (most recent call last):
...
ValueError: n must be exact integer
>>> factorial(30.0)
265252859812191058636308480000000
import math
if not n >= 0:
raise ValueError("n must be >= 0")
if math.floor(n) != n:
raise ValueError("n must be exact integer")
if n+1 == n: # catch a value like 1e300
raise OverflowError("n too large")
result = 1
factor = 2
while factor <= n:
result *= factor
factor += 1
return result
if __name__ == "__main__":
import doctest
doctest.testmod()
if rand_num == 0:
print("It's 0!")
else:
raise Exception("It's 1!")
@ECHO OFF
"C:\U\.\AppData\Local\Programs\Python\Python39\python.exe" "C:\U\.\Desktop\
Projects\CAAI\CAGitDemo\test\cmd_script.py"
REM PAUSE
EXIT
Command: C:\Users\..\Desktop\Projects\CAAI\CAGitDemo\test\command.bat
Arguments: 1>>C:\Users\Mindaugas\Desktop\Projects\CAAI\CAGitDemo\test\out.log
2>>C:\Users\Mindaugas\Desktop\Projects\CAAI\CAGitDemo\test\err.log
We will use the same script as with Linux!
Run the task in cmd to check if it works before scheduling
You can enable the history of the task to debug better (see next slide)
Refs:
https://www.windowscentral.com/how-create-automated-task...
https://datatofish.com/python-script-windows-scheduler/
Python Crash Course
Job Scheduling: windows task scheduler
Log with debugging info
Python Crash Course
Job Scheduling: jenkins
Web app for performing all kinds of tasks: running test, tasks, reports and so on.
Can be used also for periodically occurring tasks.
Python Crash Course
Job Scheduling: jenkins
Automating periodic tasks with Jenkins
Creating a new freestyle project
Execute Windows Batch Command:
rand_num = random.randint(0, 1)
time.sleep(5)
if rand_num == 0:
print("It's 0!")
else:
raise Exception("It's 1!")
Here, distance is a norm function such as the L2 norm, content is a function that
takes an image and computes a representation of its content, and style is a
function that takes an image and computes a representation of its style. Minimizing
this loss causes style(generated_image) to be close to style(reference_image), and
content(generated_image) is close to content(generated_image), thus achieving style
transfer as we defined it.
A fundamental observation made by Gatys et al. was that deep convolutional neural
networks offer a way to mathematically define the style and content functions.
Let’s see how.
Introduction
Generative Deep Learning
As you already know, activations from earlier layers in CNN contain local
information (think: “small features”) about the image, whereas activations from
higher layers contain increasingly global, abstract information.
Formulated in a different way, the activations of the different layers of a convnet
provide a decomposition of the contents of an image over different spatial scales.
Therefore, you’d expect the content of an image, which is more global and abstract,
to be captured by the representations of the upper layers in a convnet. A good
candidate for content loss is thus the L2 norm between the activations of an upper
layer in a pretrained convnet, computed over the target image, and the activations
of the same layer computed over the generated image.
This guarantees that, as seen from the upper layer, the generated image will look
similar to the original target image. Assuming that what the upper layers of a
convnet sees is really the content of their input images, then this works as a way
to preserve image content.
So essentially we pass the content image and the generated image and compare the
activations at some high layer in the CNN architecture.
Content Loss
Generative Deep Learning
The content loss only uses a single upper layer, but the style loss as defined by
Gatys et al. uses multiple lower layers of a convnet: you try to capture the
appearance of the stylereference image at all spatial scales extracted by the
convnet, not just a single scale.
For the style loss, Gatys et al. use the Gram matrix of a layer’s activations: the
inner product of the feature maps of a given layer. This inner product can be
understood as representing a map of the correlations between the layer’s features.
These feature correlations capture the statistics of the patterns of a particular
spatial scale, which empirically correspond to the appearance of the textures found
at this scale.
Hence, the style loss aims to preserve similar internal correlations within the
activations of different layers, across the style-reference image and the generated
image. In turn, this guarantees that the textures found at different spatial scales
look similar across the style-reference image and the generated image.
Style Loss
Generative Deep Learning
Just feature maps are not enough to capture style in the “stary night” the circles
mostly appear in yellow color - these features are highly correlated.
Correlations between feature maps allow for computation of a more abstract notion
of style.
These correlations are captured by using gram matrices.
Explanation with references to papers: https://www.youtube.com/watch?v=Elxnzxk-AUk
- however note, that this video does not explain the underlying reasons and does
not give an intuition on why gram matrices provide the high level property of
matrix correlation. For this we would need to delve into the maths or some
simplified examples (MWEs).
MWE example: extract 4 feature maps from conv2d(4, kernel_size=(3, 3)) layer -
these are “decorrelated”/seaprate so we need to combine them somhow, then compute
the Gram matrix and apply the computed matrix to the style image to see what you
get.
Ref: https://www.tensorflow.org/tutorials/generative/style_transfer#calculate_style
Gram matrix
Generative Deep Learning
In short, you can use a pretrained convnet to define a loss that will do the
following:
Preserve content by maintaining similar high-level layer activations between target
content image and generated image. The convnet should “see” both the target image
and the generated image as containing the same things.
Preserve style by maintaining similar correlations within activations for both low-
level layers and high-level layers. Feature correlations capture textures: the
generated image and the style-reference image should share the same textures at
different spatial scales.
2 Level
1 Chapter
Today you will learn
Kintamieji
Teksto apdorojimas
01
02
03
Operatoriai
Python Crash Course
00
Komentarai
04
Vartotojo įvestis (input)
05
Aplinkos kintamieji
Komentarai
Python Crash Course
Komentarai:
padeda debuginant bei bandantis kodą
padeda prisiminti ką norėjai padaryti po tam tikro laiko
padeda kitiems suprasti kodą (dokumentacijos)
nėra vykdomi, ignoruoja interpretatorius
Python komentarai prasideda #
Taip pat galima naudoti multiline komentarus.
Ctrl+/ - klavišų kombinacija (hotkeys, comment shortcut)
Ref: https://www.w3schools.com/python/python_comments.asp
Kintamieji
Python Crash Course
Kas yra kintamieji:
Kintamasis tai atminties gabalėlis turintis vardą, reikšmę bei tipą.
Metafora juos suprasti: dėžutė, kurioje yra daiktas. Norėdami pasinaudoti tuo
daiktu kviečiame kintamąjį vardu.
Kvietimas daugelyje kalbų yra daromas vardais, bet yra ir išimčių, kai pasiekiame
reikšmę naudodamiesi jos pozicija kolekcijoje (apie tai kitose paskaitose) t.y.
indekso pagalba.
Ref: https://www.w3schools.com/python/python_datatypes.asp
x = “Jonas”; x[0] vs. “Jonas”[0] - pakeičiamumas tarp literal reikšmės ir kintamojo
vardo.
Sužinoti kokio duomenų tipo yra kintamasis galime su type() funkcija.
Skaliariniai kintamieji: integer, float, complex, null object, bool. Daugiau apie
skaičius: https://www.w3schools.com/python/python_numbers.asp
Šiandien dar kalbėsime apie strings, sekančioje paskaitoje apie kolekcijas (Set,
Tuple ir t.t.). Žinoma, galima pasigaminti ir “user defined types” su objektinio
programavimo mechanizmais. Apie juos kalbėsime neužilgo taip pat.
Kintamieji
Python Crash Course
Integer:
- ~unlimited size (2 ** 128)
- galima reprezentuoti daubybę pagrindinių skaitinių tipų: 10, 0b10, 0x10, 0o10
- galime konvertuoti skaičius int() (bin(), jei norime binary reprezentacijos)
funkcija: int(3.8) bei int(“3.8”) - kodėl 3.8 yra 3 ?
- taip, pat galime konvertuoti skaičius į kitas skaičių sistemas: int(“1000”, 2)
- atvirkšinis konvertavimas įmanomas su specialiomis funkcijomis: bin(8)
Further explorations
Sequential Data Analysis
For this part create a stock market (or crypto, commodities (gas, oil prices) or
other time series) forecasting model for the stocks we have not forecasted in the
class. Can be historical, old data or it can be modern. Obtain data from any source
you like.
You should use at least one of: RNNs, LSTMs, GRUs (but comparative approach is
recommended).
Write a short paragraph on what you learned while implementing a solution for this
specific task (not part 9 of the course, just the task) (5 sentences / ideas
minimum).
Please provide a link to the collab notebook/github link (double check the share
options of the notebook) when finished for review and evaluation.
2 Level
1 Chapter
Today you will learn
Scraping
Scrapy
01
02
03
BeautifulSoup
Python Crash Course
00
Regex
04
Scraping client side rendered pages
05
Requests-Html (optional)
06
Selenium
07
Complex cases
Regex
Regular Expression (RegEx / RegExp) is a sequence of characters that defines a
search pattern, that can be used for finding matching strings, extract the mathed
part, splitting the string using the matched part and permitting the string to be
processed further if the match was found, also data validation.
Elementary examples: check if person’s name does not have any digits in it, if some
id matches a pattern AA-DDDD, extract all emails from large corpus of text,
validate that incoming data is a mac address.
To learn regex is to understand the metacharacter syntax, semantics in isolation,
then chain them into mor e complex patterns.
Regex patterns can be simple: \d{5} … or complex: (?i)(select(?!_)|(?<!u)top|(?<!
%20)from(?!s)|(?<!&)limit|(?<!gmt)offset|(?!s)
Regex patterns are strings. A valid string is a valid regex of itself (disregarding
incorrect metacharacter usage). Why? Because if you search for a string literal - a
word - in a some text you should be able to find the word by that word itself.
Regex processing is performed by the regex engine (inside a programming language) -
so schematically:
Python Crash Course
Regex
Python Crash Course
There are multiple implementations of regex engines: JS, Golang RE2 and the
standard feature rich PCRE and PCRE2 implementations. All of them perform the same
basic operations and support the same basic features, but the differences come in
advanced feature support and performance, JS did not support lookbehind mechanism
for the longest time : https://stackoverflow.com/a/3950684/1964707
Python has re module, and regex module of which the last one is preferable:
https://stackoverflow.com/a/7066413/1964707 . Bindings for a very performant
alternative called RE2 engine are also available.
Regex
Python Crash Course
For learning, experimentation and prototyping I would recommend the tool:
https://regex101.com
Let’s see a trivial example - check if there are two spaces in the text.
Regex
Python Crash Course
Task: your friend / colleague comes in with some text output from an old
unmaintaned app, like:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem
Ipsum has been the industry's standard dummy text ever since the 1500s, when an
unknown printer took a galley of type and scrambled it to make a type specimen
book... It has survived not only five centuries, but also the leap into electronic
typesetting, remaining essentially unchanged. It was popularised in the 1960s with
the release of Letraset sheets containing Lorem Ipsum passages, and recently with
desktop publishing software Aldus PageMaker.
Array
(
[0] => Lorem Ipsum is simply dummy text of the printing and typesetting
industry
[1] => Lorem Ipsum has been the industry's ever the 1500s, when an unknown
printer took a galley of type and scrambled it to make a type specimen book
[2] => It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged
[3] => It was popularised in the 1960s with the of Letraset sheets Lorem Ipsum
passages, and recently with desktop publishing software Aldus PageMaker
)
Let’s try:
Array → ?
Array|\( → ?
… anything else?
Array\n|\(\s|\)| +\[\d\]\s=>\s+
Regex
Python Crash Course
Usage in script and console / terminal usage w/ grep:
ping 1.1.1.1
ping 1.1.1.1 | grep bytes=
ping 1.1.1.1 | grep -o bytes=
ping 1.1.1.1 | grep -oP 'bytes=\d{1,}'
ping 1.1.1.1 | grep -oP '(?<=time=)\d+ms'
Getting our IP from the console:
curl http://ipecho.net/
curl http://ipecho.net/ | grep -o '<h1>.*</h1>'
curl http://ipecho.net/ -s | grep -oP '(?<=<h1>Your IP is ).*(?=</h1>)'
while true; do curl http://ipecho.net/ -s | grep -oP '(?:\d{1,}\.){3}\d{1,}' ; done
1. Antanas, 55
2. Petras, 99
3. Jonas, 66
(\d{4}).+\1
(?:GET|POST) (\d{4})(?:-\d{2}){2} \w+\1\w+
Regex
Python Crash Course
Named references allow adding names to references (using named captured groups):
for symmetric patterns ^(?P<first>\w)(?P<second>\w)(?P=second)(?P=first)$
for repeating patterns (?P<first>.)(?P<second>.)(?P=first)(?P=second), but we would
prefer: (\w\w){2,}
You can refer to a named reference in the same pattern.
Ref: https://www.regular-expressions.info/refext.html
Example:
aa Tarzan "TarzanTarzan" "Tarzan" aaa "Jonas"
(?<=\")(?P<x>Tarzan)(?P=x)(?=\")
Regex
Python Crash Course
Greedy vs. lazy quantifiers
We have a comment section on a news network (delfi.lt ?). We want to detect people
trying to post html tags in our comment section: <p>Labas</p>, however we want to
just clean the tags, we will allow the text between the tags.
In this context it is not regex that is greedy (or lazy) but quantifiers!
Specifically: +, *
By default these quantifiers are greedy - it will match as much as possible until
the pattern allows. Try: <.*>
To make it lazy: <.*?> <.+?> …. or <.{1,}?>
Text: aa Tarzan "Tarzan" Tarzan <h1>"Tarzan"</h1> <p>aaa Tarzan</p> "Tarzan"
Usually we want the global flag (find all) be on when using lazy quantifiers - to
find all matches.
Regex
Python Crash Course
Lookaheads and lookbehinds (lookarounds) - only match pattern if before and/or
after we have another pattern (subpatterns)
(?=) → look-ahead (add after the pattern): {pattern}(?={pattern_ahead})
(?<=) → look-behind (add before the pattern): (?<={pattern_behind}){pattern}
(?!) → negative look-ahead - match the pattern only if the string does not match
the subpattern specified in the lookahead
(?<!) → negative look-behind - match the pattern only if the string does not
match the subpattern specified in the lookahead
Question: what is the difference between just specifying quotes “ and looking
around for quotes? Answer: Lookaround mechanism does not include into the match the
subpatterns they contain.
Example1: Text: aa Tarzan "Tarzan" Tarzan "Tarzan" aaa Pattern: (?<!\")Tarzan(?!\")
Example2: For comparison (with a backreference): (?<!")Tarzan|Tarzan(?!") vs. (?
<!")(Tarzan)|\1(?!") - this is not equivalent to the prev.
Example3:
Regex
Python Crash Course
(cont.)
Example 4: lookbehinds do not support varying length patterns (at least in Python
regex flavor):
Pattern: (?:(?<=<.>)|(?<=<..>)).*?(?=</.{1,}>) - 308 steps, (... or this one: (?
<=>)[a-zA-Z\"]+? ?[a-zA-Z\"]+?(?=<) - 163 steps. Could we compare the generality of
these two patterns?)
Text: aa Tarzan "Tarzan" Tarzan <h1>"Tarzan"</h1> <p>aaa Tarzan</p> "Tarzan"
Goal: match only content between tags, make it as general as possible
(?<=<[a-z]?>).*?(?=</.{1,}>)
(?<=<[a-z]{1,2}>).*?(?=</.{1,}>)
(?<=<\w{1,}>).*?(?=</.{1,}>)
(?<=<\w+>).*?(?=</.{1,}>)
(?<=<(\w\w|w)>).*?(?=</.{1,}>)
(?<=<[a-z]?>) = (?<=<[a-z]>)|(?<=<>) (equivalent)
Regex
Python Crash Course
Short note on performance:
Main principle 1: use regex only when you must - when the pattern is complex.
Main principle 2: be as precise as possible.
Avoid regex for simple tasks, not using regex is the best way to optimize regex
often.
We see the tips to improve the performance of regex itself is in the brown box,
see: https://www.youtube.com/watch?v=EkluES9Rvak
The hardest thing for regex is to find that a string does not match the pattern
fully, when the beginning is matching and then going back to see if it can match
when starting from the next character. In a general case this is callend
backtracking and it can be catastrophic for performance: https://www.regular-
expressions.info/catastrophic.html … ReDOS
One good way to optimize regex is to count the steps it takes:
Regex
Python Crash Course
Understanding complex regexes:
You get a complex regex, how can you understand it to fix it or enhance it?
We can fix 3 things in regex: FPs, FNs and performance.
Steps to take (divide and conquer):
Split the alternatives (|) and other separables
Isolate the parts (you need to know what can not be semantically separated, atomic
syntax units)
Create / synthesize / find data for TP - the more data the better
Create / synthesize / find data for FP/FN - where the current regex does not work
Correct the part / piece that should match but does not or v.v.
Reassmeble the regex from the pieces
Example 1 (JS): (?<!{[^}]*)Tarzan(?=\s+{[^}])
Example 2 (): ^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Could you create matches for the example regex above
And non matches that only differ by one symbol (but not lenght)
Why is the last dot ( . ) included?
Can we make this better (will it work for .co.uk ?)
Regex
Python Crash Course
Unicode and regex
Ref: https://www.regular-expressions.info/unicode.html
Pattern: [д-я]+ , text: дсвфдвфбф
Pattern: [弛她]+ , text: 弛她弛她弛她弛她弛她弛她弛她
Regex
Python Crash Course
Exercises:
https://alf.nu/RegexGolf
https://regexone.com/
https://www.hackerrank.com/domains/regex
Reference materials:
https://www.regular-expressions.info - probably no. 1 resource.
https://regexone.com/references/python
https://developers.google.com/edu/python/regular-expressions
https://www.w3schools.com/python/python_regex.asp
Regex
Python Crash Course
Interesting example:
https://alf.nu/RegexGolf
Split the regex and understand the parts
Combine them in various ways
Understand the problem (we match all the cases with even number of x’s and then do
a negative lookahead)
To match the even number of x’s: ^(..+)\1$ and then just add the lookahead
This regex: ^(.+)\1$ - matches all the even numbers:
Regex
Python Crash Course
Interesting example 2:
Match only lines which contain the string “as” at the word boundary exactly 3 times
Example string:
Mindaugas Jonas Antanas Stasė
Mindaugas Antanas Stasė
Jonas Antanas Stasė Mindaugas
Regex: .+(?:(as\b).+?\1\b.+?\1\b)(?:.+)?
Regex
Python Crash Course
Regex and python:
Ref: https://realpython.com/regex-python/ → re.DEBUG
Ref: https://realpython.com/regex-python-part-2/
Ref: https://www.w3schools.com/python/python_regex.asp
Intro to 3rd party regex module:
https://learnbyexample.github.io/py_regular_expressions/regex-module.html
A note on re.compile() - you don’t need to do, no performance benefits:
https://realpython.com/regex-python-part-2/#why-bother-compiling-a-regex
Scraping
Python Crash Course
Web Scraping the activity of gathering data from a web page w/o using an API,
almost always done via a robotic process / code that is called a web scrapper / bot
/ scrapping bot. Can be thought of as automated browsing (crawling) + data
gathering / copy.
Unlike screen scraping, which only copies pixels displayed onscreen, web scraping
extracts underlying HTML code and, with it, data stored in a database (or we can
process the data using OCR technologies). The screen scraper can then replicate
entire website content elsewhere.
Why do it?
There is money in web scrapping
Every webpage is just data (almost … games are an exception, some tools might not
be considered data) - analysis
There is no available WEB API
Usecases:
HiQ Labs and talent/HR analytics - who might be at risk of leaving the company?
competitive automated pricing and cheapest product search (autoplius / autogidas /
ebay / aliexpress / amazon / aroudas.lt)
pricing history (proove that companies add prices before holliday discounts)
lead generation - search for business / people that need your service (skelbiu.lt,
skelbimai.lt), looking for jobs and sending cv automatically
keyword research - scrape competitor websites to compare keywords for your website
or google to see improvements after SEO optimization
… science, product research, job descriptions, stock analysis, news analysis,
facebook/twitter user tracking/profile building (“vatnik detector”).
search engines are fundamentally web scrappers (or at least crawlers. Crawling is
just visiting each page in a website to gather info about links).
aggregation/combinations of previously mentioned services…
Example:
Extract data from all the car selling sites in lithuania for a particular model,
sort them by price, send a newsletter each morning.
Students often implement a website (or a script) job search stats website.
Python Crash Course
Scraping
Python Crash Course
There are many libraries available (python is probably the leading language in
“tool making/automation” programming):
Requests + Beautiful Soup → Beautiful Soup is like ElementTree library
for XML, but its for HTML
Scrapy → for serious projects.
Selenium / Puppeteer / Playwright → CSR apps.
Scraping
Python Crash Course
Scraping is a multi-part process:
→ making requests (... until you get the response: auth, redirects …)
→ parsing and processing/gathering the information
→ saving it / use it
→ selector tunning
… using it
Selectors
The biggest part of work when scrapping is tunning selectors!
CSS selectors: tag, #id, .class, other attributes :
https://www.w3schools.com/cssref/css_selectors.php
XPath
Determing the selector can be made easier by using the browser:
Changing the style with the browser for improved targeting
Getting the selectors with chrome
Scraping
Python Crash Course
CSS selector vs Xpath selector
Selectors need to be balanced: precise but not too specific in order to not be
affected by changes in the UI structure or design.
Compare: /html/body/div[1]/div/div/main/div/article/a/div[2]/div[2]/span/span/
span/span vs. //*/div[2]/span/span/span/span
Scraping
Python Crash Course
Sometimes to bypass some protections you need to disable javascript (google slides
website itself, serves as a good demo).
Scraping
Python Crash Course
Autoplius sometimes opens the page in another tab to disallow inspecting the
request details (presumably). For that we need to remove target=”_blank” attribute
in HTML - the page will then open in the same tab. Or delete the taget=”_blank”
Scraping
Python Crash Course
The flow of scrapping is usually like this: you open the page > the familiar with
it > start scripting the scrapper > .... so exploration is key.
Sometimes when scrapping tools are involved you must save the HTML returned to a
file and test the selectors in order to not inadvertently start blasting the site
with request from a bot that is under development.
And certain websites can detect that this is a robot browsing and return a
different version of the site, like google. If you start tunning your selector in
the browser those selectors can be completely useless!
You can check that by issuing a curl request:
curl https://www.google.com/ -o goo.html
curl https://uzt.lt/ -D - (does not work in google collab, but does from regular
PC)
Scraping
Python Crash Course
The legality and ethics of scrapping:
Laws are vague, but you must seek legal advise if you want to do large scale
scrapping as a compercial activity.
The legality depends on your exact web scrapping application - why do you scrape,
what do you do with the data after scrapping.
Note if you interpret web scraping as automated browsing - which is many cases it
is - you could hire someone to just go though the webpages and save the data in a
database. It would accomplish the same thing. So there is nothing unethical in web
scraping inherently. However if you think it would unethical for some person to do
something you are doing in an automated way, then it might be unethical, but still,
not because it’s automated (stealing songs and publishing them as your own).
If you are scrapping a public website for non-commercial use and you are not
claiming that the content is yours, then you should be fine. If you are breaking
into private networks to scrape, then you are in legal trouble, but not because of
scrapping.
Do not (D)DoS the website with your scrapper.
See: https://prowebscraper.com/blog/is-web-scraping-legal/
Scraping
Python Crash Course
See (hiq vs. linkedin): https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn ,
newest: https://techcrunch.com/2022/04/18/web-scraping-legal-court/
Nord Pool case: https://www.nordpoolgroup.com/en/About-us/terms-and-conditions-for-
useofwebsite/
Video about the legality of web scraping: https://www.youtube.com/watch?
v=8GhFmQPZAlo
import requests
from bs4 import BeautifulSoup
URL = "https://www.google.com/search?q=python+jobs"
resp = requests.get(URL)
soup = BeautifulSoup(resp.content, 'html.parser')
print(soup)
with open('goog.html', 'w', encoding='utf-8') as f:
print(soup, file=f)
BeautifulSoup
Python Crash Course
Demo: attribute selector.
Exercise:
Find a website - any website.
Use descendant CSS selector to select the element of your choosing
Use attribute selector to select the element of your choosing
Use Xpath to select the element of your choosing
Scrapy
Python Crash Course
Scrapy is
a framework** for crawling / scraping websites and extracting information
it is used for complex scrapping applications
it has a CLI for creating and managing scrapy project
it also has a template to build HTML spiders
also a shell for troubleshooting and refining selectors
supports CSS and XPath selectors (unlike BS4)
has middleware code for cookies, caching, redirects, rate limit.
** Note that a framework usually implies that it’s bigger and more complete than a
library. However this is not necessarily the case. What is the case is that with a
framework you write the code “inside” it according to the conventions of that
framework and then the framework calls your code (“inversion of control”). With
libraries it’s the reverse - you call the functionality you need from the library.
That is the main difference.
Scrapy
Python Crash Course
Scrapy workflow:
… general prep steps: check if API available, robots.txt, terms of use and so
on.
Install - globaly or in venv
Create project - frameworks usually generate some project structure according to
the conventions of the framework
Analyze website of interest - find the site and choose the information that is
interesting to you (product name and qty, price)
Generate spider skeleton - your spider should inherit from scrapy.Spider (check if
https is used)
Do a test crawl or runspider - have to be inside the project directory where the
scrapy.cfg is to launch these commands
Refine the selectors - using scrapy shell or browser
Implement the parse() method – to get the necessary data you want.
Run, inspect, adjust the selectors (one of the most costly steps in a scrapping
project is selector tunning in addition to page visiting algorithm for complex
cases (visiting each item when information is lacking in the title page)) until the
result is satisfying.
Save the results if needed to any format you like (multiple feed exports are
supported: https://docs.scrapy.org/en/latest/topics/feed-exports.html).
Scrapy
Python Crash Course
Scrapy architecture:
See: https://docs.scrapy.org/en/latest/topics/architecture.html?
highlight=architecture
Engine → invokes and processes everything, the coordinator
Spiders → how the content is parsed and extracted
Item pipelines →
Scrapy
Python Crash Course
Installation
pip install scrapy → (can be system level install)
Launching scrapy:
scrapy startproject <project name> → generates the project
write the spider that will be in the spiders directory
scrapy crawl <site_name> -o <file_name>.[csv/json/xml/pickle]
we’ll see more in the code
Other commands:
scrapy → help
scrapy version → version
scrapy bench → bench local webserver (2400p/min is normal)
scrapy settings → settings, returns nothing at first
scrapy view <url> → download and view page locally (only .html is
downloaded)
scrapy fetch --nolog <url> > file.html → download page locally
… you can download the full page (almost full page) via “save as” in the browser.
Scrapy
Python Crash Course
The difference between runspider and crawl commands?
Scrapy
Python Crash Course
Saving to excel?
Scrapy
Python Crash Course
Scrapy
Python Crash Course
Note - when scrapping items from products/inventory/articles pages (pages that have
multiples of the same thing displayed) ensure you are mapping all the pieces of
information for 1 item correctly. If you selecteing them independently you might
have a situation where you have 100 titles, but 97 prices - the question “where did
the mismatch come” can take a lot of time to figure out.
Scrapy
Python Crash Course
Advanced usage:
Disobeying robots.txt - robots.txt is a file, that can be found at the root of the
domain, youtube.com/robots.txt . This file contains instructions for bots how to
parse/srape/crawl the website and index it’s contents. It’s mainly used by friendly
robots like googlebot, bingbot, yahoobot, etc. Although it is advisable to follow
it’s instructions, there are certain exceptions - but it sometimes impossible to
obtain the information needed w/o disobeying. Scrapy obeys the robots.txt file by
default, we can switch that off in the settings. See next page for examples!
Extensions: Throttling with AutoThrottle :
https://docs.scrapy.org/en/latest/topics/autothrottle.html
Items - classes representing something that is extracted:
project_root/project/items.py.
Item Pipelines - used for dropping empty values, cleaning, validating, storing:
https://docs.scrapy.org/en/latest/topics/item-pipeline.html
Bypassing the european cookie consent pages:
https://stackoverflow.com/questions/32623285/how-to-send-cookie-with-scrapy-
crawlspider-requests
Client side rendered (CSR) pages - two choices splash and playwright integration:
https://docs.scrapy.org/en/latest/topics/dynamic-content.html#pre-rendering-
javascript
You need a driver for the browsers you are going to use, they can be found in many
places:
https://www.selenium.dev/downloads/
https://github.com/mozilla/geckodriver/releases (gecko driver is the firefox one)
https://chromedriver.chromium.org/downloads (OLD. with chrome driver you have to
use version that is identical to browser version)
https://googlechromelabs.github.io/chrome-for-testing/known-good-versions-with-
downloads.json NEW LINK for chrome drivers
Complex cases
Python Crash Course
There are some things that are quite hard to scarpe, common obstacles/advanced
cases’ish:
Choose the right tool: requests + bs4, scrapy, or selenium / requests-html when you
have a CSR. Target dictates!
If you don’t hit the elements you see in the browser with the scraper, try using
incognito mode, this will let you see page closer to what your scraping tool sees.
First load problem (cookie acceptance dialogs). Change window size so the elements
would be visible.
Implant the cookies if needed (scrappy, selenium does that automatically).
Optimization with selenium - inject cookie acceptance policy to skip the acceptance
dialog.
JPEG, Video scraping is a thing: https://www.bloomberg.com/graphics/2019-tesla-
model-3-survey/, https://stackoverflow.com/a/37821542/1964707
User agent spoofing - a simple trick that can get though a lot of simple
protections
User agent rotation - a more advanced trick that can be through more advanced
protections
Handling form data (form sending GET vs. POST, sending a CSRF token with POST
requests, selenium handles CSRF)
Authentication, re-authentication - you need to know how to login to reach some
data.
Captchas - most of the time impossible to bypass, even with machine learning
(recaptha). But can be partially automated, just let the client receive
notification that he needs to pass the captcha. Or try automated services:
https://github.com/madmaze/pytesseract , https://www.deathbycaptcha.com/ ,
https://anti-captcha.com/ . Windows API for mouse movement. Robot arm:
https://www.ebay.com/sch/i.html?
_from=R40&_trksid=p4432023.m570.l1313&_nkw=robot+arm&_sacat=0
Rotating IPs with proxies, with scrapy it’s effortless, more work with other
frameworks, see: https://free-proxy-list.net/ , see: https://www.zyte.com/smart-
proxy-manager/ . See: https://stackoverflow.com/a/59410739/1964707
Robots.txt honoring (how to find and interpret).
Complex cases
Python Crash Course
Continued …
Prefer APIs if they exist, even if they are not free
Cache the page / reuse the same variable to avoid superfluous http requests
Crawl / scrape during low-load hours or throttle the requests (nocturnal diurnal
cycle in web apps)
Rotate crawling patterns (especially if you get blocked, countermeasure)
Copyright - do not republish or redistribute the content
Constantly changing HTML (which we don’t control) - can be solved using selectors
as configuration. Generalize your selectors: //#x/ul/li[5]/p vs. li[5]/p ← the
second one is less likely to change, the chain is shorter.
Pagination and infinite scroll (pinterest) are also cases you might encounter
Honeypot traps: sometimes display: none is used to lure the bot to click some
invisible link and then block the ip (selenium handles this automatically because
visibility matters).
Scraper poisoning (scraped data poisoning) /w display: none - fake data inserted
invisible for the user.
Rotating selectors - randomly generated classes / id’s. Use tag selectors,
hierarchies of them.
Headless mode for server environments and massive scrapping.
Screenshots.
Screensizes.
Questions
Python Crash Course
We scrapped data from multiple e-shops for same items, and the names are different
in each shop. How do we compare names/strings in that case? Compare:
https://www.benu.lt/sezono-svarbiausi-produktai/bioderma-apsauginis-kremas-nuo-
saules-raustanciai-odai-naturalaus-atspalvio-photoderm-ar-spf50-30-ml
https://camelia.lt/odos-ir-plauku-kosmetika/151078-apsauginis-kuno-kremas-bioderma-
photoderm-ar-spf50-nuo-paraudimo-30-ml.html
Get structured information: brand, item name, amount in the bottle.
Dumb algoritms: diff btw/ sentences based on number of matching words, number of
matching characters.
Stemming from NLP could be tried: https://towardsdatascience.com/text-cleaning-
methods-for-natural-language-processing-f2fc1796e8c7
Questions
Python Crash Course
Possible to run in linux server w/o a windowing system? YES:
https://stackoverflow.com/questions/68283578/how-can-i-run-selenium-on-linux
Programos planas
Čia galite susipažinti su programa
Additional information
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Python Crash Course
Programos detalus planas
Rasite užduotis, praeitas skaidres ir t.t.
2 Level
1 Chapter
Today you will learn
Wide and long data format
01
02
Multilevel indexes
Python Crash Course
00
Joins
Time series
03
Cleaning data
04
05
06
Correlation
Exercises
07
Practical Task P3
Joins
Just like in SQL, we have the ability to join two distinct dataframes.
Some call this mechanism: “SQL style joins” as they have some parallels with the
Joins used relational model.
SQL discussion about Joins can be found here:
https://docs.google.com/presentation/d/1yaHEZojmi3CcDDAZIItjSC1oekxTMflV/
More info on merge: https://stackoverflow.com/questions/53645882/pandas-merging-101
Methods:
df1.join(df2) → join on index
df.2merge(df2) → join on any column in multiple ways (more versatile than
join)
indicator=True,
how='left',
validate='m:m'
etc.
pd.concat([df1, df2]) → contact top-to-bottom, like “union”
Additional examples:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
https://dfrieds.com/data-analysis/melt-unpivot-python-pandas.html
2 Level
1 Chapter
Today you will learn
Bytes
Dict
01
02
03
List
Python Crash Course
00
Collections
04
Tuple
05
Set
06
Protokolai
07
Control Structures: Ciklai, Sąlygos
08
Comprehensions
Collections
Python Crash Course
Kolekcijos yra aibės elementų pasiekiamos vienu vardu - kintamasis reprezentuojatis
ne vieną, bet daugybę reikšmių. Mes kol kas nekalbėsime apie Python Collections
modulį, tai yra atskira tema, kol kas apibrėžiame žodį “collections” negriežta
prame, kaip antonimą skaliariniems kintamiesiems.
Galime išskirti tokias kolekcijas:
Strings
Bytes
List
Dict
Tuple
Set
… kadangi apie Strings jau kalbėjome, pradėsime nuo bytes.
Bytes
Python Crash Course
Panašūs į String tipo kintamąjį - jei string yra seka unicodo simbolių tai baitai -
sekos baitų.
Galime saugoti binarinius duomenis.
Galime saugoti fiksuoto pločio enkodinimu enkodintą tekstą, pvz: ASCII.
Sutiksime karts nuo karto bibliotekų metodus, kurie atiduoda baitus kaip išdavą -
tarkim failų ir tinklo duomenis.
Literal forma: b’aaa’
Daug string operacijų yra palaikomos - indeksavimas, slicinimas, iteravimas.
Jei turime baitų masyvą ir norime paversti jį į string, turime žinoti, kokiu
encodingu buvo uženkodinti baitai.
List
Python Crash Course
Labai dažnai naudojama, mutable duomenų struktūra reprezentuojanti “aibę” elementų.
Skirtingai nuo stringų yra mutable, taigi append() ar kiti metodai operuoja ant to
paties listo.
Indeksuojami nuo 0 ir turi tvarką - ordered: kaip dedu taip ir susideda.
Galime dėti heterogeniškus tipus bei dublikatus.
Replace operacija, tiesiog priskiriant listo pozicijai kitą reikšmę (lst[5] =
“Jonas”)
Taip pat del operatorius bei remove() funkcija.
Galime unpackinti reišmes iš listo į named references (kintamuosius) a, b = [1, 2]
or even name, num1, num2 = sys.argv
Įsimintini metodai: sort() vs. sorted() (daugiau kitoje skaidrėje), reverse() vs.
reversed(), append(), insert(pos, val), extend([list2]), pop(), clear() ir t.t. …
help(list).
Pasiekti reikšmes “nuo galo” naudojame neigiamus indeksus, plg: list[len(list) - 1]
== list[-1], tačiau list[0] == list[-0], saugokitės OBOB’ų!
List repetition ir concatenation su * ir +
Daugiau: https://www.w3schools.com/python/python_lists.asp
Implementacija:
CPython implementuoja List’us kaip array of pointers su overallocatinimu, O(1)
access
Daugiau: https://stackoverflow.com/questions/3917574/how-is-pythons-list-
implemented .
Gal žinote ar tai yra cache-friendly duomenų struktūra?
List kopijavimas:
skirtumas tarp reference ir value
list2 = list1[:]
list2 = list1.copy()
list2 = list(list1) // galima naudoti netik su listais.
kopijos yra lėkštos taigi nukopijuoti listai visvien mato tuos pačius objektus, jei
reikia “deep copy” reikia naudoti copy modulį.
[0] * 9 taip pat pakopijuoja nuorodą, o ne reikšmes, todėl pakeitę objektą nested
liste pasikeitimą matysime visuose elementuose: s = [ [1] ] * 5 ; s[2].append(6)
DĖMESIO: copy.deepcopy() - source kodą galime peržiūrėti ir tai bus python, bet
copy() metodo kodas jau bus C kalbos.
List
Python Crash Course
Rikiavimas:
sort() metodas, pakeičia originalią list kolekciją (in-place sort).
Naudoja timsort algoritmą (apie tai daugiau kalbėsime paskaitoje apie algoritmus).
Du argumentai: key ir reverse.
Key galime paduoti callable objektą ir tuomet intepretatorius naudos kiekvienam
itemui tą callable objektą, kad palygintų visus narius tarpusavyje.
Pvz: list_of_works.sort(key=len)
Sorted() metodas gražina kopiją listo tik išrikiuotą.
Dict
Python Crash Course
Fundamentali duomenų struktūra Python’e:
key:value poros - { k1 : v1 , k2 : v2 }
panašiai kaip map (java/c#) ar associative array (php), json object (js) kitose
kalbose.
Mutable, bet negalime naudoti dublikuotų key reikšmių - turi būti unikalios.
Nuo 3.7 pridėti itemai yra išlaikomi įdėjimo tvarka (ordered), kas nėra didelė
problema, nes "adresavimas” nėra pagal indeksus, bet pagal key’jus.
Stebuklinga savybė: constant time access O(1) - Implementacija: hashtable. Gal
žinote kuo ypatinga Hashtable struktūra? Kas būna su collisionas tai pythone yra
naudojamas Open Addressing Collision Resolution Strategy.
Metodai: keys(), values(), items(), pp(), update() (why we need .update() -
https://stackoverflow.com/a/70773868/1964707 )
Kopijavimas yra shallow by default taip pat.
Key gali būti tik immutable duomenų tipai: str, int, float, byte, tuple,
datetime.datetime
Plačiau: https://www.w3schools.com/python/python_dictionaries.asp
Tuple
Python Crash Course
Immutable seka objektų:
Literal forma gali būti inicializuojama su skliaustais arba išvis be jų. Taip pat
tuple() konstruktorius.
Charakteristikos: ordered, unchangeable / immutable, leidžia dublikatus
Galime naudoti * ir + operatorių padidinti kartojant ir sujungti du tuple’us (beje,
tai galime daryti ir su string).
Vėlgi galimas nestinimas ir yra daugybė API metodų įvairioms operacijoms.
Tuple unpacking - priskiriame named referene’am tuple reikšmes: a, b, c = ("apple",
"banana", "cherry")
Ref: https://www.w3schools.com/python/python_tuples.asp
Panaudojimas:
Dict (multi)key for fast dict access
Fast access / iteration (but not if you need modification)
Unpacking a = (a, b) // a = a, b // a, b = b, a
Heterogenous collections: [("Max", "Weber", 55), ("Min", "Power", 44)]
Implementacijos detalės:
https://rushter.com/blog/python-lists-and-tuples/
Palyginkime Tuple ir List savybes (šone). Taip pat yra konvencija naudoti tuple’us
heterogeniškoms kolekcijoms, ref: https://stackoverflow.com/a/1708610/1964707
Set
Python Crash Course
Unikalių elementų nerikiuota kolekcija.
Inicializuojamas su { } (jei reikšmių yra) arba su set() konstruktoriumi, tačiau:
python -c "print(type({}))" .
Elementai turi būti immutable, pats yra mutable.
Dažnas panaudojimas: išmesti dublikatus / palikti unikalias reikšmes iš
listo/tuple’o efektyviai.
Galime iteruoti, galime pridėti į setą .add() / .update().
… removinti: .remove() (meta klaidą, kai nėra item) bei .discard() (nemeta klaidos)
… ko negalime padaryti tai adresuoti nario su index notacija.
Ref: https://www.w3schools.com/python/python_sets.asp
Set algebra - Venn diagrams:
.union() - komutatyvi operacija. Result: set
.intersection() - ar komutatyvi? Result: set
.symmetric_difference() - komutatyvi operacija. Result: set
.issubset() - ar komutatyvi? Result: bool
.issuperset() - ar komutatyvi? Result: bool
.isdisjoint() - ar komutatyvi? Result: bool
higher level: union - symmetric difference = intersection
practical example: 2 groups of buyers, 1 - used coupon, 2 - bought >$1000
Protokolai ir apibendrinimas
Python Crash Course
Protocol: https://mypy.readthedocs.io/en/stable/protocols.html
Tam, kad objektas suportintų tam tikrą protokolą jam nebūtina implementuoti
interface’o ar paveldėti iš abstrakčios ar konkrečios klasės. Tiesiog palaikyti tam
tikrą operaciją:
Kompleksiškumas gali kilti, nes visos šios struktūros gali būti “nestinamos”.
Tačiau prisiminus Zen of Python (PEP20) supraskime, jog tai nėra Pitoniškas kodas:
“flat is better than nested” - geriau nenestinti pernelyg “giliai”. Vengti tokio
kodo:
If sąlyga
Python Crash Course
If sąlyga leidžia sąlyginį kodo vykdymą, kada vienas kodo gabalas gali būti
visiškai nepaliestas interpretatoriaus arba sąlygoms esant viena kodo dalis
leidžiama vietoje kitos:
Schematiškai: if <expr>: - tai sąlygos head’as.
Tuomet yra indentacija, rekomenduojami 4-turi space’ai.
Viskas kas yra pastumta yra if body.
Jei sąlyga <expr> įvertinama kaip False arba falsy tai if body esantis kodas NEbus
vykdomas.
Jei sąlyga <expr> įvertinama kaip True arba truthy tai if body esantis kodas bus
vykdomas.
Naudoti command-line one-lineriuose sudėtinga, nors linux paprasčiau bei windows:
https://stackoverflow.com/questions/2043453/executing-multi-line-statements-in-the-
one-line-command-line
Else - naudojamas kai turi būti įgyvendintas “vienas arba kitas” kodo gabalas.
Elif - jei atsiranda 3-čia, 4-ta, … n-ta opcija ir rašomas tarp if ir else.
Always be mindful to cover all the ranges of values in your conditionals ( … not
really OBOBs, but similar)
… if age < 15
… elif age > 15 || elif age >= 15
Multiple chained if’s (flat) is better than nested if’s.
Eliminating nested if’s using datastrcutures:
https://colab.research.google.com/drive/19unAvFJZu1qPrPHrYO7xLP1DWKgs4qA-?
usp=sharing
05
06
Image representation in CV
Questions and what’s next
In this section our main goals are 3
- learn how images are represented and preprocessed in ML
- learn about a new type of neural network called Convolutional Neural Network
(including how to use it)
- learn a new technique about transfer learning!
In the future lectures you will see that we will use CNNs for Inverse Image Search
(part 8) and we will revisit some of the same topics in part in Advanced Computer
Vision (object detection and segmentation, part 13)
After this part you will know how to:
Prepare images for training ConvNet
Train a FCFFNN for image classification and compare it to ConvNet
Understand why FCFFNN is not sufficient (this is a common interview question)
Train and optimize a ConvNet (in all 3 popular frameworks: TFK, Torch, Fast.ai)
Use transfer learning to improve your models
Structure of Part 7
Computer vision and image classification
People ingest ~85% of information about the world through their eyes.
What if we could make computers that understood visual information and acted upon
it?
That is preciselly what computer vision is about - making computers see things and
act on that information.
Computer vision is an interdisciplinary scientific field that deals with how
computers can gain high-level understanding from digital images or videos. From the
perspective of engineering, it seeks to understand and automate tasks that the
human visual system can do (and maybe even can’t do - going above an beyond).
Computer vision tasks include: processing, analyzing and understanding digital
images, extraction of high-dimensional data from the real world in order to produce
numerical or symbolic information and even actions / decisions based on the
information.
Computer vision data can take many forms, such as video sequences, views from
multiple cameras, multi-dimensional data from a 3D scanner or medical scanning
device (CT scan, CAT scan, x-ray scan, MRI scan). Spectrum: greyscale 28x28 images
to huge 3D images and videos.
The technological discipline of computer vision seeks to apply its theories and
models to the construction of computer vision systems.
What is computer vision
Computer vision and image classification
Sub-domains of computer vision include scene reconstruction, event detection, video
tracking, object recognition, 3D pose estimation, motion estimation, visual
servoing (servo motors), 3D scene modeling, and image restoration, image
classification.
Computer vision is not a single thing! There are many interesting problems that are
being solved with a variety of different techniques.
What is computer vision
Computer vision and image classification
Applying filters and other image processing techniques. They can be used by
computer vision, but only as an intermediary step in understanding the image.
Certainly not design, working with photoshop or photography (cameras can use
mechanisms derived from computer vision).
Don’t confuse image processing / retouching / digital art creation / graphic design
and computer vision.
What computer vision is NOT
Computer vision and image classification
Barcode scanner → Amazon go: https://towardsdatascience.com/how-the-amazon-go-
store-works-a-deep-dive-3fde9d9939e9
Parking meter → Pay-by-Plate: https://viso.ai/computer-vision/automatic-
number-plate-recognition-anpr/
Medical diagnostics → cancer recognition / prediction.
Applications
Computer vision and image classification
https://hackernoon.com/a-brief-history-of-computer-vision-and-convolutional-neural-
networks-8fe8aacc79f3
Most important milestones:
Cats experiments (~1950)
1957 images to numbers
3D to 2D scanining (~1960)
1982 established that vision is hierarchical
CNN (1985-1989, LeCun applies backprop to Fukushimas CNN)
AlexNet (2012) - not the first in anywhere, but probably the most impactfull
historically: ILSVRC.
Remember. The target in AI is:
Human level performance
Super human performance
Deep learning for computer vision
Computer vision and image classification
A CNN that beat it’s competition.
Designed by Alex Krizhevsky and published with Ilya Sutskever and Krizhevsky's
doctoral advisor Geoffrey Hinton.
AlexNet competed in the ImageNet Large Scale Visual Recognition Challenge on
September 30, 2012. The network achieved a top-5 error of 15.3% (top-1 and top-5
error rates explained: https://stats.stackexchange.com/a/156515/162267 ), more than
10.8 percentage points lower than that of the runner up.
Many consider this the starting point of modern computer vision boom.
Deep learning for computer vision
Computer vision and image classification
The high level process is the same (supervised learning):
Take a set of images
Label them / preprocess them
Create CNN
Train it using backprop
Tune and train again
Additional concepts we will need are:
Image representation in CV
Image pre-procesing techniques
Deep learning for computer vision
Computer vision and image classification
Constructivist vision - we see in 2D, but construct a 3D model of the world.
Our brain constructs, derives 3D from 2D based on:
Size of objects remembered or infered from similar objects
Their ordering in space (closer further)
most importantly stereoscopy (2x2D images produce stereoscopy).
Do we need two cameras for depth in CV? Not really. Link
The brain constructs a lot of what we see - colors in the peripheral vision, no
blind spot.
Colors. There are several color systems: RGB, CMYK
We see RGB: https://www.youtube.com/watch?v=l8_fZPHasdo
How do we see?
Computer vision and image classification
continued…
How do we see?
Computer vision and image classification
Remember, all the qualitative data in DL needs to be reduced to quantitative data.
Images are comprised of pixels.
Each pixel is comprised of RGB values (if it’s a colored image, multichannel image)
or just one value if it is a gray-scale image (single channel image).
We represent the set of all the pixels comprising the image as tensors. In both
grayscale and RGB cases. However in the RGB case the 3rd dimesion has 3 elements.
So an image of size 6px by 6px will be represented as a 3D tensor
Image representation in CV
Computer vision and image classification
Because an image can be represented as 3D tensor, a bunch of images is a 4D tensor.
We ussually feed 4D tensors to NNs when working with image data.
Tensors need to be of the same size, so all images need to be of same size and have
the same number of color chanells (for a NN with a single input layer).
What does this tensor of image data mean:
Image representation in CV
Computer vision and image classification
The answer is it depends, but ofter it’s like in the picture below.
Note that the number of images is first number in all frameworks. The meaning of
other numbers depends on the framework you are using!
Image representation in CV
Computer vision and image classification
We could learn to classify images and perform even more demanding tasks with
already preprocessed images from well known datasets (MNIST (28x28x1, 70K
symmetric, hw diggits, 10 class), fashion MNIST (28x28x1, 70K symmetric, clothes,
10 class), CIFAR-10/50/100 (32x32x3, 60K symmetric, common items, 10 class),
Imagenet, Traffic Signs, etc.). But these techniques will be necessary when doing
work on new images.
Image preprocesing can improve your models. Sometimes even enable them to work!
We will discuss 6 common techniques. There are way more!
Image preprocessing techniques
Computer vision and image classification
Uniform aspect ratio: the ratio of the width to the height of an image or screen.
Since it’s a ratio, that means it’s division: width / height = aspect ratio.
Uniform image size: CNN uses feature maps to extract features from images.
You might need to rescale the images according to the feature maps that you are
going to use (downscaling, sometimes upscaling).
Image preprocessing techniques
Computer vision and image classification
Mean image subtraction. To make models more robust, so that it would work with
noisy images we can use the mean image technique. Calculate the mean and then
subtact from all images. Additional benefit of this technique is making a uneven
backgrounds in the images more even - reducing the chance that our model will take
lightning and background color as a very import feature.
Calculate the mean image, inspect it.
We can subtract the mean image from all our images (2 techniques):
Data augmentation - when you train your network you can add more data to the
training set by perturbing, flipping (one of the safest, except when assymetries
apply), adding noise, rotating, shifting (geometric translation & crop) and
applying other techniques.
This will make your model more robust and give you more data. So the definition of
augmentation is a way of generating new data form existing data by applying some
transformations to the data. Be careful not to create data that can not exist!
Data Augmentation is achieved by appling Transformations on images. We need
distinguish that transformations can be applied w/o augmenting the dataset
(transformations “in place”). Be careful not to assume augmentation when using
frameworks. Usually augmentation needs to be specifically programmed, example in
Pytorch:
https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_illustra
tions.html#resize and
https://keras.io/api/layers/preprocessing_layers/image_augmentation/
random_brightness/
Image preprocessing techniques
Computer vision and image classification
You can find a list of real augmentation techniques performed in this article:
https://machinelearningmastery.com/best-practices-for-preparing-and-augmenting-
image-data-for-convolutional-neural-networks/
Demos:
Resizing and rescaling
Cropping and denoising
Augmentation
Normalization
ZCA whitening and decorrelation
Image transformations with Pytorch
Normalization for entire dataset per channel
Image preprocesing techniques
Computer vision and image classification
A note on ZCA whitening
Image preprocesing techniques
Computer vision and image classification
Decorelation
Image preprocesing techniques
Computer vision and image classification
Pytorch image transformations
A good illustrativee reference:
https://pytorch.org/vision/main/auto_examples/plot_transforms.html#sphx-glr-auto-
examples-plot-transforms-py
Keras transformations
Keras choose to not include many preprocessing functions. They rely on the python
ecosystem and packages like Pillow / See:
https://keras.io/api/layers/preprocessing_layers/image_preprocessing/ and
https://keras.io/api/layers/preprocessing_layers/image_augmentation/
Fast.ai transformations
Ref: https://fastai1.fast.ai/vision.transform.html (old documentation is still
better) and new one is here: https://docs.fast.ai/vision.augment.html
Image preprocesing techniques
Computer vision and image classification
Swirling, pixel attacks and making networks more robust.
Ref: https://arxiv.org/pdf/1710.08864.pdf
Ref: https://ipsjcva.springeropen.com/articles/10.1186/s41074-019-0053-3
Image preprocesing techniques
Computer vision and image classification
Next:
We will learn image classification with FCFFNNs
Understand their inadequacies and compare them to Convolutional Neural Networks
that solve the problems with FCFFNNs.
Questions:
Can you pass two different sized images to a CNN? Yes, but they will take different
paths until they are reduced to the same feature vector at the fully connected
layer, ref: https://www.quora.com/How-do-you-create-a-CNN-in-TensorFlow-that-takes-
2-different-sized-images-as-input-Do-these-two-images-pass-through-different-paths-
in-the-network-and-come-together-in-a-fully-connected-layer
Questions and what’s next
Computer vision and image classification
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Random Forests
Introduction to Machine Learning
Boosting (hypothesis boosting) - ensemble method that combines several weak
learners into a strong learner training predictors sequentially, each trying to
correct its predecessor.
There are many boosting methods available, but by far the most popular are AdaBoost
(Adaptive Boosting) and Gradient Boosting.
Stacking
Introduction to Machine Learning
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Recommender Systems
Detailed course plan
Slides, tasks and so on
Additional information
However projection is not always the optimal solution - like swiss roll projection.
This is because swiss roll is a manifold - a 2D shape that is twisted in 3D space.
Image ref: https://www.cnblogs.com/yangxiaoling/p/10658376.html
This is geometric interpretation of data.
Introduction to Machine Learning
Many dimensionality reduction algorithms work by modeling the manifold on which the
training instances lie - called Manifold Learning.
Relies on the manifold assumption - most real-world high-dimensional datasets lie
close to a much lower-dimensional manifold.
This assumption is very often empirically observed - for example with MNIST dataset
you have way fewer degrees of freedom than if you were just to generate any random
greyscale image - all handwritten digits are centered, connected, have shapes that
are similar or close - this gives the ability to squeeze the manifold to lower
dimensions w/o losing the important information.
Dimensionality Reduction
Introduction to Machine Learning
Most popular dimensionality reduction technique, by far.
PCA identifies axis with largest amount of variance in the training set and uses it
to transform the data (preserving maximum variance).
Selecting the axis that preserves maximum amount of variance, will likely lose less
information than other projections. Another way to phrase it is - the axis that
minimizes the mean squared distance between the original dataset and its projection
onto that axis.
The axis that accounts of the largest variance is called principal component 1 -
PC1. PC2 is always perpendicular to PC1 and so on till PCn.
See: https://www.youtube.com/watch?v=HMOI_lkzW08
You can calculate the PCA by using SVD - singular value decomposition. NumPy’s
svd() function can obtain all the principal components of the training set. We can
then extract the two unit vectors that define the first two PCs. PCA assumes that
the dataset is centered around the origin, so subtracting the mean should be done.
Code example:
X_centered = X - X.mean(axis=0)
U, s, Vt = np.linalg.svd(X_centered)
c1 = Vt.T[:, 0]
c2 = Vt.T[:, 1]
Clustering
Introduction to Machine Learning
Applications of clustering:
When exploring datasets - EDA. Run a clustering algorithm when analyzing what you
have and then exploring clusters separately.
Social network analysis - cluster analysis to identify communities, suggest missing
connections between people.
Biology - find groups of genes with similar expression patterns (you have
expression patterns as data, find clusters).
Recommendation systems - identify products or media that might appeal to a user by
clustering the items (same cluster - similar items.) Obviously this would be just
one way to do it out of many.
Marketing - find segments of consumers by their purchasing or browsing behavior,
analyze how they respond to marketing campaigns and targeted discounting campaigns.
Even dimensionality reduction - for some applications we can replace the original
data of dimension N with the cluster affinity metrics (how well the instance fits
it’s cluster) which is K dimensional and usually K << N. Example: only use the
cluster centers when classifying new instances instead of entire dataset!
Anomaly detection - low affinity score (distance from the center is high) can
indicate outlier. Same for outlier elimination.
Semi-supervised learning - if we have few labels we can propagate same labels to
all instances of the cluster.
Search engines - cluster and return similar items by cluster (similar news articles
in something like google news).
To segment an image by clustering pixels according to their color then replacing
the color with the mean color of the cluster. You can read more about it here:
https://www.kdnuggets.com/2019/08/introduction-image-segmentation-k-means-
clustering.html
Clustering
Introduction to Machine Learning
K-Means algorithm - simple, quite powerful and pretty fast. Proposed by Stuart
Lloyd at Bell Labs in 1957, published outside of the company in 1982. In 1965,
Edward W. Forgy had published virtually the same algorithm, so K-Means is sometimes
referred to as Lloyd–Forgy / Lloyd, also naive k-means.
How does it work?
Pick number K (problematic as we will discuss shortly)
K centroids initialy placed randomly (initialization is problematic and will be
discussed more latter)
The random placement is chosen by taking k instances as the centers.
Calculate and update the centroids to minimize sum of distances from each
datapoint.
Repeat the process until the centroids stop moving (sum of distances is minimal).
Algorithm is guaranteed to converge (stop oscilating), but is not guaranteed to
converge optimally.
See: https://www.youtube.com/watch?v=4b5d3muPQmA and https://www.youtube.com/watch?
v=yR7k19YBqiw
Advantages:
Easy to understand
Easy to interpret and verify the results (if dimensionality is low)
Although possible to encounter slow convergence, usually one of the fastest
clustering algorithms (if you encounter slow convergence tune the initialization of
centroids).
Disadvantages:
Sensitive to the initial conditions (initial centroids, even data permutations
gives different results). Solutions are not stable.
Needs tunning for K
Sensitive to clusters w/ varying sizes, different densities, or nonspherical shapes
K-Means Clustering
Introduction to Machine Learning
Scikit and KMC
We can use make_blobs() function to generate simple cluster data for
experimentation.
Demo: Genearting data
Scikit offers a k-means clustering algorithm - sklearn.cluster.KMeans you specify
number of clusters k the algorithm must find. In some cases, it is obvious from
looking at the data what K should be, but in many cases it is not, we will talk how
to choose it soon.
Demo: Simple KMC
If you see that instances are incorrectly assigned you can try soft clustering -
which gives each instance a score per cluster - score can be the distance between
the instance and the centroid (transform() method measures the euclidian distance
from each instance to every centroid). Soft clustering can be used as
dimensionality reduction technique, although it would be an advanced topic.
This is an alternative to hard clustering - when the algorithm assigns a single
cluster - we discussed it above.
Demo: Hard Clustering vs Soft Clustering
K-Means Clustering
Introduction to Machine Learning
Centroid initialization methods
K-Means++ - default in scikit. This is an improvement to k-means with a smarter
initialization step that tends to select centroids that are distant from one
another, and this improvement makes the K-Means algorithm much less likely to
converge to a suboptimal solution (init='k-means++').
Run KMC with random initialization multiple times, keep best solution (use
init="random" parameter). Which solution is best? kmeans.inertia_ - is the
performance metric used by scikit. The algorithm is run n_init (10 by default,
centroid seeds) times, best score is kept. kmeans.score(X) returns negative inertia
because of the “greater is better” rule in scikit when using scoring methods, like
score.
Manual initialization - you can pass approximate initialization for the centroids
by visual inspection:
kmeans = KMeans(n_clusters=5, init=np.array([[-3, 3], [-3, 2], [-3, 1], [-1, 2],
[0, 2]]) , n_init=1)
Demo: centroid initialization. Let’s observe how random initialization can produce
suboptimal solutions.
Choosing K:
Choosing K is not so simple, as inertia is not a good performance metric when
trying to choose k because it keeps getting lower as we increase k (the more
clusters there are, the closer each instance will be to its closest centroid,
lowering the innertia). It could be used, but it will not result in optimal
solution.
More precise approach (more computationally expensive) is to use the silhouette
score (sklearn.metrics.silhouette_score()), which is the mean silhouette
coefficient over all the instances. An instance’s silhouette coefficient is equal
to (b – a) / max(a, b), where a is the mean distance to the other instances in the
same cluster (i.e., the mean intra-cluster distance) and b is the mean nearest-
cluster distance (i.e., the mean distance to the instances of the next closest
cluster, defined as the one that minimizes b, excluding the instance’s own
cluster). Silhouette coefficient can vary between –1 and +1:
close to +1 means that the instance is well inside its own cluster and far from
other clusters,
close to 0 means that it is close to a cluster boundary,
close to –1 means that the instance may have been assigned to the wrong cluster.
Demo: Choosing K
K-Means Clustering
Introduction to Machine Learning
//
K-Means Clustering
Introduction to Machine Learning
//
K-Means Clustering
Introduction to Machine Learning
PCA can be performed before doing the clustering - this would help vizualize the
clusters on a highly dimensional data since PCA will alow us to reduce the
dimensionality. Note that feature scaling (normalization or standardization)
usually is applied before clustering as well as PCA. We do not want our clusters to
be determined just because the values are on a different scale, we want them to be
determined by geometric distance and closeness to each other in a situation where
the scale of the features would not matter (one of many reasons is that the
absolute values are not important - the differences between values are: it does not
matter that I earn 10K€ in the year 2025 as much as it would have mattered in 2018.
This is an important, general point in data analysis - the values do not matter,
the comparison does).
Tasks - these are the tasks related to the actuall bussines problem solving (as
opposed to other tasks like generating and visualizing TS data which are related to
data, not to the bussines problem):
Regression - predicting a continous variable in time series.
Classification - given a set of time series with class labels, can we train a model
to accurately predict class of new time series? Example: ECG cardiograms classified
by different issues of the heart.
Predicting the next word/letter.
Forecasting - financial asset prices in a temporal space, ussually refering to
predicting multiple values, not one continous variable. Forecasting can be
described a prediction of the future based on the past.
Seasonality detection - detecting cyclical variations in time series data that then
can be used latter for intepretation of the data.
Action modeling in sports - predict the next action in a sporting event like
soccer, football, tennis etc.
Speach synthesis - generating speech or text (can be thought of as prediction).
Music composition - https://konstilackner.github.io/LSTM-RNN-Melody-Composer-
Website/
Next image/frame generation - https://arxiv.org/pdf/1502.04623.pdf
Sentiment analysis - classification task ussually (this should be considered as a
sequential data only if we are not using BOW models)
Anomaly detection - detecting data points that violate the general pattern that the
data follows.
Tasks with time series data
Sequential Data Analysis
These are the patterns TS data can exhibit:
Uptrend / horizontal / steady / downtrend, possitive/negative secular/upwards
trend. Example.
Regular variation - seasonality (weekly sales). Example.
Irregular variation - cyclical, peaks and trofs are not regular. Example.
Combination: positive secular trend with seasonality. Example.
Random - no identifyable variation (trends sometimes are not counted). Example.
Question - what is the difference between seasonal and cyclical: A seasonal pattern
exists when a series is influenced by seasonal factors (e.g., the quarter of the
year, the month, or day of the week). ... A cyclic pattern exists when data exhibit
rises and falls that are not of fixed period. An example you can remember -
shorterm and longterm credit cycles - we “know” that these cycles exist, but we
don’t know when each cycle will end/begin.
Time series data patterns
Sequential Data Analysis
Explanatory video: https://www.youtube.com/watch?v=rPrJ7sSbTqM
Explanatory SO: https://stats.stackexchange.com/a/234601/162267
VU slides:
http://web.vu.lt/mif/a.buteikis/wp-content/uploads/2019/02/Lecture_03.pdf
Time series data patterns
Sequential Data Analysis
Let’s practice to identify patterns
Time series data patterns
Sequential Data Analysis
Let’s practice to identify trends
Time series data patterns
Sequential Data Analysis
Let’s practice to identify trends
Time series data patterns
Sequential Data Analysis
When TS Analysis is not used
Values are constant.
Values are generated using functions (except when you don't know that).
In short - we are interested in real world data when performing analysis.
Time series data patterns
Sequential Data Analysis
Some raw datasets:
The UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets.php
The UEA and UCR Time Series Classification Repository:
http://www.timeseriesclassification.com/
Governmental:
https://www.ncdc.noaa.gov/cdo-web/datasets - publishes a variety of time series
data relating to temperatures and precipitation at granularities as fine as every
15 minutes for all weather stations across the country.
https://www.cdc.gov/flu/weekly/overview.htm - publishes weekly flu case counts
during the flu season.
https://fred.stlouisfed.org/ - economic time series data
https://www.comp-engine.org/# - “self organizing database of time-series data” has
more than 25,000 time series databases, almost 140 million individual data points:
“website allows you to upload time-series data and interactively visualize similar
data that have been measured by others”
https://cran.r-project.org/web/packages/Mcomp/index.html (R only)
Note: It can be difficult to learn model turning on these data sets because they
present complicated problems. However, they are good for learning cleaning,
visualization, etc. For example, many economists spend their entire careers trying
to predict the unemployment rate in advance of its official publication, with only
limited success.
For some easier datasets we have these:
Python library and a company: https://www.quandl.com/, now nasdaq (see:
https://github.com/Nasdaq/data-link-python) - this enables faster experimentation
with the concepts.
Things like google trends, bitcoin / etherium exchange rate monitoring tools are
also ways we interact with time series data. APIs are the ussual choises. There are
APIs, that give data in time intervals (for the entire time interval) and there are
APIs that give only current data (you will need to collect that data yourself over
a time period).
Tables rows - not hard to generate, images - very hard to generate manually, time
series - not hard (for simple cases).
Obtaining sequential data
Sequential Data Analysis
Time series data can be stored in may ways, just to name a few:
File without a format at all.
File with a custom format (.lisp.7z)
File formated to standard formats: csv (text), xls (binary), xlsx (binary).
APIs (SOAP, REST, GQL)
Different general databases (Mysql, Postgresql)
Specific databases for TS
Dedicated TSDB solutions: influx db
Tradional relational databases like PostgreSQL can be used as time series databases
with plugins, like: https://www.timescale.com/
Other nosql options: cassandra (more often used for it’s multimaster clustering
capabilities), mongo
Storing TS data @scale is a complex topic (as anything @scale), if you need to do
it I would recommend starting with Postgresql + Timescale, researching influx (it
was worse than Timescale in 2016, we researched it in company I worked in) and
other nosql solutions only if you really, really need them.
Recommended reading: https://stackoverflow.com/questions/27002903/efficiently-
storing-time-series-data-mysql-or-flat-files-many-tables-or-files and
https://medium.com/@neslinesli93/how-to-efficiently-store-and-query-time-series-
data-90313ff0ec20
Storing time series data
Sequential Data Analysis
Generating data is an important skill to have:
https://towardsdatascience.com/synthetic-data-generation-a-must-have-skill-for-new-
data-scientists-915896c0c1ae
When you are training a model, testing a database you need simple data with
controlled noise injected into the data (or any other phenomenon carefully
crafted), simulate different patterns and so on.
What of generating the data:
By hand / hardcoded - simple list or dictionary
Numpy / pandas. For example: pd.date_range(start='1/1/2018', end='01/08/2018',
freq='H')
Utilities like TimeSeriesMaker: https://github.com/mbonvini/TimeSeriesMaker or
TimeSynth: https://github.com/TimeSynth/TimeSynth
… and more that can be found on the internet. Sometimes from research papers.
Generating time series data
Sequential Data Analysis
Visualization plays an important role in time series analysis and forecasting.
Plots of the raw sample data can provide valuable diagnostics to identify temporal
structures like trends, cycles, and seasonality that can influence the choice of
model. We will see 6 different plots:
Line Plots.
Histograms and Density Plots.
Box and Whisker Plots.
Heat Maps.
Lag Scatter Plots.
Autocorrelation Plots.
The three that are mostly associated with time series data are in bold above.
The focus is on univariate time series, but the techniques are just as applicable
to multivariate time series, when you have more than one observation at each time
step.
Visualizing time series data
Sequential Data Analysis
Box and Whisker Plots - plot draws a box around the 25th and 75th percentiles of
the data that captures the middle 50% of observations.
IQR - interquartile range.
In the context of stock pricing similar looking charts mean a different thing.
Candle stick chart.
Visualizing time series data
Sequential Data Analysis
Box and whisker plots do indicate skewness.
Visualizing time series data
Sequential Data Analysis
Lag Scatter Plots - a datapoint at time t has a particular value and it depends on
t-1 point. The points prior to a particular point are called lags.
We can plot t and t+1 on x and y axis for each graph point - lag plot.
Pandas has lag plotting functionality: lag_plot(series, lag=n)
When trying to understand the lag plots, we can imagine a sliding window moving
over the time series of width that is equal to the lag size, let’s call it wt,t+1.
So between w1,2 and w2,3 we get that both values (the y at t and y at t+1) are
increasing then we have a positive correlation. And the lag plot will have an
increasing slope:
… if the values are decreasing going between w1,2 and w2,3 then we have a negative
correlation and the lag plot will have a negative slope (see above).
If the time interval is 1 year, then the biggest correlation will be at lag=1 or
lag=365, the largest negative correlation at lag=180 (approx). Closest to 0 at 90
(not tested).
Visualizing time series data
Sequential Data Analysis
Autocorrelation Plots - two variables are said to be possitivelly correlated if
when one increases the other increases as well and negativelly correlated when one
increases the other decreases. The magnitude of the correlation is at max -1 / 1.
Correlation values, called correlation coefficients, can be calculated for each
observation and different lag values. Once calculated, a plot can be created to
help better understand how this relationship changes over the lag. This type of
plot is called an autocorrelation plot or autocorrelation function plot (ACP or ACF
plot) and Pandas provides this capability built in, called the
autocorrelation_plot() function.
Importantly - autocorrelation plots and (pACF plots which we will see latter)
answer the question - how much data do I need to collect untill I have captured all
of the behaviors or the system (well, at least predictable behaviors). Think of a
sprinkler w/ rotation speed of 1 min per rotation. If it has constant water flow
and preasure after 1 minute there will be no additional information we can capture.
Partial autocorrelation plots - is a summary of the relationship between an
observation in a time series with observations at prior time steps with the
relationships of intervening observations removed. The partial autocorrelation at
lag k is the correlation that results after removing the effect of any correlations
due to the terms at shorter lags. PACF only describes the direct relationship
between an observation and its lag.
Another important thing why PACF and ACF's are important - we will be able to
choose AR, MA or ARMA models based on PACF and ACF distributions.
Visualizing time series data
Sequential Data Analysis
As with any data analysis task, cleaning and properly processing data is often the
most important step of a data pipeline. Fancy techniques can’t fix messy data
(GIGO). Most data analysts will need to find, align, scrub, and smooth their own
data either to learn time series analysis or to do meaningful work in their
organizations. As you prepare data, you’ll need to do a variety of tasks, from
joining disparate columns to resampling irregular or missing data to aligning time
series with different time axes. We will start with trend detection.
Some techniques:
Smoothing (Hodrick–Prescott) and filtering
Downsampling (common task in time series preprocessing)
Enrichment (adding context e.g. adding GPS coordinates from Wi-Fi positioning
system)
Rolling window statistics
Missing / corrupt data handling: deletion (can’t delete one!), imputation,
interpolation
Missing / corrupt data handling with forecasting
Stationarity: https://www.youtube.com/watch?v=oY-j2Wof51c - stationary time series
is one that has fairly stable statistical properties over time, particularly with
respect to mean and variance. Synonymous with “stability”.
STL - Seasonal and Trend decomposition using Loess (next lecture)
Preprocessing and EDA for time series data
Sequential Data Analysis
Explore additional time series / sequential data preprocessing techniques, like
smoothing techniques.
Additional time series visualization techniques, like time series heatmaps,
calendar heatmaps, see:
http://www.columbia.edu/~sg3637/blog/Time_Series_Heatmaps.html
Further explorations
Sequential Data Analysis
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Function classes:
(Binary) Step Function
Linear Function
Non linear
Sigmoid
Hiperbolic Tangent
Rectified Linear Unit - ReLU
Others...
Ref: https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-
activation-functions-right/
Activation functions
Introduction to Deep Learning
Binary step function aka Heaviside step function
threshold-based activation function. If the input value is above or below a certain
threshold, the neuron is activated and sends exactly the same signal to the next
layer.
problems: does not allow multi-value outputs, so cannot support classifying the
inputs into one of several categories, or making continuous predictions for
regression. This makes a single neuron less powerful than it potentially might be
(if y can be 1 or 0, that means a NN of 6 neurons has ~6! states, if it is
continuous then it’s much more, so the power to represent different states of the
world is increased) Second: it is not differentiable at 0.
Linear
y = mx + b
It takes the inputs, multiplied by the weights for each neuron, and creates an
output signal proportional to the input. In one sense, a linear function is better
than a step function because it allows multiple outputs, not just yes and no.
Problems: unusable backpropagation (gradient descent) to train the model—the
derivative of the function is a constant, and has no relation to the input, X. So
not possible to understand which weights in the input neurons can provide better
prediction. Also, all layers of the neural network collapse into one — with linear
activation functions, no matter how many layers in the neural network, the last
layer will be a linear function of the first layer (because a linear combination of
linear functions is still a linear function). So a linear activation function turns
the neural network into just one layer. A neural network with a linear activation
functions is simply a linear regression model (no matter the size, assuming all
activations are the same).
BSF, Linear
Introduction to Deep Learning
Modern neural network models primarily use non-linear activation functions. They
allow model to create complex mappings between the network’s inputs and outputs,
which are essential for learning and modeling complex data, such as images, video,
audio, and data sets which are non-linear or have high dimensionality.
They allow backpropagation because they have a derivative function which is related
to the inputs.
They allow “stacking” of multiple layers of neurons to create a deep neural
network. Multiple hidden layers of neurons are needed to learn complex data sets
with high levels of accuracy.
The representational power of the models is huge compared to linear or bsf
functions. Way more internal states.
Common functions falling into this category
Sigmoid
Hiperbolic Tangent (tanh)
Rectified Linear Unit - ReLU (next time)
Leaky ReLU (next time)
Parametric ReLU and others (next time)
Softmax (next time)
Radial Basis Function - RBF (next time)
Swish (next time)
Non Linear Functions, Sigmoid
Introduction to Deep Learning
One of the most popular non-linear activation functions.
We will understand the vanishing gradient and output not being zero centered as
problems in the future. For now let’s note that these are problems for the
algorithm that is used to train the neural network - backpropagation with gradient
descent. Vanishing gradient will not allow large inputs to be translated into
proportional outputs and absence of zero centeredness (a.k.a the y intercept is not
at 0 so the mean of y values is guaranteed to be non 0 (possitive)) can cause
gradient descent to “zig-zag” as described here:
https://stats.stackexchange.com/q/237169/162267
Non Linear Functions, Sigmoid, TanH
Introduction to Deep Learning
Abbreviated Tanh, stands for hyperbolic tangent.
Very popular.
A variation of the sigmoid function, sometimes called “shifted sigmoid”
Non Linear Functions, Sigmoid, TanH
Introduction to Deep Learning
We will create a simple perceptron w/ sigmoid activation function to abtain a
result that ranges (0, 1).
The demo is mostly derived from here: https://www.youtube.com/watch?v=kft1AJ9WVDk
but we will go in more depth.
Creating Perceptron from numpy-scratch
Introduction to Deep Learning
Note, parentheses mean non-inclusivity “( )“ and these “[]” mean inclusivity. We
say that sigmoid function asymptotically approaches 0 and 1, but never actually
reaches those values, that’s why we don’t write [0,1].
In scikit they have a Perceptron class. It is essentially a linear classifier.
See: https://scikit-learn.org/stable/modules/linear_model.html#perceptron
Also:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.h
tml
We can see that scikit perceptron is capable of classifying the digits dataset - so
it is more powerful than one might expect. This is not a real percetron it appears,
as it should not theoretically be that powerful - how can we very that? Simply
compare this to LogisticRegression results.
After creating a single neuron - perceptron, we will next turn into training it. We
will discuss loss functions, more activation functions and so on next time.
Creating Perceptron with scikit
Introduction to Deep Learning
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Granted, the artistic productions we’ve seen from AI so far have been fairly low
quality. AI isn’t anywhere close to rivaling human screenwriters, painters, and
composers. But replacing humans was always beside the point: artificial
intelligence isn’t about replacing our own intelligence with something else, it’s
about bringing into our lives and work more intelligence—intelligence of a
different kind. In many fields, but especially in creative ones, AI will be used by
humans as a tool to augment their own capabilities: more augmented intelligence
than artificial intelligence.
Intro and definitions
Generative Deep Learning
A large part of artistic creation consists of simple pattern recognition and
technical skill. And that’s precisely the part of the process that many find less
attractive or even dispensable. That’s where AI comes in. Our perceptual
modalities, our language, and our artwork all have statistical structure. Learning
this structure is what deep-learning algorithms excel at. Machine-learning models
can learn the statistical latent space of images, music, and stories, and they can
then sample from this space, creating new art-works with characteristics similar to
those the model has seen in its training data.
Naturally, such sampling is hardly an act of artistic creation in itself. It’s a
mere mathematical operation: the algorithm has no grounding in human life, human
emotions, or our experience of the world; instead, it learns from an experience
that has little in common with ours. It’s only our interpretation, as human
spectators, that will give meaning to what the model generates. But in the hands of
a skilled artist, algorithmic generation can be steered to become meaningful—and
beautiful. Latent space sampling can become a brush that empowers the artist,
expands the space of what we can imagine. What’s more, it can make artistic
creation more accessible by eliminating the need for technical skill and practice—
setting up a new medium of pure expression, factoring art apart from craft (we can
think about what it means). Iannis Xenakis, a visionary pioneer of electronic and
algorithmic music, beautifully expressed this same idea in the 1960s, in the
context of the application of automation technology to music composition:
It is not easy to imagine the usage of deep learning in music generation (we can
read about it), but abstractly, it’s possible that in the space of all possible
melodies, sounds, etc. there are unexplored combinations that deep learning can
help su find (like electronic music unlocked a large space of new sounds, melodies
thus giving rise to new style). Dummy example: give a NN pleasant and unpleasant
melodies and ask it to generate something pleasant - tune until something original
is found.
Intro and definitions
Generative Deep Learning
In summary:
AI will probably become an additional form of art rather than replace art/artists
that use other means of expression. Cultural forms will coexist, like they ussually
do.
Creating art with the help of computers can be done via brute force, but with
Generative AI we will try to do something better: give some input that will act as
a constraint for the generative model.
Intro and definitions
Generative Deep Learning
https://impakter.com/art-made-by-ai-wins-fine-arts-competition/
Deep Dream https://www.youtube.com/watch?v=3hnWf_wdgzs and
https://www.youtube.com/watch?v=oyxSerkkP4o and https://www.youtube.com/watch?
v=T5jaCr4RAJc
Prisma (based on DeepArt: https://en.wikipedia.org/wiki/DeepArt):
https://www.youtube.com/watch?v=xky0NoxronY and https://www.youtube.com/watch?
v=U5OCXVvXlh0 - https://play.google.com/store/apps/details?
id=com.neuralprisma&hl=en&gl=US
Sunspring: https://www.youtube.com/watch?v=LY7x2Ihqjmc
AWS Deep Composer: https://aws.amazon.com/blogs/machine-learning/collaborating-
with-ai-to-create-bach-like-compositions-in-aws-deepcomposer/
OpenAi Musenet: https://openai.com/blog/musenet/
Newest thing: text-to-music (https://suno.com/). Generation is not the only task
that is interesting in music - we could created a music producer that would
potentially know which song will become very popular/viral (what Timbaland does).
Examples
Generative Deep Learning
Even though the field of generative deep learning is not old, we already have
several different tasks or kinds of tasks we can perform:
Text Generation - ussually performed with/using RNNs/Transformer (SSMs/Mamba).
Deep Dreaming - performed with/using CNN.
Neural Style Transfer - performed with/using CNN.
Image/video generation - GAN/Transformers/Diffusion models (text-to-image, text-to-
video).
SORA
Common task categories
Generative Deep Learning
Generative AI can be used in the following situations:
Generating speach with chatbots.
Generating music, text, pictures - original art pieces (NFTs), international
competitions.
Improvement of traditional ML/DL models though geneartion of new realistic data
(images for cancer detection)
Business Usecases
Generative Deep Learning
Multiple resources online:
LSTM based:
https://keras.io/examples/generative/lstm_character_level_text_generation/
Transformer: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/
master/chapter12_part01_text-generation.ipynb
Miniature GPT:
https://keras.io/examples/generative/text_generation_with_miniature_gpt/
FNET, that is supposed to improve on attention and make the training process faster
(new topic in DL): https://keras.io/examples/nlp/text_generation_fnet/
Important things:
Generative deep learning requires good data cleaning (no longer true?)! In
classification, we did not care that there are tokens like <br>, <hr>, \n, “ ”
(two spaces) and so on. But with text generation the model should not think that
these tokens are human language related.
The last dense layer will output a softmax probability distribution for each char,
so the dense layer will have len(number of chars) units:
Dense(len(chars), activation="softmax")
Text Generation
Generative Deep Learning
When generating text, the way you choose the next character is important.
Naive approach - greedy sampling: always choose the most likely next character. But
such an approach results in repetitive, predictable strings that don’t look like
coherent language.
A more interesting approach makes slightly more surprising choices: it introduces
randomness in the sampling process, by sampling from the probability distribution
for the next character. This is called stochastic sampling (recall that
stochasticity is what we call randomness in this field). In such a setup, if e has
a probability 0.3 of being the next character, according to the model, you’ll
choose it 30% of the time.
Sampling probabilistically from the softmax output of the model is neat: it allows
even unlikely characters to be sampled some of the time, generating more
interesting - looking sentences and sometimes showing creativity by coming up with
new, realisticsounding words that didn’t occur in the training data.
Issue with this strategy: it doesn’t offer a way to control the amount of
randomness in the sampling process. Why would you want more or less randomness?
Consider an extreme case: pure random sampling, where you draw the next character
from a uniform probability dis- tribution, and every character is equally likely.
This scheme has maximum randomness; in other words, this probability distribution
has maximum entropy. Naturally, it won’t produce anything interesting. At the other
extreme, greedy sampling doesn’t produce anything interesting, either, and has no
randomness: the corresponding probability distribution has minimum entropy.
Sampling from the “real” probability distribution—the distribution that is output
by the model’s softmax function—constitutes an intermediate point between these two
extremes. But there are many other intermediate points of higher or lower entropy
that you may want to explore. Less entropy will give the generated sequences a more
predictable structure (and thus they will potentially be more realistic looking),
whereas more entropy will result in more surprising and creative sequences. When
sampling from generative models, it’s always good to explore different amounts of
randomness in the generation process.
Sampling strategy
Generative Deep Learning
In order to control the amount of stochasticity in the sampling process, we’ll
introduce a parameter called the softmax temperature that characterizes the entropy
of the probability distribution used for sampling: it characterizes how surprising
or predictable the choice of the next character will be.
Given a temperature value, a new probability distribution is computed from the
original one (the softmax output of the model) by reweighting it in the following
way (explanation from F. Chollet, Deep Learning with Python):
NOTE: the assumption is that the reweighted distribution still produces most likely
words and most likely words are the words that make sense, not random words.
Sampling strategy
Generative Deep Learning
Compare LSTM and Transformer based generative models.
Implement the remaining tutorials on the same dataset and analyze which approach
has better characteristics, like: best results, easiest to use, easiest to
understand and so on.
Are there any sampling strategies/temperature used for models like GPT-2/GPT-3 when
generating text?
Try to think about why resampling is needed i.e. why does the neural network
produce repetitive sequences on it’s own. Error propagation? Is this the nature of
representing something complex in a limited space (number of weights)?
Further explorations
Generative Deep Learning
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/
Step by step
Advanced Computer Vision
The model is a simple TF model with the head replaced for regression of 3 numbers.
If this works, we can think about the case when the object is missing from the
picture entirely. We will need to use a custom loss function if we want to dedicate
a single number in our output to indicate a missing object
We can extend this argument to more classes and detect dogs, things, etc.
Model for bounding box regression
Advanced Computer Vision
But now we have a problem: the dataset does not have bounding boxes. So, we need to
add them ourselves. This is often one of the hardest and most costly parts of a
Machine Learning project: getting the labels. It’s a good idea to spend time
looking for the right tools. To annotate images with bounding boxes, you may want
to use an open source image labeling tool like VGG Image Annotator, LabelImg,
OpenLabeler, or ImgLab, or perhaps a commercial tool like LabelBox or Supervisely.
You may also want to consider crowdsourcing platforms such as Amazon Mechanical
Turk if you have a very large number of images to annotate. However, it is quite a
lot of work to set up a crowdsourcing platform, prepare the form to be sent to the
workers, supervise them, and ensure that the quality of the bounding boxes they
produce is good, so make sure it is worth the effort. If there are just a few
thousand images to label, and you don’t plan to do this frequently, it may be
preferable to do it yourself. Adriana Kovashka et al. wrote a very practical paper
about crowdsourcing in computer vision ( https://arxiv.org/abs/1611.02145 ). I
recommend you check it out, even if you do not plan to use crowdsourcing.
Problems with real data
Advanced Computer Vision
Some of the most popular project:
VGG Image Annotator (VIA): https://www.robots.ox.ac.uk/~vgg/software/via/
CVAT: https://blog.roboflow.com/cvat/
VoTT: https://blog.roboflow.com/vott/
LabelImg: https://blog.roboflow.com/labelimg/
LabelMe: https://blog.roboflow.com/labelme/
OpenLabeler
ImageLab
LabelBox
Supervisely
Coco annotator: https://www.youtube.com/watch?v=OMJRcjnMMok
Amazon Mechanical Turk: https://www.mturk.com/get-started
Best ones - we usually care overviews that tell us which tools have best features:
https://www.folio3.ai/blog/labelling-images-annotation-tool/
DEMO: VGG Image Annotator (VIA)
Problems with real data
Advanced Computer Vision
DEMO: VGG Image Annotator (VIA)
Problems with real data
Advanced Computer Vision
Here are the datasets with bounding boxes:
https://public.roboflow.com/object-detection
https://lionbridge.ai/datasets/20-best-bounding-box-image-and-video-datasets-for-
machine-learning/
https://cocodataset.org/#home COCO dataset (COCO2017 118k images)
Datasets for object detection / localization
Advanced Computer Vision
The MSE often works fairly well as a cost function to train the model, but it is
not a great metric to evaluate how well the model can predict bounding boxes. The
most common metric for this is the Intersection over Union (IoU): the area of
overlap between the predicted bounding box and the target bounding box, divided by
the area of their union.
In tf.keras, it is implemented by the tf.keras.metrics.MeanIoU class.
IOU is ussually an evaluation metric, good explanation:
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-
detection/
Also it can be used as a loss function as well: https://arxiv.org/abs/2101.08158
IoU metric and loss
Advanced Computer Vision
A very simple approach is available with OpenCV / CVlib
We have common object detection from the Coco dataset, face detection and gender
detection: https://github.com/arunponnusamy/cvlib . Coco dataset recognizes the
following categories:
https://github.com/arunponnusamy/object-detection-opencv/blob/master/yolov3.txt
Simple approach OpenCV / CVlib
Advanced Computer Vision
Object detection a very important problem in computer vision. Here the model is
tasked with localizing the objects present in an image, and at the same time,
classifying them into different categories. Object detection models can be broadly
classified into "single-stage" (YOLO, SSD) and "two-stage" (R-CNN and familly)
detectors (see: https://www.jeremyjordan.me/object-detection-one-stage/ ). Two-
stage detectors are often more accurate but at the cost of being slower (although
it is common to say that single stage is the way to go).
Here in this example, we will implement RetinaNet, a popular single-stage detector,
which is accurate and runs fast. RetinaNet uses a feature pyramid network to
efficiently detect objects at multiple scales and introduces a new loss, the Focal
loss function, to alleviate the problem of the extreme foreground-background class
imbalance for single-stage object detection (see: https://medium.com/analytics-
vidhya/how-focal-loss-fixes-the-class-imbalance-problem-in-object-detection-
3d2e1c4da8d7 )
Ref: https://keras.io/examples/vision/retinanet/
Things to remember about this model:
Pyramid CNN w/ ResNet backbone.
RetinaNetBoxLoss + RetinaNetClassificationLoss
IOU loss is also used
Multi-class Object Detection w/ RetinaNET
Advanced Computer Vision
There are many ways to solve object detection and this single notebook does not do
justice to all of them.
Sometimes you need something simple - simple to run, train and explain to clients
or yourself. For this - you might want to consider bounding box regression using a
multihead CNN for classification+localization. But sometimes you need SOTA models
for multiclass object detection in real time - for this consider YOLOv7 (or newer).
Overviews:
A great overview paper can be found here: https://arxiv.org/pdf/1807.05511.pdf
Wiki contains a list of groups of methods:
https://en.wikipedia.org/wiki/Object_detection
Summary
Advanced Computer Vision
If you ask, why is there no bounding box regression here, remember that bounding
box regression is considered a single object classification+localization problem
(not an architecture!). Also, it’s not real object detection - the task of
classifying and localizing multiple objects in an image is called object detection.
Check if CVlib/OpenCV supports multiobject detection. Create a MWE.
Compare the accuracy (and speed) of CVLib implementation vs. RetinaNet w/ Keras,
using the same images (if RetinaNet uses imagenet weights and CVLib was trained on
COCO, then which categories of items overlap in both datasets - use those).
Further explorations
Advanced Computer Vision
Course plan
You can get familiar with it using this link
https://www.codeacademy.lt/programavimo-kursai/dirbtinio-intelekto-studijos/