Cheat Sheets For Ai: Neural Networks, Machine Learning, Deeplearning & Big Data
Cheat Sheets For Ai: Neural Networks, Machine Learning, Deeplearning & Big Data
Cheat Sheets For Ai: Neural Networks, Machine Learning, Deeplearning & Big Data
Neural Networks,
Machine Learning,
DeepLearning &
Big Data
BecomingHuman.AI
Table of
Content
Neural
Networks
Neural Perceptron (P)
Feed
Forward (FF)
Radial Basis
Network (RBF)
Deep Feed
Forward (DFF)
Recurrent Neural
Network (RNN)
Long / Short Term
Memory (LSTM)
Gated Recurrent
Unit (GRU)
Networks
Basic Auto Variational Sparse Denoising Markov Hopfield Boltzman Restricted
Cheat Sheet Encorder (AE) AE (VAE) AE (SAE) AE (DAE) Chain (MC) Network (HN) Machine (BM) BM (RBM)
BecomingHuman.AI
Input Cell
Hidden Cell
Output Cell
Recurrent Cell
Memory Cell Deep Residual Network (DRN) Support Vector Machine (SVM) Neural Turing Machine (SVM)
Kernel
Convolutional or Pool
www.asimovinstitute.org/neural-network-zoo/
input input sigmoid input sigmoid
bias bias sum sigmoid Deep Feed
Neural Networks
input input sigmoid input sigmoid bias
Forward Example
bias bias
input
sum
bias
sum
bias
relu
relu
sum
bias
sum
bias
relu
relu
sum sigmoid
bias
Deep Recurrent Example
(previous literation)
BecomingHuman.AI
input sum relu sum relu
bias bias sum sigmoid Deep Recurrent
Example
input sum relu sum relu bias
bias bias
input sum sigmoid multiply sum sum sigmoid multiply sum input bias multiply bias multiply
bias invert multiply bias invert multiply sum sigmoid sum sigmoid
sum sigmoid multiply sum tanh sum sigmoid multiply sum tanh sum sigmoid Deep GRU Example bias bias sum sigmoid
(previous literation) Deep LSTM Example
bias bias bias bias bias sum sigmoid sum sigmoid bias
(previous literation)
bias multiply bias multiply
input sum sigmoid multiply sum sum sigmoid multiply sum sum tanh sum tanh sum tanh sum tanh
bias invert multiply bias invert multiply bias multiply bias multiply
sum sigmoid multiply sum tanh sum sigmoid multiply sum tanh sum sigmoid sum sigmoid
input sum sigmoid multiply sum sum sigmoid multiply sum bias multiply bias multiply
bias invert multiply bias invert multiply sum sigmoid sum sigmoid
sum sigmoid multiply sum tanh sum sigmoid multiply sum tanh sum sigmoid input bias multiply bias multiply
Deep GRU Example sum sigmoid sum sigmoid
bias bias bias bias bias
bias bias sum sigmoid Deep LSTM Example
input sum sigmoid multiply sum sum sigmoid multiply sum sum sigmoid sum sigmoid bias
bias invert multiply bias invert multiply bias multiply bias multiply
sum sigmoid multiply sum tanh sum sigmoid multiply sum tanh sum tanh sum tanh sum tanh sum tanh
Machine
Learning
CLASSIFICATION FEATURE REDUCTION
T-DISTRIB STOCHASTIC
NEIB EMBEDDING
NEURAL NET
MachineLearning Overview neural_network.MLPClassifier()
Complex relationships. Prone to overfitting
manifold.TSNE()
Visual high dimensional data. Convert
similarity to joint probabilities
Basically magic.
UNDERFITTING / OVERFITTING
PRECISION FUNCTION
manifold.TSNE()
Regression Metrics
Mean Absolute Error
>>> from sklearn.metrics import mean_absolute_error
Unsupervised Learning Estimators
Skicit Learn Preprocessing The Data
>>> y_true = [3, -0.5, 2]
>>> mean_absolute_error(y_true, y_pred)
Principal Component Analysis (PCA)
>>> from sklearn.decomposition import PCA
>>> pca = PCA(n_components=0.95)
Skicit Learn is an open source Phyton library that Mean Squared Error
implements a range if machine learning, processing, cross Standardization >>> from sklearn.metrics import mean_squared_error K Means
>>> from sklearn.preprocessing import StandardScaler
validation and visualization algorithm using a unified >>> mean_squared_error(y_test, y_pred) >>> from sklearn.cluster import KMeans
>>> scaler = StandardScaler().fit(X_train)
>>> standardized_X = scaler.transform(X_train) >>> k_means = KMeans(n_clusters=3, random_state=0)
A basic Example >>> standardized_X_test = scaler.transform(X_test)
R² Score
>>> from sklearn import neighbors, datasets, preprocessing >>> from sklearn.metrics import r2_score
>>> from sklearn.cross validation import train_test_split >>> r2_score(y_true, y_pred)
>>> from sklearn.metrics import accuracy_score
>>> iris = datasets.load _iris() >>> X, y = iris.data[:, :2], iris.target
>>> Xtrain, X test, y_train, y test = train_test_split (X, y, random stat33) Normalization Clustering Metrics
>>> scaler = preprocessing.StandardScaler().fit(X_train)
>>> X train = scaler.transform(X train)
>>> from sklearn.preprocessing import Normalizer
>>> scaler = Normalizer().fit(X_train) Adjusted Rand Index Training And Test Data
>>> X test = scaler.transform(X test) >>> normalized_X = scaler.transform(X_train) >>> from sklearn.metrics import adjusted_rand_score
>>> adjusted_rand_score(y_true, y_pred) >> from sklearn.cross validation import train_test_split
>>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) >>> normalized_X_test = scaler.transform(X_test)
>> X train, X test, y train, y test - train_test_split(X,
>>> knn.fit(X_train, y_train) y,
>>> y_pred = knn.predict(X_test) Homogeneity
>>> from sklearn.metrics import homogeneity_score random state-0)
>>> accuracy_score(y_test, y_pred)
>>> homogeneity_score(y_true, y_pred)
Binarization
>>> from sklearn.preprocessing import Binarizer V-measure
>>> binarizer = Binarizer(threshold=0.0).fit(X)
>>> binary_X = binarizer.transform(X)
>>> from sklearn.metrics import v_measure_score
>>> metrics.v_measure_score(y_true, y_pred)
Tune Your Model
Prediction Grid Search
>>> from sklearn.grid_search import GridSearchCV
Supervised Estimators Cross-Validation >>> params = {"n_neighbors": np.arange(1,3)
>>> y_pred = svc.predict(np.random.radom((2,5)))
>>> y_pred = lr.predict(X_test)
Predict labels
Predict labels Encoding Categorical Features >>> from sklearn.cross_validation import cross_val_score "metric": ["euclidean","cityblock"]}
>>> y_pred = knn.predict_proba(X_test) Estimate probability of a label >>> from sklearn.preprocessing import Imputer >>> print(cross_val_score(knn, X_train, y_train, cv=4)) >>> grid = GridSearchCV(estimator=knn,
>>> imp = Imputer(missing_values=0, strategy='mean', axis=0) >>> print(cross_val_score(lr, X, y, cv=2)) param_grid=params)
Unsupervised Estimators >>> imp.fit_transform(X_train) >>> grid.fit(X_train, y_train)
Predict labels in clustering algos
>>> y_pred = k_means.predict(X_test) >>> print(grid.best_score_)
>>> print(grid.best_estimator_.n_neighbors)
www.https:/dwatww.acadmatpa.ccaomp.m/ccoom/mmucommuninity/btylo/gbl/osg/ciskcitk-ilte-laerarn-ncheeatat-shheete t
Skicit-learn Algorithm
BecomingHuman.AI
START
get more data
classification kernel
approximation
NOT WORKING
NO
>50 samples regression
SGD CLassifier SVR(kernel='rbf')
YES EnsembleRegressors
NO SGD Regressor ElasticNet Lasso
SVC Ensemble
Classifiers KNeighbors Classifier <100K samples
NOT
NOT predicting NO YES WORKING
WORKING YES a category NO
YES
Naive Bayes Text Data Linear SVC <100K samples few features should RidgeRegression SVR
YES NOT be important NO (kernel='linear')
WORKING YES predicting YES
a quantity
YES do you have
labeled data NO
just looking
NOT WORKING
Spectral Clustering GMM KMeans number of NO
categories knows Randomized PCA Isomap Spectral LLE
Embedding NO
YES
YES
MiniBatch KMeans <10K samples NO NOT
NO
NO WORKING YES dimensionality
reduction
<10K samples kernel approximation
YES NO
clustering MeanShift VBGMM <10K samples tough luck NOT WORKING
NOT predicting
WORKING structure
Predicting
values TWO CLASS CLASSIFICATION
Poisson regression Predicting event counts Fast training, linear model Two-class logistic regression
Fast forest quantile regression Predicting a distribution Fast training, linear model Two-class Bayes point machine
Linear regression Fast training, linear model Accuracy, fast training Two-class decision forest
Linear model,
Bayesian linear regression Accuracy, fast training Two-class boasted decision tree
small data sets
Decision forest regression Accuracy, fast training >100 features Two-class locally deep SVM
Accuracy, long
Boasted decision tree regression Accuracy, fast training Two-class neural network
training times
Part 3
Part 3
Data Science
with Python
Tensor Flow Cheat Sheet
BecomingHuman.AI
Info
reduce_sum
iterator that returns arrays of features. The training input
Python helper Important functions reduce_prod samples for fitting the model.
type(object) reduce_min
Y: vector or matrix [n_samples] or [n_samples, n_outputs]. Can
Get object type reduce_max
be iterator that returns array of targets. The training target
TensorFlow help(object)
reduce_mean
values (class labels in classification, real numbers in
reduce_all
TensorFlow™ is an open source software library created by Get help for object (list of available methods, attributes, regression).
reduce_any
Google for numerical computation and large scale signatures and so on) accumulate_n monitor: Monitor object to print training progress and invoke
computation. Tensorflow bundles together Machine Learning, dir(object)
early stopping
Deep learning models and frameworks and makes them Get list of object attributes (fields, functions) Activation functions logdir: the directory to save the log file that can be used for
useful by way of common metaphor. tf.nn? optional visualization.
str(object) relu predict (X, axis=1, batch_size=None)
Transform an object to string object? relu6 Args:
Keras Shows documentations about the object elu X: array-like matrix, [n_samples, n_features…] or iterator.
globals() softplus axis: Which axis to argmax for classification.
Keras is an open sourced neural networks library, written in
Return the dictionary containing the current scope's global softsign By default axis 1 (next after batch) is used. Use 2 for sequence
Python and is built for fast experimentation via deep neural
variables. dropout predictions.
networks and modular design. It is capable of running on top of
bias_add
TensorFlow, Theano, Microsoft Cognitive Toolkit, or PlaidML. locals() batch_size: If test set is too big, use batch size to split it into
sigmoid mini batches. By default the batch_size member variable is
Update and return a dictionary containing the current scope's tanh used.
Skflow local variables. sigmoid_cross_entropy_with_logits Returns:
id(object) softmax y: array of shape [n_samples]. The predicted classes or
Scikit Flow is a high level interface base on tensorflow which can log_softmax
Return the identity of an object. This is guaranteed to be unique predicted value.
be used like sklearn. You can build you own model on your own softmax_cross_entropy_with_logits
among simultaneously existing objects.
data quickly without rewriting extra code.provides a set of high sparse_softmax_cross_entropy_with_logits
level model classes that you can use to easily integrate with your import _builtin_ weighted_cross_entropy_with_logits
existing Scikit-learn pipeline code. dir(_builtin_) etc.
Other built-in functions
String Operations
>>> my_string * 2
BecomingHuman.AI 'thisStringIsAwesomethisStringIsAwesome'
>>> my_string + 'Innit'
'thisStringIsAwesomeInnit'
>>> 'm' in my_string
True
>>> my_string[3]
>>> my_string[4:9]
Variables and Data Types Lists Also see NumPy Arrays Numpy Arrays Also see Lists
String Methods
>>> a = 'is' >>> my_list = [1, 2, 3, 4] >>> my_string.upper() String to uppercase
Variable Assignment >>> b = 'nice' >>> my_array = np.array(my_list)
>>> my_string.lower() String to lowercase
>>> my_list = ['my', 'list', a, b] >>> my_2darray =
>>> x=5 np.array([[1,2,3],[4,5,6]])
>>> my_list2 = [[4,5,6,7], [3,4,5,6]] >>> my_string.count('w') Count String elements
>>> x
5 >>> my_string.replace('e', 'i') Replace String elements
Selecting List Elements Index starts at 0 Selecting Numpy
>>> my_string.strip() Strip whitespaces
Subset
Array Elements Index starts at 0
Calculations With Variables >>> my_list[1] Select item at index 1
Subset
>>> my_list[-3] Select 3rd last item
>>> x+2 Sum of two variables Slice >>> my_array[1] Select item at index 1
7 2
>>> my_list[1:3] Select items at index 1 and 2
>>> x-2 Subtraction of two variables >>> my_list[1:] Select items after index 0 Slice
>>> my_array[0:2] Select items at index 0 and 1
Libraries
3 >>> my_list[:3] Select items before index 3
>>> x*2 Multiplication of two variables array([1, 2])
>>> my_list[:] Copy my_list Import libraries
10 Subset Lists of Lists Subset 2D Numpy arrays
>>> import numpy
>>> x**2 Exponentiation of a variable >>> my_list2[1][0] my_list[list][itemOfList] >>> my_2darray[:,0] my_2darray[rows, columns]
array([1, 4]) >>> import numpy as np
25 >>> my_list2[1][:2] Selective import
>>> x%2 Remainder of a variable >>> from math import pi
1
>>> x/float(2) Division of a variable List Operations Numpy Array Operations
2.5 >>> my_list + my_list >>> my_array > 3
['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice'] array([False, False, False, True], dtype=bool)
>>> my_array * 2
>>> my_list * 2 array([2, 4, 6, 8])
['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice'] >>> my_array + np.array([5, 6, 7, 8])
Calculations With Variables >>> my_list2 > 4
array([6, 8, 10, 12])
Install Python
True
str() '5', '3.45', 'True' Variables to strings
defaultdict(<type 'int'>,{'a':2,'b':1})
>>> rdd.countByValue() Count RDD instances
Grouping by
defaultdict(<type 'int'>,{('b',2):1,('a',2):1,('a',7):1}) by value >>> rdd3.groupBy(lambda x: x % 2) Return RDD of grouped values
>>> rdd.collectAsMap() Return (key,value) pairs as a .mapValues(list)
{'a': 2,'b': 2} dictionary .collect()
Loading Data >>> rdd3.sum() Sum of RDD elements
4950
Sum of RDD elements >>> rdd.groupByKey()
.mapValues(list)
Group rdd by key
.collect()
>>> sc.parallelize([]).isEmpty() Check whether RDD is empty [('a',[7,2]),('b',[2])]
true
Parallelized Collections
>>> rdd = sc.parallelize([('a',7),('a',2),('b',2)])
>>> rdd2 = sc.parallelize([('a',2),('d',1),('b',1)]) Summary Aggregating
>>> rdd3 = sc.parallelize(range(100)) >>> seqOp = (lambda x,y: (x[0]+y,x[1]+1)) Aggregate RDD elements of each
>>> rdd4 = sc.parallelize([("a",["x","y","z"]), >>> rdd3.max() Maximum value of RDD elements partition and then the results
99 >>> combOp = (lambda x,y:(x[0]+y[0],x[1]+y[1]))
("b",["p", "r"])])
>>> rdd3.min() Minimum value of RDD elements >>> rdd3.aggregate((0,0),seqOp,combOp) Aggregate values of each RDD key
PySpark is the Spark Python API External Data
0 (4950,100)
Aggregate the elements of each
>>> rdd3.mean() Mean value of RDD elements >>> rdd.aggregateByKey((0,0),seqop,combop)
that exposes the Spark programming Read either one text file from HDFS, a local file system or or any 49.5 .collect()
4950 partition, and then the results
Merge the values for each key
>>> rdd3.stdev() Standard deviation of RDD elements [('a',(9,2)), ('b',(2,1))]
model to Python. Hadoop-supported file system URI with textFile(), or read in a directory
of text files with wholeTextFiles(). 28.866070047722118 >>> rdd3.fold(0,add)
>>> rdd3.variance() Compute variance of RDD elements 4950
>>> textFile = sc.textFile("/my/directory/*.txt") 833.25 >>> rdd.foldByKey(0, add)
>>> textFile2 = sc.wholeTextFiles("/my/directory/") >>> rdd3.histogram(3) Compute histogram by bins .collect()
([0,33,66,99],[33,33,34]) [('a',9),('b',2)]
Initializing Spark >>> rdd3.stats() Summary statistics (count, mean,
stdev, max & min)
>>> rdd3.keyBy(lambda x: x+x)
.collect()
Create tuples of RDD elements by
applying a function
Configuration
In the PySpark shell, a special interpreter-aware SparkContext Iterating Sort
is already created in the variable called sc.
$ ./bin/spark-shell --master local[2] Getting >>> rdd2.sortBy(lambda x: x[1]) Sort RDD by given function
$ ./bin/pyspark --master local[4] --py-files code.py .collect()
>>> def g(x): print(x) [('d',1),('b',1),('a',2)]
Set which master the context connects to with the --master
argument, and add Python .zip, .egg or .py files to the runtime
>>> rdd.foreach(g)
('a', 7)
>>> rdd2.sortByKey() Sort (key, value)
.collect()
RDD by key Execution
path by passing a comma-separated list to --py-files. ('b', 2) [('a',2),('b',1),('d',1)]
('a', 2) $ ./bin/spark-submit examples/src/main/python/pi.py
Prediction
>>> model3.predict(x_test4, batch_size=32)
>>> model3.predict_classes(x_test4,batch_size=32)
Getting
>>> s['b'] Get one element
-5
>>> df[1:] Get subset of a DataFrame
Use the following import convention: >>> import pandas as pd Country Capital Population
1 India New Delhi 1303171035
Applying Functions
2 Brazil Brasília 207847528 >>> f = lambda x: x*2
>>> df.apply(f) Apply function
Selecting, Boolean Indexing & Setting
The Pandas library is By Position
>>> df.applymap(f) Apply function element-wise
1 New Delhi
programming >>> df.rank() Assign ranks to entries 2 Brasília
>>> df.ix[1,'Capital'] Select rows and columns
Arithmetic Operations with Fill Methods
You can also do the internal data alignment yourself with
'New Delhi'
Cheat Sheet
a a 20.784
>>> df3.loc[:,df3.notnull().all()] Select cols without NaN
b 1.303 b NaN
Indexing With isin
c 99.906 d 20.784
>>> df[(df.Country.isin(df2.Type))] Find same elements
>>> df3.filter(items=”a”,”b”]) Filter on values
Unstacked 3 3 0 0.433522
>>> s3.unique()
>>> df2.duplicated('Type')
Return unique values
Check duplicates
Dates
1 0.429401 >>> df2.drop_duplicates('Type', keep='last') Drop duplicates
>>> df2['Date']= pd.to_datetime(df2['Date'])
Stacked >>> df.index.duplicated() Drop duplicates
>>> df2['Date']= pd.date_range('2000-1-1', periods=6,
Melt freq='M')
>>> dates = [datetime(2012,5,1), datetime(2012,5,2)]
>>> pd.melt(df2, Gather columns into rows >>> index = pd.DatetimeIndex(dates)
id_vars=["Date"], Grouping Data >>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')
value_vars=["Type", "Value"],
value_name="Observations") Aggregation
Date Variable Observations >>> df2.groupby(by=['Date','Type']).mean()
Date Type Value
0 2016-03-01 Type a >>> df4.groupby(level=0).sum()
0
1
2016-03-01
2016-03-02
a
b
11.432
13.031
1
2
2016-03-02
2016-03-01
Type
Type
b
c
>>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x), 'b': np.sum}) Visualization
Transformation
2016-03-01 c 20.784 3 2016-03-03 Type a
2 >>> customSum = lambda x: (x+x%2) >>> import matplotlib.pyplot as plt
4 2016-03-02 Type a
3 2016-03-03 a 99.906 >>> df4.groupby(level=0).transform(customSum)
5 2016-03-03 Type c >>> s.plot() >>> df2.plot()
4 2016-03-02 a 1.303
6 2016-03-01 Value 11.432 >>> plt.show() >>> plt.show()
5 2016-03-03 c 20.784 7 2016-03-02 Value 13.031
8 2016-03-01 Value 20.784 Missing Data
9 2016-03-03 Value 99.906
10 2016-03-02 Value 1.303 >>> df.dropna() Drop NaN value
11 2016-03-03 Value 20.784 >>> df3.fillna(df3.mean()) Fill NaN values with a predetermined value
>>> df2.replace("a", "f") Replace values with others
&
a b c F M A F M A Tidy data complements pandas’s
In a tidy M A F Count number of rows with each unique value Drop rows with any column having NA/null data.
vectorized operations. pandas will
1 4 7 10 data set: of variable
automatically preserve observations df.fillna(value)
2 5 8 11 as you manipulate variables. No len(df)
Each variable Each observation other format works as intuitively with # of rows in DataFrame.
3 6 9 12 is saved in its is saved in its pandas M A
own column own row df['w'].nunique()
df = pd.DataFrame( # of distinct values in a column.
{"a" : [4 ,5, 6],
"b" : [7, 8, 9], df.describe()
Make New Columns
"c" : [10, 11, 12]}, Basic descriptive statistics for each column
index = [1, 2, 3])
Specify values for each column.
Reshaping Data Change the layout of a data set (or GroupBy)
df = pd.DataFrame( df.sort_values('mpg')
[[4, 7, 10], Order rows by values of a column (low to high).
[5, 8, 11], df.assign(Area=lambda df: df.Length*df.Height)
df.sort_values('mpg',ascending=False) pandas provides a large set of summary functions Compute and append one or more new columns.
[6, 9, 12]], Order rows by values of a column (high to low). that operate on different kinds of pandas objects
index=[1, 2, 3], pd.melt(df) df.pivot(columns='var', values='val') (DataFrame columns, Series, GroupBy, Expanding df['Volume'] = df.Length*df.Height*df.Depth
columns=['a', 'b', 'c']) Gather columns into rows. Spread rows into columns. df.rename(columns = {'y':'year'}) and Rolling (see below)) and produce single Add single column.
Specify values for each row. Rename the columns of a DataFrame values for each of the groups. When applied to a
DataFrame, the result is returned as a pandas pd.qcut(df.col, n, labels=False)
df.sort_index() Series for each column. Examples: Bin column into n buckets.
a b c Sort the index of a DataFrame
n v
df.reset_index() sum() min()
1 4 7 10 Reset index of DataFrame to row numbers, Sum values of each object. Minimum value in Vector Vector
d moving index to columns.
2 5 8 11 each object. function function
count()
e 2 6 9 12 pd.concat([df1,df2]) pd.concat([df1,df2], axis=1) df.drop(columns=['Length','Height']) Count non-NA/null max()
Append rows of DataFrames Append columns of DataFrames Drop columns from DataFrame values of each object. Maximum value in pandas provides a large set of vector functions that
df = pd.DataFrame( each object. operate on allcolumns of a DataFrame or a single selected
{"a" : [4 ,5, 6], median()
column (a pandas Series). These functions produce
"b" : [7, 8, 9], Median value of mean()
vectors of values for each of the columns, or a single
each object. Mean value of each object.
"c" : [10, 11, 12]},
index = pd.MultiIndex.from_tuples( Subset Observations (Rows) Subset Variables (Columns) quantile([0.25,0.75]) var()
Series for the individual Series. Examples:
max(axis=1) min(axis=1)
[('d',1),('d',2),('e',2)], Quantiles of each object. Variance of each object. Element-wise max. Element-wise min.
names=['n','v']))
Create DataFrame with a MultiIndex apply(function) std() clip(lower=-10,upper=10) abs()
Apply function to Standard deviation Trim values at input thresholds Absolute value.
each object of each object.
+ = + =
result. This improves readability of code. rows (only considers adf x1 x2
bdf x1 x3
df.iloc[10:20]
df.filter(regex='regex') ydf x1
A
x2
1
zdf x1
B
x2
2 A 1 A T
df = (pd.melt(df) columns). Select columns whose name matches regular expression regex. B 2 C 3 B 2 B F
.rename(columns={ Select rows by position. C 3 D 4 C 3 C T
df.head(n) Logic in Python (and pandas)
'variable' : 'var',
Select first n rows. df.nlargest(n, 'value') '\.' Matches strings containing a period '.'
'value' : 'val'}) Select and order top n entries. 'Length$ Matches strings ending with word 'Length'
Set Operations Standard Joins
.query('val >= 200') '^Sepal' Matches strings beginning with the word 'Sepal'
df.tail(n) '^x[1-5]$' Matches strings beginning with 'x' and ending with 1,2,3,4,5
) Select last n rows. df.nsmallest(n, 'value') '^(?!Species$).*' Matches strings except the string 'Species' x1 x2 pd.merge(ydf, zdf) x1 x2 x3 dpd.merge(adf, bdf,
B 2 A 1 T
Select and order bottom C 3
Rows that appear in both ydf and zdf B 2 F
how='left', on='x1')
n entries. df.loc[:,'x2':'x4'] (Intersection). C 3 NaN Join matching rows from bdf to adf.
Select all columns between x2 and x4 (inclusive).
Windows
x1 x2
pd.merge(ydf, zdf, how='outer') x1 x2 x3 pd.merge(adf, bdf,
A 1 A 1.0 T
Logic in Python (and pandas) df.iloc[:,[1,2,5]] B 2 Rows that appear in either or both ydf and zdf B 2.0 F
how='right', on='x1')
< Less than != Not equal to Select columns in positions 1, 2 and 5 (first column is 0). C 3 (Union). D NaN T Join matching rows from adf to bdf.
df.expanding() > Greater than df.column.isin(values) Group membership D 4
== Equal to pd.isnull(obj) Is NaN df.loc[df['a'] > 10, ['a','c']]
Return an Expanding object allowing summary <= Less than or equal to pd.notnull(obj) Is not NaN x1 x2 pd.merge(ydf, zdf, how='outer', x1 x2 x3 pd.merge(adf, bdf,
functions to be applied cumulatively. >= Greater than or equal to &,|,~,^,df.any(),df.all( Logical and, or, not, Select rows meeting logical condition, and only the A 1 indicator=True) A 1 T
how='inner', on='x1')
B 2 F
xor, any, all specific columns . .query('_merge == "left_only"') Join data. Retain only rows in both sets.
df.rolling(n) .drop(columns=['_merge'])
Return a Rolling object allowing summary Rows that appear in ydf but not zdf (Setdiff) x1 x2 x3 pd.merge(adf, bdf,
functions to be applied to windows of length n.
Windows A
B
C
1
2
3
T
F
NaN
how='outer', on='x1')
Join data. Retain all values, all rows.
D NaN T
df.groupby(by="col") The examples below can also be applied to groups. In this case, the function is applied on a per-group basis, and the
Windows Return a GroupBy object, grouped by
values in column named "col".
returned vectors are of the length of the original DataFrame.
Filtering Joins
shift(1) rank(method='first') cummin()
df.plot.hist() df.plot.scatter(x='w',y='h') df.groupby(level="ind") x1 x2 adf[adf.x1.isin(bdf.x1)]
Copy with values shifted by 1. Ranks. Ties go to first value. Cumulative min. A 1
Histogram for Scatter chart using pairs of Return a GroupBy object, grouped by All rows in adf that have a match in bdf.
rank(method='dense') shift(-1) cumprod() B 2
each column points values in index level named "ind". Ranks with no gaps. Copy with values lagged by 1. Cumulative product adf[~adf.x1.isin(bdf.x1)]
x1 x2
All of the summary functions listed above can be applied to a group. rank(method='min') cumsum() C 3
All rows in adf that do not have a match
Additional GroupBy functions: Ranks. Ties get min rank. Cumulative sum. in bdf
size() agg(function) rank(pct=True) cummax()
Size of each group. Aggregate group using function. Ranks rescaled to interval [0, 1]. Cumulative max.
htps:/github.com/rstudio/cheatshe… r/LICENSE
Data Wrangling with
dplyr and tidyr
Syntax Helpful conventions for wrangling Summarise Data Make New Variables
dplyr::tbl_df(iris)
Converts data to tbl class. tbl’s are easier to examine than
Cheat Sheet
data frames. R displays only the data that fits onscreen
BecomingHuman.AI dplyr::summarise(iris, avg = mean(Sepal.Length))
Summarise data into single row of values.
dplyr::summarise_each(iris, funs(mean))
dplyr::mutate(iris, sepal = Sepal.Length + Sepal. Width)
Compute and append one or more new columns.
dplyr::mutate_each(iris, funs(min_rank))
Apply summary function to each column. Apply window function to each column.
Reshaping Data Change the layout of a data set dplyr::count(iris, Species, wt = Sepal.Length)
Count number of rows with each unique value of
dplyr::transmute(iris, sepal = Sepal.Length + Sepal. Width)
Compute one or more new columns. Drop original columns
variable (with or without weights).
x1
3
x3
NA
&
F M A F M A x1 x2 x3 dplyr::inner_join(a, b, by = "x1")
select(iris, -Species) A 1 T
Join data. Retain only rows in both sets.
x1 x2 dplyr::setdiff(y, z)
A 1
Select all columns except Species.
B 2 F Rows that appear in y but not z.
Each variable is
saved in its own
Each observation
is saved in its own
Group Data x1
A
B
x2
1
2
x3
T
F
dplyr::full_join(a, b, by = "x1")
Join data. Retain all values, all rows. Binding
column row
C 3 NA x1 x2 dplyr::bind_rows(y, z)
D NA T A 1
dplyr::group_by(iris, Species) iris %>% group_by(Species) %>% summarise(…) iris %>% group_by(Species) %>% mutate(…) Append z to y as new rows.
Tidy data complements R’s B 2
Group data into rows with the Compute separate summary row for each group. Compute new variables by group. Filtering Joins C 3
vectorized operations. R will M A F B 2
automatically preserve same value of Species. x1 x2 dplyr::semi_join(a, b, by = "x1") C 3
A 1 D 4
observations as you dplyr::ungroup(iris) B 2
All rows in a that have a match in b.
manipulate variables. No Remove grouping information x1 x2 x1 x2 dplyr::bind_cols(y, z)
other format works as from data frame. x1 x2 dplyr::anti_join(a, b, by = "x1") A 1 B 2
Append z to y as new columns.
intuitively with R M A C 3
All rows in a that do not have a match in b B 2 C 3
C 3 D 4 Caution: matches rows by position.
The SciPy library is one of the
core packages for scientific
computing that provides
mathematical algorithms and
Scipy Linear Algebra
convenience functions built on
the NumPy extension of
Python.
Cheat Sheet
BecomingHuman.AI
Interacting With NumPy Also see NumPy Linear Algebra Also see NumPy
ht ps:/ www.datacamp.com/community/blog/python-scipy-cheat-she t
Matplotlib is a Python 2D plotting library
which produces publication-quality figures
in a variety of hardcopy formats and
interactive environments across
Matplotlib Cheat Sheet
platforms. BecomingHuman.AI
Anatomy & Workflow Prepare The Data Also see Lists & NumPy Customize Plot
Plot Anatomy Index Tricks Colors, Color Bars & Color Maps Limits, Legends & Layouts
>>> import numpy as np >>> plt.plot(x, x, x, x**2, x, x**3)
Limits & Autoscaling
>>> x = np.linspace(0, 10, 100) >>> ax.plot(x, y, alpha = 0.4)
>>> ax.margins(x=0.0,y=0.1) Add padding to a plot
>>> y = np.cos(x) >>> ax.plot(x, y, c='k')
Axes/Subplot >>> ax.axis('equal') Set the aspect ratio
>>> z = np.sin(x) >>> fig.colorbar(im, orientation='horizontal') of the plot to 1
>>> im = ax.imshow(img, >>> ax.set(xlim=[0,10.5],ylim=[-1.5,1.5]) Set limits for x-and y-axis
2D Data or Images cmap='seismic') >>> ax.set_xlim(0,10.5) Set limits for x-axis
>>> data = 2 * np.random.random((10, 10)) Markers
>>> data2 = 3 * np.random.random((10, 10)) Legends
>>> fig, ax = plt.subplots() >>> ax.set(title='An Example Axes', Set a title and x-and
>>> Y, X = np.mgrid[-3:3:100j, -3:3:100j] ylabel='Y-Axis', y-axis labels
>>> ax.scatter(x,y,marker=".")
>>> U = -1 - X**2 + Y xlabel='X-Axis')
>>> ax.plot(x,y,marker="o")
Figure >>> V = 1 + X - Y**2 >>> ax.legend(loc='best') No overlapping
Y-axis plot elements
>>> from matplotlib.cbook import get_sample_data Linestyles
>>> img = np.load(get_sample_data('axes_grid/bivariate_normal.npy')) Ticks
>>> plt.plot(x,y,linewidth=4.0) >>> ax.xaxis.set(ticks=range(1,5), Manually set x-ticks
>>> plt.plot(x,y,ls='solid') ticklabels=[3,100,-12,"foo"])
direction='inout',
Make y-ticks longer and
Create Plot >>> plt.plot(x,y,ls='--')
>>> plt.plot(x,y,'--',x**2,y**2,'-.')
length=10)
go in and out
All plotting is done with respect to an Axes. In most >>> ax.annotate("Sine", xy=(8, 0),
Workflow cases, a subplot will fit your needs. A subplot is an xycoords='data',
xytext=(10.5, 0), Axis Spines
axes on a grid system. textcoords='data', >>> ax1.spines['top'=].set_visible(False) Make the top axis
arrowprops=dict(arrowstyle="->", line for a plot invisible
>>> fig.add_axes() connectionstyle="arc3"),)
>>> ax1 = fig.add_subplot(221) # row-col-num >>> ax1.spines['bottom'].set_position(('outward',10)) Move the bottom
axis line outward
01 Prepare data
04 Customize plot >>> ax3 = fig.add_subplot(212)
>>> fig3, axes = plt.subplots(nrows=2,ncols=2)
Mathtext
>>> plt.title(r'$sigma_i=15$', fontsize=20)
>>> fig4, axes2 = plt.subplots(ncols=3)
ht ps:/ w w.datacamp.com/community/blog/python-matplotlib-cheat-she t
Data Visualisation Geoms Use a geom to represent data points, use the geom’s aesthetic properties to represent variables. Each function returns a layer Stats An alternative way to build a layer Scales
with ggplot2 One Variable Two Variables Some plots visualize a transformation of the original data set.
Use a stat to choose a common transformation to visualize,
Scales control how a plot maps data values to the visual values of
an aesthetic. To change the mapping, add a ustom scale.
Continuous Continuous X, Continuous Y Continuous Bivariate Distribution e.g. a + geom_bar(stat = "bin")
+
4
2
=
4
2
n <- b + geom_bar(aes(fill = fl))
n
aesthetic prepackaged
b + geom_area(aes(y = ..density..), stat = "bin") linetype, size, weight 1 1 scale_ to adjust scale to use
i + geom_density2d() stat geom
a + geom_density(kernel = "gaussian") f + geom_jitter() x=x 0 1 2 3 4 0 1 2 3 4
n + scale_fill_manual(
Basics x, y, alpha, color, fill, shape, size x, y, alpha, colour, linetype, size data y = ...count...
x, y, alpha, color, fill, linetype, size, weight coordinate plot
b + geom_density(aes(y = ..county..)) system values = c("skyblue", "royalblue", "blue", "navy"),
f + geom_point() i + geom_hex() Each stat creates additional variables to map aesthetics to. These limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", "r"),
a + geom_dotplot() x, y, alpha, color, fill, shape, size x, y, alpha, colour, fill size
variables use a common ..name.. syntax. stat functions and geom range of
ggplot2 is based on the grammar of graphics, the idea that you x, y, alpha, color, fill name = "fuel", labels = c("D", "E", "P", "R"))
functions both combine a stat with a geom to make a layer, i.e. values to breaks to
can build every graph from the same few components: a data set, f + geom_quantile() Continuous Function include in title to use in labels to use in use in
a + geom_freqpoly() x, y, alpha, color, linetype, size, weight stat_bin(geom="bar") does the same as geom_bar(stat="bin")
a set of geoms—visual marks that represent data points, and a x, y, alpha, color, linetype, size j <- ggplot(economics, aes(date, unemploy)) mapping legend/axis legend/axis legend/axis
stat layer specific variable created
coordinate system. b + geom_freqpoly(aes(y = ..density..))
f + geom_rug(sides = "bl") function mappings by transformation
x, y, alpha, color, linetype, size, weight j + geom_area()
F M A a + geom_histogram(binwidth = 5) x, y, alpha, color, fill, linetype, size General Purpose scales
4 4
x, y, alpha, color, fill, linetype, size, weight i + stat_density2d(aes(fill = ..level..),
Use with any aesthetic:
+ = b + geom_histogram(aes(y = ..density..)) f + geom_smooth(model = lm) geom = "polygon", n = 100)
3 3
2 2 x, y, ymax, ymin, alpha, color, fill, linetype, size f + stat_bin2d(bins = 30, drop = TRUE) scale_x_date(labels = date_format("%m/%d"),
1 1
c + geom_polygon(aes(group = group)) x, y, fill | ..count.., ..density..
breaks = date_breaks("2 weeks"))
x, y, alpha, color, fill, linetype, size g + geom_boxplot() k + geom_errorbar() f + stat_binhex(bins = 30) - treat x values as dates. See ?strptime for label formats.
lower, middle, upper, x, ymax, ymin, alpha, x, y, fill | ..count.., ..density..
0 1 2 3 4 0 1 2 3 4
color, fill, linetype, shape, size, weight x, ymax, ymin, alpha, color, linetype, size, scale_x_datetime() - treat x values as date times. Use
data geom coordinate plot d <- ggplot(economics, aes(date, unemploy)) width (also geom_errorbarh()) same arguments as scale_x_date().
x=F system g + geom_dotplot(binaxis = "y", f + stat_density2d(contour = TRUE, n = 100)
y=A d + geom_path(lineend="butt", k + geom_linerange() x, y, color, size | ..level.. scale_x_log10() - Plot x on log10 scale
color = F
linejoin="round’, linemitre=1) stackdir = "center") x, ymin, ymax, alpha, color, linetype, size
size = A
x, y, alpha, color, fill m + stat_contour(aes(z = z)) scale_x_reverse() - Reverse direction of x axis
x, y, alpha, color, linetype, size x, y, z, order | ..level..
Build a graph with qplot() or ggplot() g + geom_violin(scale = "area") k + geom_pointrange() scale_x_sqrt() - Plot x on square root scale
d + geom_ribbon(aes(ymin=unemploy - 900, x, y, alpha, color, fill, linetype, size, weight x, y, ymin, ymax, alpha, color, fill, linetype, shape, size m+ stat_spoke(aes(radius= z, angle = z))
aesthetic mappings data geom ymax=unemploy + 900)) angle, radius, x, xend, y, yend | ..x.., ..xend.., ..y.., ..yend.. Color and fill scales
x, ymax, ymin, alpha, color, fill, linetype, size
qplot(x = cty, y = hwy, color = cyl, data = mpg, geom = "point") m + stat_summary_hex(aes(z = z), bins = 30, fun = mean) Use with x or y aesthetics (x shown here)
d <- ggplot(economics, aes(date, unemploy))
Maps x, y, z, fill | ..value..
Creates a complete plot with given data, geom, and
mappings. Supplies many useful defaults. e + geom_segment(aes( Discrete X, Discrete Y data <- data.frame(murder = USArrests$Murder, g + stat_boxplot(coef = 1.5) n <- b + geom_bar( o <- a + geom_dotplot(
state = tolower(rownames(USArrests))) x, y | ..lower.., ..middle.., ..upper.., ..outliers..
xend = long + delta_long, h <- ggplot(diamonds, aes(cut, color)) aes(fill = fl)) aes(fill = ..x..))
ggplot(data = mpg, aes(x = cty, y = hwy)) yend = lat + delta_lat)) map <- map_data("state") g + stat_ydensity(adjust = 1, kernel = "gaussian", scale = "area")
Begins a plot that you finish by adding layers to. No x, xend, y, yend, alpha, color, linetype, size h + geom_jitter() l <- ggplot(data, aes(fill = murder)) x, y | ..density.., ..scaled.., ..count.., ..n.., ..violinwidth.., ..width.. o + scale_fill_gradient(
x, y, alpha, color, fill, linetype, size, n + scale_fill_brewer( low = "red",
defaults, but provides more control than qplot(). e + geom_rect(aes(xmin = long, ymin = lat, l + geom_map(aes(map_id = state), map = map) + palette = "Blues")
expand_limits(x = map$long, y = map$lat) f + stat_ecdf(n = 40) high = "yellow")
xmax= long + delta_long, x, y, alpha, color, fill, linetype, size, x, y | ..x.., ..y.. For palette choices:
data ymax = lat + delta_lat)) library(RcolorBrewer) o + scale_fill_gradient2(
add layers, elements with + xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size f + stat_quantile(quantiles = c(0.25, 0.5, 0.75), formula = y ~ log(x), display.brewer.all()
ggplot(mpg, aes(hwy, cty)) + method = "rq") low = "red", hight = "blue",
x, y | ..quantile.., ..x.., ..y.. n + scale_fill_grey( mid = "white", midpoint = 25)
geom_point(aes(color = cyl)) + layer = geom + default stat +
geom_smooth(method ="lm") + layer specific mappings Position Adjustments f + stat_smooth(method = "auto", formula = y ~ x, se = TRUE, n = 80,
fullrange = FALSE, level = 0.95)
start = 0.2, end = 0.8,
na.value = "red")
o + scale_fill_gradientn(
colours = terrain.colors(6))
coord_cartesian() + x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax..
scale_color_gradient() +
Three Variables f + stat_ecdf(n = 40)
Also: rainbow(), heat.colors(),
additional elements Position adjustments determine how to arrange geoms that would otherwise occupy the topo.colors(), cm.colors(),
x, y | ..x.., ..y.. RColorBrewer::brewer.pal()
theme_bw() seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)) same space
m <- ggplot(seals, aes(long, lat)) f + stat_quantile(quantiles = c(0.25, 0.5, 0.75), formula = y ~ log(x), Shape scales
s <- ggplot(mpg, aes(fl, fill = drv)) method = "rq")
x, y | ..quantile.., ..x.., ..y.. p <- f + geom_point(
Add a new layer to a plot with a geom_*() or stat_*() function. Each m + geom_contour(aes(z = z)) s + geom_bar(position = "dodge")
provides a geom, a set of aesthetic mappings, and a default stat and x, y, z, alpha, colour, linetype, size, weight Arrange elements side by side f + stat_smooth(method = "auto", formula = y ~ x, se = TRUE, n = 80, aes(shape = fl)) 0 6 12 18 24
fullrange = FALSE, level = 0.95)
position adjustment. x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax.. 1 7 13 19 25
m + geom_raster(aes(fill = z), hjust=0.5, s + geom_bar(position = "fill") p + scale_shape(
last_plot() vjust=0.5, interpolate=FALSE) Stack elements on top of one another, normalize height solid = FALSE) 2 8 14 20
ggplot() + stat_function(aes(x = -3:3), *
Returns the last plot x, y, alpha, fill fun = dnorm, n = 101, args = list(sd=0.5)) .
3 9 15 21
x | ..y..
ggsave("plot.png", width = 5, height = 5) m + geom_tile(aes(fill = z)) s + geom_bar(position = "stack") p + scale_shape_manual( 4 10 16 22 0
Saves last plot as 5’ x 5’ file named "plot.png" in x, y, alpha, color, fill, linetype, size Stack elements on top of one another f + stat_identity() values = c(3:7))
Shape values shown in 5 11 17 23 O
working directory. Matches file type to file extension. ggplot() + stat_qq(aes(sample=1:100), distribution = qt, chart on right
f + geom_point(position = "jitter") dparams = list(df=5))
Add random noise to X and Y position of each element to avoid overplotting sample, x, y | ..x.., ..y.. Size scales
f + stat_sum()
Coordinate Systems Faceting Each position adjustment can be recast as a function with manual width and height
arguments
x, y, size | ..size..
f + stat_summary(fun.data = "mean_cl_boot")
q <- f + geom_point(
aes(size = cyl))
q + scale_size_area(max = 6)
Value mapped to area of circle
s + geom_bar(position = position_dodge(width = 1)) (not radius)
Facets divide a plot into subplots based on the values f + stat_unique()
r <- b + geo m_bar()
of one or more discrete variables.
r + coord_cartesian(xlim = c(0, 5)) t <- ggplot(mpg, aes(cty, hwy)) + geom_point()
xlim, ylim
The default cartesian coordinate system
t + facet_grid(. ~ fl)
facet into columns based on fl
Labels Themes Zooming
r + coord_fixed(ratio = 1/2) t + facet_grid(year ~ .) t + ggtitle("New Plot Title ")
Add a main title above the plot
ratio, xlim, ylim
Cartesian coordinates with fixed aspect
facet into rows based on year Without clipping (preferred)
ratio between x and y units t + xlab("New X label") Use scale functions
t + facet_grid(year ~ fl) Change the label on the X axis to update legend
r + coord_flip() facet into both rows and columns
xlim, ylim t + ylab("New Y label") labels t + coord_cartesian(
Flipped Cartesian coordinates t + facet_wrap(~ fl) Change the label on the Y axis xlim = c(0, 100), ylim = c(10, 20))
wrap facets into a rectangular layoutof one r + theme_bw() r + theme_classic()
or more discrete variables. t + labs(title =" New title", x = "New x", y = "New y") White background White background
r + coord_polar(theta = "x", direction=1 ) Set scales to let axis limits vary across facets
All of the above with grid lines no gridlines
theta, start, direction
Polar coordinates t + facet_grid(y ~ x, scales = "free")
x and y axis limits adjust to individual facets
r + coord_trans(ytrans = "sqrt")
xtrans, ytrans, limx, limy
Transformed cartesian coordinates. Set
• "free_x" - x axis limits adjust
• "free_y" - y axis limits adjust
Legends With clipping (removes unseen data points)
extras and strains to the name
of a window function. Set labeller to adjust facet labels t + theme(legend.position = "bottom") t + xlim(0, 100) + ylim(10, 20)
t + facet_grid(. ~ fl, labeller = label_both) Place legend at "bottom", "top", "left", or "right"
z + coord_map(projection = "ortho", fl: c fl: d fl: e fl: p fl: r r + theme_grey() r + theme_minimal() t + scale_x_continuous(limits = c(0, 100)) +
orientation=c(41, -74, 0)) t + guides(color = "none") Grey background Minimal theme
t + facet_grid(. ~ fl, labeller = label_both) Set legend type for each aesthetic: colorbar, legend, or none (no legend) (default theme) scale_y_continuous(limits = c(0, 100))
projection, orientation, xlim, ylim
Map projections from the mapproj package t + scale_fill_discrete(name = "Title", labels = c("A", "B", "C"))
(mercator (default), azequalarea, lagrange, etc.) t + facet_grid(. ~ fl, labeller = label_both) Set legend title and labels with a scale function.
c d e p r ggthemes - Package with additional ggplot2 themes
Creative Commons Data Visualisation with ggplot2 by RStudio is licensed under CC BY SA 4.0
Big-O Complexity Chart
Horrible
O(n!) O(n^2)
Operations
O(2^n)
O(n log n)
Bad
Big-O Cheat Sheet
BecomingHuman.AI
Fair
O(n)
Good
O(log n), O(1)
Excellent
Elements
Data Structure Array Sorting
Operation Algorithms
Array Θ(1) Θ(n) Θ(n) Θ(n) Θ(1) Θ(n) Θ(n) Θ(n) Θ(n) Quicksort Ω(n log(n)) Θ(n log(n)) O(n^2) O(n log(n))
Stack Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Mergesort Ω(n log(n)) Θ(n log(n)) O(n log(n)) O(n log(n))
Queue Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Timsort Ω(n) Θ(n log(n)) O(n log(n)) Θ(n)
Singly-Linked List Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Heapsort Ω(n log(n)) Θ(n log(n)) O(n log(n)) O(n log(n))
Doubly-Linked List Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Θ(n) Θ(1) Θ(1) Θ(n) Bubble Sort Ω(n) Θ(n^2) O(n^2) Θ(n)
Skip List Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(n) Θ(n) Θ(n) Θ(n) O(n log(n)) Insertion Sort Ω(n) Θ(n^2) O(n^2) Θ(n)
Hash Table N/A Θ(1) Θ(1) Θ(1) N/A Θ(n) Θ(n) Θ(n) Θ(n) Selection Sort Ω(n^2) Θ(n^2) O(n^2) Ω(n^2)
Binary Search Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(n) Θ(n) Θ(n) Θ(n) Θ(n) Tree Sort Ω(n log(n)) Θ(n log(n)) O(n^2) O(n log(n))
Cartesian Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) N/A Θ(n) Θ(n) Θ(n) Θ(n) Shell Sort Ω(n log(n)) Θ(n(log(n))^2) O(n(log(n))^2) O(n log(n))
B-Tree N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(n) Bucket Sort Ω(n+k) Θ(n+k) O(n^2) Ω(n+k)
1 10 100
Red-Black Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(n) Radix Sort Ω(n+k) Θ(n+k) Ω(n+k) Ω(n+k)
Splay Tree N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(n) Counting Sort Ω(n+k) Θ(n+k) Ω(n+k) Ω(n+k)
AVL Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(n) Cubesort Ω(n) Θ(n log(n)) O(n log(n)) O(n log(n))
KD Tree Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(n) Θ(n) Θ(n) Θ(n) Θ(n)