Statistics using Python
Porting Code from Matlab to Python - 2017
10 October 2017 | Rajalekshmi Deepu
Member of the Helmholtz Association
Statistics in Matlab and python
Ø Matlab:
- Proprietary software.
- Need “Statistics” toolbox. (extra cost )
Ø Python:
- Opensource
- Extended with a fantastic ecosystem of
data-centric packages like: numpy,scipy,
Member of the Helmholtz Association
matplotlib, scikit-learn,pandas, …
10. October 2017 Porting Code from Matlab to Python 3
Numpy and Statistics(Descriptive)
Ø Contains in-built statistical functions like Mean,
Median, Standard Deviation and Variance.
Matlab Python (using Numpy)
>> load wages.dat
>>> import numpy as np
% Mean
>>> X = [16.92, 96.10, 11.82, 44.32,
>> Mean_value = mean(wages)
55.66, 10.75]
% Median
>>> mean = np.mean(X)
>> med_value = median(wages)
>>> median = np.median(X)
% Standard deviation
>>> sd = np.std(X)
>> std_value = std(wages)
>>> variance = np.var(X)
Member of the Helmholtz Association
% Variance
>> var_value = var(wages)
10. October 2017 Porting Code from Matlab to Python 4
ScientificPython (SciPy)
Ø Scientific Computing Package for Python.
>>> help(scipy)
Ø Built on top of Numpy and uses Numpy arrays and data types.
Ø Scipy package is organized into several sub-packages.
Ø Imports all functions in the Numpy package, and several commonly
used functions from sub-packages, into the top level namespace.
e.g: scipy.var and numpy.var
both refers to function var in module numpy.core.fromnumeric
scipy.array and numpy.array
both refers to built-in function array in module numpy.core.multiarray
Member of the Helmholtz Association
10. October 2017 Porting Code from Matlab to Python 5
SciPy and Statistics (Inferential)
Ø SciPy offers an extended collection of statistical
tools such as distributions (continuous and discrete)
and functions.
Ø Few sub packages for statistics are:
scipy.cluster --- Vector Quantization / Kmeans
scipy.stats --- Statistical Functions
scipy.stats.t --- Student’s T test
Remember: Subpackages requires an explicit import
Member of the Helmholtz Association
e.g: >>> import scipy.cluster
>>> from scipy import stats
10. October 2017 Porting Code from Matlab to Python 6
scipy.stats
Member of the Helmholtz Association
10. October 2017 Porting Code from Matlab to Python 7
Member of the Helmholtz Association
10. October 2017 Porting Code from Matlab to Python 8
Scipy and Matlab
Ø scipy.io.matlab - Utilities for dealing with MATLAB
files.
Ø Included functions:
• scipy.io.loadmat - Load MATLAB file. Returns dictionary
with variable names as keys, and loaded matrices as values.
• scipy.io.savemat - Save a dictionary of names and arrays
into a MATLAB-style .mat file.
• scipy.io.whosmat - List variables inside a MATLAB file.
Member of the Helmholtz Association
Jupyter Notebook: Load_List_Save_MAT_files
10. October 2017 Porting Code from Matlab to Python 9
Matplotlib
Ø matplotlib.mlab
Numerical python functions written for compatibility
with MATLAB commands with the same names.
MATLAB compatible functions
:func:`cohere` Coherence (normalized cross spectral density)
:func:`csd` Cross spectral density using Welch's average periodogram
:func:`detrend` Remove the mean or best fit line from an array
:func:`find` Return the indices where some condition is true; numpy.nonzero is
similar but more general.
:func:`griddata` Interpolate irregularly distributed data to a regular grid.
:func:`prctile` Find the percentiles of a sequence
Member of the Helmholtz Association
:func:`prepca` Principal Component Analysis
:func:`psd` Power spectral density using Welch's average periodogram
:func:`rk4` A 4th order runge kutta integrator for 1D or ND systems
:func:`specgram` Spectrogram (spectrum over segments of time)
10. October 2017 Porting Code from Matlab to Python 10
Principal Component Analysis (PCA)
Ø Way of identifying patterns and expressing the data
to highlight their similarities and differences.
Ø Powerful tool for analyzing high dimensional data.
Ø Enables data compression without much loss of
information by reducing the number of dimensions.
Member of the Helmholtz Association
10. October 2017 Porting Code from Matlab to Python 11
Matlab code for PCA (An example)
rd = load_untouch_nii('edtd.nii');
rd = double(rd.img);
sz = size(rd)
nrows = sz(1)
ncols = sz(2)
nslcs = sz(4)
s = reshape(rd,nrows*ncols,nslcs);
[coeff,score] = pca(s);
s = reshape(score,nrows,ncols,nslcs);
n = make_nii(s);
save_nii(n,'results/pca.nii')
Member of the Helmholtz Association
Ref: https://de.mathworks.com/help/stats/pca.html
https://stackoverflow.com/questions/35651133/matlab-and-python-
produces-different-results-for-pca
10. October 2017 Porting Code from Matlab to Python 12
PCA using Python (matplotlib.mlab)
Ø Hint:
- Use matplotlib.mlab.PCA
- Imported as given below:
from matplotlib.mlab import PCA
- Dataset: edtd.nii
- Ref:
http://matplotlib.org/api/mlab_api.html#matplotlib.mlab.PCA
Member of the Helmholtz Association
http://nipy.org/nibabel/nibabel_images.html
Jupyter Notebook
10. October 2017 Porting Code from Matlab to Python 13
Scikit-learn or sklearn
Ø Meant for machine learning in Python
Ø sklearn.cluster.KMeans
Ø ‘sklearn.decomposition’ module includes matrix
decomposition algorithms, including among others
PCA, NMF or ICA.
e.g. modules:
- sklearn.decomposition.nmf - Non-negative matrix factorization
- sklearn.decomposition.pca - Principal Component Analysis
Ø Most of the algorithms of this module can be
Member of the Helmholtz Association
regarded as dimensionality reduction techniques.
10. October 2017 Porting Code from Matlab to Python 14
PCA using Python (sklearn)
Ø Hint:
- Use sklearn.decomposition.PCA
- Imported as given below:
from sklearn.decomposition import PCA
- Dataset: edtd.nii
- Ref:
http://scikitlearn.org/stable/modules/generated/sklearn.dec
omposition.PCA.html
Member of the Helmholtz Association
http://nipy.org/nibabel/nibabel_images.html
Jupyter Notebook Optional
10. October 2017 Porting Code from Matlab to Python 15
Other Python modules for Statistics
Ø Seaborn : Statistical data visualization
http://seaborn.pydata.org
Ø Statsmodels : Library for statistical and econometric
analysis in Python.
http://statsmodels.sourceforge.net/
Member of the Helmholtz Association
Jupyter Notebook : seaborn_savefig
10. October 2017 Porting Code from Matlab to Python 16
References
The Python Language Reference: http://docs.python.org/2/reference/index.html
The Python Standard Library: http://docs.python.org/2/library/
https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html
http://matplotlib.org/api/mlab_api.html#module-matplotlib.mlab
http://conference.scipy.org/proceedings/scipy2010/pdfs/seabold.pdf
http://seaborn.pydata.org
https://www.datacamp.com/community/data-science-cheatsheets
PEP 20 -- The Zen of Python :https://www.python.org/dev/peps/pep-0020/
https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html
https://www.tiobe.com/tiobe-index/
Member of the Helmholtz Association
10. October 2017 Porting Code from Matlab to Python 17