All Projects → KaveIO → Phik

KaveIO / Phik

Licence: other
Phi_K correlation analyzer library

Projects that are alternatives of or similar to Phik

Juliatutorial
Julia Tutorial for Finance and Econometrics Students
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Documents
Slides produced by Engineers and Data Scientists of Blue Yonder
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Machine Learning Asset Management
Machine Learning in Asset Management (by @firmai)
Stars: ✭ 1,060 (+1978.43%)
Mutual labels:  jupyter-notebook
Scona
Code to analyse structural covariance brain networks using python.
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Sgn 41007
Supplementary materials for course "Pattern Recognition and Machine Learning" at
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Russiansuperglue
Russian SuperGLUE benchmark
Stars: ✭ 51 (+0%)
Mutual labels:  jupyter-notebook
Numerical Linear Algebra
Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course
Stars: ✭ 8,263 (+16101.96%)
Mutual labels:  jupyter-notebook
Training toolbox caffe
Training Toolbox for Caffe
Stars: ✭ 51 (+0%)
Mutual labels:  jupyter-notebook
Bigartm Book
Topic modeling with BigARTM: an interactive book
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Advbox
Advbox is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddle、PyTorch、Caffe2、MxNet、Keras、TensorFlow and Advbox can benchmark the robustness of machine learning models. Advbox give a command line tool to generate adversarial examples with Zero-Coding.
Stars: ✭ 1,055 (+1968.63%)
Mutual labels:  jupyter-notebook
Sketchback
Keras implementation of sketch inversion using deep convolution neural networks (synthesising photo-realistic images from pencil sketches)
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Doubletdetection
Doublet detection in single-cell RNA-seq data.
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Stock Trading
『파이썬과 리액트를 활용한 주식 자동 시스템』 예제 코드
Stars: ✭ 51 (+0%)
Mutual labels:  jupyter-notebook
How to generate images with tensorflow live
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Gym Continuousdoubleauction
A custom MARL (multi-agent reinforcement learning) environment where multiple agents trade against one another (self-play) in a zero-sum continuous double auction. Ray [RLlib] is used for training.
Stars: ✭ 50 (-1.96%)
Mutual labels:  jupyter-notebook
Community
Kubernetes community content
Stars: ✭ 9,133 (+17807.84%)
Mutual labels:  jupyter-notebook
Pytorch Transfomer
My implementation of the transformer architecture from the Attention is All you need paper applied to time series.
Stars: ✭ 51 (+0%)
Mutual labels:  jupyter-notebook
Workshops
DSSG Workshops
Stars: ✭ 51 (+0%)
Mutual labels:  jupyter-notebook
Machine Learning Decal Spring 2019
A 2-unit decal run by [email protected]'s education team
Stars: ✭ 51 (+0%)
Mutual labels:  jupyter-notebook
Bnlearn
Python package for learning the graphical structure of Bayesian networks, parameter learning, inference and sampling methods.
Stars: ✭ 51 (+0%)
Mutual labels:  jupyter-notebook

================================== Phi_K Correlation Analyzer Library

Phi_K is a practical correlation constant that works consistently between categorical, ordinal and interval variables. It is based on several refinements to Pearson's hypothesis test of independence of two variables.

The combined features of Phi_K form an advantage over existing coefficients. First, it works consistently between categorical, ordinal and interval variables. Second, it captures non-linear dependency. Third, it reverts to the Pearson correlation coefficient in case of a bi-variate normal input distribution. These are useful features when studying the correlation matrix of variables with mixed types.

The presented algorithms are easy to use and available through this public Python library: the correlation analyzer package. Emphasis is paid to the proper evaluation of statistical significance of correlations and to the interpretation of variable relationships in a contingency table, in particular in case of low statistics samples.

For example, the Phi_K correlation analyzer package has been used to study surveys, insurance claims, correlograms, etc. For details on the methodology behind the calculations, please see our publication.

Documentation

The entire Phi_K documentation including tutorials can be found at read-the-docs <https://phik.readthedocs.io>_. See the tutorials for detailed examples on how to run the code with pandas. We also have one example on how calculate the Phi_K correlation matrix for a spark dataframe.

Check it out

The Phi_K library requires Python 3 and is pip friendly. To get started, simply do:

.. code-block:: bash

$ pip install phik

or check out the code from out GitHub repository:

.. code-block:: bash

$ git clone https://github.com/KaveIO/PhiK.git $ pip install -e PhiK/

where in this example the code is installed in edit mode (option -e).

You can now use the package in Python with:

.. code-block:: python

import phik

Congratulations, you are now ready to use the PhiK correlation analyzer library!

Speedups

Phi_K can use the Numba JIT library for faster computation of certain operations. You can either install Numba separately or use the numba extra specifier while installing:

.. code-block:: bash

$ pip install phik[numba]

Quick run

As a quick example, you can do:

.. code-block:: python

import pandas as pd import phik from phik import resources, report

open fake car insurance data

df = pd.read_csv( resources.fixture('fake_insurance_data.csv.gz') ) df.head()

Pearson's correlation matrix between numeric variables (pandas functionality)

df.corr()

get the phi_k correlation matrix between all variables

df.phik_matrix()

get global correlations based on phi_k correlation matrix

df.global_phik()

get the significance matrix (expressed as one-sided Z)

of the hypothesis test of each variable-pair dependency

df.significance_matrix()

contingency table of two columns

cols = ['mileage','car_size'] df[cols].hist2d()

normalized residuals of contingency test applied to cols

df[cols].outlier_significance_matrix()

show the normalized residuals of each variable-pair

df.outlier_significance_matrices()

generate a phik correlation report and save as test.pdf

report.correlation_report(df, pdf_file_name='test.pdf')

For all available examples, please see the tutorials <https://phik.readthedocs.io/en/latest/tutorials.html>_ at read-the-docs.

Contact and support

Please note that KPMG provides support only on a best-effort basis.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].