All Projects → abidlabs → Contrastive

abidlabs / Contrastive

Licence: mit
Contrastive PCA

Projects that are alternatives of or similar to Contrastive

Nb pdf template
A more accurate representation of jupyter notebooks when converting to pdfs.
Stars: ✭ 109 (-0.91%)
Mutual labels:  jupyter-notebook
Kubeflowdojo
Repository to hold code, instructions, demos and pointers to presentation assets for Kubeflow Dojo
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Taiwanreferendum
2018台灣公投結果中的不合理數據
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Alexnet Experiments Keras
Code examples for training AlexNet using Keras and Theano
Stars: ✭ 109 (-0.91%)
Mutual labels:  jupyter-notebook
Densecap
Dense image captioning in Torch
Stars: ✭ 1,469 (+1235.45%)
Mutual labels:  jupyter-notebook
Leetcode Course
A guide to crushing tech interviews.
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Street to shop experiments
Stars: ✭ 108 (-1.82%)
Mutual labels:  jupyter-notebook
Keras Tutorials
一个面向初学者的,友好的Keras入门教程
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Mtcnn
MTCNN face detection implementation for TensorFlow, as a PIP package.
Stars: ✭ 1,689 (+1435.45%)
Mutual labels:  jupyter-notebook
Gtc2019 Numba
Numba tutorial for GTC2019
Stars: ✭ 109 (-0.91%)
Mutual labels:  jupyter-notebook
Prisma abu
用机器学习做个艺术画家-Prisma
Stars: ✭ 109 (-0.91%)
Mutual labels:  jupyter-notebook
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-0.91%)
Mutual labels:  jupyter-notebook
Sklearn
Data & Code associated with my tutorial on the sci-kit learn machine learning library in python
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Ml Da Coursera Yandex Mipt
Machine Learning and Data Analysis Coursera Specialization from Yandex and MIPT
Stars: ✭ 108 (-1.82%)
Mutual labels:  jupyter-notebook
Google Images Dataset
This repository provides the necessary code to create your own Google Images Dataset.
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Deeplearning.ai Convolutional Neural Networks
Completed assignment jupyter notebook of Foundations of Convolutional Neural Networks, deeplearning.ai coursera course
Stars: ✭ 109 (-0.91%)
Mutual labels:  jupyter-notebook
Nvidia Gpu Tensor Core Accelerator Pytorch Opencv
A complete machine vision container that includes Jupyter notebooks with built-in code hinting, Anaconda, CUDA-X, TensorRT inference accelerator for Tensor cores, CuPy (GPU drop in replacement for Numpy), PyTorch, TF2, Tensorboard, and OpenCV for accelerated workloads on NVIDIA Tensor cores and GPUs.
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Bigquery Tutorial
Google BigQuery Tutorial for Data Analyst
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Monetizing Machine Learning
Source code for 'Monetizing Machine Learning' by Manuel Amunategui and Mehdi Roopaei
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook
Hgn
Hierarchical Gating Networks for Sequential Recommendation
Stars: ✭ 110 (+0%)
Mutual labels:  jupyter-notebook

contrastive

A python library for performing unsupervised machine learning on datasets with learning (e.g. PCA) in contrastive settings, where one is interested in patterns (e.g. clusters or clines) that exist one dataset, but not the other.

Applications include dicovering subgroups in biological and medical data. Here are basic installation and usage instructions, written for Python 3 (in which the library has been developed and tested, although it should work in Python 2 as well).

For more details, see the accompanying paper: "Exploring Patterns Enriched in a Dataset with Contrastive Principal Component Analysis" <https://www.nature.com/articles/s41467-018-04608-8.pdf/>_, Nature Communications (2018), and please use the citation below.

.. code-block::

@article{abid2018exploring,
  title={Exploring patterns enriched in a dataset with contrastive principal component analysis},
  author={Abid, Abubakar and Zhang, Martin J and Bagaria, Vivek K and Zou, James},
  journal={Nature communications},
  volume={9},
  number={1},
  pages={2134},
  year={2018},
}

This repository also includes experiments to reproduce most of the figures in the paper. Please see the python notebooks in the :code:experiments folder.

Installation

.. code-block::

$ pip3 install contrastive

Basic Usage

The basic functions enabled by this library are shown below. Generally speaking, we have two datasets, one is a dataset that we can label as :code:foreground_data, which is the dataset in which we are discovering patterns and directions, and another dataset called :code:background_data, which is the dataset that does not have the patterns or directions we are interested in discovering. In some cases, both datasets may contain the signal of interest, but the foreground dataset may have the pattern enriched relative to the background. In these analyses, there is a contrast parameter, known as alpha, which can be thought of as a hyperparameter.

.. code-block:: python

from contrastive import CPCA

mdl = CPCA()
projected_data = mdl.fit_transform(foreground_data, background_data)

#returns a set of 2-dimensional projections of the foreground data stored in the list 'projected_data', for several different values of 'alpha' that are automatically chosen (by default, 4 values of alpha are chosen)

Note that :code:foreground_data and :code:background_data should be 2D numpy arrays that have the second dimension (which represents the number of features). In other words, :code:foreground_data.shape[1]==background_data.shape[1] should return :code:True.

Built-in plotting: to quickly see the results of contrastive PCA, simply enable the :code:plot parameter to true:

.. code-block:: python

from contrastive import CPCA

mdl = CPCA()
projected_data = mdl.fit_transform(foreground_data, background_data, plot=True)

.. image:: images/plot_true.png

Interactive GUI: if you are running these analyses inside a jupyter notebook, you can easily launch an interactive GUI as shown here:

.. code-block:: python

from contrastive import CPCA

mdl = CPCA()
projected_data = mdl.fit_transform(foreground_data, background_data, gui=True)

.. image:: images/gui_true.png

Using the slider, you can see how the your data points move as you change the value of the contrast parameter. These animations can reveal groups in the data and other insights:

.. image:: images/animation.gif

Quick Test

To ensure that the library is working, here is a quick script that will allow you to test the code on synthetic data. Simply run the following commands:

.. code-block:: python

import numpy as np
from contrastive import CPCA

N = 400; D = 30; gap=3
# In B, all the data pts are from the same distribution, which has different variances in three subspaces.
B = np.zeros((N, D))
B[:,0:10] = np.random.normal(0,10,(N,10))  
B[:,10:20] = np.random.normal(0,3,(N,10))
B[:,20:30] = np.random.normal(0,1,(N,10))


# In A there are four clusters.
A = np.zeros((N, D))
A[:,0:10] = np.random.normal(0,10,(N,10))
# group 1
A[0:100, 10:20] = np.random.normal(0,1,(100,10))
A[0:100, 20:30] = np.random.normal(0,1,(100,10))
# group 2
A[100:200, 10:20] = np.random.normal(0,1,(100,10))
A[100:200, 20:30] = np.random.normal(gap,1,(100,10))
# group 3
A[200:300, 10:20] = np.random.normal(2*gap,1,(100,10))
A[200:300, 20:30] = np.random.normal(0,1,(100,10))
# group 4
A[300:400, 10:20] = np.random.normal(2*gap,1,(100,10))
A[300:400, 20:30] = np.random.normal(gap,1,(100,10))
A_labels = [0]*100+[1]*100+[2]*100+[3]*100

cpca = CPCA(standardize=False)
cpca.fit_transform(A, B, plot=True, active_labels=A_labels)

You should see a series of plots that looks something like this:

.. image:: images/plot_example.png

Optional Parameters

Labels for foreground data (plot/gui mode): In the examples above, the data points are colored according to labels known ahead of time. You can supply these labels using the :code:active_labels parameter, as shown here:

.. code-block:: python

from contrastive import CPCA

mdl = CPCA()
#labels = [0, 1, 0, 1, 1 ... 1, 0] 
projected_data = mdl.fit_transform(foreground_data, background_data, plot=True, active_labels=labels)

Additional # of components: Sometimes, you'd like to project your data on more than the top 2 contrastive principal components (cPCs). Specify the number of cPCs when you instantiate your model using the :code:n_components parameter:

.. code-block:: python

from contrastive import CPCA

mdl = CPCA(n_components=3) #the top 3 components will be returned
projected_data = mdl.fit_transform(foreground_data, background_data)

However, note that only when :code:n_components=2 can the data be plotted or visualized through the GUI.

How values of alpha are chosen: So far, we've always plotted the data when the values of alpha have been chosen automatically with default parameters. However, the values of alpha can be customized. For example, if you'd like to still choose the values of alpha automatically, but change the range or number of alphas considered, you can use the :code:n_alphas and :code:max_log_alpha parameters. The former sets the number of alphas that are analyzed, and the latter sets the upper bound on the highest value of log (base 10) alpha. (The minimum value of alpha, besides alpha = 0, is always alpha = 0.1). Finally, you can change the number of values of alpha that are returned using the :code:n_alphas_to_return parameter.

.. code-block:: python

from contrastive import CPCA

mdl = CPCA()
projected_data = mdl.fit_transform(foreground_data, background_data, n_alphas=10,  max_log_alpha=2, n_alphas_to_return=1) #search through 10 logarithmically spaced values of alpha from 0.1 to 100 and return the PCs for only 1 of them.

You can also decide to set the value of alpha to a particular value of alpha manually by changing the :code:alpha_selection and :code:alpha_value parameters as follows:

.. code-block:: python

from contrastive import CPCA

mdl = CPCA()
projected_data = mdl.fit_transform(foreground_data, background_data, alpha_selection='manual', alpha_value=2.0)

Or you can decide to plot or return the data for all values of alpha in the given range. In this case, you can still choose to set the :code:n_alphas and :code:max_log_alpha parameters:

.. code-block:: python

from contrastive import CPCA

mdl = CPCA() #the top 3 components will be returned
projected_data = mdl.fit_transform(foreground_data, background_data, n_alphas=10,  max_log_alpha=2, alpha_selection='all') #search through 10 logarithmically spaced values of alpha from 0.1 to 100 and return the PCs for all of them!

Whether to standardize your data: By default, before performing contrastive PCA, the data are standardized so that each column or dimension has unit variance. You can turn this off by doing the following:

.. code-block:: python

from contrastive import CPCA

mdl = CPCA(standardize=False)
projected_data = mdl.fit_transform(foreground_data, background_data)

Custom colors (plot/gui mode): As a stylistic touch, you can also customize which colors are used to label the points when the data is plotted by using the :code:colors argument. Here's an example:

.. code-block:: python

from contrastive import CPCA

mdl = CPCA(standardize=False)
projected_data = mdl.fit_transform(foreground_data, background_data, gui=True, colors=['r','b','k','c'])

will produce something along the lines of:

.. image:: images/gui_colors.png

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].