All Projects → ucla-vision → information-dropout

ucla-vision / information-dropout

Licence: other
Implementation of Information Dropout

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to information-dropout

reprieve
A library for evaluating representations.
Stars: ✭ 68 (+88.89%)
Mutual labels:  representation-learning
FLIP
A collection of tasks to probe the effectiveness of protein sequence representations in modeling aspects of protein design
Stars: ✭ 35 (-2.78%)
Mutual labels:  representation-learning
rl singing voice
Unsupervised Representation Learning for Singing Voice Separation
Stars: ✭ 18 (-50%)
Mutual labels:  representation-learning
awesome-graph-self-supervised-learning
Awesome Graph Self-Supervised Learning
Stars: ✭ 805 (+2136.11%)
Mutual labels:  representation-learning
pair2vec
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Stars: ✭ 62 (+72.22%)
Mutual labels:  representation-learning
TailCalibX
Pytorch implementation of Feature Generation for Long-Tail Classification by Rahul Vigneswaran, Marc T Law, Vineeth N Balasubramaniam and Makarand Tapaswi
Stars: ✭ 32 (-11.11%)
Mutual labels:  representation-learning
REGAL
Representation learning-based graph alignment based on implicit matrix factorization and structural embeddings
Stars: ✭ 78 (+116.67%)
Mutual labels:  representation-learning
State-Representation-Learning-An-Overview
Simplified version of "State Representation Learning for Control: An Overview" bibliography
Stars: ✭ 32 (-11.11%)
Mutual labels:  representation-learning
graphml-tutorials
Tutorials for Machine Learning on Graphs
Stars: ✭ 125 (+247.22%)
Mutual labels:  representation-learning
HSIC-bottleneck
The HSIC Bottleneck: Deep Learning without Back-Propagation
Stars: ✭ 56 (+55.56%)
Mutual labels:  information-theory
TCE
This repository contains the code implementation used in the paper Temporally Coherent Embeddings for Self-Supervised Video Representation Learning (TCE).
Stars: ✭ 51 (+41.67%)
Mutual labels:  representation-learning
amr
Official adversarial mixup resynthesis repository
Stars: ✭ 31 (-13.89%)
Mutual labels:  representation-learning
meta-embeddings
Meta-embeddings are a probabilistic generalization of embeddings in machine learning.
Stars: ✭ 22 (-38.89%)
Mutual labels:  representation-learning
GLOM-TensorFlow
An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data
Stars: ✭ 32 (-11.11%)
Mutual labels:  representation-learning
gnn-lspe
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations), ICLR 2022
Stars: ✭ 165 (+358.33%)
Mutual labels:  representation-learning
M-NMF
An implementation of "Community Preserving Network Embedding" (AAAI 2017)
Stars: ✭ 119 (+230.56%)
Mutual labels:  representation-learning
Pose2vec
A Repository for maintaining various human skeleton preprocessing steps in numpy and tensorflow along with tensorflow model to learn pose embeddings.
Stars: ✭ 25 (-30.56%)
Mutual labels:  representation-learning
MSF
Official code for "Mean Shift for Self-Supervised Learning"
Stars: ✭ 42 (+16.67%)
Mutual labels:  representation-learning
FEATHER
The reference implementation of FEATHER from the CIKM '20 paper "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models".
Stars: ✭ 34 (-5.56%)
Mutual labels:  representation-learning
ShapeFormer
Official repository for the ShapeFormer Project
Stars: ✭ 97 (+169.44%)
Mutual labels:  representation-learning

Information Dropout implementation

A TensorFlow implementation of Information Dropout [https://arxiv.org/abs/1611.01353].

Information Dropout is a form of stochastic regularization that adds noise to the activations of a layer in order to improve disentanglement and invariance to nuisances in the learned representation. The related paper Emergence of Invariance and Disentangling in Deep Representations also establish strong theoretical and practical connections between the objective of Information Dropout (i.e., minimality, invariance, and disentanglement of the activations), the minimality, compression and generalization performance of the network weights, and the geometry of the loss function.

Installation

This implementation makes use TensorFlow (tested with v1.0.1), and the Python package sacred.

To run the experiments, you will need a preprocessed copy of the CIFAR-10 and Cluttered MNIST datasets. You can download and uncompress them in the datasets directory using:

wget http://vision.ucla.edu/~alex/files/cifar10.tar.gz
wget http://vision.ucla.edu/~alex/files/cluttered.tar.gz
tar -xzf cifar10.tar.gz
tar -xzf cluttered.tar.gz

The CIFAR-10 dataset was preprocessed with ZCA using the included process_cifar.py script, while the Cluttered MNIST dataset was generated using the official code and converted to numpy format.

Running the experiments

CIFAR-10

To train a CNN on CIFAR-10 you can use commands in the following format:

./cifar.py train with dropout=information filter_percentage=0.25 beta=3.0
./cifar.py train with dropout=binary filter_percentage=0.25

The first command trains using information dropout, with parameter beta=3.0, and using a smaller network with only 25% of the filters. The second will train with binary dropout instead. All the trained models will be saved in the models directory using a unique name for the given configuration. To load and test a trained configuration, run

./cifar.py test with [...]

You can print the complete list of options using

./cifar.py print_config

Computing the total correlation of the layer is only supported for softplus activations using a log-normal prior at the moment. To train with softplus activations and compute the total correlation of the trained representation, run

./cifar.py train with softplus filter_percentage=0.25 beta=1.0
./cifar.py correlation with softplus filter_percentage=0.25 beta=1.0

Cluttered MNIST

Training on the Cluttered MNIST dataset uses a similar syntax:

./cluttered.py train with beta=0.5

To plot the information heatmap of each layer, which shows that Information Dropout learns to ignore nuisances and focus on information important for the task, use the following command. The results will be saved in the plots subdirectory.

./cluttered.py plot with beta=0.5

Note: due to a slight change in the training algorithm, for this version of the code use beta <= 0.5 to train.

A minimal implementation

For illustration purpose, we include here a commented pseudo-implementation of a convolutional Information Dropout layer using ReLU activations.

import tensorflow as tf
from tensorflow.contrib.layers import conv2d

def sample_lognormal(mean, sigma=None, sigma0=1.):
    """
    Samples from a log-normal distribution using the reparametrization
    trick so that we can backprogpagate the gradients through the sampling.
    By setting sigma0=0 we make the operation deterministic (useful at testing time)
    """
    e = tf.random_normal(tf.shape(mean), mean = 0., stddev = 1.)
    return tf.exp(mean + sigma * sigma0 * e)

def information_dropout(inputs, stride = 2, max_alpha = 0.7, sigma0 = 1.):
    """
    An example layer that performs convolutional pooling
    and information dropout at the same time.
    """
    num_ouputs = inputs.get_shape()[-1]
    # Creates a convolutional layer to compute the noiseless output
    network = conv2d(inputs,
        num_outputs=num_outputs,
        kernel_size=3,
        activation_fn=tf.nn.relu,
        stride=stride)
    # Computes the noise parameter alpha for the new layer based on the input
    with tf.variable_scope(None,'information_dropout'):
        alpha = max_alpha * conv2d(inputs,
            num_outputs=num_outputs,
            kernel_size=3,
            stride=stride,
            activation_fn=tf.sigmoid,
            scope='alpha')
        # Rescale alpha in the allowed range and add a small value for numerical stability
        alpha = 0.001 + max_alpha * alpha
        # Similarly to variational dropout we renormalize so that
        # the KL term is zero for alpha == max_alpha
        kl = - tf.log(alpha/(max_alpha + 0.001))
        tf.add_to_collection('kl_terms', kl)
    e = sample_lognormal(mean=tf.zeros_like(network), sigma=alpha, sigma0=sigma0)
    # Noisy output of Information Dropout
    return network * e

### BUILD THE NETWORK
# ...
# Computes the KL divergence term in the cost function
kl_terms = [ tf.reduce_sum(kl)/batch_size for kl in tf.get_collection('kl_terms') ]
# Normalizes by the number of training samples to make
# the parameter beta comparable to the beta in variational dropout
Lz = tf.add_n(kl_terms)/N_train
# Lx is the cross entropy loss of the network
Lx = cross_entropy_loss
# The final cost
cost = Lx + beta * Lz
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].