0% found this document useful (0 votes)
88 views80 pages

Machine Learning For Astronomy: Rob Fergus

This document provides an overview of machine learning techniques for astronomy problems, specifically discussing generative and discriminative modeling approaches. It gives examples of applying these techniques to problems like exoplanet detection, galaxy morphology classification, and analyzing astronomical image data cubes. Generative models like PCA and probabilistic PCA are discussed, as well as discriminative models like support vector machines. The document then focuses on applying these methods to direct detection of exoplanets from image data, comparing a generative model called S4 Detect to a discriminative version called DS4 Detect.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views80 pages

Machine Learning For Astronomy: Rob Fergus

This document provides an overview of machine learning techniques for astronomy problems, specifically discussing generative and discriminative modeling approaches. It gives examples of applying these techniques to problems like exoplanet detection, galaxy morphology classification, and analyzing astronomical image data cubes. Generative models like PCA and probabilistic PCA are discussed, as well as discriminative models like support vector machines. The document then focuses on applying these methods to direct detection of exoplanets from image data, comparing a generative model called S4 Detect to a discriminative version called DS4 Detect.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Machine Learning for

Astronomy

Rob Fergus

Dept. of Computer Science,


Courant Institute,
New York University
Overview
• High-level view of machine learning
– Discuss generative & discriminative modeling of data
– Not exhaustive survey
– Try to illustrate important ML concepts

• Give examples of these models applied to


problems in astronomy

• In particular, exoplanet detection algorithms


Generative vs Discriminative Modeling
• Key distinction in machine learning
• E.g toy classification dataset with labels
(red=class 1, blue=class 2)
Generative vs Discriminative Modeling
• Given new point x Bayes rule

• Want to compute
Posterior Likelihood Prior

• Alternatively:

Discriminative Generative
approaches approaches
compute this compute this
Generative Modeling
• Top-down interpretation of data
– i.e. adjust model parameters to fit observed data
• E.g. Gaussian model, estimate
that maximizes likelihood of data:

+
Generative Modeling
• Given new point, we can compute
• Combine with prior to give posterior
• Likelihood ratio defines decision surface

+
Discriminative Modeling
• Model posterior directly (no model of data density)
• Fit decision surface directly
• Bottom-up model: input=x, output=class prediction
Principal Components Analysis (PCA)
• Example of generative model (objective: compression)
• Observed data points:
• Hidden manifold coords.:
• Hidden linear mapping:

• Find global optimum via eigendecomposition of


sample covariance matrix
Principal Components Analysis (PCA)
Probabilistic Principal
Components Analysis (PPCA)
• Data is linear function of low-dimensional latent coordinates,
plus Gaussian noise.
Support Vector Machines (SVMs)
[Cortes; Vapnik; Schölkopf; others]

• Classic discriminative approach


• Formal notion of margin m, to aid generalization
• “Kernel trick” to give non-linear decision surfaces
m

r
Comparison
Generative Models Discriminative Models
+ Labels not essential - Need labels
+ Unsupervised or - Supervised only
supervised • Model only fits
• Models whole density decision
+ Interpretable result surface
- Can be hard to specify + Fast to evaluate
model structure + Can be very powerful
Detour

Deep Neural Networks for


Natural Image Classification
Deep Learning
• Big gains in performance in last few years on:
– Vision
– Audition
– Natural language processing

• Three ingredients:
1. Discriminative neural network models
(supervised training)
2. Big labeled datasets
3. Lots of computation
Computer Vision
• Image Recognition
– Input: Pixels
– Output: Class Label

Ground Truth
Model
Predictions

[Krizhevsky et al. NIPS 2012]


Convolutional Neural Networks
• LeCun et al. 1989
• Neural network with specialized
connectivity structure
Convolutional Neural Network
• Krizhevsky et al. [NIPS2012]
- 8 layer Convolutional network model [LeCun et al. ‘89]
- Trained on 1.2 million ImageNet images (with labels)
- GPU implementation (50x speedup over CPU)

• 7 hidden layers, 650,000 neurons, 60,000,000 parameters


• Trained on 2 GPUs for a week
Big Image Datasets

• Stanford Vision group [Deng et al. 2009]


• ~14 million labeled images, 20k classes
• Images gathered from Internet
• Human labels via Amazon Turk

[Deng et al. CVPR 2009]

• Microsoft + academic collaboration


• 2 million objects in natural settings
• Human labels via Amazon Turk
Powerful Hardware

• Deep neural nets highly amenable to implementation


on Graphics Processing Units (GPUs)
– Mainly matrix multiply, 2D convolution operations

• Latest generation
nVidia GPUs (Pascal)
deliver 10 TFlops / card
– Faster than fastest
super-computer
in world in 2000
ImageNet Performance over time
30
Convolutional
25 Neural Nets
Top-5 Classification Error (%)

20

15

10

0
2010 2011 2012 2013 2014 Human 2015

[Russakovsky et al. IJCV 2015]


Examples
• From Clarifai.com
Examples
• From Clarifai.com
Examples
• From Clarifai.com
Industry Deployment
• Widely used in Facebook, Google, Microsoft
• Face recognition, image search, photo
organization….
• Very fast at test time (~100 images/sec/GPU)

[Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face


Verification, CVPR’14]
Success of DeepNets

• ConvNets work great for other types of data:


– Medical imaging
– Speech spectrograms
– Particle physics traces

• Other types of deep neural nets (Recurrent


Nets) work well for natural language

• But need lots and lots of labeled data!!


End of Detour
Galaxy Morphology Classification
• https://www.galaxyzoo.org/

• Crowd-sourced
labels for
different galaxy
shapes

F i gu r e 1. T he Galaxy Zoo 2 decision t ree. Reproduced from Figure 1 in W illet t et al. (2013).
Galaxy Morphology Classification
[Rotation-invariant convolutional neural networks for galaxy morphology prediction,
Dieleman, Willett, Dambre, R. Astron. Soc. March 2015]

• Train ConvNet on Galaxy Zoo data/labels


– Won Kaggle competition
• Closely matches human performance
Direct Detection of Exoplanets
using the S4 Algorithm
[Spatio-Spectral Speckle Suppression]

Rob Fergus 1, David W. Hogg 2,


Rebecca Oppenheimer 3, Doug Brenner 3, Laurent Pueyo 4

1 Dept. of Computer Science, 2 Center for Cosmology 3 Dept. of Astrophysics 4 Space Telescope
Courant Institute, & Particle Physics, American Museum Science Institute
New York University Dept. of Physics, of Natural History
New York University
P1640 Data Cubes
• Each exposure gives 32 wavelength bands
(near IR 950-1770nm)

• Speckles are
diffraction artifacts

• Move radially with


wavelength

• Planet stationary
Use Polar Representation
• Speckles become diagonal structures
Wavelength
• Planet is vertical
– Key to separating the two
• Assume: independence to
angle and exposure
Wavelength

Wavelength
Three versions of S4

1. S4 Detect [Generative, PCA-based detection model]

2. DS4 Detect [Discriminative, SVM-based detection model]


• [Munandet, Schölkopf, Oppenhiemer, Nilsson, Veicht]

3. S4 Spectra [Generative, spectra estimation model]

• All use same representation


• Just different ML approach

• Lots of related algorithms (KLIP, LOCI etc.)


[Fergus et al., Astrophysical Journal, under review]
Leave-Out Strategy for Detection
(S4 Detect & DS4)

• Separate slices within annulus into train/test


• Train new model for each location

Machine
1. S4 Detect

[Fergus et al., Astrophysical Journal, under review]


S4 Detect PCA Model
• Trained for each location
S4 Detect Summary
• Build PCA basis on training set
• Fit PCA model to test patches
• Companion should appear in residual
• Correlate residual with (fixed) companion model
2. DS4 Detect

[Fergus et al., Astrophysical Journal, under review]


DS4 Detect Summary
• Generate training set
– Discriminative models need labeled examples
– Negative examples: take directly from data
– Positive examples: add artificial companion (different spectra)

Negative Example Positive Example


(image patch with no planet) (fake companion
realistic brightness
and spectra)

• Train Support Vector Machine (SVM)


• Use SVM on test patches to estimate p(companion|patch)
S4 Detect vs DS4 Comparison

Method Data Algorithm Detection


Principle Component
Background data Correlation between
S4 (speckle)
Analysis (Unsupervised
residual and template
learning)

Background data +
Support Vector Machine Prediction value of the
DS4 artificially generated
(Supervised learning) model
data
S4 Detect vs DS4 Detect
Relative brightness of companion vs speckle flux
3. S4 Spectra
True Generative Model for Spectra

• S4 Detect: spectrum of planet fixed (white)

• Now spectra is unknown

Wavelength
-- Treat as latent variable

Radius
• Observed data = PCA speckle model
+
Fixed (spatial) planet model with latent spectra

• Gaussian noise assumption


S4 Spectra Algorithm
1.2
-0.11
1.43
0.02 Estimated Spectrum
.
.

Basis * Speckle Basis *


Weights Planet Shape
(white)

Planet
Estimated Speckles + Estimated Planet Model
Speckle Model Radius

Wavelength

Reconstruction Observed data


Spectra of Fake Insertions
• Insert T4.5 standard 2MASS J0559-1404 at same strength as real
companions into HR8799 data
Spectra of HR8799 system
[R Oppenheimer et al., The Astrophysical Journal, April 2013.]
Finding Planets in Kepler 2.0 data
Foreman-Mackey, Montet, Hogg, et al. (arXiv:1502.04715)
4000 raw: 301 ppm
• Generative model of K2 data 2000
0

• Simultaneous fit of: − 2000


− 4000
10 20 30 40 50 60 70 80
– Planet: physics & geometry t ime [BJD - 2456808]

– Star: Gaussian Process


– CCD Noise: Poisson distribution
– Space-craft: Data-driven linear model
• 36 plant candidates, 18 confirmed planets
Finding Planets in Kepler 2.0 data
Foreman-Mackey, Montet, Hogg, et al. (arXiv:1502.04715)

Raw data
+ synthetic planet transit

PCA fit with


few components  systematics remain

PCA fit with


lots of components  systematics removed,
but transit signal attenuated

Simultaneous fit  systematics removed,


transit signal preserved.

36 plant candidates, 18 confirmed planets


Comparison
Generative Models Discriminative Models
+ Labels not essential - Need labels
+ Unsupervised or - Supervised only
supervised • Model only fits
• Models whole density decision
+ Interpretable result surface
- Can be hard to specify + Fast to evaluate
model structure + Can be very powerful
Final Thoughts

• Generative models feasible for many astronomy


problems
– Well understood signal formation process

• Discriminative models very powerful for other tasks


where input features must be learned too

• Use machine learning to help design the


coronograph itself
– To maximize discriminability of planet vs speckles
Depth from Defocus using a Coded Aperture
[Levin, Fergus, Durand, Freeman, SIGGRAPH 2007]

• Using generative model of natural images to


design shape of aperture mask
Single input image (shallow D.o.F)
– Maximize discriminability
between different defocus blur

Modified Canon lens PSF

Inferred
depth map
“Unified” Generative Model of Astronomical Images

• Unified
Bayesian
model

• Propagate
uncertainty
from pixels

• Physics-
informed
priors
Hogg & Fergus, NSF #1124794 “CDI: A Unified Probabilistic Model of Astronomical Imaging”
Detection of Planets

HR 8799 Input | S4 Output map


Algorithm Overview
• Exploit radial motion of speckles (vs wavelength)
– Build model in polar domain
– Speckle motion is now 1D
Joint Radius-Wavelength Model
• Speckles are diagonal structures
Wavelength
• Planet is vertical
– Key to separating the two
• Assume: independence to
angle and exposure
Wavelength

Wavelength
S4 Graphical Model
Speckle coeffs. Planet position Planet
spectrum
Coeff.
prior ϕ zi μi s

W xi pi g
Speckle
Planet shape
basis
Pixel j
Speckle image: from yji Planet image:
cube i pixel j
exposure i
Assume Gaussian distributions,
yields overall cost:
Approach
• Build statistical model of speckles
– Physical model of optics too complex

• Few exposures of a given star (5-10)


– Little data from which build model

• Need to exploit problem structure to yield


more samples of speckles
Spectral Estimation Error
• Function of radius & companion brightness
Spectra of HR8799 system
Comparison with Existing Spectrum of
HR8799b
Astronomy & Computer Vision

• Both fields concerned with images


– Astronomy images simpler than
natural scenes
– Some hope that generative models
could work

• Much work in vision on learning statistical models of


natural scenes
– Use as statistical priors for ill-posed or low S/N problems
– Lots of ways to apply these to astronomy images
Single Image Blind Deconvolution
R. Fergus, B. Singh, A. Hertzmann, S.T. Roweis & W.T. Freeman, SIGGRAPH 2006

• Uses prior on image gradients to regularize problem


Original Output
Close-up

Original Naïve Sharpening Our algorithm


Online Blind Deconvolution
• Remove blur due to atmospheric turbulence
• Alterative to “lucky imaging” (keep best few %)
Hirsch, Harmeling, Sra & Schölkopf, Astronomy & Astrophysics 2011
65
66
67
68
Plan
• Generative vs Discriminative modeling [12 mins]
– PCA & PPCA
– SVMs
– Deep Nets
• Examples of G & D modeling [10 mins]
– Galaxy Zoo
– Kepler DFM
• Examples of G &D modeling for direct imaging of
exoplanets [20 mins]
– S4 Detect
– S4 Discriminative
– S4 Spectra
Project 1640
• Hale Telescope @ Palomar, CA
• Integral Field Spectrometer, Coronagraph, Adaptive Optics

Hinkley et al. 2011c (PASP, 123, 74) [Slide: R. Oppenhiemer]


Integrated Field Spectrometer
Monochromatic 1330nm
light source

Broadband
white light
source

[Slide: R. Oppenhiemer] Hinkley et al. 2011c (PASP, 123, 74)


Data Matrix
(#angles – held out zone) * #exposures
(~30-300) (~10)

Annulus width (~20) # samples


*
# wavelengths (~30) # dimensions
*
Patch width in angle (~3)
Residual Error of PCA Model
Planet Model

• Use model of planet

y
• Obtained from
instrument calibration
(spatially invariant) x

• Spectra fixed:
Wavelength
assume white

Radius
Correlation with Planet Model

• Correlation between planet model & residual error

Residual (with planet) Residual (no planet)


Wavelength

Radius
Data Cubes
• Each exposure gives 32 wavelength bands
(near IR 950-1770nm)

• Speckles are
diffraction artifacts

• Move radially with


wavelength

• Planet stationary
Leave-Out Strategy

• Separate slices within


annulus into train/test
• Build speckle model
on train slices
– Lots of them:
~ #exposures * #angle
– Use patches with small
extent in angle

• Use model to
reconstruct test slices
Evaluation
• 10 exposures of star HR8799 from June 2012

• Compare to leading astronomy algorithms:


– LOCI (Local Combination Of Images)
Lafrenière et al. , The Astrophysical Journal, 660:770-780, May 2007
• Models speckles as linear combination of speckles
from other wavelengths/exposures

– KLIP: Detection and Characterization of Exoplanets and Disks using


Projections on Karhunen-Loeve Eigenimages, Remi Soummer et al.,
arXiv:1207.4197, July 2012
• PCA-based but does not exploit radius-wavelength structure
PCA Residuals for HR8799
Spectra of HR8799 Planets

You might also like