0% found this document useful (0 votes)

23 views8 pages

DMbook TOC1

The document is a comprehensive guide on the application of statistics, data mining, and machine learning in astronomy, specifically using Python for survey data analysis. It covers various topics including computational efficiency, statistical frameworks, exploratory data analysis, and machine learning techniques relevant to astronomical data. The book is structured into sections that provide theoretical foundations as well as practical examples and case studies.

Uploaded by

Umesh Chandra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views8 pages

DMbook TOC1

Uploaded by

Umesh Chandra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Statistics, Data Mining and Machine Learning in Astronomy:

A Practical Python Guide for the Analysis of Survey Data

Željko Ivezić, Andrew J. Connolly, Jacob T. VanderPlas

University of Washington

and Alex Gray

Georgia Institute of Technology
Contents

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

I Introduction 9

1 About the Book and Supporting Material 11

1.1 What do data mining, machine learning and knowledge discovery mean? . . . . . . 11
1.2 What is this book about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 An incomplete survey of the relevant literature . . . . . . . . . . . . . . . . . . . . 16
1.4 Introduction to the Python language and the Git code management tool . . . . . . 20
1.5 Description of surveys and data sets used in examples . . . . . . . . . . . . . . . . 21
1.6 Plotting and visualizing the data in this book . . . . . . . . . . . . . . . . . . . . . 38
1.7 How to eﬃciently use this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2 Fast Computation on Massive Data Sets 51

2.1 Data types and data management systems . . . . . . . . . . . . . . . . . . . . . . . 51
2.2 Analysis of algorithmic eﬃciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3 Seven types of computational problems . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4 Seven strategies for speeding things up . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Case Studies: Speedup Strategies in Practice . . . . . . . . . . . . . . . . . . . . . 58
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

II Statistical Frameworks and Exploratory Data Analysis 75

3 Probability and Statistical Distributions 77

3.1 Brief overview of probability and random variables . . . . . . . . . . . . . . . . . . 77
3.2 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3 Common univariate distribution functions . . . . . . . . . . . . . . . . . . . . . . . 91
3.4 The central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.5 Bivariate and multivariate distribution functions . . . . . . . . . . . . . . . . . . . 111
3.6 Correlation coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.7 Random number generation for arbitrary distributions . . . . . . . . . . . . . . . . 124
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

4 Classical Statistical Inference 129

2
Contents

4.1 Classical versus Bayesian statistical inference . . . . . . . . . . . . . . . . . . . . . 129

4.2 Maximum likelihood estimation (MLE) . . . . . . . . . . . . . . . . . . . . . . . . 130
4.3 The goodness-of-fit and model selection . . . . . . . . . . . . . . . . . . . . . . . . 137
4.4 ML applied to Gaussian mixtures: the Expectation Maximization algorithm . . . . 140
4.5 Confidence estimates: the bootstrap and jackknife . . . . . . . . . . . . . . . . . . 146
4.6 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.7 Comparison of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.8 Non-parametric modeling and histograms . . . . . . . . . . . . . . . . . . . . . . . 168
4.9 Selection eﬀects and luminosity function estimation . . . . . . . . . . . . . . . . . . 171
4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

5 Bayesian Statistical Inference 179

5.1 Introduction to the Bayesian Method . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.2 Bayesian priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.3 Bayesian parameter uncertainty quantification . . . . . . . . . . . . . . . . . . . . . 188
5.4 Bayesian model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.5 Non-uniform priors: Eddington, Malmquist and Lutz-Kelker biases . . . . . . . . . 194
5.6 Simple examples of Bayesian analysis: parameter estimation . . . . . . . . . . . . . 198
5.7 Simple examples of Bayesian analysis: model selection . . . . . . . . . . . . . . . . 229
5.8 Numerical methods for complex problems (MCMC) . . . . . . . . . . . . . . . . . . 236
5.9 Summary of pros and cons for classical and Bayesian methods . . . . . . . . . . . . 246
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

III Data Mining and Machine Learning 253

6 Searching for Structure in Point Data 255

6.1 Non-parametric density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
6.2 Nearest neighbor density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 263
6.3 Parametric density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
6.4 Finding clusters in data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
6.5 Correlation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
6.6 Which density estimation and clustering algorithms should I use? . . . . . . . . . . 287
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

7 Dimensionality and its Reduction 293

7.1 The curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
7.2 The data sets used in this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
7.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
7.4 Non-negative Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
7.5 Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
7.6 Independent Component Analysis and Projection Pursuit . . . . . . . . . . . . . . 318
7.7 Which dimensionality reduction technique should I use? . . . . . . . . . . . . . . . 320
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

3
Contents

8 Regression and Model Fitting 325

8.1 Formulation of the regression problem . . . . . . . . . . . . . . . . . . . . . . . . . 325
8.2 Regression for linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
8.3 Regularization and penalizing the likelihood . . . . . . . . . . . . . . . . . . . . . . 335
8.4 Principal component regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
8.5 Kernel regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
8.6 Locally linear regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
8.7 Non-Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
8.8 Uncertainties in the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
8.9 Regression that is robust to outliers . . . . . . . . . . . . . . . . . . . . . . . . . . 348
8.10 Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
8.11 Overfitting, underfitting, and cross-validation . . . . . . . . . . . . . . . . . . . . . 357
8.12 Which regression method should I use? . . . . . . . . . . . . . . . . . . . . . . . . . 366
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

9 Classification 371
9.1 Data Sets used in this Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
9.2 Assigning Categories: Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
9.3 Generative Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
9.4 K-Nearest-Neighbor Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
9.5 Discriminative Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
9.6 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
9.7 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
9.8 Evaluating Classifiers: ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 400
9.9 Which classifier should I use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406

10 Time Series Analysis 407

10.1 Main concepts for time series analysis . . . . . . . . . . . . . . . . . . . . . . . . . 408
10.2 Modeling toolkit for time series analysis . . . . . . . . . . . . . . . . . . . . . . . . 409
10.3 Analysis of periodic time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
10.4 Temporally localized signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
10.5 Analysis of stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
10.6 Which method should I use for time series analysis? . . . . . . . . . . . . . . . . . 469
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470

IV Appendices 473

A An Introduction to Scientific Computing with Python 475

A.1 A Brief History of Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
A.2 The Scipy Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
A.3 Getting Started with Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
A.4 IPython: basics of interactive computing . . . . . . . . . . . . . . . . . . . . . . . . 489
A.5 Introduction to Numpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
A.6 Visualization with Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496

4
Contents

A.7 Overview of Useful NumPy/SciPy Modules . . . . . . . . . . . . . . . . . . . . . . 499

A.8 Eﬃcient coding with Python and NumPy . . . . . . . . . . . . . . . . . . . . . . . 505
A.9 Wrapping existing code in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
A.10 Other Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509

B AstroML: Machine Learning for Astronomy 513

B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
B.2 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
B.3 Tools Included in AstroML v0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514

C Astronomical flux measurements and magnitudes 517

C.1 The definition of the specific flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
C.2 Wavelength window function for astronomical measurements . . . . . . . . . . . . 517
C.3 The astronomical magnitude systems . . . . . . . . . . . . . . . . . . . . . . . . . . 518

D SQL query for downloading SDSS data 521

E Approximating the Fourier Transform with the FFT 523

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526

5
Preface

Astronomy and astrophysics are witnessing dramatic increases in data volume as detectors, tele-
scopes and computers become ever more powerful. During the last decade, sky surveys across the
electromagnetic spectrum have collected hundreds of terabytes of astronomical data for hundreds
of millions of sources. Over the next decade, the data volume will enter the petabyte domain, and
provide accurate measurements for billions of sources. Astronomy and physics students are not
traditionally trained to handle such voluminous and complex data sets. Furthermore, standard
analysis methods employed in astronomy often lag far behind rapid progress in statistics and com-
puter science. The main purpose of this book is to help minimize the time it takes a student to
become an effective researcher.
This book provides the interface between astronomical data analysis problems and modern statis-
tical methods. It is aimed at physical and data-centric scientists who have an understanding of the
science drivers for analyzing large data sets but may not be aware of developments in statistical
techniques over the last decade. The book targets researchers who want to use existing methods for
analysis of large data sets, rather than those interested in the development of new methods. Theo-
retical discussions are limited to the minimum required to understand the algorithms. Nevertheless,
extensive and detailed references to relevant specialist literature are interspersed throughout the
book.
We present an example-driven compendium of modern statistical and data mining methods, to-
gether with carefully chosen examples based on real modern data sets, and of current astronomical
applications that will illustrate each method introduced in the book. The book is loosely organized
by practical analysis problems, and offers a comparative analysis of different techniques, including
discussions of the advantages and shortcomings of each method, and their scaling with the sample
size. The exposition of the material is supported by appropriate publicly available Python code
(available from the book website, rather than fully printed here) and data to enable a reader to
reproduce all the figures and examples, evaluate the techniques, and adapt them to their own field
of interest. To some extent, this book is an analog of the well-known Numerical Recipes book, but
aimed at the analysis of massive astronomical data sets, with more emphasis on modern tools for
data mining and machine learning, and with freely available code.
From the start, we desired to create a book which, in the spirit of reproducible research, would
allow readers to easily replicate the analysis behind every example and figure. We believe this
feature will make the book uniquely valuable as a practical guide. We chose to implement this
using Python, a powerful and flexible programming language that is quickly becoming a standard
in astronomy (a number of next-generation large astronomical surveys and projects use Python,
e.g., JVLA, ALMA, LSST). The Python code base associated with this book, called AstroML,
is maintained as a live web repository (GitHub), and is intended to be a growing collection of
well-documented and well-tested tools for astronomical research. Any astronomical researcher who

7
is currently developing software for analysis of massive survey data is invited and encouraged to
contribute their own tools to the code.
The target audience for this text includes senior undergraduate and graduate students in physics
and astronomy, as well as researchers using large data sets in a scientific context. Familiarity with
calculus and other basic mathematical techniques is assumed, but no extensive prior knowledge in
statistics is required (e.g., we assume that readers have heard before of the Gaussian distribution,
but not necessarily of the Lorentzian distribution). Though the examples in this book are aimed
at researchers in the fields of astronomy and astrophysics, the organization of the book allows for
easy mapping of relevant algorithms to problems from other fields. After the first introductory
Chapter, data organization and some aspects of fast computation are discussed in Chapter 2,
statistical foundations are reviewed in Chapters 3–5 (statistical distributions, maximum likelihood
and other classical statistics, and Bayesian methodology), exploratory data analysis is described
in Chapters 6 and 7 (Searching for Structure in Point Data; Dimensionality and its Reduction),
and data-based prediction methods are described in Chapters 8-10 (Regression and Model Fitting;
Classification; Time Series Analysis).
Finally, we are indebted to a number of colleagues whose careful reading and resulting comments
significantly improved this book. A summer study group consisting of Bob Abel, Yusra AlSayyad,
Lauren Anderson, Vaishali Bhardwaj, James Davenport, Alexander Fry, Bryce Kalmbach, and
David Westman identified many rough edges in the manuscript and tested the AstroML code.
We thank Alan Weinstein for help and advice with LIGO data, and Carl Carter-Schwendler for
motivational and expert discussions about Bayesian statistics. In addition, Tim Axelrod, Andy
Becker, Joshua Bloom, Tamás Budavári, David Hogg, Robert Lupton, Chelsea MacLeod, Lovro
Palaversa, Fernando Perez, Maria Süveges, Przemek Woźniak, and two anonymous reviewers pro-
vided extensive expert comments. Any remaining errors are entirely our own.
We dedicate this book to Cristin, Ian, Nancy, Pamela, Tom, and Vedrana for their support, encour-
agement, and understanding during the periods of intensive work and absent-mindedness along the
way to this finished text.

Authors, Seattle and Atlanta, 2012

Andrew B Lawson - Using R For Bayesian Spatial and Spatio-Temporal Health Modeling-CRC Press (2021)
No ratings yet
Andrew B Lawson - Using R For Bayesian Spatial and Spatio-Temporal Health Modeling-CRC Press (2021)
300 pages
Guide To Intelligent Data Analysis
No ratings yet
Guide To Intelligent Data Analysis
398 pages
Probability and Statistics For Machine Learning - A Textbook
No ratings yet
Probability and Statistics For Machine Learning - A Textbook
530 pages
1 All Notes G
No ratings yet
1 All Notes G
217 pages
Foundations of Machine
No ratings yet
Foundations of Machine
120 pages
Statistics For Geoscientists: Pieter Vermeesch
No ratings yet
Statistics For Geoscientists: Pieter Vermeesch
225 pages
(Chapman & Hall - CRC Computer Science & Data Analysis) Faraway, Julian James - Wang, Xiaofeng - Yue, Yu - Bayesian Regression Modeling With Inla-CRC Press (2018)
No ratings yet
(Chapman & Hall - CRC Computer Science & Data Analysis) Faraway, Julian James - Wang, Xiaofeng - Yue, Yu - Bayesian Regression Modeling With Inla-CRC Press (2018)
325 pages
Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
FDS Lecture Notes 2024 01 28
No ratings yet
FDS Lecture Notes 2024 01 28
217 pages
Statistical Methods For Data Science
100% (2)
Statistical Methods For Data Science
406 pages
Jamovi
100% (2)
Jamovi
519 pages
Introduction To Data Science and Statistical Thinking
No ratings yet
Introduction To Data Science and Statistical Thinking
384 pages
LectureNotes22 WI4455
No ratings yet
LectureNotes22 WI4455
154 pages
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition Scribd PDF Download
100% (12)
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition Scribd PDF Download
16 pages
Bayesian Statistical Methods
100% (10)
Bayesian Statistical Methods
288 pages
Statistics For Applied Science 200l
No ratings yet
Statistics For Applied Science 200l
122 pages
Introduction To Bayesian Methods in Ecology and Natural Resources Exclusive Download
100% (12)
Introduction To Bayesian Methods in Ecology and Natural Resources Exclusive Download
15 pages
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition High-Resolution PDF Download
100% (1)
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition High-Resolution PDF Download
16 pages
1
No ratings yet
1
130 pages
Fundamentals of Machine Learning
No ratings yet
Fundamentals of Machine Learning
97 pages
Bayesian Statistical Methods (Brian J. Reich, Sujit K. Ghosh)
No ratings yet
Bayesian Statistical Methods (Brian J. Reich, Sujit K. Ghosh)
288 pages
Computer Intensive Methods in Statistics
No ratings yet
Computer Intensive Methods in Statistics
227 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
TOBo ML
No ratings yet
TOBo ML
135 pages
Machine Learning
No ratings yet
Machine Learning
216 pages
Previewpdf
No ratings yet
Previewpdf
46 pages
Learning Book 11 Feb
No ratings yet
Learning Book 11 Feb
322 pages
178 HW 9
No ratings yet
178 HW 9
153 pages
Machine Learning and Data Mining Notes 1647447657
No ratings yet
Machine Learning and Data Mining Notes 1647447657
134 pages
Research 2 For Grade 8: Second Quarter Week 7
No ratings yet
Research 2 For Grade 8: Second Quarter Week 7
11 pages
178 HW 6
No ratings yet
178 HW 6
125 pages
001-2023-0929 DLMDSAS01 Course Book
No ratings yet
001-2023-0929 DLMDSAS01 Course Book
224 pages
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
Jguytibu
No ratings yet
Jguytibu
4 pages
Mclust
No ratings yet
Mclust
57 pages
PCML Notes
No ratings yet
PCML Notes
249 pages
Issues and Methods in Rorschach Research (Exner, John E) (Z-Library)
No ratings yet
Issues and Methods in Rorschach Research (Exner, John E) (Z-Library)
339 pages
WorkloadCharacterizationAndModeling 2005 Feitelson
No ratings yet
WorkloadCharacterizationAndModeling 2005 Feitelson
508 pages
Extra Lecturenotes Cs725
No ratings yet
Extra Lecturenotes Cs725
119 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
135 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Prob Toc
No ratings yet
Prob Toc
12 pages
Abdullah Hasan - Language Learning Teacher Strategies PDF
100% (1)
Abdullah Hasan - Language Learning Teacher Strategies PDF
114 pages
Interactive and Dynamic Graphics For Data Analysis
No ratings yet
Interactive and Dynamic Graphics For Data Analysis
169 pages
Distribution System
No ratings yet
Distribution System
103 pages
Machine Learning Algorithms Applications and Practices in Data Science PDF
No ratings yet
Machine Learning Algorithms Applications and Practices in Data Science PDF
113 pages
Thesis Final Version Wenke Zhang
No ratings yet
Thesis Final Version Wenke Zhang
170 pages
Statistics 152
No ratings yet
Statistics 152
236 pages
Chapter-3-Lesson-1-Practical Research 1
No ratings yet
Chapter-3-Lesson-1-Practical Research 1
16 pages
Exercises
No ratings yet
Exercises
69 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Statistics and Probability: Quarter 3 - Module 7 T-Distribution and Percentiles Using The T-Table
100% (5)
Statistics and Probability: Quarter 3 - Module 7 T-Distribution and Percentiles Using The T-Table
19 pages
Science 7 DLL
100% (1)
Science 7 DLL
4 pages
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
134 pages
Stress Testing Market Risk
No ratings yet
Stress Testing Market Risk
18 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Mastering Python Advanced Concepts and Practical Applications
From Everand
Mastering Python Advanced Concepts and Practical Applications
Aissa Younes
No ratings yet
Epidemiology and Research Review: Dr. Galvez PLM Notes
No ratings yet
Epidemiology and Research Review: Dr. Galvez PLM Notes
171 pages
STATISTICS 2015 To 2024
No ratings yet
STATISTICS 2015 To 2024
17 pages
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
BUDGET OF WORK GRADE 10 4th Quarter
No ratings yet
BUDGET OF WORK GRADE 10 4th Quarter
7 pages
The Impact of Project Management Methodologies On Project Success: A Case Study of The Oil and Gas Industry in The Kingdom of Bahrain
100% (1)
The Impact of Project Management Methodologies On Project Success: A Case Study of The Oil and Gas Industry in The Kingdom of Bahrain
11 pages
Capital Budgeting
No ratings yet
Capital Budgeting
30 pages
Coc
No ratings yet
Coc
47 pages
Gage R R
No ratings yet
Gage R R
8 pages
Financing Int Trade-1
No ratings yet
Financing Int Trade-1
25 pages
Lesson 3
No ratings yet
Lesson 3
16 pages
Sheet Metal Springback
No ratings yet
Sheet Metal Springback
6 pages
Y (Xom Price) Xi1 (Interest Rate) Xi2 (Oil Price) Xi3 (Value of S&P 500 Index
No ratings yet
Y (Xom Price) Xi1 (Interest Rate) Xi2 (Oil Price) Xi3 (Value of S&P 500 Index
19 pages
By Sudarshana Bhat Asst General Manager Corporation Bank: Exchange Rate Mechanism
No ratings yet
By Sudarshana Bhat Asst General Manager Corporation Bank: Exchange Rate Mechanism
42 pages
Quasi Experimental Designs
No ratings yet
Quasi Experimental Designs
56 pages
0975 Data Science and Machine Learning
No ratings yet
0975 Data Science and Machine Learning
6 pages
L15 Testing of Hypothesis
No ratings yet
L15 Testing of Hypothesis
42 pages
Financing International Trade
No ratings yet
Financing International Trade
19 pages
CHM256 - Tutorial 2
No ratings yet
CHM256 - Tutorial 2
3 pages
Hybrid Math 11 Stat Q1 M2 W2 V2
No ratings yet
Hybrid Math 11 Stat Q1 M2 W2 V2
13 pages
Sampling Distributions of Sample Means and Proportions PDF
No ratings yet
Sampling Distributions of Sample Means and Proportions PDF
14 pages
As Level Statistics 2022
No ratings yet
As Level Statistics 2022
5 pages
PISA 2024 Science Strategic Vision Proposal
No ratings yet
PISA 2024 Science Strategic Vision Proposal
28 pages
Annual Report 2017-18
No ratings yet
Annual Report 2017-18
116 pages
Research Bias by Dr. Virendra Singh Choudhary
No ratings yet
Research Bias by Dr. Virendra Singh Choudhary
3 pages
Financial Inclusion PNB
No ratings yet
Financial Inclusion PNB
25 pages
EC2303 Additional Practice Questions
No ratings yet
EC2303 Additional Practice Questions
7 pages
T, F and Chisquare Table
No ratings yet
T, F and Chisquare Table
7 pages
Research Methodology PH.D Entrance
No ratings yet
Research Methodology PH.D Entrance
2 pages
Xiii Xiv Contents: 2 Probability Distributions 67
No ratings yet
Xiii Xiv Contents: 2 Probability Distributions 67
6 pages
Xiii Xiv Contents: 2 Probability Distributions 67
No ratings yet
Xiii Xiv Contents: 2 Probability Distributions 67
6 pages
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Fintech Environment and Funding Activity in India
No ratings yet
Fintech Environment and Funding Activity in India
10 pages
231
No ratings yet
231
10 pages
Stats 1
No ratings yet
Stats 1
6 pages
Stats 2
No ratings yet
Stats 2
6 pages
Preface VII Mathematical Notation Xi Contents Xiii
No ratings yet
Preface VII Mathematical Notation Xi Contents Xiii
6 pages
Theoretical Analysis of The Stability of Slopes: BAKER, R. & GARBER, M. (1978) - 4, 395-41
No ratings yet
Theoretical Analysis of The Stability of Slopes: BAKER, R. & GARBER, M. (1978) - 4, 395-41
17 pages
Uantum Mechanics A L A Irac T S - G Q
No ratings yet
Uantum Mechanics A L A Irac T S - G Q
10 pages
JAIIB N I Act 1881
No ratings yet
JAIIB N I Act 1881
42 pages
Overview On Risk in Int Bsns
No ratings yet
Overview On Risk in Int Bsns
34 pages
Expert System For Failure Analysis On Leading Edge Flap and Slat Position Indicating System Boeing 737ng
No ratings yet
Expert System For Failure Analysis On Leading Edge Flap and Slat Position Indicating System Boeing 737ng
11 pages
Reaction Paper 1
No ratings yet
Reaction Paper 1
2 pages
TFY4250 FY2045 Lecture Notes 14 Time Dep
No ratings yet
TFY4250 FY2045 Lecture Notes 14 Time Dep
18 pages
Basic Aspects of Superconductivity: By-Poonam Kumari Guide: Prof. S. Ramakrishnan TIFR, Mumbai
No ratings yet
Basic Aspects of Superconductivity: By-Poonam Kumari Guide: Prof. S. Ramakrishnan TIFR, Mumbai
24 pages
A Brief Survey of Sir Isaac Newton's Views On Religion
No ratings yet
A Brief Survey of Sir Isaac Newton's Views On Religion
12 pages
Global Liquidity and Financial Contagion
No ratings yet
Global Liquidity and Financial Contagion
10 pages
Modelling Stress Scenarios: Sanjay Basu, NIBM, November 2012
No ratings yet
Modelling Stress Scenarios: Sanjay Basu, NIBM, November 2012
10 pages
Lecture 16: Friedman's Challenge To Keynes Money Demand and Labour Supply Curve
No ratings yet
Lecture 16: Friedman's Challenge To Keynes Money Demand and Labour Supply Curve
7 pages
Anova: Descriptives
No ratings yet
Anova: Descriptives
8 pages

DMbook TOC1

Uploaded by

DMbook TOC1

Uploaded by

Statistics, Data Mining and Machine Learning in Astronomy:

A Practical Python Guide for the Analysis of Survey Data

Željko Ivezić, Andrew J. Connolly, Jacob T. VanderPlas

and Alex Gray

1 About the Book and Supporting Material 11

2 Fast Computation on Massive Data Sets 51

II Statistical Frameworks and Exploratory Data Analysis 75

3 Probability and Statistical Distributions 77

4 Classical Statistical Inference 129

4.1 Classical versus Bayesian statistical inference . . . . . . . . . . . . . . . . . . . . . 129

5 Bayesian Statistical Inference 179

III Data Mining and Machine Learning 253

6 Searching for Structure in Point Data 255

7 Dimensionality and its Reduction 293

8 Regression and Model Fitting 325

10 Time Series Analysis 407

A An Introduction to Scientific Computing with Python 475

A.7 Overview of Useful NumPy/SciPy Modules . . . . . . . . . . . . . . . . . . . . . . 499

B AstroML: Machine Learning for Astronomy 513

C Astronomical flux measurements and magnitudes 517

D SQL query for downloading SDSS data 521

E Approximating the Fourier Transform with the FFT 523

Authors, Seattle and Atlanta, 2012

You might also like