Building Statistical Models in Python 1st Edition Anonymous
Building Statistical Models in Python 1st Edition Anonymous
com
https://ebookmeta.com/product/building-statistical-models-
in-python-1st-edition-anonymous/
OR CLICK HERE
DOWLOAD NOW
https://ebookmeta.com/product/statistical-learning-with-math-and-
python-100-exercises-for-building-logic-joe-suzuki/
ebookmeta.com
https://ebookmeta.com/product/python-for-beginners-11th-
edition-2022-anonymous/
ebookmeta.com
https://ebookmeta.com/product/statistical-analysis-with-swift-data-
sets-statistical-models-and-predictions-on-apple-platforms-andersson/
ebookmeta.com
Sons At War The True Story Of Two Young Men Destined From
Birth To Collide In Death First Edition Jane Sweetland
https://ebookmeta.com/product/sons-at-war-the-true-story-of-two-young-
men-destined-from-birth-to-collide-in-death-first-edition-jane-
sweetland/
ebookmeta.com
Black Light Rocked Black Light Series Book 1 1st Edition
Livia Grant
https://ebookmeta.com/product/black-light-rocked-black-light-series-
book-1-1st-edition-livia-grant-2/
ebookmeta.com
https://ebookmeta.com/product/occultism-and-the-origins-of-
psychoanalysis-1st-edition-maria-pierri/
ebookmeta.com
https://ebookmeta.com/product/the-crying-cave-killings-the-yorkshire-
murders-3-1st-edition-wes-markin/
ebookmeta.com
https://ebookmeta.com/product/the-gun-debate-what-everyone-needs-to-
know-philip-j-cook/
ebookmeta.com
Mathematical Methods Units 1 2 for Queensland 1st Edition
Michael Evans
https://ebookmeta.com/product/mathematical-methods-units-1-2-for-
queensland-1st-edition-michael-evans/
ebookmeta.com
Building Statistical Models
in Python
Paul N Adams
Stuart J Miller
BIRMINGHAM—MUMBAI
Building Statistical Models in Python
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, without the prior written permission of the publisher, except in the case
of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held
liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and
products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot
guarantee the accuracy of this information.
To my daughter, Lydie, for demonstrating how work and dedication regenerate inspiration and
creativity. To my wife, Helene, for her love and support.
– Paul Adams
I would like to thank my wife Anita and daughter Ananya for giving me the time and space to review
this book.
Karthik Dulam is a Principal Data Scientist at EDB. He is passionate about all things data with a
particular focus on data engineering, statistical modeling, and machine learning. He has a diverse
background delivering machine learning solutions for the healthcare, IT, automotive, telecom, tax,
and advisory industries. He actively engages with students as a guest speaker at esteemed universities
delivering insightful talks on machine learning use cases.
I would like to thank my wife, Sruthi Anem, for her unwavering support and patience. I also want to
thank my family, friends, and colleagues who have played an instrumental role in shaping the person I
am today. Their unwavering support, encouragement, and belief in me have been a constant source of
inspiration.
Table of Contents
Preface xiii
2
Distributions of Data 19
Technical requirements 19 Measuring shape 38
Understanding data types 20 The normal distribution and central
Nominal data 20 limit theorem 42
Ordinal data 21 The Central Limit Theorem 45
Interval data 21
Ratio data 22
Bootstrapping 45
Visualizing data types 22 Confidence intervals 46
Standard error 51
Measuring and describing Correlation coefficients (Pearson’s correlation) 51
distributions 26
Measuring central tendency 26
Permutations 52
Measuring variability 33 Permutations and combinations 52
Permutation testing 55
viii Table of Contents
Transformations 57 References 59
Summary 59
3
Hypothesis Testing 61
The goal of hypothesis testing 61 Basics of the z-test – the z-score,
Overview of a hypothesis test for the mean 62 z-statistic, critical values, and p-values 65
Scope of inference 62 The z-score and z-statistic 65
Hypothesis test steps 63 A z-test for means 72
z-test for proportions 78
Type I and Type II errors 63
Power analysis for a two-population pooled
Type I errors 63 z-test 82
Type II errors 64
Summary 85
4
Parametric Tests 87
Assumptions of parametric tests 87 Tests with more than two groups and
Normally distributed population data 88 ANOVA 114
Equal population variance 99 Multiple tests for significance 114
ANOVA 117
T-test – a parametric hypothesis test 102
Pearson’s correlation coefficient 118
T-test for means 103
Power analysis examples 123
Two-sample t-test – pooled t-test 108
Two-sample t-test – Welch’s t-test 111 Summary 124
Paired t-test 112 References 124
5
Non-Parametric Tests 125
When parametric test assumptions The test statistic procedure 128
are violated 125 Normal approximation 129
Permutation tests 126 Rank-Sum example 129
7
Multiple Linear Regression 173
Multiple linear regression 173 Ridge regression 189
Adding categorical variables 175 LASSO regression 192
Evaluating model fit 176 Elastic Net 194
Interpreting the results 181 Dimension reduction 196
Feature selection 184 PCA – a hands-on introduction 196
Statistical methods for feature selection 184 PCR – a hands-on salary prediction study 199
Performance-based methods for feature Summary 202
selection 186
Recursive feature elimination 187
9
Discriminant Analysis 225
Bayes’ theorem 225 Linear Discriminant Analysis 229
Probability 225 Supervised dimension reduction 236
Conditional probability 227
Quadratic Discriminant Analysis 238
Discussing Bayes’ Theorem 228
Summary 244
11
ARIMA Models 271
Technical requirements 271 Models for non-stationary
Models for stationary time series 272 time series 295
Autoregressive (AR) models 272 ARIMA models 296
Moving average (MA) models 283 Seasonal ARIMA models 304
Autoregressive moving average (ARMA)
More on model evaluation 311
models 287
Summary 318
References 319
12
Multivariate Time Series 321
Multivariate time series 321 Step 2 – selecting the order of AR(p) 339
Time-series cross-correlation 322 Step 3 – assessing cross-correlation 340
Step 4 – building the VAR(p,q) model 344
ARIMAX 326 Step 5 – testing the forecast 346
Preprocessing the exogenous variables 328 Step 6 – building the forecast 347
Fitting the model 329
Assessing model performance 333 Summary 349
References 349
VAR modeling 335
Step 1 – visual inspection 338
14
Survival Models 361
Technical requirements 361 Cox Proportional Hazards
Kaplan-Meier model 362 regression model 372
Model definition 362 Step 1 374
Model example 364 Step 2 375
Step 3 379
Exponential model 368 Step 4 380
Model example 370 Step 5 383
Summary 384
Index 385
• An introduction to statistics
• Regression models
• Classification models
• Time series models
• Survival analysis
Understanding the tools provided in these sections will provide the reader with a firm foundation
from which further independent growth in the statistics domain can more easily be achieved.
xiv Preface
• Industry professionals with limited statistical or programming knowledge who would like to
learn to use data for testing hypotheses they have in their business domain
• Data analysts and scientists who wish to broaden their statistical knowledge and find a set of
tools and their implementations for performing various data-oriented tasks
The ground-up approach of this book seeks to provide entry into the knowledge base for a wide
audience and therefore should neither discourage novice-level practitioners nor exclude advanced-
level practitioners from the benefits of the materials presented.
No.
References
of Remarks,
(New Modern
New Testament Sheet References, and
Testament Identifi‐
Name. on No. of Sheet on
and cation.
⅜-in. Large Map.
Josephus).
Map.