Using Models To Explore

Uploaded by

anubavroshan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views17 pages

Using Models To Explore

Uploaded by

anubavroshan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Using Models to Explore

Your Data
Using Models to Explore
Your Data
• model is something we construct to help us understand the
real world.
• A common example is the use of an animal which mimics a
human disease to help us understand, and hopefully,prevent
and/or treat the disease.
• The same concept applies to a set of data–presumably you
are using the data to understand the real world.
• In the world of politics a pollster has a dataset on a sample of
likely voters and the pollster’s job is to use this sample to
predict the election outcome.
• data analyst uses the polling data to construct a model to
predict what will happen on election day.
• The process of building a model involves imposing a specific
structure on the data and creating a summary of the data.
• In the polling data example, you may have thousands of
observations, so the model is a mathematical equation that
reflects the shape or pattern of the data, and the equation
allows you to summarize the thousands of observations with,
for example, one number, which might be the percentage of
voters who will vote for your candidate.
• A statistical model serves two key purposes in a data analysis,
 which are to provide a quantitative summary of your data
 and to impose a specific structure on the population from
which the data were sampled.
• Imagine you wanted to conduct a survey of 20
people to ask them how much they’d be willing to
spend on a product you’re developing.
• What is the goal of this survey? Probably, if you’re
spending time and money developing a new product,
you believe that there is a large population of people
out there who are willing to buy this product.
• However, it’s far too costly and complicated to ask
everyone in that population what they’d be willing to
pay.
• So you take a sample from that population to get a
sense of what the population would pay.
• One of us (Roger) recently published a book titled R
Programming for Data Science1.
• Before the book was published,interested readers could
submit their name and email address to the book’s web site
to be notified about the books publication.
• In addition, there was an option to specify how much they’d
be willing to pay for the book.
• Below is a random sample of 20 response from people who
volunteered this information.
• 25 20 15 5 30 7 5 10 12 40 30 30 10 25 10 20 10
10 25 5
• “What do the data say?” One thing you could do is simply
hand over the data—all 20 numbers.
• The first key element of a statistical model is data reduction.
• The basic idea is you want to take the original set of numbers
consisting of your dataset and transform them into a smaller set
of numbers.
• The process of data reduction typically ends up with a statistic.
• Statistic is any summary of the data. The sample mean, or
average,is a statistic.
• So is the median, the standard deviation, the maximum, the
minimum, and the range.
• Some statistics are more or less useful than others but they are
all summaries of the data.
• simplest data reduction you can produce is the mean, or the
simple arithmetic average, of the data, which in this case is
$17.2.
• Going from 20 numbers to 1 number is about as much
reduction.
1.Models as Expectations:
• simple summary statistic, such as the mean of a set of
numbers, is not enough to formulate a model.
• A statistical model must also impose some structure on the
data.
• statistical model provides a description of how the world
works and how the data were generated.
• The model is essentially an expectation of the relationships
between various factors in the real world and in your dataset.
Applying the normal model:
• Perhaps the most popular statistical model in the world is the
Normal model.
• This model says that the randomness in a set of data can be
explained by the Normal distribution, or a bell-shaped curve.
• The Normal distribution is fully specified by two parameters—
the mean and the standard deviation.
• To apply the Normal model to this dataset, we just need to calculate the
mean and standard deviation.
• In this case, the mean is $17.2 and the standard deviation is $10.39.
• Given those parameters, our expectation under the Normal model is that
the distribution of prices that people are willing to pay looks something
like this.
• According to the model, about 68% of the population would be willing to
pay somewhere between $6.81 and $27.59 for this new product. Whether
that is useful information or not depends on the specifics of the situation
• use the statistical model to answer more complex questions if you
want.
• For example, suppose you wanted to know “What proportion of
the population would be willing to pay more than $30 for this
book?”
pnorm(30, mean = mean(x), sd = sd(x), lower.tail = FALSE)
[1] 0.1089893
• So about 11% of the population would be willing to pay more than
$30 for the product.
• Again, whether this is useful to you depends on your specific goals.
• we used the data to draw the picture (to calculate the mean and
standard deviation of the Normal distribution), but ultimately the
data do not appear directly in the plot.
• In this case we are using the Normal distribution to tell us what
the population looks like, not what the data look like.
• The Normal distribution is our expectation for what the data
should look like.
2.Comparing Model Expectations to Reality:
• How do we know if our expectations match with reality?
Drawing a fake picture
• To begin with we can make some pictures, like a histogram of the data
• 20 data points from a Normal distribution and overlaid the theoretical
Normal curve on top of the histogram.
• If the population followed roughly a Normal distribution, and
the data were a random sample from that population, then
the distribution estimated by the histogram should look like
the theoretical model provided by the Normal distribution.
• Normal distribution is a good statistical model for the data.
• Normal distribution allows for negative values, but we don’t
really expect that people will say that they’d be willing to pay
negative dollars for a book.
The real picture:
a histogram of the data from the sample of 20
respondents.
At first glance, it looks like the histogram and the
Normal distribution don’t match very well.
The histogram has a large spike around $10, a feature
that is not present with the blue curve.
Also, the Normal distribution allows for negative values
on the left-hand side of the plot, but there are no data
points in that region of the plot.
• Normal model isn’t really a very good
representation of the population given the
data that we sampled from the population.
3.Reacting to Data: Refining Our Expectations:
model and the data don’t match very well, as
was indicated by the histogram above.
So what do do? Well, we can either
1. Get a different model; or
2. Get different data
Or we could do both.
• will choose a different statistical model to represent
the population, the Gamma distribution.
• This distribution has the feature that it only allows
positive values, so it eliminates the problem we had
with negative values with the Normal distribution.
• do the following:
• 1. Develop expectations: Draw a fake picture—what
do we expect to see before looking at the data?
• 2. Compare our expectations to the data
• 3. Refine our expectations, given what the data show
4.Examining Linear Relationships :
understand linear relationships between variables
of interest. The most common statistical
technique to help with this task is linear
regression.
1. developing expectations,
2. comparing our expectations to data,
3. refining our expectations—to the application
of linear regression
5.When Do We Stop?
 In some cases, a single iteration may be sufficient, but in
most real-life cases, you’ll need to iterate at least a few
times.
 might be able to iterate over and over again.
 every answer will usually raise more questions and
require further digging into the data.
 When exactly do you stop the process then? Statistical
theory suggests a number of different approaches to
determining when a statistical model is “good enough”
and fits the data well.
 a few high-level criteria to determine when you might
consider stopping the data analysis iteration.
Summary:
 first set your expectations for a how a model
should characterize a dataset before you actually
apply a model to data.
 Then you can check to see how your model
conforms to your expectation.
 Often, there will be features of the dataset that do
not conform to your model and you will have to
either refine your model or examine the data
collection process.

GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
155 pages
Solved Problems of Estimation Population Mean and Sample Size
100% (1)
Solved Problems of Estimation Population Mean and Sample Size
4 pages
Ten Big Statistical Ideas in Research
100% (1)
Ten Big Statistical Ideas in Research
32 pages
Slides of Discovering Statistics Using SPSS by Muhammad Yousaf Abid. Iqra University Islamabad.
No ratings yet
Slides of Discovering Statistics Using SPSS by Muhammad Yousaf Abid. Iqra University Islamabad.
31 pages
Chapter2 BI
No ratings yet
Chapter2 BI
77 pages
Ds 5 Marks Final
No ratings yet
Ds 5 Marks Final
11 pages
Chapter 08 Inference
No ratings yet
Chapter 08 Inference
34 pages
Lecture 1
No ratings yet
Lecture 1
12 pages
Data Science Dse
No ratings yet
Data Science Dse
24 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Data Science and Visualization
No ratings yet
Data Science and Visualization
37 pages
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
No ratings yet
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
76 pages
(Ebook PDF) Statistics Learning From Data by Roxy Peck Download
100% (1)
(Ebook PDF) Statistics Learning From Data by Roxy Peck Download
58 pages
Intro To Data: Science
No ratings yet
Intro To Data: Science
156 pages
Das FFFF
No ratings yet
Das FFFF
16 pages
Modules 4 6 1
No ratings yet
Modules 4 6 1
30 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Statistics and Probability: Measures of Central Tendency
No ratings yet
Statistics and Probability: Measures of Central Tendency
19 pages
Week 05 - Introduction To Statistics
No ratings yet
Week 05 - Introduction To Statistics
42 pages
Stats Lect
No ratings yet
Stats Lect
77 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Master Class Data Uses 100712
No ratings yet
Master Class Data Uses 100712
69 pages
03 - Statistics Foundations Part 2
No ratings yet
03 - Statistics Foundations Part 2
73 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
Estima
No ratings yet
Estima
378 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
Ass-3 Ds
No ratings yet
Ass-3 Ds
7 pages
SCS3250A - Module 1 - Introduction To Statistics and Analytics
No ratings yet
SCS3250A - Module 1 - Introduction To Statistics and Analytics
44 pages
Unit - II - Part I - Importance of Statistics in Data Science
No ratings yet
Unit - II - Part I - Importance of Statistics in Data Science
10 pages
Statistics Probabilty
No ratings yet
Statistics Probabilty
92 pages
-ch05
No ratings yet
-ch05
124 pages
The Role of Statistics in Engineering
No ratings yet
The Role of Statistics in Engineering
37 pages
E-Note 33325 Content Document 20250319114322AM
No ratings yet
E-Note 33325 Content Document 20250319114322AM
69 pages
Data Science 2
No ratings yet
Data Science 2
8 pages
Prelim Coverage
No ratings yet
Prelim Coverage
6 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
1.1 CS3352-FDS - Unit 1
No ratings yet
1.1 CS3352-FDS - Unit 1
42 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
157 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
Build ETL Using Python
No ratings yet
Build ETL Using Python
7 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
156 pages
Intro To Statistics and Assignments
No ratings yet
Intro To Statistics and Assignments
12 pages
Ms Data Science S, 24 (WEEK# 1)
No ratings yet
Ms Data Science S, 24 (WEEK# 1)
30 pages
Ms Data Science S, 24 (WEEK# 1) Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 1) Unlock
31 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
0000 Statistics Lecture 1
No ratings yet
0000 Statistics Lecture 1
25 pages
Role of Statistics in Engineering - OMPAD
No ratings yet
Role of Statistics in Engineering - OMPAD
15 pages
Data Science
No ratings yet
Data Science
62 pages
Chapter1 Introduction To Statistics
No ratings yet
Chapter1 Introduction To Statistics
27 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
Important Questions
No ratings yet
Important Questions
26 pages
Math 140 Final Review Notes
No ratings yet
Math 140 Final Review Notes
20 pages
DSOST2
No ratings yet
DSOST2
44 pages
R Lang-Unit-04
No ratings yet
R Lang-Unit-04
12 pages
MLCourse Slides
No ratings yet
MLCourse Slides
427 pages
ML Unit-II Notes
No ratings yet
ML Unit-II Notes
86 pages
MLCourse Slides
No ratings yet
MLCourse Slides
356 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Inference A Primer
No ratings yet
Inference A Primer
19 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Exploratory Data Analysis, Inference, Interpretation
No ratings yet
Exploratory Data Analysis, Inference, Interpretation
45 pages
CP Question Paper
No ratings yet
CP Question Paper
2 pages
SMDM Project
No ratings yet
SMDM Project
16 pages
Data Science EDA MCQs Document
No ratings yet
Data Science EDA MCQs Document
24 pages
Factor Analysis True/False Questions
100% (1)
Factor Analysis True/False Questions
3 pages
Measures of Dispersion or Variability Range Variance Standard Deviation
No ratings yet
Measures of Dispersion or Variability Range Variance Standard Deviation
12 pages
Descriptive Statistics
100% (3)
Descriptive Statistics
41 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
20190720221926D3611 6-7
No ratings yet
20190720221926D3611 6-7
33 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
16 pages
Aditya Garg DMDW
No ratings yet
Aditya Garg DMDW
40 pages
Statistical Quality Control 2
No ratings yet
Statistical Quality Control 2
34 pages
STAT 830 Bayesian Estimation: Richard Lockhart
No ratings yet
STAT 830 Bayesian Estimation: Richard Lockhart
23 pages
Soehartini Toemiran - Analisis Faktor Lingkungan Tempat Perindukan Nyamuk Dan Kebiasaan Keluar Malam
No ratings yet
Soehartini Toemiran - Analisis Faktor Lingkungan Tempat Perindukan Nyamuk Dan Kebiasaan Keluar Malam
11 pages
SNM 1 - (II) T Distribution
No ratings yet
SNM 1 - (II) T Distribution
21 pages
Module On Test of Hypothesis
No ratings yet
Module On Test of Hypothesis
9 pages
Question Text: Find The Sample Size Given 99% Confidence, Margin of Error 0.11 and ̂P 0.35. Answer
100% (1)
Question Text: Find The Sample Size Given 99% Confidence, Margin of Error 0.11 and ̂P 0.35. Answer
20 pages
Dmda Mid-2 Assignment - 20.11.2024
No ratings yet
Dmda Mid-2 Assignment - 20.11.2024
2 pages
Business Statistics MBA IB (2024-27)
No ratings yet
Business Statistics MBA IB (2024-27)
6 pages
Econometric Modeling: Model Specification and Diagnostic Testing
No ratings yet
Econometric Modeling: Model Specification and Diagnostic Testing
52 pages
Risk Anlytics - Tutorial - w14+15
No ratings yet
Risk Anlytics - Tutorial - w14+15
33 pages
Kathryn Anne Haskard - PHD Thesis - An Anisotropic Matern Spatial Covariance Model - REML Estimation-Desbloqueado
No ratings yet
Kathryn Anne Haskard - PHD Thesis - An Anisotropic Matern Spatial Covariance Model - REML Estimation-Desbloqueado
208 pages
Python, AI, ML, DL Overview
No ratings yet
Python, AI, ML, DL Overview
4 pages
Lehmann Scheffe PDF
100% (1)
Lehmann Scheffe PDF
7 pages
O-Level Statistics (4040) - Quiz Level 2
No ratings yet
O-Level Statistics (4040) - Quiz Level 2
21 pages
Shortcut (Solution)
No ratings yet
Shortcut (Solution)
3 pages
Further Mathematics 2019: Unit 3 & 4: Examples Answered
No ratings yet
Further Mathematics 2019: Unit 3 & 4: Examples Answered
30 pages
Calculating Basic Statistical Procedures in SPSS
No ratings yet
Calculating Basic Statistical Procedures in SPSS
161 pages
T.R. Jain and V.K. Ohri Solutions For Economics CBSE Class 11 Commerce, Chapter 7 Frequency Diagrams, Histograms, Polygon and Ogive - TopperLearning
67% (3)
T.R. Jain and V.K. Ohri Solutions For Economics CBSE Class 11 Commerce, Chapter 7 Frequency Diagrams, Histograms, Polygon and Ogive - TopperLearning
22 pages
Mock Exam 2
No ratings yet
Mock Exam 2
2 pages
Matrix of Curriculum Standards (Competencies), With Corresponding Recommended Flexible Learning Delivery Mode and Materials Per Grading Period
100% (4)
Matrix of Curriculum Standards (Competencies), With Corresponding Recommended Flexible Learning Delivery Mode and Materials Per Grading Period
5 pages

Using Models To Explore

Uploaded by

Using Models To Explore

Uploaded by

Using Models to Explore

You might also like