0% found this document useful (0 votes)

10 views108 pages

What Is Statistics?: Definition of Statistics Statistics

Statistics is the science of collecting, organizing, analyzing, and interpreting data to make decisions, with two main branches: descriptive and inferential statistics. Key concepts include scatterplots for displaying relationships, correlation for measuring linear associations, and the normal distribution, which has important properties and applications in various fields. Additionally, concepts like expected value, covariance, and sampling methods are essential for understanding and applying statistical principles.

Uploaded by

Pinjala Anoop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views108 pages

What Is Statistics?: Definition of Statistics Statistics

Uploaded by

Pinjala Anoop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

What is Statistics?

Definition of Statistics
– Statistics is the science of collecting, organizing, analyzing,
and interpreting data in order to make a decision.

• Branches of Statistics
– The study of statistics has two major branches –
descriptive(exploratory) statistics and inferential statistics.
• Descriptive statistics is the branch of statistics that
involves the organization, summarization, and display of
data.
• Inferential statistics is the branch of statistics that
involves using a sample to draw conclusions about
population. A basic tool in the study of inferential statistics
is probability.
Scatterplots and Correlation
• Displaying relationships: Scatterplots

• Interpreting scatterplots

• Adding categorical variables to scatterplots

• Measuring linear association: correlation r

• Facts about correlation

• Response variable measures an outcome of a
study.
• An explanatory variable explains, influences or
cause changes in a response variable.
• Independent variable and dependent variable.
• WARNING: The relationship between two
variables can be strongly influenced by other
variables that are lurking in the background.
• Note: There is not necessary to have a cause-and-effect
relationship between explanatory and response
variables.
• Example. Sales of personal computers and athletic shoes
Example - 1
Definitions
• Sample space: the set of all possible outcomes.
We denote S
• Event: an outcome or a set of outcomes of a
random phenomenon. An event is a subset of the
sample space.
• Probability is the proportion of success of an
event.
• Probability model: a mathematical description
of a random phenomenon consisting of two
parts: S and a way of assigning probabilities to
events.
Probability distributions
• Probability distribution of a
random variable X: it tells what values
X can take and how to assign probabilities to
those values.

– Probability of discrete random variable: list

of the possible value of X and their
probabilities
– Probability of continuous random variable:
density curve.
Measuring linear association: correlation r
(The Pearson Product-Moment Correlation Coefficient or Correlation Coefficient)

• The correlation r measures the strength and

direction of the linear association between two
quantitative variables, usually labeled X and Y.

1 xi − x yi − y
r= ( )( )
n −1 sx sy
Facts about correlation
• What kind of variables do we use?
– 1. No distinction between explanatory and response variables.
– 2. Both variables should be quantitative
• Numerical properties
– 1. − 1 ≤ r ≤ 1
– 2. r>0: positive association between variables
– 3. r<0: negative association between variables
– 4. If r =1or r = - 1, it indicates perfect linear relationship
– 5. As |r| is getting close to 1, much stronger relationship
 −negative relationship −  − positive relationship − 
−1 0 1
 − − − − stronger stronger − − − − 

– 6. Effected by a few outliers not resistant.

– 7. It doesn’t describe curved relationships
– 8. Not easy to guess the value of r from the appearance of a
scatter plot
Some necessary elements of

Probability theory and Statistics

The NORMAL DISTRIBUTION

The normal (or Gaussian) distribution, is a very

commonly used (occurring) function in the fields of
probability theory, and has wide applications in the
fields of:
- Pattern Recognition;
- Machine Learning;
- Artificial Neural Networks and Soft computing;
- Digital Signal (image, sound , video etc.) processing
- Vibrations, Graphics etc.
Its also called a BELL function/curve.

The formula for the normal distribution is:

1 1 x−μ 2
p( x) = exp[− ( ) ]
σ 2π 2 σ

The parameter μ is called the mean or expectation (or

median or mode) of the distribution.

The parameter σ is the standard deviation;

and variance is thus σ2.
P(x) 

X
1 1 x−μ 2
p( x) = exp[− ( ) ]
σ 2π 2 σ
https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
(2013)
The normal distribution p(x), with any mean μ and
any positive deviation σ, has the following properties:
• It is symmetric around the mean (μ) of the distribution.
• It is unimodal: its first derivative is positive for x < μ,
negative for x > μ, and zero only at x = μ.
• It has two inflection points (where the second
derivative of f is zero and changes sign), located one
standard deviation away from the mean, x = μ − σ and x =
μ + σ.
• It is log-concave.
• It is infinitely differentiable, indeed supersmooth of
order 2.
Also, the standard normal distribution
p (with μ = 0 and σ = 1) also has the following properties:

• Its first derivative p′(x) is: −x.p(x).

• Its second derivative p′′(x) is: (x2 − 1).p(x)

• More generally, its n-th derivative :

p(n)(x) is: (-1)nHn(x)p(x),

where, Hn is the Hermite polynomial of order n.

The 68 – 95 - 99.7% Rule:
All normal density curves satisfy the following property
which is often referred to as the Empirical Rule:

- 68% of the observations fall within

1 standard deviation of the mean,
that is, between
(μ −σ ) and (μ + σ )
- 95% of the observations fall within
2 standard deviations of the mean,
that is, between
(μ − 2σ ) and (μ + 2σ )

- 99.7% of the observations fall within

3 standard deviations of the mean,
that is, between
(μ − 3σ ) and (μ + 3σ )
1 1 x−μ 2
p( x) = exp[− ( ) ]
σ 2π 2 σ
A normal distribution:

1. is symmetrical (both halves are identical);

2. is asymptotic (its tails never touch the
underlying x-axis; the curve reaches to – ∞
and + ∞ and thus must be truncated);
3. has fixed and known areas under the curve
(these fixed areas are marked off by units
along the x-axis called z-scores; imposing
truncation, the normal curve ends at + 3.00
z on the right and - 3.00 z on the left).
Expected Value of Random Variables
The expected value of a random variable is the weighted average of all
possible values of the variable. The weight here means the probability
of the random variable taking a specific value.
PDF function represented by this
line is: f(x) = 0.03125x
1 1 x−μ 2
Normal Density: p( x) = exp[− ( ) ]
σ 2π 2 σ
Bivariate Normal Density:
1 x−μ x 2 2 ρ xy ( x − μ x )( y − μ y ) y−μ y
− [( ) − +( )2 ]
2
2 (1− ρ xy ) σx σ xσ y σy
e
p ( x, y ) =
2πσ xσ y (1 − ρ ) 2
xy
μ - Mean; σ - S.D.; ρ xy - Correlation Coefficient
Visualize ρ as equivalent to the orientation of tilted asymmetric Gaussian
filter. n
For x as a discrete random variable,
the expected value of x:
E ( x) =  xi P( xi ) = μ x
i =1
E(x) is also called the first moment of the distribution.
n
E ( x ) =  x P( xi )
The kth moment is defined as: k k
i
P(xi) is the probability of x = xi. i =1
Covariance of x and y, is defined as: σ xy = E[( x − μ x )( y − μ y )]
Covariance indicates how much x and y vary together. The value
depends on how much each variable tends to deviate from its mean, and also
depends on the degree of association between x and y.

σ xy x − μx y − μ y
Correlation between x and y: ρ xy = = E[( )( )]
σ xσ y σx σy
Property of correlation coefficient: − 1 ≤ ρ xy ≤ 1
For Z = ax + by ;

E[( z − μ z ) ] = a σ + 2abσ xy + b σ ;
2 2 2
x
2 2
y

If σ xy = 0, σ = a σ + b σ
2
z
2 2
x
2 2
y
Several sets of (x, y) points, with the correlation coefficient of x and y
for each set.
The correlation reflects the strength and direction of a linear relationship (top
row),
but not the slope of that relationship (middle),
nor many aspects of nonlinear relationships (bottom).
??

σ xy x − μx y − μ y
ρ xy = = E[( )( )]
σ xσ y σx σy

The correlation coefficient can also be viewed as the cosine of the angle
between the two vectors (R D) of samples drawn from the two random variables -
i.e. between the two observed vectors in N-dimensional space (for N observations
of each variable) - http://www.hawaii.edu/powerkills/UC.HTM
This method only works with centered data, i.e., data which have been
shifted by the sample mean so as to have an average of zero.
https://people.math.harvard.edu/~knill/teaching/math19b_2011/handouts/lecture12.pdf
SRC - WIKI
Poisson 
Other PDFs:
λ x
−λ ;
P( x) = e λ >0
x!

 Binomial
Cauchy
LAPLACE:

Read about:

• Central Limit Theorem

Double Exponential Density:
• Uniform Distribution
1 − x−a b
P( x) = e ; • Geometric Distribution

2b • Quantile-Quantile (QQ) Plot

• Probability-Probability (P-P) Plot

The mean and standard
deviation of a random variable
X are 5 and 4 respectively.
Find:

E(X2) = 25 + 16 = 41
σ xy = E[( x − μ x )( y − μ y )]
PROB. & STAT. - Revisited/Contd.

n n
~ 1
Sample mean is defined as: x =  xi P( xi ) =  xi where,

i =1 n i =1 P(xi) = 1/n.
n
1 ~
Sample Variance is: σ x =  ( xi − x)
2 2

n i =1
~ ~
Higher order moments may also be computed: E ( xi − x) 3 ; E ( xi − x) 4

Covariance of a bivariate distribution:

n
1 ~ ~
σ xy = E[( x − μ x )( y − μ y )] =  ( x − x)( y − y )
n i =1
Second, third,… moments of the distribution p(x) are the expected values of:
x2, x3,…
The kth central moment is defined as: n
E[( x − μ x ) ] =  ( x − μ x ) P( xi )
k k

i =1
Thus, the second central moment (also called Variance) of a random variable x is
defined as:
σ = E[{x − E ( x)} ] = E[( x − μ x ) ]
2
x
2 2

S.D. of x is σx.

σ = E[{x − E ( x)} ] = E[( x − μ x ) ]

2
x
2 2

= E ( x ) − 2μ + μ = E ( x ) − μ
2 2
x
2
x
2 2
x
Thus
E(x2 ) = σ 2 + μ 2
If z is a new variable: z= ax + by; Then E(z) = E(ax + by)=aE(x) + bE(y).
Also, note that
MAXIMUM LIKELIHOOD ESTIMATE (MLE)
The ML estimate (MLE) of a parameter is that value which, when substituted
into the probability distribution (or density), produces that distribution for which
the probability of obtaining the entire observed set of samples is maximized.

Problem: Find the maximum likelihood estimate for μ in a normal distribution.

1 1 x−μ 2
Normal Density: p( x) = exp[− ( ) ]
σ 2π 2 σ
Assuming all random samples to be independent:
n
p ( x1 , , , , xn ) = p ( x1 )..... p ( xn ) = Π p ( xi )
i =1
1 1 n
x−μ
= n
σ (2π ) n/2
exp[− 2
2σ
(
i =1 σ
2
) ]

Taking derivative (w.r.t. μ ) Setting this term = 0, we get:

of the LOG of the above:
1 n ~
μ =  xi = x
n n
1 1
2σ 2  ( x − μ ).2 = σ [ x − nμ ]
i =1
i 2
i =1
i
n i =1
Also read about MAP estimate – Baye’s is an example.
Sampling Distributions

http://grid.cs.gsu.edu/~skarmakar/math1070_slides.html
What are the main types of sampling and how is each done?

Simple Random Sampling: A simple random sample (SRS) of size

n is produced by a scheme which ensures that each subgroup of the
population of size n has an equal probability of being chosen as the
sample.
Stratified Random Sampling: Divide the population into "strata".
There can be any number of these. Then choose a simple random
sample from each stratum. Combine those into the overall sample.
That is a stratified random sample. (Example: Church A has 600
women and 400 women as members. One way to get a stratified
random sample of size 30 is to take a SRS of 18 women from the
600 women and another SRS of 12 men from the 400 men.)
Multi-Stage Sampling: Sometimes the population is too large and
scattered for it to be practical to make a list of the entire population
from which to draw a SRS. For instance, when the a polling
organization samples US voters, they do not do a SRS. Since voter
lists are compiled by counties, they might first do a sample of the
counties and then sample within the selected counties. This
illustrates two stages.
<* SRC: WIKI *>
In statistics, a simple random sample is a subset of
individuals (a sample) chosen from a larger set (a population). Each
individual is chosen randomly and entirely by chance, such that
each individual has the same probability of being chosen at any
stage during the sampling process, and each subset of k individuals
has the same probability of being chosen for the sample as any
other subset of k individuals. This process and technique is known
as simple random sampling, and should not be confused with
systematic random sampling. A simple random sample is an
unbiased surveying technique.

Systematic sampling (Sys-S) is a statistical method involving

the selection of elements from an ordered sampling frame. The most
common form of systematic sampling is an equi-probability method. In
this approach, progression through the list is treated circularly, with a
return to the top once the end of the list is passed. The sampling starts
by selecting an element from the list at random and then every k-th
element in the frame is selected, where k, the sampling interval
(sometimes known as the skip): this is calculated as: k = N/n
where n is the sample size, and N is the population size.
Systematic sampling (Sys-S) Example: Suppose a supermarket
wants to study buying habits of their customers, then using systematic
sampling they can choose every 10th or 15th customer entering the
supermarket and conduct the study on this sample.

This is random sampling with a system. From the sampling

frame, a starting point is chosen at random, and choices thereafter are at
regular intervals. For example, suppose you want to sample 8 houses
from a street of 120 houses. 120/8=15, so every 15th house is chosen
after a random starting point between 1 and 15. If the random starting
point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and
116.
Sampling With Replacement and Sampling Without Replacement

Consider a population of potato sacks, each of which has

either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the values are
equally likely. Suppose that, in this population, there is exactly one
sack with each number. So the whole population has seven sacks.

Sampling with replacement:

If I sample two with replacement, then I first pick one (say
14). I had a 1/7 probability of choosing that one. Then I replace it.
Then I pick another. Every one of them still has 1/7 probability of
being chosen. And there are exactly 49 different possibilities here.

Sampling without replacement:

If I sample two without replacement, then I first pick one (say

14). I had a 1/7 probability of choosing that one. Then I pick another.
At this point, there are only six possibilities: 12, 13, 15, 16, 17, and
18. So there are only 42 different possibilities here (again assuming
that we distinguish between the first and the second.)
Sampling distribution

• The sampling distribution of a statistic (not

parameter) is the distribution of values taken by
the statistic (not parameter) in all possible
samples of the same size from the same
population.
Sampling Distribution
Introduction
• In real life calculating parameters of
populations is prohibitive because
populations are very large.
• Rather than investigating the whole
population, we take a sample, calculate a
statistic related to the parameter of interest,
and make an inference.
• The sampling distribution of the statistic is
the tool that tells us how close is the statistic
to the parameter.
Sample Statistics as Estimators
of Population Parameters
• A sample statistic is a A population parameter
numerical measure of a is a numerical measure of
summary characteristic a summary characteristic
of a sample. of a population.

• An estimator of a population parameter is a sample

statistic used to estimate or predict the population
parameter.
• An estimate of a parameter is a particular numerical
value of a sample statistic obtained through
sampling.
• A point estimate is a single value used as an
estimate of a population parameter.
Estimators

• The sample mean, X , is the most common

estimator of the population mean, μ.
• The sample variance, s2, is the most common
estimator of the population variance, σ2.
• The sample standard deviation, s, is the most
common estimator of the population standard
deviation, σ.
• The sample proportion, p̂, is the most common
estimator of the population proportion, p.
Sampling Distribution of X

• The sampling distribution of X is the

probability distribution of all possible values
the random variable X may assume when a
sample of size n is taken from a specified
population.
Sampling Distribution of the Mean
• An example
– A die is thrown infinitely many times. Let X
represent the number of spots showing on
any throw.
– The probability distribution of X is

E(X) = 1(1/6) +
x 1 2 3 4 5 6 2(1/6) + 3(1/6)+
………………….= 3.5
p(x) 1/6 1/6 1/6 1/6 1/6 1/6
V(X) = (1-3.5)2(1/6) +
(2-3.5)2(1/6) +
…………. …= 2.92
Throwing a dice twice – sampling
distribution of sample mean

• Suppose we want to estimate μ

from the mean x of a sample of
size n = 2.
• What is the distribution of x ?
Throwing a die twice – sample
mean

Sample Mean Sample Mean Sample Mean

1 1,1 1 13 3,1 2 25 5,1 3
2 1,2 1.5 14 3,2 2.5 26 5,2 3.5
3 1,3 2 15 3,3 3 27 5,3 4
4 1,4 2.5 16 3,4 3.5 28 5,4 4.5
5 1,5 3 17 3,5 4 29 5,5 5
6 1,6 3.5 18 3,6 4.5 30 5,6 5.5
7 2,1 1.5 19 4,1 2.5 31 6,1 3.5
8 2,2 2 20 4,2 3 32 6,2 4
9 2,3 2.5 21 4,3 3.5 33 6,3 4.5
10 2,4 3 22 4,4 4 34 6,4 5
11 2,5 3.5 23 4,5 4.5 35 6,5 5.5
12 2,6 4 24 4,6 5 36 6,6 6
Sample Mean Sample Mean Sample Mean
1 1,1 1 13 3,1 2 25 5,1 3
The distribution of x when n = 2
2
3
1,2
1,3
1.5
2
14
15
3,2
3,3
2.5
3
26
27
5,2
5,3
3.5
4
4 1,4 2.5 16 3,4 3.5 28 5,4 4.5
5 1,5 3 17 3,5 4 29 5,5
σ 2 5
6
7 Note : μ = μ
1,6
2,1
3.5 18
1.5 x 19
3,6
4,1
x
4.5
2.5 and σ = 30 2
31 x
5,6
6,1
x 5.5
3.5
8
9
2,2
2,3
2
2.5
20
21
4,2
4,3
3
3.5
32
33
6,2
6,3
2 4
4.5
10 2,4 3 22 4,4 4 34 6,4 5
11 2,5 3.5 23 4,5 4.5 35 6,5 5.5
12 2,6 4 24 4,6 5 36 6,6 6

E( x) =1.0(1/36)+
6/36 1.5(2/36)+….=3.5
5/36
V(X) = (1.0-3.5)2(1/36)+
4/36 (1.5-3.5)2(2/36)... = 1.46
3/36
2/36
1/36
1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 x
Sampling Distribution of the
Mean
n=5 n = 10 n = 25
μ x = 3.5 μ x = 3.5 μ x = 3.5
2 σ 2x σ 2x σ 2
σ = .5833 ( = )
x
2
σ x = .2917 ( = ) σ 2x = .1167 ( = x )
5 6 10 25
Sampling Distribution of the
Mean
n=5 n = 10 n = 25
μ x = 3.5 μ x = 3.5 μ x = 3.5
σ 2
2 σ 2x 2 σ 2x
2
σ = .5833 ( = )x σ = .2917 ( = )
x σ = .1167 ( = )
x
x
5 10 25

Notice that σ x2 is smaller than .σx2.

The larger the sample size the
smaller σ 2x . Therefore, x tends
to fall closer to μ, as the sample
size increases.
Relationships between Population Parameters and
the Sampling Distribution of the Sample Mean

The expected value of the sample mean is equal to the population mean:

E( X ) = μ = μ
X X

The variance of the sample mean is equal to the population variance divided by
the sample size:

σ 2

V(X) = σ 2
= X
X
n
The standard deviation of the sample mean, known as the standard error of
the mean, is equal to the population standard deviation divided by the square
root of the sample size:
σX
s.e. = SD( X ) = σ X =
n
Law of Large Number
How sample means approach the population mean
(μ=25).
Example
- what would happen in many samples?
Recall Some Features of the Sampling Distribution

• It will approximate a normal curve even if the

population you started with does NOT look
normal

• Sampling distribution serves as a bridge between

the sample and the population
Mean of a sample mean x
Standard Deviation of a sample mean x
Third Property: Sample Size and the
Standard Deviation

• The larger the sample size, the smaller the

standard deviation of the mean x

• As n increases, the standard deviation of the

mean decreases
Sampling distribution of a sample mean x

• Definition: For a random variable x and a given sample

size n, the distribution of the variable x , that is the
distribution of all possible sample means, is called the
sampling distribution of the sample mean.
Sampling distribution of the sample mean

• Case 1. Population follows Normal

distribution
– Draw an SRS of size n from any population.
– Repeat sampling.
– Population follows a Normal distribution with
mean µ and standard deviation σ.
σ/ n – Sampling distribution of x follows normal
distribution as follows: N(µ, σ/√n ).
Example
(The population distribution follow a Normal
distribution, then so does the sample mean)
The central limit theorem

This theorem tells us:

1. Small samples: Shape of sampling distribution is
less normal
2. Large sample: Shape of sampling distribution is
more normal.
Sampling distribution of the sample mean

• Case 2. Population follows any distribution

(CLT: Central limit theorem)
– Draw an SRS of size n from any population.
– Repeat sampling.
– Population follows a distribution with mean µ
and standard deviation σ.
– When n is large (n>=30), sampling dist of x
follows approximately Normal distribution as
follows N(µ, σ/√n ).
The Central Limit Theorem
n=5
When sampling from a population 0.25

with mean μ and finite standard

0.20
0.15

P(X)
0.10

deviation σ, the sampling 0.05

0.00
X

distribution of the sample mean will n = 20

tend to be a normal distribution with 0.2

σ
mean μ and standard deviation n as

P(X)
0.1

the sample size becomes large 0.0

(n >30). Large n
0.4
0.3

f(X)
0.2

For “large enough” n: X ~ N(μ,σ / n)

2
0.1
0.0
-
μ
X
The Central Limit Theorem Applies to
Sampling Distributions from Any Population
Normal Uniform Skewed General

Population

n=2

n = 30

μ X μ X μ X μ X
Student’s t Distribution
If the population standard deviation, σ, is unknown, replace σ with
the sample standard deviation, s. If the population is normal, the
resulting statistic: X −μ
t=
s/ n
has a t distribution with (n - 1) degrees of freedom.
• The t is a family of bell-shaped and
symmetric distributions, one for each
number of degree of freedom.
• The expected value of t is 0. Standard normal
• The variance of t is greater than 1, but t, df=20
approaches 1 as the number of degrees of t, df=10
freedom increases.
• The t distribution approaches a standard
normal as the number of degrees of
0
freedom increases. μ
• When the sample size is small (<30) we use
t distribution.
Sampling Distributions

Finite Population Correction Factor

If the sample size is more than 5% of the

population size and the sampling is done
without replacement, then a correction needs
to be made to the standard error of the
means.

σN −n
σx = •
n N −1
Sampling Distribution of x

Standard Deviation of x
Finite Population Infinite Population

σN −n σ
σx = ( ) σx =
n N −1 n
• A finite population is treated as being
infinite if n/N < .05.
• ( N − n) / ( N − 1) is the finite correction factor.
• σ x is referred to as the standard error of the
mean.
The Sampling Distribution of the Sample
Proportion, p
n= 2 , p = 0 .3

The sample proportion is the percentage of 0 .5

0 .4

successes in n binomial trials. It is the 0 .3

P(X)
0 .2

number of successes, X, divided by the 0 .1

number of trials, n.
0 .0
0 1 2

n=10,p=0.3

X 0.3

Sample proportion: p = 0.2

P(X)
0.1

0.0
0 1 2 3 4 5 6 7 8 9 10
X

As the sample size, n, increases, the sampling

distribution of p
n=15, p = 0.3
 approaches a normal
0.2

distribution with mean p and standard

deviation

P(X)
p (1 − p ) 0.1

n 0.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 ^p

Lecture Slides - Inferential Statistics
100% (1)
Lecture Slides - Inferential Statistics
42 pages
Tank Mixing JGS 210-120-1-66E: Confidential
No ratings yet
Tank Mixing JGS 210-120-1-66E: Confidential
9 pages
Standard Refinery Fuel Tons
100% (6)
Standard Refinery Fuel Tons
2 pages
Preparation of LC and LG Arrangement
No ratings yet
Preparation of LC and LG Arrangement
13 pages
3 - Introduction To Inferential Statistics
No ratings yet
3 - Introduction To Inferential Statistics
32 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
AP Statistics Study Guide
100% (2)
AP Statistics Study Guide
12 pages
Statistics and Probability Reviewer Quarter 3
No ratings yet
Statistics and Probability Reviewer Quarter 3
19 pages
OISD Check List - 1
100% (1)
OISD Check List - 1
5 pages
49538ad5e2701462f3121414ecb10ba7
No ratings yet
49538ad5e2701462f3121414ecb10ba7
241 pages
OISD 166 Guidelines
No ratings yet
OISD 166 Guidelines
50 pages
Inline Mixing JGS 210-120-1-72E: Confidential
No ratings yet
Inline Mixing JGS 210-120-1-72E: Confidential
11 pages
Interval Estimation Practice Questions
0% (2)
Interval Estimation Practice Questions
19 pages
M131-Lecture Notes No. 4
No ratings yet
M131-Lecture Notes No. 4
58 pages
91 With: Probability
No ratings yet
91 With: Probability
13 pages
Chapter 6
No ratings yet
Chapter 6
39 pages
Data Sheet Air Foam Chamber
No ratings yet
Data Sheet Air Foam Chamber
1 page
Normal Distribution - WikipediaTheory of Estimation
No ratings yet
Normal Distribution - WikipediaTheory of Estimation
61 pages
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
100% (1)
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
16 pages
ESTIMATION
No ratings yet
ESTIMATION
51 pages
Statistics and Probability 2
No ratings yet
Statistics and Probability 2
16 pages
Day 02-Random Variable and Probability - Part (I)
No ratings yet
Day 02-Random Variable and Probability - Part (I)
34 pages
Normal Distribution - Wikipedia
No ratings yet
Normal Distribution - Wikipedia
32 pages
Centrifugal Compressor Surge Control Methods PDF
100% (2)
Centrifugal Compressor Surge Control Methods PDF
1 page
2 Phase Flow Orifice
No ratings yet
2 Phase Flow Orifice
14 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
941 pages
1a Review of Discrete Probability
No ratings yet
1a Review of Discrete Probability
26 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Statistics and Probability Midterm Reviewer
100% (1)
Statistics and Probability Midterm Reviewer
4 pages
Statistics I Notes
No ratings yet
Statistics I Notes
99 pages
Stat1 Formulas and Tables For Statistics 2022
No ratings yet
Stat1 Formulas and Tables For Statistics 2022
34 pages
Data Sheet Cylinder Rack & Piping Manifold
100% (1)
Data Sheet Cylinder Rack & Piping Manifold
1 page
Upang Cea Common Ece069 p3-1
No ratings yet
Upang Cea Common Ece069 p3-1
49 pages
Energy Conversion
No ratings yet
Energy Conversion
16 pages
Sci Pi Statistics and Probability Handout
No ratings yet
Sci Pi Statistics and Probability Handout
4 pages
Prob & Stats (Slides) PDF
No ratings yet
Prob & Stats (Slides) PDF
101 pages
Statistics & Probability
No ratings yet
Statistics & Probability
9 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Data Sheet Co2 Cylinder Assembly
No ratings yet
Data Sheet Co2 Cylinder Assembly
2 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
Csc-Reviewer-Stats and Prob
No ratings yet
Csc-Reviewer-Stats and Prob
13 pages
OptimalLinearFilters PDF
No ratings yet
OptimalLinearFilters PDF
107 pages
PSA User Meet - Jaipur
No ratings yet
PSA User Meet - Jaipur
2 pages
Plan 53 B
No ratings yet
Plan 53 B
2 pages
Probability and Statistics - 2
No ratings yet
Probability and Statistics - 2
72 pages
STAT 552 Probability and Statistics Ii: Short Review of S551
No ratings yet
STAT 552 Probability and Statistics Ii: Short Review of S551
51 pages
(2nd Sem) Stats and Prob Reviewer
No ratings yet
(2nd Sem) Stats and Prob Reviewer
12 pages
L2 Introduction To Probability Theory
No ratings yet
L2 Introduction To Probability Theory
38 pages
Mathematics Handbook
No ratings yet
Mathematics Handbook
11 pages
Is 1448 70 1968
No ratings yet
Is 1448 70 1968
9 pages
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
No ratings yet
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
44 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
Math403 - 4.0 Continuous Probability Distribution
No ratings yet
Math403 - 4.0 Continuous Probability Distribution
42 pages
CDU-I Monthly Yields 2017-18 Updated
No ratings yet
CDU-I Monthly Yields 2017-18 Updated
44 pages
Random Variables and Process
No ratings yet
Random Variables and Process
31 pages
Week 5 D 1
No ratings yet
Week 5 D 1
15 pages
3-AAP Analysis Report
No ratings yet
3-AAP Analysis Report
11 pages
Standard Refinery Fuel Tons
No ratings yet
Standard Refinery Fuel Tons
2 pages
Intro To Data Science Lecture 2
No ratings yet
Intro To Data Science Lecture 2
12 pages
Stats Review
No ratings yet
Stats Review
65 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
Lecture 8 - Continuous Probability Distributions
No ratings yet
Lecture 8 - Continuous Probability Distributions
33 pages
Statistics and Probability
No ratings yet
Statistics and Probability
2 pages
Probability Distribution
No ratings yet
Probability Distribution
10 pages
Name: Katakam Sandeep Reddy Mobile: 9704575353: Resume
No ratings yet
Name: Katakam Sandeep Reddy Mobile: 9704575353: Resume
2 pages
Statistics S1 Theory
No ratings yet
Statistics S1 Theory
8 pages
Document 73
No ratings yet
Document 73
38 pages
Medical Health Insurance Scheme For HPCL Underwritten by NIA Co Ltd. Non - Hospitalization)
No ratings yet
Medical Health Insurance Scheme For HPCL Underwritten by NIA Co Ltd. Non - Hospitalization)
1 page
Math Statistics
No ratings yet
Math Statistics
4 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Project Locus Grade 11
No ratings yet
Project Locus Grade 11
9 pages
R-6 Theory
No ratings yet
R-6 Theory
4 pages
Data Sheet Directional Valve
No ratings yet
Data Sheet Directional Valve
1 page
Data Sheet Discharge Nozzle
No ratings yet
Data Sheet Discharge Nozzle
1 page
Distribución Gaussiana
No ratings yet
Distribución Gaussiana
26 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Business Statistics CH 2
No ratings yet
Business Statistics CH 2
49 pages
St2334-Cheatsheet Organized
No ratings yet
St2334-Cheatsheet Organized
2 pages
Fe Engineering Probability Statistics
No ratings yet
Fe Engineering Probability Statistics
9 pages
Safety Contact - Pipeline Bursted During Hydrotest
No ratings yet
Safety Contact - Pipeline Bursted During Hydrotest
1 page
Review of Basic Statistics: Appendix A
No ratings yet
Review of Basic Statistics: Appendix A
12 pages
Untitled Notebook
No ratings yet
Untitled Notebook
1 page
Qualitative Quantitative: Random Variable
No ratings yet
Qualitative Quantitative: Random Variable
4 pages
Draw PDF
No ratings yet
Draw PDF
21 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Panel 2 Up
No ratings yet
Panel 2 Up
9 pages
Iso 11843 2 2000
No ratings yet
Iso 11843 2 2000
12 pages
Mayhs
No ratings yet
Mayhs
4 pages
Ch4 - Output Error Method
No ratings yet
Ch4 - Output Error Method
59 pages
R20 MCA 2 Years CS Syllabus
No ratings yet
R20 MCA 2 Years CS Syllabus
87 pages
8366probability Summary Sheet
No ratings yet
8366probability Summary Sheet
4 pages
STAT 366 - Sample Survey Theory and Methods II - Lecture 2
No ratings yet
STAT 366 - Sample Survey Theory and Methods II - Lecture 2
82 pages
Pg-Admission-Test-Syllabus - 2025
No ratings yet
Pg-Admission-Test-Syllabus - 2025
5 pages
CH 04 Wooldridge 5e ppt20250307
No ratings yet
CH 04 Wooldridge 5e ppt20250307
56 pages
Using Item Parameter Predictions For Reducing Calibration Sample Requirements-A Case Study Based On A High-Stakes Admission Test
No ratings yet
Using Item Parameter Predictions For Reducing Calibration Sample Requirements-A Case Study Based On A High-Stakes Admission Test
52 pages
Performance Control and Risk Calibration in The Black-Litterman M
No ratings yet
Performance Control and Risk Calibration in The Black-Litterman M
14 pages
Tree
No ratings yet
Tree
1 page
RP ch07
No ratings yet
RP ch07
29 pages
Quiz 08
No ratings yet
Quiz 08
1 page
Business Analytics BA - BA4206 - Important 2 Marks Questions With Answers - Part 3
No ratings yet
Business Analytics BA - BA4206 - Important 2 Marks Questions With Answers - Part 3
15 pages
Credibility Practice Note: July 2008
No ratings yet
Credibility Practice Note: July 2008
56 pages
The beta-PERT Distribution
No ratings yet
The beta-PERT Distribution
7 pages
Mba Statistics Midterm Review Sheet
No ratings yet
Mba Statistics Midterm Review Sheet
1 page
Rao Blackwell
No ratings yet
Rao Blackwell
2 pages
Solution To Exercise 6.2
No ratings yet
Solution To Exercise 6.2
2 pages
Panel Data V
No ratings yet
Panel Data V
28 pages
Isotropy Test For Spatial Point Processes Using
No ratings yet
Isotropy Test For Spatial Point Processes Using
14 pages
McMillen, D. P., & McDonald, J. F. (2004) - Reaction of House Prices To A New Rapid Transit Line: Chicago's Midway Line
No ratings yet
McMillen, D. P., & McDonald, J. F. (2004) - Reaction of House Prices To A New Rapid Transit Line: Chicago's Midway Line
24 pages
The Frechet Distribution: Estimation and Application An Overview
No ratings yet
The Frechet Distribution: Estimation and Application An Overview
32 pages
Matching To Remove Bias in Observational Studies
No ratings yet
Matching To Remove Bias in Observational Studies
26 pages
Chapter 11
No ratings yet
Chapter 11
35 pages
GARCH-M Model
No ratings yet
GARCH-M Model
11 pages
Box Plot
No ratings yet
Box Plot
1 page

What Is Statistics?: Definition of Statistics Statistics

Uploaded by

What Is Statistics?: Definition of Statistics Statistics

Uploaded by

What is Statistics?

• Adding categorical variables to scatterplots

• Measuring linear association: correlation r

• Facts about correlation

– Probability of discrete random variable: list

• The correlation r measures the strength and

– 6. Effected by a few outliers not resistant.

Probability theory and Statistics

The normal (or Gaussian) distribution, is a very

The formula for the normal distribution is:

The parameter μ is called the mean or expectation (or

The parameter σ is the standard deviation;

• Its first derivative p′(x) is: −x.p(x).

• Its second derivative p′′(x) is: (x2 − 1).p(x)

• More generally, its n-th derivative :

p(n)(x) is: (-1)nHn(x)p(x),

where, Hn is the Hermite polynomial of order n.

- 68% of the observations fall within

- 99.7% of the observations fall within

1. is symmetrical (both halves are identical);

• Central Limit Theorem

2b • Quantile-Quantile (QQ) Plot

• Probability-Probability (P-P) Plot

Covariance of a bivariate distribution:

σ = E[{x − E ( x)} ] = E[( x − μ x ) ]

Problem: Find the maximum likelihood estimate for μ in a normal distribution.

Taking derivative (w.r.t. μ ) Setting this term = 0, we get:

Simple Random Sampling: A simple random sample (SRS) of size

Systematic sampling (Sys-S) is a statistical method involving

This is random sampling with a system. From the sampling

Consider a population of potato sacks, each of which has

Sampling with replacement:

Sampling without replacement:

If I sample two without replacement, then I first pick one (say

• The sampling distribution of a statistic (not

• An estimator of a population parameter is a sample

• The sample mean, X , is the most common

• The sampling distribution of X is the

• Suppose we want to estimate μ

Sample Mean Sample Mean Sample Mean

Notice that σ x2 is smaller than .σx2.

• It will approximate a normal curve even if the

• Sampling distribution serves as a bridge between

• The larger the sample size, the smaller the

• As n increases, the standard deviation of the

• Definition: For a random variable x and a given sample

• Case 1. Population follows Normal

This theorem tells us:

• Case 2. Population follows any distribution

with mean μ and finite standard

deviation σ, the sampling 0.05

distribution of the sample mean will n = 20

the sample size becomes large 0.0

For “large enough” n: X ~ N(μ,σ / n)

Finite Population Correction Factor

If the sample size is more than 5% of the

The sample proportion is the percentage of 0 .5

successes in n binomial trials. It is the 0 .3

number of successes, X, divided by the 0 .1

Sample proportion: p = 0.2

As the sample size, n, increases, the sampling

distribution with mean p and standard

You might also like