0% found this document useful (0 votes)
36 views31 pages

Eco 2

1) Measures of central tendency like the mean, median and mode describe the center of a data set. The mean is best for numerical data while the median and mode are better for categorical data. 2) Percentiles and quartiles indicate the position of a value relative to the entire data set. Quartiles divide a data set into four equal parts. 3) Measures of variability like the range, variance and standard deviation describe how spread out the values are in a data set. 4) The normal distribution is a continuous, bell-shaped distribution important in statistics. It is characterized by the mean and standard deviation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views31 pages

Eco 2

1) Measures of central tendency like the mean, median and mode describe the center of a data set. The mean is best for numerical data while the median and mode are better for categorical data. 2) Percentiles and quartiles indicate the position of a value relative to the entire data set. Quartiles divide a data set into four equal parts. 3) Measures of variability like the range, variance and standard deviation describe how spread out the values are in a data set. 4) The normal distribution is a continuous, bell-shaped distribution important in statistics. It is characterized by the mean and standard deviation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Basic Statistic

Measures of Central Tendency


• It shows the “center” of a distribution of data values
 Measures include Mean, Median, Mode
• The best measure depends on the data type
• Categorical data are best described by the
median or the mode, not the mean.
• Numerical data are usually best described by
the mean.
• The mode is used less frequently as it may not
represent the true center of numerical data.
Measures of Relative Position
 Percentiles and Quartiles
• Percentiles and quartiles are measures
that indicate the location, or position, of
a value relative to the entire set of data.
• The Pth percentile is a value such that
approximately P% of the observations
are at or below that number. The 50th
percentile is the median.
• Quartiles are descriptive measures that
separate large data sets into four quarters.
 The first quartile, Q1, (25th percentile)
 The second quartile, Q2, (50th percentile)
 It is the median
 The third quartile, Q3, (75th percentile)
 The fourth quartile, Q4, (100th percentile)
Measures of Variability
• Range: The difference between the largest and smallest
observations.
• Variance: Represents the average squared
deviation (distance) from the mean.

• Standard deviation
 Gives an indication of how closely or widely
the individual values are spread around their
mean value.
Measures of Relationships Between Variables
• Covariance
 A measure of the linear relationship between two
variables.
A positive value indicates a direct or increasing linear
relationship,
A negative value indicates a decreasing linear
relationship.
cov (X, Y) = (X − μx)(Y − μy)/n-1
The value of the covariance varies if we
change the unit of measurement of a
variable (from meter to cm, etc)
Covariance does not provide a measure
of the strength of the relationship between
two variables.
• Correlation Coefficient
 It is computed by dividing the covariance of the two
variables by the product of the standard deviations of the two
variables.
r= cov(x, y)/ (sdx* sdy)
 It lies between −1 and +1, −1 indicating perfect linear
negative association and +1 indicating perfect linear positive
association.
 It is generally a more useful measure because it provides
both the direction and the strength of a relationship.
The covariance and corresponding correlation coefficient
have the same sign (both are positive or both are negative).
• When r = 0, there is no linear relationship between x and y—but
not necessarily a lack of non-linear relationship (say, Y=x2)
Normal Distribution
• ND is a bell-shaped continuous distribution widely used
in statistical inference.
• It is the most important probability distribution in
statistics and important tool in analysis of economic
data.
• Characterized by 2 parameters: Mean (m)
and standard deviation (s). These represent
location and spread.
• It is Symmetric around the mean: Two
halves of the curve are the same (mirror
images)
• The notation when Y is normally distributed
with mean m and standard deviation s is:

Y ~ N ( , )
• Random variables that are approximately normal
have the following properties :

– Approximately half (50%) fall above (and below) mean


– Approximately 68% fall within 1 standard deviation of
mean
– Approximately 95% fall within 2 standard deviations of
mean
– Approximately 99.7% (virtually all) fall within 3
standard deviations of mean
Normal Distribution

P(Y   )  0.50 P(     Y     )  0.68 P(   2  Y    2 )  0.95


Why are normal distributions so important?

• Many variables are commonly assumed to


be normally distributed in the population.

• If a variable is approximately normally


distributed we can make inferences about
values of that variable.
Standard Normal (Z) Distribution
• Problem of ND: Unlimited number of possible
normal distributions (- < m <  , s > 0)
• Solution: Standardize the random variable to have
mean 0 and standard deviation 1
Y 
Y ~ N ( , )  Z  ~ N (0,1)

• Z indicates how many standard deviations away
from the mean the point Y lies.
• Because all z-score distributions have the same mean
and standard deviation, individual scores from
different distributions can be directly compared.
Standard Normal (Z) Distribution
• Standard Normal Distribution Characteristics:
 P(Z  0) = P(X  m ) = 0.5000
 P(-1  Z  1) = P(m-s  X  m+s ) = 0.6826
 P(-2  Z  2) = P(m-2s  X  m+2s ) = 0.954C4
 P(Z  za) = P(Z  -za) = a (using Z-table)1
-
Finding Probabilities of Specific Ranges
• Step 1 - Identify the normal distribution of interest (its
mean (m) and standard deviation (s) )
• Step 2 - Identify the range of values that you wish to
determine the probability of observing (XL , XU), where
often the upper or lower bounds are  or -
• Step 3 - Transform XL and XU into Z-values:
XL   XU  
ZL  ZU 
 
• Step 4 - Obtain P(ZL Z  ZU) from Z-table
Table : P(0<Z<X)

z .00 .01 .02 .03 .04 .05 .06


0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1404
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123
… … … … … … … …
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770
• Example:
X is a normal random variable with μ=120,
and σ=15. Find the probability P(X≤135)

Z=(135-120)/15=1
P(Z≤1) = 0.5 + 0.3413 = 0.8413
Examples
• P(0<Z<1) = 0.3413
• P(1<Z<2)=P(0<Z<2)–P(0<Z<1) =0.4772–0.3413 =0.1359
• P(Z≥1) = 0.5–P(0<Z<1) = 0.5–0.3413 =0.1587
• P(Z ≥ -1) = 0.3413+0.50 = 0.8413
• P(-2<Z<1) = 0.4772+0.3413 = 0.8185
• P(Z ≤ 1.87) =0.5+P(0<Z ≤ 1.87)=0.5+0.4693=0.9693
• P(Z<-1.87)= P(Z>1.87)= 0.5–0.4693= 0.0307
Chi Square distributions
• The Chi Square distribution is the distribution of
the sum of squared standard normal variables.

• The degrees of freedom of the distribution is equal


to the number of standard normal observations
being summed.

• Chi Square with one degree of freedom, written as


χ2(1), is simply the distribution of a single normal
observation squared.
• The area of a Chi Square distribution below
4 is the same as the area of a standard
normal distribution below z value of 2,
since 4 is 22.
• Chi Square distributions are positively
skewed, with the degree of skew decreasing
with increasing degrees of freedom.
• As the degrees of freedom increases, the
Chi Square distribution approaches a
normal distribution.
Skewness and Kurtosis
Skewness
It is a measure of symmetry of distribution
Sk>0, positively skewed [right skewness]
Mode<median<mean
Sk<0, negatively skewed [left skewness]
 Mode>median>mean
Sk=0, normal distribution
 Mode=median=mean
Kurtosis
• It measures the peakedness of a distribution.
• Kurtosis provides a measure of the weight in
the tails of a probability density function.
• For normal distribution the kurtosis is 3.

You might also like