Tian Statistics Lesson 5 and 6 Probability Distributions
Tian Statistics Lesson 5 and 6 Probability Distributions
and Statistics
Topic 1: Introducing Distributions and Statistical Tests -
Concepts of distribution, frequency, quintiles, outliers (cont.)
Data wings;
Input name $13. type $ Length Wingspan @@;
Datalines;
Robin S 28 41 Bald Eagle R 102 244 Barn Owl R 50 110
Osprey R 66 180 Cardinal S 23 31 Goldfinch S 11 19
Golden Eagle R 100 234 Crow S 53 100 Magpie S 60 60
Elf Owl R 15 27 Condor R 140 300 Rt Robin S 24 70
;
Run;
Proc format;
Value $birdtype
'S'='Songbirds'
'R'='Raptors';
Run;
Proc sgplot data=wings;
Scatter x=wingspan y=length;
Title 'Comparison of Wingspan vs, Length';
Run;
(1) A basic linear relationship between X and Y for most of the data, and
(2) A single outlier (at X=375). Outlier detection is important for effective modeling.
Page 6 of 54
Independent and dependent variables
In general, the independent variable is obtained by the
researcher and its effects on the dependent variable are
measured.
particular value.
Page 8 of 54
Topic 1: Introducing Distributions and Statistical Tests -
Concepts of distribution, frequency, quintiles, outliers (cont.)
Example: When you flip a coin two times, there are 4 possible outcomes: HH, HT,
TH and TT. What is the probability distribution for the number of Tails?
.5
.25
x
0 1 2
.25 if x=0 or 2
P(x)=
.5 if x=1
Topic 1: Introducing Distributions and Statistical Tests -
Concepts of distribution, frequency, quintiles, outliers (cont.)
Q : What is the probability that the coin flips would be in one or fewer tails?
Page 11 of 54
Topic 2: Different kinds of distribution, shape and attributes
There are two main types of random variables: discrete and continuous.
Page 12 of 54
Topic 2: Different kinds of distribution, shape and attributes
Page 13 of 54
Topic 2: Different kinds of distribution, shape and attributes
Probability Distributions
Page 14 of 54
Topic 3: Different kinds of distribution, shape and attributes
For probablity distribution, an important aspect of the distribution of a data set is its
Technically, a distribution is bimodal or multimodal only if the peaks are the same
Symmetry – the distribution can be divided into two pieces that are mirror images of
etc. Bimodal and multimodal distributions sometimes are symmetry, but that are
sections.
If a random variable X can take values xi, then the following must be
true:
1. 0<=p(xi)<=1 for all xi
2. Sum(p(xi))=1 for all xi
x -4 0 1 2
P(x) .2 .3 .4 .1
2. Determine which of the following are not valid probability distributions, and why
A: x 1 2 3 4
p(x) .2 .2 .3 .4
B: x 0 2 4 5
p(x) -.1 .2 .3 .4
C: x -2 -1 1 2
p(x) .1 .1 .1 .7
Topic 2: Different kinds of distribution, shape and attributes
μx = ∑x * P(x)
member from a finite population. Then the mean of the random variable
value of those observations will be approximately equal to μx, The larger the
For all xi
U=sum(xi)/N=sum(xi)*(1/N)
Topic 2: Different kinds of distribution, shape and attributes
Example: A fair six-sided die is tossed. You win $2 if the result is a “1”, you win
The interpretation is that if you play many times, the average outcome is losing 17
cents per play. Thus, over time you should expect to lose money.
Page 24 of 54
Laws of expected value
1 E(c)=c
2 E(cX)=cE(x)
3 E(X+Y)=E(X)+E(Y)
E(X-Y)=E(X)-E(Y)
4 E(XY)=E(X)E(Y), if X and Y are independent
Topic 2: Different kinds of distribution, shape and attributes
Defined σ 2=Var(X)=E((X-u)^2)
Page 26 of 54
Topic 2: Different kinds of distribution, shape and attributes
Example: Going back to the previous example for expectation involving the
dice game, we would calculate the standard deviation for this discrete
Page 27 of 54
Laws of variance
that counts how often a particular event occurs in a fixed number of tries or
Page 29 of 54
Topic 2: Different kinds of distribution, shape and attributes
for which there are only two possible outcomes such as a coin flip. If one
N = number of trials
X = number of successes
Page 30 of 54
Example:
H T
H T H T
H T HT HT HT
P(0)=(3!/0!*3!)*p^0(1-p)^3=(1-P)^3
P(1)=(3!/1!*2!)*p*(1-p)^2=3*p*(1-P)^2
Topic 2: Different kinds of distribution, shape and attributes
flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers
and produce five offspring. Find the probability that there will be no red flowered
The number of red flowered plants has a binomial distribution with n = 5, p = .25
P(X=0)=5!/0!(5−0)!0.25^0(1−.25)^5=1×.755^5=.237
There is a 23.7% chance that none of the five plants will be red flowered.
Page 32 of 54
Topic 2: Different kinds of distribution, shape and attributes
n = number of trials
Page 33 of 54
Topic 2: Different kinds of distribution, shape and attributes
Example: A roulette wheel has 38 slots, 18 are red, 18 are black, and 2 are green.
You play five games and always bet on red. How many times you expected to win?
μ = np = 5(18 / 38)=2.3684
Out of 5 games, you can expect to win 2.3684 (with a standard deviation of 1.1165).
Page 34 of 54
Topic 2: Different kinds of distribution, shape and attributes
distribution.
Page 35 of 54
When dealing with continuous random variable X, we attempt to find a
P(a<X<b)= f(x)dx
1 f(x) is nonnegative
Example: A survey finds the following probability distribution for the age of
a rented car.
What is the probability that a rented car is between 0 and 4 years old?
Page 37 of 54
Topic 2: Different kinds of distribution, shape and attributes
Solution: The histogram of this distribution is shown on the left of the figure below.
The curve on the right is the graph of some function f, which we call a probability
density function. The domain of f is [0, +∞) . The probability that a rented car is
Page 38 of 54
Practice:
f(x)=-.5x+1 0<=x<=2
(-.5*(x^2/2)+x)
Uniform distribution:
f(x)=1/(b-a) a<=x<=b
1/(b-a)
A=1
a b
E(x)=(1/2)(a+b) V(X)=(b-a)^2/12
Example:
P(120<=x<=150)=(150-120)(1/80)=.375
Topic 2: Different kinds of distribution, shape and attributes
μ and .
Page 43 of 54
Topic 2: Different kinds of distribution, shape and attributes
50% of values less than the mean and 50% greater than the mean
Page 44 of 54
Topic 2: Different kinds of distribution, shape and attributes
following formula:
Page 45 of 54
Topic 2: Different kinds of distribution, shape and attributes
Example:
Page 46 of 54
Topic 2: Different kinds of distribution, shape and attributes
Solution: The graph on the previous page shows that more than half of the curve
is shaded in; P(x<73)=P((x-65)/5<(73-65)/5)=P((x-65)/5<1.6)
Z=(x-65)/5 N(0,1)
Then P(x<73)=.9452
Next, we can use the z table to determine the proportion of the curve that is less
than a z score of 1.6 by looking up 1.6. The cumulative probability for x = 1.6 is
0.9452. There is a 94.52% chance of randomly selecting a vehicle that is going 73
mph or less.
Page 47 of 54
Topic 2: Different kinds of distribution, shape and attributes
Page 48 of 54
Practice:
Determine the following probabilities
A p(Z>=1.47)
B P(0<=Z<=1.85)
C P(1.65<=X<=2.36)
exactly 2 fours?
Page 50 of 54
Exercise 2:
The data set of N=90 ordered observations as shown below is examined for outliers:
Page 51 of 54
Exercise 3:
deviation of 15. What IQ score separates the bottom 30% from the
Page 52 of 54
Bibliography:
Page 53 of 55