0% found this document useful (0 votes)
4 views

Statistics class notes

The document covers various statistical concepts including measures of central tendency, probability rules, random variables, and probability distributions. It discusses the Empirical Rule, outlier detection methods, and the binomial probability distribution, providing examples and calculations throughout. Additionally, it highlights the use of Excel for statistical analysis and methods for assessing the normality of data.

Uploaded by

cardiopiloti
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Statistics class notes

The document covers various statistical concepts including measures of central tendency, probability rules, random variables, and probability distributions. It discusses the Empirical Rule, outlier detection methods, and the binomial probability distribution, providing examples and calculations throughout. Additionally, it highlights the use of Excel for statistical analysis and methods for assessing the normality of data.

Uploaded by

cardiopiloti
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Statistics class 2

Example my drive time from home to USF has a mean of 30 minutes and
a std deviation of 2 minutes

a. If nothing is known about the shape of the drive times, what


proportion of times are less than 24 minutes?

Answer: at most 11

b. Assuming a mound-shaped symmetric distribution. What proportion


of drive times falls between 32 and 36 minutes?

Answer: Approximately 16 percent

Modified Empirical Rule – works for skewed data sets

Approximately 60 to 90% of the data fall between x+-s

Approximately 90-100% fall between x+-2s

Approximately 100% fall between x+-3s

We use these rules to interpret the std deviation by explaining where we


expect most of the data to dell

-For(modified) Empirical rules: we expect most to fall between x+-2s

-For Chebyshev: we expect most to fall between x+-3s

Measures of relative standing

Ex. ACT vs SAT

1. Percentiles – give the percent of the data in the distribution that fall
below a particular value.

Ex. 75th percentile – upper quartile (75% fall below and 25% fall above)

25th percentile – lower quartile (25% fall below and 75% fall above)

50th percentile -median

Z-scores => Z = (x-m)/sigma

The z-score tells us the number of std deviations on observation falls from
the mean, and in which direction

Ex. Drive Times: x=30; s=2

X=27; x= (27-30)/2 = -3/2 = -1.5

X=38; z= (38-30) = 8/2 = 4


Suppose 33 minutes is the 78th percentile.78 percent of my drive time is
less than 33 minutes and 22 percent exceed.

Outliers

- Outside x+-2s for mound-shaped data


- Outside x+-3s for all other data

Outliers indicate a problem with the data:

1. Observation is miscoded
2. The observation comes from a population different than the one
specified
3. Observation is the result of a rare chance

Ex. Poker Chip

Population: 95% red

5% white

Pick; red

Methods of Detecting Outliers

1. Z-score method: calculate the z-score for an observation and


considering the correct rule to use, determine if it is an outlier

Chebychev: Outliers if z>=+-3

(Modified) Empirical: Outlier if z>=+-2

2. Box Plot Metho: a graphical method that uses the quartiles of the
data as the basis for identifying outliers.
Statistics Notes class 3

Probability

Experiment - the process of making an observation.

Sample Points - most basic outcomes of an experiment.

Event – an outcome of the experiment; a collection of sample points

Probability Rules

1. 0<=P(x)<=1
2. SIGMA p(x) =1

Symbols:

Union (or probability) – P (A or B) = P (A U B)

Intersection (and probability) – P (A and B) =P (A^B)

Conditional Probability: P (A | B); Given that event B has occurred, what is


the probability that event A occurs

Mutually Exclusive Events – two events are mutually exclusive if they


cannot occur at the same time

Independent Events – the outcome of one event does not change or affect
the probability of observing another event

Complement – the complement of event A is the event that A does not


occur

Ex. Birthday Problem

What is the probability at least two of us have the same birthday?

Probability Tables

Car Inventory

Color

a. What is the probability that a randomly selected car has a manual


transmission? =30/50
b. What proportion of cars are red or blue? = (11+15)/50=26/50
c. What proportion of cars are black or have a manual transmission?
d. What proportion are red and have automatic transmission? = 7/50
e. Given that the cat has a manual transmission, what is the
probability it is red? = 4/30
Statistics Class Notes 3 (Week 5)

Random Variables and Probability Distributions

Claim: 30% of all college students have a tattoo.

We randomly sample five college students and find one with a tattoo

What can we say about the claim?

Outcomes: NNNTN; NNNNN; TNNNN; NTNTN. How many outcomes: 32


that’s 2^5

What if n=50? 2^50 =1.126*10^15

Random Variable – a variable that assigns a number to every outcome


of the experiment. x= the number with a tattoo.

n=5: x=0,1,2,3,4,5

n=50: x=0,1, 2, … ,50

Two Types:

1. Discrete - the variable assigns a “countable” number of values to


the outcomes of the experiment
Ex. The number of students with a tattoo; Exam 1 grades; The
number of phones that ring during class.
2. Continuous - the variable can assign any value in an interval of
values.
Ex. My drive time to USF, Heights of Students.

Probability Distribution – a table, graph, or formula that gives the


probability of observing each value of the random variable.

Must Follow

1. 0=<p(x)=<1
2. Sigma p(x)=1
Ex. n=5 students, (assume p=0.3)
Expected Value of a Discrete Random Variable

The expected number of children for a US Female is 1.78

The average SAT score of USF freshmen is 1306

Expected value of X: E(x) = Mue (m) = Sigma xp(x)

Ex. n=5 students

Expected number with a tattoo: 1.5

Expected values should be interpreted as long-term averages, not as


values we expect to observe on a single attempt.

Binomial Probability Distribution

Criteria:

1. The experiment consists of “n” identical trials.


2. There are two outcomes, success and failure, possible for each trial.
3. The probability of success = P; The probability of failure = 1-P
4. The trials are independent.

The binomial random carriable is:

X = the number of successes in the “n” trials (x=0,1,2, …, n)


Formula:

The left part: The number of outcomes with x successes and (n-x) failures

The right part: the probability of each outcome that has x successes and
(n-x) failures

Mean: M=np; Std.Dev: T=sqr root (np(1-p))

Ex. Suppose you decide to guess on every question on a 20-question


true/false exam. We are determining the number of questions you guess
correctly.

a. Is this a binomial?
- n = 20 questions
- Success = guess correctly
- P(success) = 0.5 = p => (1-p) =0.5
- Questions are independent
X = the number of questions guessed correctly
b. Find the probability that you guess exactly half the questions
correctly.
P (x=10) = (20! /10! (20-10)!) * (.5) ^10 * (1-0.5) ^20-10 =
0.176197
c. Find the probability that you pass the exam with at least a C.
P(x>=14) = P(14) + P(15) + … + P(20)
It’s Tough!!!
Cumulative Binomial Probabilities

- Software or Table provide

P(x=<k) for certain values of n and p

- Get clever! Re-think the binomial problems in terms of “=<” probabilities

Ex. Binomial: n=15, p=0.4

a. P(x=<8) = 0.905
b. P(x>=4) = P(x=<15) or 1 – P(x=<3)

Binomial: P(x=10) = P(x=<10) – P(x=<9)

c. P(x>=14) = P(x=<20) – P(x=<13)


d. P(4<x=<14) = P(x=<14) – P(x=<4)

Excel: Use Binoomial.Dist Function

Binom.Dist(k,n,p,True/False); True-less or equal to, False-equal to


Ex. Mars Inc. claims that 40% of their M&Ms are red. To test this claim, we
are going to randomly sample 20 M&Ms.

a. Find the probability that at least 7 but fewer than 12 of the M&Ms
are red
Success = red: n=20, p=0.4
P(7=<x<12)

Ex. Find the probability that more than 7 are not red

Make success “not red” and draw the dots. Or P(x=<12)

Statistics class notes 4


Examples

a. P(Z=<1.00)=0.8413

b. P(Z>0.32)= 1-0.6255=0.3745

P(-1.50<Z<0.610) =P(Z=<0.61)-P(Z=<-1.5)

=0.6623

Find the z-score, Zo, such that:

d. P(Z=<Zo)=0.8944

Zo=+1.25

e. P(Z<Zo)=0.1056

Zo=-1.25
f. P(Z>Zo)=0.9582; Zo=-1.73

We can use the standard normal distribution to solve all normal curve
questions.

Key: Z=(z-m)/standard deviation

Working with normals in Excel

Norm.Dist(x,m,sd,True)

- returns the”<=” probability for x

Norm.Inv(prob.;m;st)

Ex. the GPAs of students follow a normal distribution with m=2.8 and
sd=0.4

a. What proportion of students have GPAs above 3.0?

Method 1: Convert to z-score; solve using z-table

Z=(x-m)/sd;
z=(3.0-2.8)/0.4=0.50=0.50;1-0.6915=0.3085

Method 2: Use Excel (I love Excel)

=1-Norm.Dist(3.0;2.8;0.4;True)=0.3085

b. Find the proportion of students with GPAs between 2.0 and 2.5

Z1=-0.75

Z2=-2.00

=Norm.Dist(2.5;2.8;0.4;True)- Norm.Dist(2.0;2.8;0.4;True)
c. Identify the GPA that identifies the lowest 9% of student GPAs.
Method 1: use the “<=” probability and the cumulative normal table
to find Zo. Zo= -1.34
Solve for Xo=m+ZoSt=2.264

Method 2: Use Excel


=Norm.Inv(.09;2.8;0.4)

Assessing the normality of data

1. Chapter 2 plots – construct a histogram and/or stem-and-leaf plot of


the data (don’t be too judgy!!)
2. Empirical Analysis – create the x +-s, x+-2s, x+-3s intervals.
Compare the percentage of your data in these intervals to the
Empirical rule’s 68%, 95%, and 100%. (pay particular attention to
the 68%)
3. *Calculate the value of IQR/s=(Q3-Q1)/s. The closer it is to 1.3, the
more normal your data is.
4. Construct a normal probability Plot. The straighter the plot, the more
normal the data is.

You might also like