Statssss
Statssss
α = the area under the normal curve in the tails of the distribution outside the area
defined by the confidence interval
Distribution of Sample Mean for 95% Confidence
margin of error of the interval = the distance between the statistic computed to
estimate a parameter and the parameter.
Common z values
95% confident that the population mean is in an interval: If the company
were to randomly select 100 samples of 85 bills and use the results of each sample
to construct a 95% confidence interval, approximately 95 of the 100 intervals would
contain the population mean.
The t Distribution
= a distribution that describes the sample data in small samples when the standard
deviation is unknown and the population is normally distributed
WEEK 7
9.1. Introduction to Hypothesis Testing
Hypothesis = a tentative explanation of a principle operating in nature
Types of Hypotheses
1. Research hypotheses
2. Statistical hypotheses
3. Substantive hypotheses
1. Research Hypothesis
= a statement of what the researcher believes will be the outcome of an
experiment/study
“Older workers are more loyal to a company”
2. Statistical Hypotheses
= a formal hypothesis structure set up with a null and an alternative
hypothesis to scientifically test research hypotheses
a) null hypothesis H0
The hypothesis that assumes the status quo—that the old theory, method,
or standard is still true (Older workers are more loyal to a company)
b) alternative hypothesis Ha
The hypothesis that the researcher is interested in proving (Older workers
are not more loyal to a company)
Tests:
1. Two-tailed tests
- a statistical test wherein the researcher is interested in testing both sides of
the distribution
- nondirectional: Ha allows for either the > or < possibility; H a: ≠
o same, different, equal, control, out-of-control
- Are the machines overfilling or underfilling the wheat packages?
- 2 critical points, 2 rejection regions, α split in half
2. One-tailed tests
- a statistical test wherein the researcher is interested in testing one side of the
distribution
- directional: Ha uses either the > or < possibility; H a : < or Ha: >
o higher, lower, older, younger, more, less, longer, shorter
- 1 critical point, 1 rejection region, α not split in half
3. Substantive hypotheses
substantive result = what occurs when the outcome of a statistical study
produces results that are important to the decision-maker
3. Set α = 0.05
4. Decision rule: zα/2 = ± 1.96, z = 0.5 - α/2 = 0.5 – 0.025 = 0.4750
5. Gather data: sample mean is $78,695
6. Calculate z with n = 112, x̄ = $78,695, σ = $14,530, μ = $74,914
7. Observed value (2.75) > critical value of z reject the null hypothesis
WEEK 8
10.1. Hypothesis Testing and Confidence Intervals About the Difference in
2 Means Using the z Statistic (σ2 Known)
EXAMPLE: Which toothpaste brand is more effective?
Hypothesis Testing
If σ 21=σ 22=σ 2:
t formula
3. α = 0.05
4. two-tailed test
Df = 25
t0.025,25 = ± 2.060
5. Gather data
6. t value
n = number of pairs
d = sample difference in pairs
D – mean population difference
sd – standard deviation of sample difference
d̄ = mean sample difference
1.
2.
3. Assume α = 0.01
4. α/2 = 0.005. n = 9, df = 8, t0.005,8 = ± 3.355
- if t > 3.355 or t < - 3.355 reject null hypothesis
6. t = - 0.7
7. fail to reject null hypothesis
EXAMPLE: Is the proportion of people driving new cars in Windsor different from
the proportion in Kingston.
- ANOVA test are one-tailed with the rejection region in the upper tail
One-way ANOVA partitions the total variance of the data into 2 variances:
1. the variance resulting from the treatment (columns)
2. the error variance, or that portion of the total variance unexplained by the
treatment
EXAMPLE: An analyst decides to analyze the effects of the machine operator on the
valve opening measurements of valves produced in a manufacturing plant. The
independent variable in this design is the machine operator. Suppose further that 4
different operators operate the machines. These four machine operators are the
levels of treatment (classification) of the independent variable. The dependent
variable is the opening measurement of the valve. Is there a significant difference in
the mean valve opening of 24 valves produced by the 4 operators?
ANOVA table
In the machine operator example, dfC = 3 and dfE = 20. F0.005, 3, 20 = 3.1. The
observed F value is 10.18 and larger than the critical F value. The null hypotjesis is
rejected.
WEEK 10
12.1 Correlation
Correlation = a measure of the degree of relatedness of two or more variables.
- several measures of correlation are available, ideally you will solve for p (the
population coefficient of correlation)
coefficient of correlation, r
= measure the linear correlation of two variables, interval data
- business analysts with sample data
- Pearson product-moment correlation coefficient
- from -1 (inverse) to 0 (no correlation) to +1 (perfect correlation)
Examples of correlations:
12.2 Introduction to Simple Regression Analysis
- probabilistic: includes an error term that allows for various values of output to
occur for a given value of input
Outliers = data points that lie apart from the rest of the points.
- located with residuals
residual plot = a type of graph in which the residuals for a particular regression
model are plotted along with their associated values of x (x, y - ^y )
homoscedasticity = the condition that occurs when the error variances produced
by a regression model are constant
heteroscedasticity = the condition that occurs when the error variances produced
by a regression model are not constant
68% 0 ± 1se
95% 0 ± 2se
- you can identify outliers by looking at data that is ± 2s e or ± 3se
12.7 Hypothesis Tests for the Slope of the Regression Model and for the
Overall Model
Testing the Slope
- a hypothesis can be conducted on the sample slope of the regression model to
determine whether the population slope is significantly different from zero
- another way to determine how well a regression model fits the data
- the values of the sum of squares (SS), degrees of freedom (df), and mean squares
(MS) are obtained from the ANOVA table
WEEK 11
12.8 Estimation
- regression analysis can be used as a prediction tool
Confidence Intervals to Estimate the Conditional Mean of y: µ y/x
- one type of confidence interval is an estimate of the average value of y E(y x) for a
given x
where x0 is a particular value of x
x0 = 73, ^y = 4.5411
EXAMPLE:
x0 = 73, ^y = 4.5411
The F Value
At a = 0.05, the null hypothesis is rejected because the probabilities (p) are less
than 0.05.
- if the t ratios for any predictor variable are not significant (fail to reject H 0), the
analyst might decide to drop that variable from the analysis as a nonsignificant
predictor
- the df for each of these individual tests of regression coefficients are n – k – 1
- testing the regression coefficients gives the analyst some insight into the fit of the
regression model but also helps in the evaluation of how worthwhile individual
independent variables are in predicting y
WEEK 12
14.2 Indicator (Dummy) Variables
= qualitative variables that represent whether or not a given item or person
possesses a certain characteristic and are usually coded as 0 (negative) or 1
(affirmative).
- if an indicator variable has c categories, then c – 1 dummy variables must be
created