Regression Analysis
Regression Analysis
Assignment II
Simple Linear Regression Analysis and Hypothesis Testing
F-distribution chart
Cont’d
Chi-square chart
T-TEST
A set of data gathered from two similar or different
groups
T-test is applicable for a smaller sample size
Only valid and should be done when the mean or
average of only two categories or groups needs
to be compared
Assumptions of T-Test
• The measurement scale used for such hypothesis testing follows a set of
continuous or ordinal patterns. The accounted parameters and variants
influencing the samples and surrounding the groups are based on the standard
consideration.
• The tests are completely based on random sampling. As no individuality is
maintained in the samples, the reliability is often questioned.
• When the data is plotted with respect to the T-test distribution, it should follow
a normal distribution and bring about a bell-curved graph.
• For a clearer bell curve, the sample size needs to be bigger.
• The variance should be such that the standard deviations of the samples are
almost equal.
• There should be no extreme outliers in the differences.
Example – One Sample T-Test
• A claim is made that the average number of days a
person spends on vacation is more than or equal to 5
days (hypothesized population mean) based on a sample
of 16 people whose mean came out to be 9 days.
Example
• A survey was conducted in the randomly selected individuals in a
shopping mall to determine if educational attainment is related to
gender.
• First organize the data file into
cross-tabulation of the two
qualitative (nominal) variables to
obtain the frequencies for each
category, which can be done using
statistical software, especially for a
very large sample.
• Formulate the hypotheses
Null Hypothesis:
– H0: There is no significant association between gender and education
level.
Alternative Hypothesis:
– Ha: There is a significant association between gender and education
level.
• Specify the expected values for each cell of the table (when the null
hypothesis is true)
The expected values specify what the values of each cell of the table
would be if there was no association between the two variables.
• To see if the data give convincing evidence against the null hypothesis,
compare the observed counts from the sample with the expected counts,
assuming H0 is true.
Statistical software such as SPSS, Datatab etc…will compute both the
expected and observed counts for each cell when conducting a chi-square
test.
If these values are entered into the formula for the chi-square tests statistic, the
value obtained is 0.504.
By number:
6. Even Likert Scale 4, 8 point Likert scales
7. Odd Likert Scale 5, 7 and 9 point scales
Example of LIKERT SCALE
• A bank wants to know the customer satisfaction on its newly introduced
ATM machine. It administered the following questionnaire in 100 ATM
users of the new ATM is planted and compare it with their satisfaction on
the machine it has replaced. Customer rating on the first machine, using a
Likert Scale question of 5 points, was 35 % very poor and poor, 50% as
average and 15% as good and excellent.
• Using the same tool the survey found the following result in the table.
Question 1 2 3 4 5
Very poor Poor Average Good Excellent
How would you rate the service of the 9 12 42 30 7
new ATM machine?
Total % 9 12 42 30 7
Cont.
Conclusion: The finding of the survey shows that out of the 100
customers 9% rated the service of the new ATM as very poor, 12%
as poor, 42% as average, 30% as good and 7% as excellent.
Film Production
GDP
• The data shows that the 38 have produced a total of X films.
• Most of the data points are clustered near the origin, indicating that there are many
countries with low GDP and low film production.
• There are a few scattered points extending out towards higher GDP and higher film
production values.
• This graph implies that there is some positive relationship between the two variables.
E. Calculate the mean, median, range, standard deviation, standardized inter-quartile
deviation, correlation coefficient, and coefficient of variation for the given data on the
dependent and explanatory variables and interpret the results. Does the sign of the
estimated correlation coefficient confirm the pattern of relationship depicted under ‘d’?
Descriptive Statistics
Mean
N Mean
FILM_PRODUCTION 38 268.53
GDP 38 5618.026
Valid N (listwise) 38
Descriptive Statistics
Range
N Range
FILM_PRODUCTION 38 2589
GDP 38 28742.0
Valid N (listwise) 38
• The range for FILM_PRODUCTION is 2,589, indicating the difference between the
highest and lowest values in the dataset for FILM_PRODUCTION.
• The range for GDP is 2,8742.0, representing the difference between the highest
and lowest values in the GDP dataset.
Standard deviation
Descriptive Statistics
Standard deviation
N Std. Deviation
FILM_PRODUCTION 38 543.125
GDP 38 6049.4112
Valid N (listwise) 38
FILM_PRODUCTION GDP
N Valid 38 38
Missing 0 0
Percentiles 25 28.75 1700.000
50 58.50 3081.500
75 188.75 7808.250
•The IQR provides a measure of the spread of the middle 50% of the data.
A larger IQR indicates a greater spread of values within that middle 50%.
FILM_PRODUCTION GDP
FILM_PRODUCTION Pearson Correlation 1 .055
N 38 38
N 38 38
• The correlation between FILM_PRODUCTION and GDP is 0.055. The p-value for this correlation
is 0.741.
• A correlation of 0.055 suggests a very weak positive relationship between FILM_PRODUCTION
and GDP. Additionally, the p-value of 0.741 indicates that this correlation is not statistically
significant at the conventional significance level of 0.05.
• In summary, based on these results, there is no strong evidence to suggest a significant linear
relationship between FILM_PRODUCTION and GDP in this dataset.
F. Assuming a linear relationship between the variables ‘x’ and ‘y’, and normal
distribution, estimate the linear regression equation (i.e., the best-fit line) depicting ‘y’
as a function of ‘x’ for the given sample data (do this by making use of one of the
software packages and annex the software output).
Interpretation
• The t-value of 0.33 suggests that there is not a statistically significant relationship
between "GDP" and "FILM_PRODUCTION".
• The p-value of 0.741 indicates that there is insufficient evidence to reject the null
hypothesis at a typical significance level of 0.05.
• In conclusion, there does not appear to be a statistically significant relationship between
GDP and FILM_PRODUCTION based on these results from the t-test. A unit change of
GDP would change .004966 of film production
F2) Interpret the values of the estimated parameters, the R2 (coefficient of
determination) and the F-test (or goodness of fit test).
• The graph is attempting to analyze the correlation between GDP and film
production, but the data and the low R-squared value suggest that there
may not be a strong linear correlation between the two variables based
on this particular dataset.
Thank You !