0% found this document useful (0 votes)
28 views68 pages

Regression Analysis

This document outlines concepts related to hypothesis testing, including the t-distribution, F-distribution, chi-squared distribution, t-test, F-test, chi-squared test, and ANOVA test. It provides examples and explanations of how to perform one-sample and two-sample t-tests, how to calculate and interpret the F-distribution and F-test when comparing variances, and how the chi-squared distribution is used in goodness of fit tests and other chi-squared tests. The document also briefly defines the Likert scale.

Uploaded by

Tewabe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views68 pages

Regression Analysis

This document outlines concepts related to hypothesis testing, including the t-distribution, F-distribution, chi-squared distribution, t-test, F-test, chi-squared test, and ANOVA test. It provides examples and explanations of how to perform one-sample and two-sample t-tests, how to calculate and interpret the F-distribution and F-test when comparing variances, and how the chi-squared distribution is used in goodness of fit tests and other chi-squared tests. The document also briefly defines the Likert scale.

Uploaded by

Tewabe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

Addis Ababa University Center for African and Asian Studies

Assignment II
Simple Linear Regression Analysis and Hypothesis Testing

Partial Fulfillment for the Course Advanced Research Methods


(ASCM 701)

Submitted to Dr. Kidist Gebreselassie

Submitted by Maria Mamo, Yared Zekiros and Tewabe Tadesse


Outline

• Meaning, applications and examples of Student’s t-


distribution, F-distribution, Chi2 (X2) distribution; the
t-test, F-test, X2 (Chi2) test (i.e., X2goodness of fit test,
other X2tests), ANOVA test and Likert Scale

• Exercise on Hypothesis Testing


T-DISTRIBUTION
Developed in 1908 by William Sealy Gosset
Has normal distribution
It is data that follow a bell curve when plotted on a
graph
Population standard deviation is unknown
Greatest number of observations close to the mean
and fewer observations in the tails
It is bell-shaped and uni-modal (For example, a uni-
modal distribution could be a set of test scores
where most students scored around the same value,
resulting in a single peak in the distribution).
T-distribution chart
Cont’d
Its shape depends on the sample size n. As the
sample size n becomes larger, the t-distribution
gets closer to the standard normal distribution
Statistical analysis on some studies which can’t be
done using the normal distribution can be done
using the t-distribution

T-statistics is used when n less than 30


Cont’d
Cont’d
Example (one sample t-test)
• Suppose we have a sample of 10 students, and
we want to test if their average test score is
significantly different from the population
mean. The sample mean is 75, the population
mean is 70, and the sample standard
deviation is 8.
Remember:
Cont’d
Step 1: Calculate the standard error of the
mean The standard error of the mean (SE) is
calculated as the sample standard deviation
divided by the square root of the sample size:
SE = sample standard deviation / √(sample
size) = 8 / √(10) ≈ 2.53
Cont’d

Step 2: Calculate the t-statistic The t-statistic is


calculated as the difference between the
sample mean and the population mean,
divided by the standard error of the mean:
t = (sample mean - population mean) / SE =
(75 - 70) / 2.53 ≈ 1.98
Cont’d

Step 3: Determine the degrees of freedom


The degrees of freedom for a t-test with a
sample size of 10 is 10 - 1 = 9.
Step 4: Find the critical t-value If we want to
test at a 95% confidence level (two-tailed
test), we would need to find the critical t-value
for 9 degrees of freedom. Using a t-
distribution table or calculator, the critical t-
value is approximately ±2.262
Cont’d

Step 5: Make a decision Since our calculated


t-value (1.98) is less than the critical t-value
(2.262), we would fail to reject the null
hypothesis at the 95% confidence level.

 This means that we do not have enough


evidence to conclude that the sample mean is
significantly different from the population
mean.
Cont’d
Example (two sample t-test)
Let's say we have two samples of data:
Sample 1: 5, 7, 9, 11, 13
Sample 2: 6, 8, 10, 12, 14
We want to calculate the t-distribution for these two
samples using the following formula:
t = (x̄1 - x̄2) / √((S12 / n1) + (S22 / n2))
Where: x̄1 and x̄2 are the means of Sample 1 and
Sample 2, respectively s1 and s2 are the standard
deviations of Sample 1 and Sample 2, respectively n1
and n2 are the sample sizes of Sample 1 and Sample 2,
respectively
Cont’d
First, let's calculate the means and standard
deviations of the two samples:
Mean of Sample 1 : (x̄1) = (5 + 7 + 9 + 11 + 13) / 5 =
9
Mean of Sample 2 : (x̄2) = (6 + 8 + 10 + 12 + 14) / 5
= 10
Standard deviation of Sample 1 : (S1) = √((1/4) *
((5-9)2 + (7-9)2 + (9-9)2 + (11-9)2 + (13-9)2))= 2.83
Standard deviation of Sample 2 : (S2) = √((1/4) *
((6-10)2 + (8-10)2 + (10-10)2 + (12-10)2 + (14-10)2))
= 2.83
Cont’d

Now, we can plug these values into the t-


distribution formula:
t = (9 - 10) / √((2.832 / 5) + (2.832 / 5))
t = -1 / √((7.9989 / 5) + (7.9989 / 5))
t = -1 / √(1.5998 + 1.5998)
t = -1 / √3.1996
t = -1 / 1.7889 t = -0.559
So, the t-distribution for these two samples is -
0.559.
Cont’d

To find the critical t-value, we need to know the


degrees of freedom and the desired confidence level.
For this example, let's assume a 95% confidence level
and calculate the critical t-value for a two-tailed test.
First, we need to calculate the degrees of freedom
(df) using the formula:
df = (n1 + n2) - 2
In this case, the sample size for both samples is 5, so:
df = (5 + 5) - 2 df = 8
Cont’d
Next, we can find the critical t-value using a t-
distribution table or a statistical software. For
a 95% confidence level and 8 degrees of
freedom, the critical t-value is approximately
±2.306.
 Since our calculated t-value (-0.559) is
within the range of the critical t-value (-2.306,
2.306), we would fail to reject the null
hypothesis at the 95% confidence level.
F-DISTRIBUTION
Named in honor of R.A. Fisher who studied it in 1924
Ronald Aylmer Fisher, was a prominent British
statistician and geneticist. In 1924, Fisher introduced
the F-distribution as part of his work on statistical
hypothesis testing and analysis of variance.
Used for comparing the variances of two populations
The F-distribution is either zero or positive, so there
are no negative values for F
Used for smaller sample sizes, where the variance in
the data is unknown
Cont’d
It is skewed to the right and its shape depends on the
degrees of freedom
It is a continuous probability distribution with two
degrees of freedom
The F-distribution is always non-negative
The F-distribution has two parameters: degrees of
freedom for the numerator and degrees of freedom
for the denominator
The F-distribution is used to test the equality of
variances in two populations
It is also used in the F-test to compare the variances
of two samples
Cont’d

• The F-distribution is commonly used in


statistics to test hypotheses about population
variances and to compare the fits of different
statistical models
• In regression analysis, the F-distribution is
used to test the overall significance of the
model and the significance of individual
regression coefficients
Cont’d

It gives a lower probability to the center and a


higher probability to the tails than the standard
normal distribution.
A more conservative form of the standard deviation

F-distribution chart
Cont’d

F Distribution can be used for several types of


applications, including:
• Testing hypotheses about the equality of two
population variances
• Testing the validity of a multiple regression equation
• Comparing the fit of different models in regression
analysis
• Constructing confidence intervals and testing
hypotheses about population variances
• One-factor analysis of variance (ANOVA)
Cont’d
Example
Suppose we want to compare the variances of two
samples, Sample 1 and Sample 2, to determine if
they are significantly different. We can use the F-
distribution to perform this comparison.
Let's assume the following data:
Sample 1 variance (S12) = 10
Sample 2 variance (S22) = 5
Sample 1 degrees of freedom (df1) = 5
Sample 2 degrees of freedom (df2) = 8
Cont’d
Now, we can calculate the F-statistic using the
formula:
F = (S12 / S22) / (df1 / df2)
F = (10 / 5) / (5 / 8)
F = 2 / (5 / 8)
F = 2 / 0.625
F = 3.2
Cont’d
Conclusion
• The calculated F-statistic is 3.2.
• We compare this value to the critical F-value
based on the degrees of freedom and the
desired significance level.
• If the calculated F-statistic is greater than the
critical F-value, we would reject the null
hypothesis and conclude that the variances of
the two samples are significantly different.
Cont’d
• If the calculated F-statistic is less than the
critical F-value, we would fail to reject the
null hypothesis, indicating that there is no
significant difference in the variances of the
two samples.
In this case, assuming a significance level of 0.05
(commonly used in statistics), we can refer an F-
distribution table or use statistical software to
find the critical F-value for degrees of freedom 5
and 8 at a 0.05 significance level.
Cont’d
Therefore, with a significance level of 0.05, the
critical F-value for degrees of freedom 5 and 8 is
approximately 3.49.
 With a significance level of 0.05, the critical F-
value for degrees of freedom 5 and 8 is
approximately 3.49, and the calculated F-value
is 3.2. Comparing the calculated F-value to the
critical F-value, we can make the following
conclusion regarding the null hypothesis:
Cont’d
Since the calculated F-value (3.2) is less than
the critical F-value (3.49), we fail to reject the
null hypothesis.
This indicates that there is no significant
difference in the variances of the two
samples at the 0.05 significance level.
Therefore, based on this analysis, we do not
have enough evidence to conclude that the
variances of the two samples are significantly
different.
X2-DISTRIBUTION
Takes only positive values
Skewed to the right
Specified by giving its degrees of freedom

Chi-square chart
T-TEST
A set of data gathered from two similar or different
groups
T-test is applicable for a smaller sample size
Only valid and should be done when the mean or
average of only two categories or groups needs
to be compared
Assumptions of T-Test
• The measurement scale used for such hypothesis testing follows a set of
continuous or ordinal patterns. The accounted parameters and variants
influencing the samples and surrounding the groups are based on the standard
consideration.
• The tests are completely based on random sampling. As no individuality is
maintained in the samples, the reliability is often questioned.
• When the data is plotted with respect to the T-test distribution, it should follow
a normal distribution and bring about a bell-curved graph.
• For a clearer bell curve, the sample size needs to be bigger.
• The variance should be such that the standard deviations of the samples are
almost equal.
• There should be no extreme outliers in the differences.
Example – One Sample T-Test
• A claim is made that the average number of days a
person spends on vacation is more than or equal to 5
days (hypothesized population mean) based on a sample
of 16 people whose mean came out to be 9 days.

– Null Hypothesis (H0): The average number of days a


person spends on vacation is equal to 5 days.
Mathematically, H0: μ=5.
– Alternative Hypothesis (Ha): The average number of
days a person spends on vacation is more than 5
days. Mathematically, Ha: μ>5.
• Sample size of 16 persons is taken. The mean number of days spent on vacation by the
persons in sample is found to be 9 days with a sample standard deviation is found to
be 3 days and confidence level 95%.
• Formula:
 x̄ = 9, μ = 5, s = 3, n = 16
 t =(9-5)/(3/ √16) = 5.33

• The critical t-value for a one-tailed test at


degree of freedom (n-1) or 16-1 = 15,
the alpha level of 0.05 is 1.753.

• If the calculated t-value is 5.33 and the


critical t-value for a one-tailed test at the
alpha level of 0.05 is 1.753, you can
make the following conclusions about Interpretation: There is a statistically
the null hypothesis: significant difference between the sample
mean and the hypothesized population
Conclusion: Since the calculated t-value mean, and the sample provides enough
(5.33) is greater than the critical t-value evidence to support the claim that the
(1.753), there is sufficient evidence to reject average number of days a person spends
the null hypothesis at the 0.05 sign. level. on vacation is more than 5 days.
F-TEST
• The F-test is a statistical test that is used to
compare the variances of two or more groups
or samples. It is based on the F-distribution,
which is a probability distribution that arises
when comparing the variability between
groups to the variability within groups
Cont’d
Purposes of F-Test
• Testing equality of variances: When
comparing two or more groups, the F-test can
be used to determine if their variances are
statistically equal. This is useful, for example,
in assessing whether different treatments or
interventions have similar levels of variability.
Cont’d
• Comparing means: In certain situations, such
as in analysis of variance (ANOVA), the F-test
can be utilized to compare means across
multiple groups.
This involves using a "between-groups"
estimate of variance and a "within-groups"
estimate of variance to calculate an F-statistic.
Chi Square-TEST
• Used to determine whether the association between two
qualitative variables is statistically significant.

Example
• A survey was conducted in the randomly selected individuals in a
shopping mall to determine if educational attainment is related to
gender.
• First organize the data file into
cross-tabulation of the two
qualitative (nominal) variables to
obtain the frequencies for each
category, which can be done using
statistical software, especially for a
very large sample.
• Formulate the hypotheses
Null Hypothesis:
– H0: There is no significant association between gender and education
level.
Alternative Hypothesis:
– Ha: There is a significant association between gender and education
level.

• Specify the expected values for each cell of the table (when the null
hypothesis is true)
The expected values specify what the values of each cell of the table
would be if there was no association between the two variables.
• To see if the data give convincing evidence against the null hypothesis,
compare the observed counts from the sample with the expected counts,
assuming H0 is true.
Statistical software such as SPSS, Datatab etc…will compute both the
expected and observed counts for each cell when conducting a chi-square
test.

Statistical software such as SPSS, Datatab etc…will compute both the


expected and observed counts for each cell when conducting a chi-square
test.
• Compute chi test statistic.
Chi-Square Test – Test Statistic

If these values are entered into the formula for the chi-square tests statistic, the
value obtained is 0.504.

• Decide if chi-square is statistically significant


– The final step of the chi-square test of significance is to determine if the value
of the chi-square test statistic is large enough to reject the null hypothesis.
– Significance levelis5% or chosen p-value chosen 0.05.
– Statistical software makes this determination much easier.X²=0.504
Result
Chi2 0.504
df 3
p-value 0.918
Interpretation: A Chi-Square test was performed between gender and education.
No expected cell frequency were less than 5. There is no statistical relationship
between gender and higher education.
ANOVA TEST
• Used to analyze whether there are statistically significant
differences among the means of three or more groups. It is
often used to compare means across different levels of a
categorical variable.
• Often used to compare means across different levels of a
categorical variable.
• It cannot tell you which specific groups were statistically
significantly different from each other, only that at least two of
the groups were.
• Example: level of employee training by category - beginner,
intermediate and advanced and customer satisfaction ratings.
– null hypothesis same rating for all employee category
– Alternative: different performance rating among employees
category
Types of ANOVA Test
• One-way ANOVA– testing differences between three or more groups
based on one independent variable.
• Example, comparing the sales performance of different stores in a retail
chain.
• Two-way ANOVA: two independent variables, Example,
impact of both advertising spend and product placement on sales revenue.
• Factorial ANOVA: more than two independent variables.
Example, a business might examine the
combined effects of age, income and education level on consumer
purchasing habits.
• Welch’s F-test ANOVA: Used when the assumption of equal variances is not
met. Example, a company might use
to compare the job satisfaction levels of employees in different departments,
where each department has a different variance in job satisfaction scores.
Assumptions of ANOVA Test
 Normality: The first assumption is that the groups each fall
into what is called a normal distribution. This means that the
groups should have a bell-curve distribution with few or no
outliers.
 Homogeneity of variance: Also known as homoscedasticity,
this means that the variances between each group are the
same.
 Independence: The final assumption is that each value is
independent from each other. This means, for example, that
unlike a conjoint analysis the same person shouldn’t be
measured multiple times.
Example of ANOVA Test
Let's consider a simple example to demonstrate the calculations for a one-
way ANOVA by hand. Suppose we have three groups of participants, each
following a different workout program, and we want to compare their
average weight loss. The data is as follows:
• Group A: 10, 12, 15, 11, 13 (Sample size = 5)
• Group B: 8, 9, 11, 10, 12 (Sample size = 5)
• Group C: 6, 7, 9, 8, 10 (Sample size = 5)
Step 1: Calculate the mean for each group
• Mean of Group A = (10 + 12 + 15 + 11 + 13) / 5 = 12.2
• Mean of Group B = (8 + 9 + 11 + 10 + 12) / 5 = 10
• Mean of Group C = (6 + 7 + 9 + 8 + 10) / 5 = 8
Step 2: Calculate the overall mean (Grand Mean)
•Grand Mean = (12.2 + 10 + 8) / 3 = 10.07
Step 3: Calculate the Sum of Squares Total (SST)
– SST = (10-10.07)2 + (12-10.07)2 + (15-10.07)2 + (11-10.07)2 + (13-10.07)2 + (8-
10.07) 2+ (9-10.07)2+ (11-10.07)2 + (10-10.07)2 + (12-10.07)2 + (6-10.07)2 + (7-
10.07) 2+ (9-10.07) 2+ (8-10.07) 2 + (10-10.07) 2= 56.8
Step 4: Calculate the Sum of Squares Between (SSB)
•SSB = 5 * (12.2 - 10.07) 2 + 5 * (10 - 10.07) 2 + 5 * (8 - 10.07) 2 = 30.27
Step 5: Calculate the Sum of Squares Within (SSW)
– SSW = (10-12.2) 2+ (12-12.2) 2 + (15-12.2) 2+ (11-12.2) 2+ (13-12.2) 2 + (8-
10) 2 + (9-10) 2+ (11-10) 2 + (10-10) 2+ (12-10) 2 + (6-8) 2+ (7-8) 2 + (9-8) 2+
(8-8) 2 + (10-8) 2 = 52.8
Step 6: Calculate the Degrees of Freedom
• Degrees of Freedom (df) between = k - 1 = 3 - 1 = 2
• Degrees of Freedom (df) within = N - k = 15 - 3 = 12
• Degrees of Freedom (df) total = N - 1 = 15 - 1 = 14
Step 7: Calculate the Mean Squares
• Mean Square (MS) between = SSB / df between = 30.27 / 2 = 15.135
• Mean Square (MS) within = SSW / df within = 52.8 / 12 = 4.4
Step 8: Calculate the F-Statistic
•F = MS between / MS within = 15.135 / 4.4 = 3.44
Step 9: Compare to Critical Value
• We compare the calculated F-value to the critical F-value for the
chosen significance level and degrees of freedom.
• In this example, the calculated F-statistic would be compared to the
critical F-value from an F-distribution table to determine if there are
significant differences in the mean weight loss between the workout
programs.
LIKERT SCALE
• A psychometric response scale
• Used in questionnaires to obtain participant’s preferences or
degree of agreement with a statement or set of statements.
• Indicating level of agreement with a given statement by way
of an ordinal scale
• Used to measure peoples’ attitudes, opinions, or perceptions.
• Used in psychology, sociology, education, marketing research
etc.
• Examples customer satisfaction, public opinion research, from
brand affinity, political beliefs etc..
Types of LIKERT SCALE
By question:
1. Agree to Disagree Likert Scale: Strongly Disagree/Disagree/Neither
agree nor disagree/Agree/Strongly Agree
2. Satisfaction Likert Scale: Very dissatisfied/Somewhat
dissatisfied/Neither dissatisfied or satisfied/Somewhat satisfied/Very
satisfied
3. Likelihood Likert Scale: Very unlikely/Somewhat unlikely/Neither likely
nor unlikely/Somewhat likely/Very likely
4. Good to bad Likert Scale: Very poor/Poor/Average/Good/Excellent
5. Frequency Likert Scale: Never/Rarely/Sometimes/Often/Always

By number:
6. Even Likert Scale 4, 8 point Likert scales
7. Odd Likert Scale 5, 7 and 9 point scales
Example of LIKERT SCALE
• A bank wants to know the customer satisfaction on its newly introduced
ATM machine. It administered the following questionnaire in 100 ATM
users of the new ATM is planted and compare it with their satisfaction on
the machine it has replaced. Customer rating on the first machine, using a
Likert Scale question of 5 points, was 35 % very poor and poor, 50% as
average and 15% as good and excellent.
• Using the same tool the survey found the following result in the table.

Question 1 2 3 4 5
Very poor Poor Average Good Excellent
How would you rate the service of the 9 12 42 30 7
new ATM machine?
Total % 9 12 42 30 7
Cont.
Conclusion: The finding of the survey shows that out of the 100
customers 9% rated the service of the new ATM as very poor, 12%
as poor, 42% as average, 30% as good and 7% as excellent.

The proportion of customers who rated the new machine as


generally poor is lesser (21%) than the old machine (35%) and
average rating of new machine is also better than the old machine
(42% vs 50%). Similarly (37%) rated the new machine as good and
excellent compared to (15%) rating on the old machine.

Interpretation: A larger portion of the customers found the new


machine to perform better than the old one. Therefore, it was a
good decision to replace the old machine with the new one.
Exercise on Hypothesis Testing
•Drawing scatterplots, estimating the best-fit line, testing hypothesis and
making predictions
A. Take a sample data of at least 30 observations for two related variables,
‘Y’ and ‘x’ (Y is a dependent variable and ‘x’ is a potential explanatory
variable). The data could be secondary or primary. Indicate which
variable is dependent and which one is explanatory.
B. State the source of your idea for the potential relationship between ‘x’
and ‘y’ and cite relevant sources.
C. Present the data in tabular form, including the units of measurement for
each variable, and state the type of data (as cross-sectional, time series,
pooled), and the source of data.
Cont’d
• A cross-sectional data of the year 2019 was used of gross domestic product (GDP)
per capital and film production of 38 countries by United Nations Educational,
Scientific and Cultural Organization (UNESCO).
• UNESCO. The African Film Industry Trends, Challenges and Opportunities for Growth.
Published in 2021.UNESCO
• N B Total number of Film Production is 10,204
• GDP AND FILM PRODUCTION OF SELECTED AFRICAN COUNTERIES IN 2019.docx
• The selection of dependent and independent variable is based on the Keynesian
Economic Theory of Consumption and Income.
• The theory states that income distribution positively affects consumption pattern. In
other words a greater the income in the hands of people leads to increased
consumption even if it wont be as much as the increase in income.
• Therefore, film production is considered as an proxi indicator of film consumption or
general consumption will become the dependent variable. Whereas GDP per capita
is taken as a proxi indicator for income.
D. Plot the data onto a scatterplot. What does the pattern of points suggest to you
about the nature of the relationship between the two variables?

Film Production

GDP
• The data shows that the 38 have produced a total of X films.
• Most of the data points are clustered near the origin, indicating that there are many
countries with low GDP and low film production.
• There are a few scattered points extending out towards higher GDP and higher film
production values.
• This graph implies that there is some positive relationship between the two variables.
E. Calculate the mean, median, range, standard deviation, standardized inter-quartile
deviation, correlation coefficient, and coefficient of variation for the given data on the
dependent and explanatory variables and interpret the results. Does the sign of the
estimated correlation coefficient confirm the pattern of relationship depicted under ‘d’?

Descriptive Statistics

Mean

N Mean
FILM_PRODUCTION 38 268.53
GDP 38 5618.026
Valid N (listwise) 38

• These mean values provide a central tendency measure for the


FILM_PRODUCTION and GDP variables in the dataset.
• The mean FILM_PRODUCTION value of 268.53 suggests the average level
of film production, while the mean GDP value of 5618.026 represents the
average level of GDP across the observations.
Median
Descriptive Statistics
Median
FILM_PRODUCTI
ON GDP
N Valid 38 38
Missing 0 0
Median 58.50 3081.500

• The median FILM_PRODUCTION value is 58.50, indicating that half of the


observations have a FILM_PRODUCTION value of 58.50 or lower, and the
other half have a value of 58.50 or higher.
• Similarly, the median GDP value is 3,081.500, suggesting that half of the
observations have a GDP value of 3081.500 or lower, and the other half
have a value of 3,081.500 or higher.
Range

Descriptive Statistics
Range

N Range
FILM_PRODUCTION 38 2589
GDP 38 28742.0
Valid N (listwise) 38

• The range for FILM_PRODUCTION is 2,589, indicating the difference between the
highest and lowest values in the dataset for FILM_PRODUCTION.
• The range for GDP is 2,8742.0, representing the difference between the highest
and lowest values in the GDP dataset.
Standard deviation
Descriptive Statistics
Standard deviation

N Std. Deviation
FILM_PRODUCTION 38 543.125
GDP 38 6049.4112
Valid N (listwise) 38

•The standard deviation for FILM_PRODUCTION is 543.125, indicating the average


amount of variation or dispersion of the FILM_PRODUCTION values around the mean.
A higher standard deviation suggests that the values are more spread out from the
mean.
•The standard deviation for GDP is 6,049.4112 which suggests the average amount of
variation or dispersion of the GDP values around the mean.
Standard inter-quartile deviation
Descriptive Statistics
Standard inter-quartile deviation

FILM_PRODUCTION GDP
N Valid 38 38
Missing 0 0
Percentiles 25 28.75 1700.000
50 58.50 3081.500
75 188.75 7808.250

•The IQR provides a measure of the spread of the middle 50% of the data.
A larger IQR indicates a greater spread of values within that middle 50%.

•The IQR for FILM_PRODUCTION is the difference between Q3 and Q1:


IQR = Q3 - Q1 = 188.75 - 28.75 = 160

• The IQR for GDP is the difference between Q3 and Q1:


. IQR = Q3 - Q1 = 7808.250 - 1700.000 = 6108.250
Correlations
Correlations

FILM_PRODUCTION GDP
FILM_PRODUCTION Pearson Correlation 1 .055

Sig. (2-tailed) .741

N 38 38

GDP Pearson Correlation .055 1

Sig. (2-tailed) .741

N 38 38

• The correlation between FILM_PRODUCTION and GDP is 0.055. The p-value for this correlation
is 0.741.
• A correlation of 0.055 suggests a very weak positive relationship between FILM_PRODUCTION
and GDP. Additionally, the p-value of 0.741 indicates that this correlation is not statistically
significant at the conventional significance level of 0.05.
• In summary, based on these results, there is no strong evidence to suggest a significant linear
relationship between FILM_PRODUCTION and GDP in this dataset.
F. Assuming a linear relationship between the variables ‘x’ and ‘y’, and normal
distribution, estimate the linear regression equation (i.e., the best-fit line) depicting ‘y’
as a function of ‘x’ for the given sample data (do this by making use of one of the
software packages and annex the software output).

. regress FILM_PRODUCTION GDP

Source SS df MS Number of obs = 38


F(1, 36) = 0.11
Model 33400.4894 1 33400.4894 Prob > F = 0.7415
Residual 10881029 36 302250.805 R-squared = 0.0031
Adj R-squared = -0.0246
Total 10914429.5 37 294984.58 Root MSE = 549.77

FILM_PRODU~N Coefficient Std. err. t P>|t| [95% conf. interval]

GDP .0049666 .0149407 0.33 0.741 -.0253344 .0352677


_cons 240.6236 122.472 1.96 0.057 -7.761118 489.0084
Linear Regression
Interpretation
F 1. State the equation
• The coefficient for "GDP" is 0.0049666, indicating
that for every one unit increase in GDP, there is a
Y= β0+ β1X1 predicted increase of approximately 0.0049666
units in FILM_PRODUCTION. However, since the p-
Where y is dependent value associated with this coefficient is 0.741 (Prob
variables > F), which is greater than the typical significance
level of 0.05, we fail to reject the null hypothesis
β0 is constant that there is no relationship between GDP and
FILM_PRODUCTION.
β1 is coefficient of GDP
• The constant term, represented by "_cons", has a
X1 is the GDP coefficient of 240.6236 with a p-value of 0.057. This
suggests that when GDP is zero (or very close to
From the data from the above zero), the estimated mean value of
table the regression equation FILM_PRODUCTION would be around 240.6236.
would be • The R-squared value for this model is low at 0.0031,
indicating that only about 0.31% of the variability in
Y= 240.6236 + .0049666GDP FILM_PRODUCTION can be explained by changes in
GDP.
T-Test
• The t-test in this regression output is used to test the null hypothesis that the coefficient
for "GDP" is equal to zero. Here are the relevant values for the t-test:
The coefficient for "GDP" is 0.0049666.
The standard error for this coefficient is 0.0149407.
The t-value, which measures the number of standard errors the coefficient is away
from zero, is calculated as (coefficient / standard error).
In this case,(0.0049666 / 0.0149407) it is approximately 0.33.

• The p-value associated with this t-value, labeled as P>|t|, is 0.741.

Interpretation
• The t-value of 0.33 suggests that there is not a statistically significant relationship
between "GDP" and "FILM_PRODUCTION".
• The p-value of 0.741 indicates that there is insufficient evidence to reject the null
hypothesis at a typical significance level of 0.05.
• In conclusion, there does not appear to be a statistically significant relationship between
GDP and FILM_PRODUCTION based on these results from the t-test. A unit change of
GDP would change .004966 of film production
F2) Interpret the values of the estimated parameters, the R2 (coefficient of
determination) and the F-test (or goodness of fit test).

• F 2. R2 is .0031 the variation of GDP would determine at.0031 of film production.


That means the dependent variable film production is explained by the
independent variable the GDP at .0031
• Prob > F is 0.7415 which greater that assumed rejection value of 0.05. Here F is
greater than 0.05. Thus, the model has goodness of fit.
F3) Test the hypothesis that the variable ‘x’ affects ‘y’ (state the null
hypothesis, test the hypothesis, indicate your decision and discuss the
results) by using the critical value (t-test) and the p-value approaches.

• Test of hypothesis based on P-value approach

Hypothesis Assumption of test of the Result Decision


hypothesis
Ho= There is no P< 0.05= Statistically P= .741 which is greater The null hypothesis is
relationship between film significant than 0.05 rejected because P value is
production and GDP greater than 0.05. Thus,
P>0.05 = statistically not the alternative hypothesis
significant is accepted

Test of Hypothesis based on Critical Value T- Test


• The calculated t value in the above table is 0.33 and the critical t value from the t
distribution value is 2.021 at degree of freedom 37.
• The calculated t value is less than the critical t value. Thus, there is not statistically
significant.
 The null hypothesis is rejected and the alternative hypothesis is accepted.
F4) test the hypothesis that the intercept is significantly different from zero (state
the null hypothesis, test the hypothesis, indicate your decision and discuss the
results) by using the critical value (t-test) and the p-value approaches.

• The y-intercept of the given regression equation above y = 240.6236 + 0.0049666X 1


is the value of y when X1 is 0. Therefore, the y-intercept is 240.6236. The intercept
is significantly different from zero.

N Minimum Maximum Mean Std. Deviation

FILM_PRODUCTION 38 10 2599 268.53 543.125

GDP 38 314.0 29056.0 5618.026 6049.4112


F5) Use the regression line to make predictions around the mean, the maximum, and
the minimum values of ‘x’ and interpret your results; and explain any deviations of
estimated values from raw data values.

• The histogram overlaid with a curve,


which appears to represent the
distribution of regression
standardized residuals for a
dependent variable named
"FILM_PRODUCTION." This is a
common type of statistical graph
used to visualize the distribution of a
dataset and check for normality in
the residuals of a regression analysis.
• On the x-axis, there are regression
standardized residual values, ranging
from approximately -3 to 4. The y-
axis represents the frequency of
these residuals.
Cont.
• The most notable feature is a tall blue bar at around the 0 mark on the x-
axis, which indicates the highest frequency of residuals is around the
mean of the dataset. This bar reaches a frequency of almost 30. The rest
of the bars are much shorter, indicating lower frequencies for other
residual values. The distribution of the bars suggests that the majority of
data points are close to the mean, with fewer cases having high positive
or negative residuals.
• The curve overlaying the bars is smooth and seems to be a fitted normal
distribution curve, suggesting that the residuals might be approximately
normally distributed, which is an assumption in many regression analyses.
However, the actual residuals appear to be slightly right-skewed, as there
are more bars on the right side of the peak than the left.
Regression line

• The graph is attempting to analyze the correlation between GDP and film
production, but the data and the low R-squared value suggest that there
may not be a strong linear correlation between the two variables based
on this particular dataset.
Thank You !

You might also like