0% found this document useful (0 votes)
70 views36 pages

Final Summit Homework

Marjorie LeMer, a fashion designer, is evaluating suppliers for cloth based on flaw rates, with BlueTex showing fewer flaws than Southern Halifax. Statistical analysis reveals a z-value of -1.94 and a p-value of 0.0262, leading to a decision to reject the null hypothesis at a 0.05 significance level, but not at 0.01. Ultimately, unless Marjorie is highly confident in the quality difference, she may opt for the cheaper Southern Halifax product.

Uploaded by

dungrongntd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views36 pages

Final Summit Homework

Marjorie LeMer, a fashion designer, is evaluating suppliers for cloth based on flaw rates, with BlueTex showing fewer flaws than Southern Halifax. Statistical analysis reveals a z-value of -1.94 and a p-value of 0.0262, leading to a decision to reject the null hypothesis at a 0.05 significance level, but not at 0.01. Ultimately, unless Marjorie is highly confident in the quality difference, she may opt for the cheaper Southern Halifax product.

Uploaded by

dungrongntd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

159-1 challenge

Upscale fashion designer r1arjorie LeMer must decide from which supplier she should purchase
bolts of cloth, Rumor has it that BlueTexs product is superior to Southern Halifax’s.

159-2 challenge
Random 10-yard sections from 43 bolts of Halifaxs cloth contain a mean of 1.8 flaws per yard.
Similar sections from 42 bolts of BlueTex’s product contain 1.6 flaws per yard. The standard
deviations are 0.3 and 0.6, respectively.

159-3 challenge
Marjorie wants you to find out if the rumors that BlueTex makes a better product are statistically
warranted.
159-4 challenge
You conduct a one-sided test. Which of the following is the best alternative hypothesis?
◎There is no difference in the quality of the cloth.
◎There are fewer flaws per yard in Halifaxs cloth than in BlueTexs.
◎There are more flaws per yard in HalifaxS cloth than in BlueTexs.

159-5 challenge
At what level are these data significant?
◎significance level 0.05 significance level 0.01
◎Both at significance level 0,05 and significance level 0.01
◎Neither at significance level 0.05 and significance level 0.01

159-6 challenge
You find the z-value for the difference between the two sample means using the appropriate
formula. The z-value is -1.94.
159-7 challenge
The cumulative probability for z=-1.94 is 0.0262. This is the left-tail probability.
Since you are running a one-sided test,0.0262 is your p-value.

159-8 challenge
0.0262 is greater than 0.01, At this significance level, you would not reject the null hypothesis.
0.0262 is less than 0.05. At this significance level, you would reject the null hypothesis.
159-9 challenge
Southern Halifaxs product is slightly cheaper than BlueTexs. All other factors being equal,
Marjorie would like to buy the less expensive product. Unless she is 99°Io confident that there is
a difference in quality, she will go with the cheaper cloth.

159-10 challenge
Based on this information and your calculations data, Marjorie should:
◎Buy cloth from Southern Halifax Textile
◎Buy cloth from BlueTex

169-1 exercise 1
Per-capita consumption of soft drink beverages is related to per-capita gross domestic product
(GDP). Generally the higher the GDP of a country, the more soda its citizens consume. Soft
drink consumption is measured in number of 8- oz servings.

169-2 exercise 1
Based on data from 12 countries, the relationship can be expressed mathematically as:
(Per Capita Soft Drink Consumption) =130 + 0.018 * Per Capita GDP
169-3 exercise 1
Based on this relationship, you can expect that, on average, for each additional $1,000 of per-
capita GDP a country’s soda consumption increases by:
◎180 servings,
◎148 servings,
◎130 servings,
◎18 servings.
(Per Capita Soft Drink Consumption) =130 + 0.018 * Per Capita GDP

169-4 exercise 1
The regression equation tells us that in our data set, average soda consumption increases by 0.0
18 servings for every additional $1 of per-capita GDP. So, for an additional 1000, average
consumption increases by (1000)(0.018 servings/$) = 18 servings.
$1,000 →18 cans

169-5 exercise 1
The per-capita GDP in the Netherlands is $25,034. What do you predict is the average number of
servings of soda consumed in the Netherlands per year?
Enter predicted average soda consumption (in servings) as an integer (e.g., 5). Round if
necessary.

169-6 exercise 1
The regression equation tells us that average soda consumption = 130 + 0.018*(per-capita GDP).
Therefore, we anticipate the Netherlands average soda consumption to be 580.6 servings,
(Per Capita Soft Drink Consumption) =130 + 0.018(25,034)=580.6 cans

169-7 exercise 1
Although the regression predicts a soda consumption of around 581 servings per person for the
Netherlands, the actual measured number of servings consumed is much lower: 362. The
discrepancy in the actual and predicted consumption reinforces that per-capita GDP alone is not
a perfect predictor of soda consumption.
189-1
Exercise 1:
You have been asked to examine the relationship between two important macroeconomic
quantities: the change in business inventories and factory capacity utilization levels.

189-2
If you wish to learn what percent of the variation in changes in inventories is explained by
capacity utilization levels, which variable should you choose as your independent variable?
 Change in business inventories
 Capacity utilization

189-3
Click here to access US economic data from the years 1971-1986. Run the regression with the
change in business inventories as the dependent variable and the capacity utilization as the
independent variable.

189-4
Using the regression outputs find the slope of the regression line, Enter the slope as a decimal
number with 2 digits to the right of the decimal point (e.g., enter “5” as “5.00”). Round if
necessary.

190-1
Exercise 2:
Greta John is the human resources manager at the software consulting firm Clever Solutions.
Recently, some of the programmers have been restless: the senior programmers feel that length
of service and loyalty have not been rewarded in their compensation. The junior programmers
think that seniority should not be a major basis for pay.

190-2
Greta wants some hard data to inform the debate. As a preliminary step, she plots employees’
salaries against their length of service. Using the data provided, perform a regression analysis
with salary as the dependent variable and length of service as the independent variable.

190-3
If you wish to learn what percent of the variation in changes in inventories is explained by
capacity utilization levels, which variable should you choose as your independent variable?

 About $97/year of service.


 About $504/year of service.
 About $768/year of service.
 About $ 1165/year of service.

190-4
Based on the regression analysis, Greta can tell that:
 Approximately 55% of the variation in compensation can be explained by length of
service.
 Approximately 42% of the variation in compensation can be explained by length of
service.
 This is not the correct answer, R-squared measures how much of the variation in the
dependent variable — compensation — is explained by the independent variable — length of
service. You may be confusing the p-value with R-squared.
 Approximately 99.95 °Io of the variation in compensation can be explained by length of
service.

191-1
Exercise 3:
Productivity measures a nation’s average output per labor-hour. It is one of the most closely
watched variables in economics: as workers produce more per hour, employers can pay them
more without increasing the price of the product.

191-2
Since wages can rise without provoking a corresponding rise in consumer prices, the growth of
productivity is essential to a real increase in a nations standard of living.

191-3
Peter Agarwal, a student at the Harvard Business School, wants to investigate the relationship
between change in productivity and change in real hourly compensation.

191-4
Peter has data on change in productivity and change in compensation for 8 industrialized nations
The figures are annual averages over the period from 1979-1990.

191-5
Run a regression with change in compensation as the dependent variable and change in
productivity as the independent variable.

191-6
How much of the variation in the change in compensation can be explained by the change in
productivity? Enter the percentage as decimal number with 2 digits to the right of the decimal
point (e.g, enter “50%” as 0.50). Round if necessary.
0.62

191-7
Given these data, Peter finds that the relationship can be mathematically expressed as:

191-8
Can Peter claim (with a 95% level of confidence) that the relationship is statistically significant?
 Yes
 No
 The answer can’t be determined from the regression analysis.

191-9
The coefficient for the slope given by Excel is an estimate based on the data in Peter’s sample,
The estimate for the slope of the regression line is about 0.75. If the actual slope of the
relationship is 0, there is no significant linear relationship between the change in productivity
and the change in compensation.

191-10
On the regression output, there are two ways to tell if the slope coefficient is significant at the
0.05 level. First, we can look at the 95°Io confidence interval provided and see that it ranges
from -0.05 to + 1.56. Since the O5°Io confidence interval contains zero, the coefficient is not
significant at the 0.05 level.

191-11
Alternatively, we can note that the p-value of the slope coefficient, 0.0625, is greater than 0.05.
Peter cannot be 95°Io confident that the actual slope is 0.

191-12
Since Peter cannot be confident that the slope is not zero, he cannot be confident that there is a
linear relationship between the two variables.
191-13
Suppose Peter collects data on B more countries. Run the regression for the entire data set with
change in compensation as the dependent variable and change in productivity as the independent
variable.

191-14
The new data set indicates that the variables have a slightly different relationship:

191-15
Can Peter claim (with a 95% level of confidence) that the relationship is statistically significant?
 Yes
 No

191-16
What might explain why the coefficient is significant in the second (combined) data set?
 The total compensation of the countries in the combined data set is larger than the total
compensation of the countries in the first set.
 The total population of the countries in the combined data set is larger than the total
population of countries in the first set.
 The number of countries in the combined data set is larger than in the first set.

198-1 Interpreting the Multiple Regression


In our home price example, we found two regression equations, one for the relationship between
price and house size, and one for the relationship between price and distance. What will the
equation look like for the three-way relationship between price, the dependent variable, and the
two independent variables, house size and distance, look like?
198-2
Interpreting the Multiple Regression
The regression equation in our housing example will have the form below: house size and
distance each have their own coefficients, and they are summed together along with the constant
coefficient a.

198-3
In general, the linear equation for a regression model with k different variables has the form
below. Since the coefficients we obtain from the data are just estimates, we must distinguish
between the idealized equation that represents the ‘true” relationship and the regression line that
estimates that relationship. To express that even the ‘true” equation does not fit perfectly, we
include an error term in the idealized equation.
198-4
Running the regression gives us coefficients for house size and distance:
252 and -55,006, respectively. We can use this multiple regression equation to predict the price
of other houses not in our data set. To predict a house’s price, we need to know only its size and
its distance to downtown.

198-5
Suppose “Windsor” is a modest mansion of 3,500 square feet, located in the outer suburbs of
Silverhaven, approximately 11 miles from downtown, Based on our regression equation, how
much would we expect Windsor to sell for?
 Around $277,000
 Around $330,000
 Around $700,000
 Around $771,000
198-6
We simply enter Windsor’s square footage and distance to downtown into the equation, and
calculate an expected selling price of $699,938.

198-7
Let’s take a closer look at the coefficients in the housing example, focusing on the distance
coefficient: -55,006. This coefficient is substantially different from the coefficient in the original
simple regression: -39,505. Why is it so different?
198-8
The coefficients in the simple regression and the coefficients in the multiple regression have very
different meanings. In the simple regression equation of price versus distance, we interpret the
coefficient, -39,505, in the following way:
for every additional mile farther from downtown, we expect house price to decrease by an
average of $39,505.

198-9
We describe this average decrease of $39,505 as a gross effect - it is an average computed over
the range of variation of all other factors that influence price
198-10
In the multiple regression of price versus size and distance, the value of the distance coefficient, -
55,006, is different, because it has a different meaning. Here, the coefficient tells us that, for
every additional mile, we should expect the price to decrease by $55,006, provided the size of
the house stays the same.

198-11
In other words, among houses that are similarly sized, we expect prices to decrease by $55,006
per mile of distance to downtown. We refer to this decrease as the net effect of distance on price.
Alternatively, we refer to it as “the effect of distance on price controlling for house size”.
198-12
Two houses are similar in size, but located in different neighborhoods: “Shangri La” is five miles
farther from downtown than “Xanadu.” If Xanadu is valued at $450,000, how much would we
expect Shangri La to cost?
 Around $175,000
 Around $252,000
 Around $725,000
 The answer cannot be determined from the information provided.

198-13
Since the two houses are the same size, we use the net effect of distance on price,
-$55,006/mile, to predict the expected difference in their selling prices. Shangri La is 5
additional miles form downtown, so its price should be -$55,005/mile * 5 miles = $275,030 less
than Xanadu’s, or $450,000-$275,030 = $174,970.
198-14
“Valhalla” is another house located 5 miles farther from downtown than Xanadu. We have no
information about the relative sizes of the two homes. If Xanadu’s selling price is $450,000,
what would we expect Valhalla’s selling price to be?
 Around $175,000
 Around $252,500
 Around $647,500
 The answer cannot be determined from the information provided.

The factors that influence house prices, as mentioned in the provided document, include:

House Size: The size of the house, typically measured in square footage, has a positive
relationship with house price. Larger houses tend to be more expensive.

Distance to Downtown: The distance of the house from the downtown area has a negative
relationship with house price. Houses located farther from downtown generally have lower
prices due to longer commute times.

These two factors are used in the regression analysis to predict house prices and understand their
impact. Additionally, the document hints at other potential variables that could affect house
prices, such as:
School District: The quality of the school district can significantly impact house prices, with
homes in better school districts typically costing more.
Neighborhood: The overall desirability of the neighborhood, including factors like safety,
amenities, and community reputation, can influence house prices.
Economic Conditions: Broader economic conditions, such as interest rates, employment rates,
and economic growth, can impact the housing market and prices.
The regression coefficients are interpreted as follows:

House Size Coefficient: Indicates the expected increase in house price for each additional square
foot, holding other factors constant.
Distance to Downtown Coefficient: Indicates the expected decrease in house price for each
additional mile from downtown, holding other factors constant.
In multiple regression, these coefficients are considered net effects, controlling for the influence
of the other included variables.

206-1
Exercise 1 :
Empire Learning is a developer of educational software, CEO Bill Hartborne is making a bid for
a contract to create an e-learning module for a new client.

206-2
Preparing the bid requires an estimate of the number of labor-hours it will take to create the new
module. Bill believes that the length of a module and the complexity of its animations directly
affect the amount of labor required to complete it.

206-3
Bill has data on the labor-hours Empire used to complete previous courses. He also knows the
number of pages and the animation run-time of each previous course — quantities he thinks are
reasonable proxies for course length and animation complexity, respectively.

206-4
Perform a simple regression analysis for each of the independent variables:
number of pages and run-time of animations.
206-5
Which factor explains more variation in labor hours?
 Number of pages
 Run-time of animations

206-6
In the simple regressions, which of the independent variables contributes significantly to the
number of labor-hours it takes Empire to create an e-learning course?
 Number of pages only
 Run-time of animations only
 Both variables
 Neither variable

206-7
The p-values for the coefficients on animation run-time and number of pages are 0.003 and
0.0002 respectively; well below 0.05, the most commonly used level of significance. Thus, we
conclude that both independent variables contribute significantly in their respective simple
regressions to the number of labor hours Empire takes to create an e-learning course.
206-8
Run the multiple regression of labor-hours versus number of pages and run-time of animations

206-9
According to this multiple regression, which of the independent variables contributes
significantly to the number of labor-hours it takes Empire to create an e-learning course?
 Number of pages only
 Run-time of animations only
 Both variables
 Neither variable

206-10
The p-values for the coefficients on animation run-time and number of pages are 0.014 and
0.0015 respectively; well below 0.05, the most commonly used level of significance. Thus, we
conclude that both independent variables contribute significantly in the multiple regression to the
number of labor hours Empire takes to create an e-learning course.
207-1
Exercise 2:
For this exercise, refer to the regression analyses performed in Exercise 1 of this section

207-2
Bill Hartborne, CEO of Empire Learning, is using regression analysis to predict the number of
labor-hours it will take his team to create a new e-learning course. He is using data on previous
courses Empire created, with the number of pages and the total run-time of animations as
independent variables.

207-3
In the multiple regression of labor-hours versus number of pages and run-time of animations, the
coefficient of 0.84 for the number of pages tells us that:

 For every additional 100 pages of module length, the run-time of animations increases by an
average of 84 seconds.
 For every additional 100 pages of module length, the run-time of animations increases by
84 seconds when we control for labor-hours.
 For every additional 100 pages of module length, the number of labor-hours increases by an
average of 84.
 For every additional 100 pages of module length, the number of labor-hours increases by 84
when we control for animation run-time.

207-4
In the multiple regression equation, the coefficient of the independent variable “number of
pages” is gross relative to:
 The number of labor-hours.
 The run-time of animations.
 The number of illustrations used in the module.
 Nothing. The number of pages is an all around pleasant and sanitary variable.
208-1
Challenge :
For this exercise, refer to the regression analyses performed in Exercise 1 of this section.

208-2
Bill Hartborne, the CEO of Empire Learning, is using regression analysis to predict the number
of labor-hours it will take his team to create a new e-learning course. He is using data on
previous courses Empire created) with the number of pages and the total run-time of animations
as independent variables.

208-3
Bill bills out his talent at $70/hour. Based on the multiple regression, how much should he
charge for the labor content of a course with 400 pages and 170 seconds of animations? Enter the
estimated cost of the labor (in $) as an integer (e.g., enter “$5.00 as “5”). Round if necessary.

208-4
First use the regression equation to predict the number of labor-hours required to complete the
course.

Labor-hours =548 hours + 1.29 Labor hrs/sec * 170 sec + 0.84 Labor hrs/page * 400 pages
= 1,103 hours

208-5
Then multiply that number by Empire Learning’s billing rate of $70/hour to find the total amount
he should charge for the labor content of the course, $77,210.

Labor-hours =548 hours + 1.29 Labor hrs/sec 170 sec + 0.84 Labor hrs/page * 400 pages
=1,103 hours
Labor cost = Labor hrs * Labor rate/hour = 1,103 hours * $70/hour = $77,210
208-6
Bill is sure that the client will balk at a labor bill of over $70,000, He knows that animation is
important to the client, so doesn’t want to cut corners there. However, he believes that his lead
writer can cover the content in fewer pages without compromising his renowned clear and
engaging prose.

208-7
To reduce total labor costs to $70,000, how many pages must Bill cut from the plan to meet his
client’s cost limits?
 Around 67 pages.
 Around 103 pages.
 Around 123 pages.
 The aren’t enough pages for Bill to cut to reduce the price below $70,000.

208-8
To reduce the labor bill from $77,210 to $70,000, Bill must reduce labor costs by
$7,210. To achieve this reduction, Bill must cut the contracts labor hours by 103 hours, since he
bills out his talent at $70/hour.

208-9
Since the animation run time will not change, we use the net relationship between labor-hours
and number of pages, which tells us that each additional page consumes 0.84 labor hours. Thus,
Bill must reduce the number of pages by 123..
219-1
Exercise 1:
Linda Szewczyk, marketing director of Amalgamated Fruits Vegetables & Legumes (AFV&L),
is researching the nation’s fruit consumption habits. In particular, she would like greater insight
into household consumption of the kiwana, a cross-breed of kiwis and bananas that AFV&L
pioneered.

219-2
Naturally, one important determinant of household consumption is the size of the household —
the number of members. Since AFV&L has positioned the kiwana as a “high end” fruit, Linda
believes that household income may also influence its consumption.

219-3
Run a multiple regression of household kiwana consumption versus household size and income.
Make note of important regression parameters such as R-squared, adjusted R-squared, the
coefficients, and the coefficients significance. The income variable has a coefficient of 00004.
Can a variable with such a small coefficient be statistically significant?
219-4
The independent variable, income, is statistically significant since its p-value is less than 0.05,
the most common level of significance. The small coefficient tells us that for every additional
$10,000 of income, average kiwana consumption increases by 4 lbs. a year.
219-5
To date, AF8L has focused its marketing campaigns on high-income, highly educated
consumers. Linda would like to deepen her understanding of how the educational level of the
household members might affect their appetite for kiwanas.

219-6
To incorporate education into her kiwana consumption analysis, Linda separated the households
in her data set into three categories based on the highest level of education attained by any
member of the household — no college degree, college degree but no post-graduate degree, and
post-graduate degree. She represents these categories using two dummy variables — “college
only’ and “postgraduate.”

219-7
Run a regression on all four independent variables.

219-8
Controlling for household size, income, and post-graduate degree, how many more pounds of
kiwanas consumed in a household in which the highest educational level is a college degree,
compared to a household in which no one holds a college degree?
 40.8
 51.5
 52.0
 84.0

219-9
The coefficient for the dummy variable “college,” 51.6, tells us the expected difference in
kiwana consumption for “college degrees only” households compared to the excluded
educational category: households in which no one holds a college degree. The coefficient
describes the net relationship between “college degrees only” and household kiwana
consumption, controlling for household size, income, and postgraduate degree.

219-10
Controlling for household size, how many more pounds of kiwanas are consumed in a household
in which the highest educational level is a post-graduate degree, compared to a household in
which the highest educational level is a college degree? Enter the difference in consumption
between the two households as a decimal number with two digits to the right of the decimal
point. (e.g., enter “5” as 5.OO). Round if necessary.

219-11
When you control for household size and income, college degree households consume 51.63 lbs
more than non-college households. Post-graduate degree households consume 51.95 lbs more
than non-college households. In other words, post-graduate households consume 0.32 lbs more
than college households.
219-12
The analysis of household kiwana consumption indicates:
 The presence of multicollinearity in the four-variable regression.
 That educational level and income are highly correlated.
 That household size contributes significantly to household kiwana consumption.
 All of the above.

220-1
Exercise 2:
For this exercise, refer to the regression analyses you ran in exercise 1 of the previous section.
220-2
Bill Hartborne, the CEO of Empire Learning, is using regression analysis to predict the number
of labor-hours it will take his team to create a new e-learning course. He is using data on
previous courses Empire created, with the number of pages and the total animation run-time as
independent variables.

220-3
Bill believes that the number of illustrations used in the course may also have a significant
impact on the number of labor-hours it takes to complete an e-learning course. He wants to add
the number of illustrations to the model as another independent variable.

220-4
Run the simple regression of labor-hours versus number of illustrations

220-5
At which level is the number of illustrations a statistically significant independent variable?
 0.01
 0.05,but not 0.01
 0.10, but not at 0.05
 None of the above

220-6
Run the multiple regression of labor-hours versus number of pages, illustrations, and animation
run-time.

220-7
Is there evidence of multicollinearity in the data?
 Yes, because R-squared is relatively high and all of the independent variables have
statistically significant coefficients.
 Yes, because R-squared is relatively high and some of the independent variables do not
have statistically significant coefficients.
 Yes, because R-squared is relatively high and the intercept coefficient is not significant.
 No, because R-squared is relatively high and some of the independent variables do not have
statistically significant coefficients.
220-8
A common symptom of multicollinearity is a high adjusted R-squared — in this case 94% —
accompanied by one or more independent variables with low significance. In this case) the
coefficient for the number of illustrations is not significant at the 0.05 level, and the p-value for
the number of pages has risen to 0.029 1, up from 0.00 15 in the regression without illustrations.
220-9
Which of the following is the likely culprit of multicollinearity?
 A positive correlation between the number of illustrations and the number of pages
 A negative correlation between the number of illustrations and the number of pages
 A positive correlation between the number of illustrations and the run-time of animations
 A negative correlation between the number of illustrations and the run-time of animations
220-10
Multicollinearity occurs when the respective effects of two or more independent variables on the
dependent variable are not distinguishable in the data. This can be the result of correlated
independent variables. The fact that the p-value for the number of pages rises when we add the
illustrations raises our suspicions that the number of illustrations and the number of pages might
be correlated.

220-11
We can compute the correlation between the number of pages and the number of illustrations:
the correlation coefficient, 67%, is fairly high.
220-12
We could also attempt to diagnose the cause of the multicollinearity by running a regression of
labor-hours versus number of pages and number of illustrations - omitting animations. Here, the
significance of illustrations is extremely low, with a p-value of 0.85.

220-13
In the regression of labor-hours versus number of illustrations and run-time of animations -
omitting pages - the respective effects of the independent variables on the dependent variables
can be distinguished. Here, the p-values for both variables are much lower than 0.05. All of the
evidence points to a linear relationship between number of pages and number of illustrations as
the culprit for the multicollearity.
220-14
Bill wants to use the regression analysis to predict the number of labor-hours it will take to
complete a new e-learning course. Comparing the two regression models - one with all three
independent variables, one without illustrations - which should Bill use?
 With number of illustrations
 Without number of illustrations

Project: Ask chatGPT to give data, select 1 dependent variable and add 3 independent
variable , lagged, dummy variable, quantitative variable like advance booking. And
compare them

Dependent Variable: Hotel Room Occupancy Rate (%)

Independent Variables:

Lagged Variable:

Variable: Previous month's room occupancy rate (%).


Rationale: Lagged variables are often used to capture the persistence or autocorrelation in time-
series data. They can indicate how current occupancy rates are influenced by past performance.
Dummy Variable:

Variable: Seasonal dummy variable (e.g., Summer = 1, Non-Summer = 0).


Rationale: Dummy variables are used to account for categorical factors that may impact the
dependent variable. Seasonality can significantly influence hotel occupancy rates due to vacation
periods and weather conditions.
Quantitative Variable:

Variable: Number of days in advance bookings.


Rationale: Advance booking information quantifies how far in advance guests book rooms. This
can reflect demand patterns and help in predicting future occupancy rates.
Comparison Approach:

Data Collection: Gather monthly data for the hotel room occupancy rate (%), lagged occupancy
rates, seasonal dummy variables (e.g., quarterly or monthly), and number of days in advance
bookings over a specific period (e.g., 2 years).

Data Analysis:

Regression Analysis: Perform multiple regression analysis where:


Model 1: Includes only the lagged occupancy rate.
Model 2: Includes the lagged occupancy rate and seasonal dummy variables.
Model 3: Includes the lagged occupancy rate, seasonal dummy variables, and number of days in
advance bookings.
Interpretation: Compare the coefficients and statistical significance of each independent variable
across the models.

Coefficient Significance: Look at the p-values to determine if each variable significantly


contributes to explaining the variation in room occupancy rates.
Model Fit: Compare R-squared values to see how much of the variance in room occupancy rates
is explained by each model.
Example Outcome:

Model 1 (Lagged Variable Only):


Coefficient of Lagged Occupancy Rate: 0.75
R-squared: 0.65
Model 2 (Lagged + Seasonal Dummy):
Coefficient of Lagged Occupancy Rate: 0.70
Coefficient of Seasonal Dummy (Summer): 3.2
R-squared: 0.70
Model 3 (Lagged + Seasonal Dummy + Advance Booking):
Coefficient of Lagged Occupancy Rate: 0.68
Coefficient of Seasonal Dummy (Summer): 3.0
Coefficient of Advance Booking (Days): 0.02
R-squared: 0.72
Conclusion:

Based on the comparison:

Lagged Model shows a strong influence of past occupancy rates on current rates.
Seasonal Dummy Model indicates significant seasonal variations, with higher occupancy rates in
summer.
Advance Booking Model suggests that booking further in advance slightly increases occupancy
rates.

You might also like