BUSANA 7001 Group Assignement
BUSANA 7001 Group Assignement
for Business
2025 S1
Group Assignment
Instructions
1. The assignment can be done in groups of one to three students. All team mem-
bers are expected to contribute approximately equally to a group assignment.
A group can eliminate an underperforming member who then will need to do
the assignment individually or join another group. Similarly, one can quit
an underperforming group to do the assignment individually or join another
group. All group members will get the same mark for the assignment.
5. Please retain your Python code and make sure that it is user-friendly (use
comments where necessary). Using your submitted code, one should be able
to produce all your results, tables, and gures.
1
the report (in doc, docx, or pdf format) for Tasks 1-4; the report should be
properly formatted and be similar to a business report; font: 12 pt Times
New Roman; maximum number of pages: 10 (no penalty for exceeding
this limit)
Python code (in py format or Jupyter notebook, or Google Colab based
.ipynb document)
Spreadsheet with the user reviews (used in Task 4).
8. Lecturer can refuse to accept assignments, which do not have a signed ac-
knowledgment of the University's policy on plagiarism.
9. Any suspected plagiarism will be severely punished. This includes any student
that submits copied work or any student that allows their work to be copied.
10. You must acknowledge any external material you use in your answers, e.g.,
material from websites, textbooks, academic journals and newspaper articles.
11. All queries (including deadline extensions) for this project should be directed
to the Course Coordinator.
12. The submission deadline for the problem set is 6pm, Friday the 13th of June,
2025.
Agenda
2
TDC1 Total Compensation (Salary + Bonus + Other Annual + Restriced
Stock Grants + LTIP Payouts + All Other + Value of Option Grants).
mb Market-to-book ratio
check for outliers and take necessary actions to deal with them
merging datasets
and so on.
3
1 OLS regressions (7 points)
This task needs to be done using Python. Discuss briey your sample, including the
number of observations, outliers. Provide the descriptive statistics of the sample.
How you choose to do this is entirely at your discretion. However, it is recommended
that you consider using both summary statistic and graphical methods (this task
should include at least one properly formatted table, one pie chart, one histogram,
and one scatter plot) while also noting any peculiarities within the data set. You
should put more emphasis on TDC1. Tables and gures should be included in the
report rather than appendix.
Find the determinants of the total compensation (TDC1). Estimate 3 dierent
OLS regressions (in order to ensure the robustness of results) with year and industry
xed eects and several independent variables. To ensure that regression residuals
`behave well', you may need to scale or transform one or more variables. For ex-
ample, to use a natural logarithm value of the variable instead of its raw value.
Provide a properly formatted table with the regression results in the report (not in
the appendix; however, you may put additional tables in the appendix if needed).
Discuss the determinants of the total compensation: what variables are statistically
signicant; which variables increase and which variables decrease total compensa-
tion; any insights from the coecient estimates of year and industry xed eects
and so on.
Predict the total compensation in 2025 (this value has been deleted in the dataset
`salaries') using the results from the 3 regressions for IBM Corporation (GVKEY =
006066). Are the predictions similar across the 3 models?
This task needs to be done using Python. Generate a time series values for total
compensation (TDC1); that is, annual averages for each year.
Plot the obtained time series. Use ARMA type models to predict its values for
the next 2 periods. Motivate and discuss ARMA orders used in the analysis. Plot
the predicted values (a scatterplot against the actual values; then time series plots
of the actual and predicted values in the same gure). Do actual and predicted time
series tend to move in the same direction over time?
Given the time series predictions for years 2025 and 2026, what would be the
4
predicted total compensation of the CEO of IBM Corporation, obtained in Task
1 (assume that the total compensation of the CEO of IBM Corporation evolves
similarly as an average CEO total compensation)?
This task needs to be done using Python. Using decision trees, identify 5 the most
important determinants of TDC1. Try 3 dierent models. Are the results consistent?
Then repeat the analysis using categorical version of TDC1 dened as follows:
Compare the results to those obtained when TDC1 was used as a continuous
variable.
This task needs to be done using SAS Viya for Learners platform. Consider three
rms: Microsoft (GVKEY = 012141), Apple (GVKEY = 001690), and Amazon
(GVKEY = 064768), and their respective products: Microsoft Surface Go 4, Amazon
Kindle (e.g., Kindle Scribe 2024 32GB with Premium Pen), and Apple iPad Mini.1
From the Google website, download 20 reviews of each product made in the year
2024. Then repeat this task for reviews made in the year 2025. You should collect at
least 120 reviews. Create a spreadsheet with four columns: product, user_id (e.g.,
1, 2, . . . ), review, and year. The user_id needs to be unique. This spreadsheet
needs to be submitted as a separate document together with the report.
Next, import this spreadsheet into SAS Viya for Learners. Create one word
cloud for each product. Are they visually dierent or the same?
Then generate sentiment scores for each product in 2024 and 2025. This might
involve manually copying sentiment scores from each review to the spreadsheet and
calculating the mean sentiment scores there. You should get six values. Create a
table with these values and briey discuss the results.
1 If you wish, you may choose dierent products.
5
Finally, plot the annual growth of CEO pay for each company (between 2024
and 2025) on the same graph as the change in the mean sentiment scores of each
product. Is there any relation between the change in customer satisfaction and CEO
pay growth?
Good luck!