0% found this document useful (0 votes)
14 views6 pages

BUSANA 7001 Group Assignement

The BUSANA 7001 group assignment involves analyzing CEO compensation for IBM Corporation in 2025 using provided datasets. Students must conduct OLS regressions, time series analysis, variable importance assessments, and sentiment analysis on product reviews, utilizing Python and SAS Visual Analytics. The assignment has specific submission guidelines, deadlines, and penalties for late submissions, emphasizing the importance of data preparation and proper reporting.

Uploaded by

amux790
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

BUSANA 7001 Group Assignement

The BUSANA 7001 group assignment involves analyzing CEO compensation for IBM Corporation in 2025 using provided datasets. Students must conduct OLS regressions, time series analysis, variable importance assessments, and sentiment analysis on product reviews, utilizing Python and SAS Visual Analytics. The assignment has specific submission guidelines, deadlines, and penalties for late submissions, emphasizing the importance of data preparation and proper reporting.

Uploaded by

amux790
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

BUSANA 7001 - Predictive and Visual Analytics

for Business

2025 S1

Group Assignment

Instructions

1. The assignment can be done in groups of one to three students. All team mem-
bers are expected to contribute approximately equally to a group assignment.
A group can eliminate an underperforming member who then will need to do
the assignment individually or join another group. Similarly, one can quit
an underperforming group to do the assignment individually or join another
group. All group members will get the same mark for the assignment.

2. The maximum score is 25 points.

3. The presentation of your write-up is important.

4. Whenever possible, numerical analysis (including data cleaning, etc.), as well


as all tables and gures, should be done using Python or SAS Visual Analytics.
However, you may use Excel or Word, etc., to make tables for regressions.

5. Please retain your Python code and make sure that it is user-friendly (use
comments where necessary). Using your submitted code, one should be able
to produce all your results, tables, and gures.

6. Please retain a copy of the problem set that is submitted.

7. Only one member of a group submits 4 les:

ˆ `Assignment Cover Sheet', which must be signed (electronic signature is


okay) and dated

1
ˆ the report (in doc, docx, or pdf format) for Tasks 1-4; the report should be
properly formatted and be similar to a business report; font: 12 pt Times
New Roman; maximum number of pages: 10 (no penalty for exceeding
this limit)
ˆ Python code (in py format or Jupyter notebook, or Google Colab based
.ipynb document)
ˆ Spreadsheet with the user reviews (used in Task 4).

8. Lecturer can refuse to accept assignments, which do not have a signed ac-
knowledgment of the University's policy on plagiarism.

9. Any suspected plagiarism will be severely punished. This includes any student
that submits copied work or any student that allows their work to be copied.

10. You must acknowledge any external material you use in your answers, e.g.,
material from websites, textbooks, academic journals and newspaper articles.

11. All queries (including deadline extensions) for this project should be directed
to the Course Coordinator.

12. The submission deadline for the problem set is 6pm, Friday the 13th of June,
2025.

13. The submission must be done through MyUni.

14. Late submission will be penalized 2.5 points per day.

Agenda

Assume that you are a compensation consultant working at a leading consulting


rm. Your client is IBM Corporation  one of the United States' largest IT and
consulting companies. They need your help to determine the compensation of their
CEO in year 2025.
You have been provided with 2 data sets. The dataset `salaries_2025_S1' con-
tains the following variables:

ˆ GVKEY  Company ID Number

ˆ YEAR  Fiscal Year

2
ˆ TDC1  Total Compensation (Salary + Bonus + Other Annual + Restriced
Stock Grants + LTIP Payouts + All Other + Value of Option Grants).

The dataset `companies_2025_S1' contains the following variables:

ˆ GVKEY  Company ID Number

ˆ YEAR  Fiscal Year

ˆ AT  Assets - Total (in $ millions)

ˆ CONM  Company Name

ˆ SALE  Sales/Turnover (Net) (in $ millions)

ˆ debt_at  Financial leverage (debt divided by assets)

ˆ roa  Return on assets (net income divided by assets)

ˆ cash_at  Cash holdings divided by assets

ˆ rd_at  Research and development expenses divided by assets

ˆ capex_at  Capital expenditure (investments) divided by assets

ˆ mb  Market-to-book ratio

ˆ ppe_at  Property, plant, and equipment divided by assets

ˆ sic4  Industry code.

First, you should prepare your data for the analysis:

ˆ remove duplicates (if any)

ˆ check for outliers and take necessary actions to deal with them

ˆ merging datasets

ˆ and so on.

3
1 OLS regressions (7 points)

This task needs to be done using Python. Discuss briey your sample, including the
number of observations, outliers. Provide the descriptive statistics of the sample.
How you choose to do this is entirely at your discretion. However, it is recommended
that you consider using both summary statistic and graphical methods (this task
should include at least one properly formatted table, one pie chart, one histogram,
and one scatter plot) while also noting any peculiarities within the data set. You
should put more emphasis on TDC1. Tables and gures should be included in the
report rather than appendix.
Find the determinants of the total compensation (TDC1). Estimate 3 dierent
OLS regressions (in order to ensure the robustness of results) with year and industry
xed eects and several independent variables. To ensure that regression residuals
`behave well', you may need to scale or transform one or more variables. For ex-
ample, to use a natural logarithm value of the variable instead of its raw value.
Provide a properly formatted table with the regression results in the report (not in
the appendix; however, you may put additional tables in the appendix if needed).
Discuss the determinants of the total compensation: what variables are statistically
signicant; which variables increase and which variables decrease total compensa-
tion; any insights from the coecient estimates of year and industry xed eects
and so on.
Predict the total compensation in 2025 (this value has been deleted in the dataset
`salaries') using the results from the 3 regressions for IBM Corporation (GVKEY =
006066). Are the predictions similar across the 3 models?

2 Time series analysis (7 points)

This task needs to be done using Python. Generate a time series values for total
compensation (TDC1); that is, annual averages for each year.
Plot the obtained time series. Use ARMA type models to predict its values for
the next 2 periods. Motivate and discuss ARMA orders used in the analysis. Plot
the predicted values (a scatterplot against the actual values; then time series plots
of the actual and predicted values in the same gure). Do actual and predicted time
series tend to move in the same direction over time?
Given the time series predictions for years 2025 and 2026, what would be the

4
predicted total compensation of the CEO of IBM Corporation, obtained in Task
1 (assume that the total compensation of the CEO of IBM Corporation evolves
similarly as an average CEO total compensation)?

3 Variable importance (4 points)

This task needs to be done using Python. Using decision trees, identify 5 the most
important determinants of TDC1. Try 3 dierent models. Are the results consistent?
Then repeat the analysis using categorical version of TDC1 dened as follows:

ˆ `high' if TDC1 is in the top 20%

ˆ `low' if TDC1 is in the bottom 50%

ˆ and `moderate' otherwise.

Compare the results to those obtained when TDC1 was used as a continuous
variable.

4 Sentiment Analysis (7 points)

This task needs to be done using SAS Viya for Learners platform. Consider three
rms: Microsoft (GVKEY = 012141), Apple (GVKEY = 001690), and Amazon
(GVKEY = 064768), and their respective products: Microsoft Surface Go 4, Amazon
Kindle (e.g., Kindle Scribe 2024 32GB with Premium Pen), and Apple iPad Mini.1
From the Google website, download 20 reviews of each product made in the year
2024. Then repeat this task for reviews made in the year 2025. You should collect at
least 120 reviews. Create a spreadsheet with four columns: product, user_id (e.g.,
1, 2, . . . ), review, and year. The user_id needs to be unique. This spreadsheet
needs to be submitted as a separate document together with the report.
Next, import this spreadsheet into SAS Viya for Learners. Create one word
cloud for each product. Are they visually dierent or the same?
Then generate sentiment scores for each product in 2024 and 2025. This might
involve manually copying sentiment scores from each review to the spreadsheet and
calculating the mean sentiment scores there. You should get six values. Create a
table with these values and briey discuss the results.
1 If you wish, you may choose dierent products.

5
Finally, plot the annual growth of CEO pay for each company (between 2024
and 2025) on the same graph as the change in the mean sentiment scores of each
product. Is there any relation between the change in customer satisfaction and CEO
pay growth?
Good luck!

You might also like