Account Based Analytics Final Spring 2025

The document outlines the final exam for an Advanced Business Analytics course, detailing the structure, points, and time allotted. It includes various questions related to predicting doctor visits and movie screens using statistical models, as well as the use of time-varying covariates in the Cox model. Students are instructed to submit their work on Gradescope and avoid external resources while providing detailed steps and explanations in their Jupyter files.

Uploaded by

kenna.harde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views2 pages

Account Based Analytics Final Spring 2025

Uploaded by

kenna.harde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Final Exam

Advanced Business Analytics

Points:100 Time: 90 minutes

You should be able to access the exam on Gradescope and submit it there.

Read the questions carefully and provide details of the steps in your Jupyter ﬁle. If you get stuck,
make sure you provide the details of where and why you got stuck. Most of the time, with some
changes, you should be able to make it work. All problems have clear solutions.

Do not use the Internet, email, messaging apps, or LLMs for answers. Everything was covered in our
lectures.

1. You want to predict the number of doctor visits by a sample of 3874 users. The data has the
following attributes which can help with prediction.

Docvis : number of visits to doctor

Hospvis: number of days in hospital
Edlevel: educational level (categorical: 1-4)
Age: age: 25-64
Outwork: out of work=1; 0=working
Female: female=1; 0=male
Married: married=1; 0=not married
Kids: have children=1; no children=0
Hhninc: household yearly income
Educ: years of formal education (7-18)
Self: self-employed=1; not self employed=0
edlevel1 : (1/0) not high school graduate
edlevel2: (1/0) high school graduate
edlevel3: (1/0) university/college
edlevel4: (1/0) graduate school

i. Plot the histogram of doctor visits. What do you observe in the plot? [5]
ii. Why would you want to fit a Zero-Inflated Poisson (ZIP) model? Explain clearly what a ZIP
model is and how the model overcomes excessive zeros. [5]
iii. One challenge is finding a variable that can be used for classification (inflation). Can you
explain why this is a challenge? [5]
iv. You decide to model doctor visits as a function of all covariates except Outwork. You believe
that people out of work (less likely to have insurance) are unlikely to go to the doctor.
Therefore, you use that as a classifier.
Split your data into train (80%) and test (20%). Write the Zero inflation model and estimate
it. Show the results. [10]
v. In your results, is the classifier significant? Explain the magnitude of that variable. Explain
the magnitude of the estimate on educ (years of formal education) [5]
vi. You want to make predictions for the “mean” value of doc visits for the test sample. Make
the prediction. [5]

vii. The mean value already controls for inﬂated zeros. How will you convert the mean values to
the discrete number of visits? Provide the prediction and plot the histogram comparing
discrete predictions with the actual number of visits. [10]

2. You have the data on movies released in the last few years. The data outlines the attributes (also
described below) which are self-explanatory. Three attributes are numerical, and the rest are
categorical. The numerical attributes are - number of screens, budget and revenues. When a movie
is released in a theater, a certain number of screens are allocated to the movie.

You want to predict the number of screens for a movie. Convert the categorical data into dummies
and take the log of revenues and budget.

You believe that attributes (Release period, Remake, Franchise, Genre, New Actor, New director,
and log of budget) can predict the number of screens.

The ﬁrst 1600 rows are used for training, and the rest for test. When you use dmatrices, correctly
specify the rows for training and test set. Notice that you do not have values for the number of
screens and revenues in the data after 1600 rows. You will be making predictions for those.

i. What model will you run? Explain the rationale. Estimate the model and show your
results. [10]
ii. Explain the role of budget. [2]
iii. Predict the number of screens for the test data (all the rows after 1600 - where you do
not have screen number value). Show the predictions for the ﬁrst 10 observations in the
test set. [5]

You believe that the number of screens and other covariates (Release period, Remake, Franchise,
Genre, New Actor, New director) predict log revenues. To get the parameters, you estimate a linear
regression model (using GLM) in the training data.

iv. What is the estimated magnitude of the number of screens? [10]

With the predicted number of screens from (iii), you can predict revenues in your test data.

v. Predict the log revenues in the test set. Show the predictions for the ﬁrst 10
observations in the test set. [13]

3. Why do you need time-varying covariates for the Cox model? How does the Cox model maintain
proportionality assumptions when using time-varying covariates? [5]

Suppose you are studying an event for 20 periods. Unfortunately, one of your analysis's covariates
changed value at times 4, 7, and 12. Show the format of the data used by lifelines in Python to
estimate the impact of the covariate on the hazard. [10]

English 10 Quarter 1 Week 4: Sources of Information: Accessibility and Effectiveness
67% (3)
English 10 Quarter 1 Week 4: Sources of Information: Accessibility and Effectiveness
26 pages
Enrichment Cards - GS1
No ratings yet
Enrichment Cards - GS1
36 pages
Question Bank HVPE 23-24
No ratings yet
Question Bank HVPE 23-24
3 pages
Contemporary Theory of Conservation SALVADOR MUNOZ VINAS
100% (1)
Contemporary Theory of Conservation SALVADOR MUNOZ VINAS
10 pages
Calauag
No ratings yet
Calauag
19 pages
Linear Regression Assignment
0% (2)
Linear Regression Assignment
8 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Predictive Modelling Sweta Kumari
No ratings yet
Predictive Modelling Sweta Kumari
35 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
Master's in Clinical Dentistry (MClinDent) Endodontology
100% (1)
Master's in Clinical Dentistry (MClinDent) Endodontology
8 pages
Quantitive Research - Assignments
No ratings yet
Quantitive Research - Assignments
68 pages
(Ebook PDF) Longitudinal Data Analysis by Donald Hedeker PDF Download
No ratings yet
(Ebook PDF) Longitudinal Data Analysis by Donald Hedeker PDF Download
54 pages
Predictive Modeling Business Report Seetharaman Final Changes PDF
100% (1)
Predictive Modeling Business Report Seetharaman Final Changes PDF
28 pages
Complete With Guide Purple Aesthetic Cover Page
No ratings yet
Complete With Guide Purple Aesthetic Cover Page
77 pages
Actuarial cs2
No ratings yet
Actuarial cs2
4 pages
6565-Economics and Financing of Education-I
100% (1)
6565-Economics and Financing of Education-I
3 pages
Theme Based Literature Review
100% (3)
Theme Based Literature Review
7 pages
MANI Project PDF
No ratings yet
MANI Project PDF
82 pages
Course: Applied Statistics Projects: Bui Anh Tuan March 1, 2022
No ratings yet
Course: Applied Statistics Projects: Bui Anh Tuan March 1, 2022
9 pages
Sample Thesis About K To 12 Program
100% (3)
Sample Thesis About K To 12 Program
8 pages
DS For Business Home Assignments
No ratings yet
DS For Business Home Assignments
24 pages
Ali 2010
No ratings yet
Ali 2010
9 pages
Perceptions of Self & Others
100% (1)
Perceptions of Self & Others
37 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
General Biology W3
No ratings yet
General Biology W3
7 pages
Pes1ug22cs841 Sudeep G Lab1
No ratings yet
Pes1ug22cs841 Sudeep G Lab1
37 pages
Class - 7 (Summer Holidays HW)
No ratings yet
Class - 7 (Summer Holidays HW)
3 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
Dhanekula Institute of Engineering and Technology Ganguru, Vijayawada-521139
No ratings yet
Dhanekula Institute of Engineering and Technology Ganguru, Vijayawada-521139
249 pages
Elites in Latin America by Seymour Martin Lipset Aldo Solari - Whetten
100% (1)
Elites in Latin America by Seymour Martin Lipset Aldo Solari - Whetten
3 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
BA II - End Sem Exam - 2024
No ratings yet
BA II - End Sem Exam - 2024
5 pages
Artificial Intelligence & BA - Practicals Assignments
No ratings yet
Artificial Intelligence & BA - Practicals Assignments
15 pages
Credit-Scoring-CASE
No ratings yet
Credit-Scoring-CASE
29 pages
Assignment 3
No ratings yet
Assignment 3
10 pages
Ids Final Sol
No ratings yet
Ids Final Sol
16 pages
Stat 412 - M - 2022
No ratings yet
Stat 412 - M - 2022
21 pages
Assignment STAT5002
No ratings yet
Assignment STAT5002
5 pages
CS-7830 Assignment-2 Questions 2022
No ratings yet
CS-7830 Assignment-2 Questions 2022
4 pages
How To Cite Unpublished Dissertation Mla
100% (2)
How To Cite Unpublished Dissertation Mla
6 pages
Fitting Lines To Data Points: What Are First Differences?
No ratings yet
Fitting Lines To Data Points: What Are First Differences?
8 pages
Comm 2502 Practice Final S 2015
No ratings yet
Comm 2502 Practice Final S 2015
8 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Regression hw3
No ratings yet
Regression hw3
3 pages
BCS 040
No ratings yet
BCS 040
21 pages
CT4 Models PDF
0% (1)
CT4 Models PDF
6 pages
1 Section A Answer All Questions in This Section
No ratings yet
1 Section A Answer All Questions in This Section
8 pages
1 Final-Exam
No ratings yet
1 Final-Exam
6 pages
GLM Assign
No ratings yet
GLM Assign
3 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
PM Alternate Project
No ratings yet
PM Alternate Project
2 pages
Twitter Return Vs S&P 500 Return
No ratings yet
Twitter Return Vs S&P 500 Return
7 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
2024 Fods Ques
No ratings yet
2024 Fods Ques
4 pages
Topics
No ratings yet
Topics
11 pages
Stat 305 Final Practice - Solutions
No ratings yet
Stat 305 Final Practice - Solutions
10 pages
Student Details (Student Should Fill The Content) : 14th February 2025
No ratings yet
Student Details (Student Should Fill The Content) : 14th February 2025
15 pages
PG IV 1110 Online Predictive Modelling End Term Paper
No ratings yet
PG IV 1110 Online Predictive Modelling End Term Paper
3 pages
HW3 2023
No ratings yet
HW3 2023
2 pages
HW Multiple Regression Analysis
No ratings yet
HW Multiple Regression Analysis
5 pages
Business Analystics - Model Paper
No ratings yet
Business Analystics - Model Paper
6 pages
Exam PA June 18, 2020 Project Solution: Task 1 - Explore The Data (8 Points)
No ratings yet
Exam PA June 18, 2020 Project Solution: Task 1 - Explore The Data (8 Points)
20 pages
2022 Final Exam - All
No ratings yet
2022 Final Exam - All
9 pages
IS5312 Mini Project-2
No ratings yet
IS5312 Mini Project-2
5 pages
Assignment #2 - For Statistical Software
No ratings yet
Assignment #2 - For Statistical Software
4 pages
2020 6 19 Exam Pa Project Statement PDF
No ratings yet
2020 6 19 Exam Pa Project Statement PDF
6 pages
BLAS 100 (01415) Syllabus
No ratings yet
BLAS 100 (01415) Syllabus
6 pages
Soal UAS Statu Genap 2019 2020 ENGLISH 1
No ratings yet
Soal UAS Statu Genap 2019 2020 ENGLISH 1
9 pages
CS1B September22 EXAM Clean Proof
No ratings yet
CS1B September22 EXAM Clean Proof
5 pages
Detailed Lesson-Wps Office
No ratings yet
Detailed Lesson-Wps Office
10 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
NPV 70 Marks Set 2
No ratings yet
NPV 70 Marks Set 2
4 pages
Tentamen #1 - Data Analytics and Visualization - 2020-2021
No ratings yet
Tentamen #1 - Data Analytics and Visualization - 2020-2021
6 pages
ZXCZCZ
No ratings yet
ZXCZCZ
9 pages
Music Lesson Plan For Grade 2 Rhythm Sound and Silence PDF Free
No ratings yet
Music Lesson Plan For Grade 2 Rhythm Sound and Silence PDF Free
3 pages
Rading 8
No ratings yet
Rading 8
24 pages
Peme Clinics April 2018 2
No ratings yet
Peme Clinics April 2018 2
5 pages
Linguistic Relativity Questions For Socratic Seminar
No ratings yet
Linguistic Relativity Questions For Socratic Seminar
2 pages
EMBA Application Form MSD 13-15 PDF
No ratings yet
EMBA Application Form MSD 13-15 PDF
4 pages
John Galt'S Speech Outline
No ratings yet
John Galt'S Speech Outline
1 page
Activity 1 Puzzle Nature Studies To Man and The Environment
No ratings yet
Activity 1 Puzzle Nature Studies To Man and The Environment
3 pages
Assignment 6
No ratings yet
Assignment 6
4 pages
Ismail Yusuf College of Arts, Science and Commerce, Jogehswari (E), Mumbai
No ratings yet
Ismail Yusuf College of Arts, Science and Commerce, Jogehswari (E), Mumbai
2 pages