0% found this document useful (0 votes)

30 views12 pages

IS5740 W05 Tutorial Note (Regression)

The document describes using linear regression to predict movie box office collections for Netflix. Various regression models are created and compared using a movie dataset containing 506 movies and 18 variables. Models include a full regression, reduced regression removing correlated variables, and backward and forward stepwise regressions. Model performance is evaluated on training and validation data using RMSE and other fit statistics.

Uploaded by

aryaynl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views12 pages

IS5740 W05 Tutorial Note (Regression)

Uploaded by

aryaynl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

IS5740 Mgt.

Support & BI Systems

W05 Linear Regression

Business Problem: As an OTT platform, Netflix intends to introduce promotional programs
for new movies in their early stages. In pursuit of more impactful promotions, Netflix aims
to forecast the box office collections for the initial three months following a movie's
release.
I. Data Source: Movie.xlsx
 Suppose you're working at Netflix, where you've been tasked with predicting the box
office collection for the three months following a movie's release. The insights gained
from this prediction will aid your department in deciding on marketing expenses for
promotional campaigns.
 Your department possesses historical data comprising information on 506 movies,
including 18 variables.
Name Role Level Description

Marketing expense Input Interval Expense for promotions

Production expense Input Interval Expense for movie production

Multiplex coverage Input Interval Percentage of multiplexes showing a movie

Budget Input Interval Total budget for Production, Meeting, and Casting Fees

Movie_length Input Interval The length of a movie (minutes)

Lead_Actor_Rating Input Interval A lead actor’s rating

Lead_Actress_rating Input Interval A lead actress’s rating

Director_rating Input Interval A director’s rating

Producer_rating Input Interval A producer’s rating

Critic_rating Input Interval Critics score
Trailer_views Input Interval Number of movie trailer views

_3D_available Input Binary Whether 3D was used

Twitter_hashtags Input Interval Number of Twitter hashtags

Genre Input Nominal Genre (i.e., Action, Comedy, Drama, and Thriller)

Avg_age_actors Input Interval Average age of all actors

Time_taken input Interval The number of days after its release

Num_multiplex Input Interval Number of multiplexes showing a movie

Boxoffice Target Interval Number of tickets sold in Box-Offices

II. Predict the box-office collection by a regression model

1. Create your SAS E-Miner project
2. Download Movie.xlsx from the Canvas.

1
IS5740 Mgt. Support & BI Systems

3. Create your diagram called “W05_Movie”

4. Select the “File Import” node in the Sample tab and add the node to your diagram
5. Rename the “File Import” node “Movie” by right-click on your mouse.

6. Click the “Movie” file import node and click the button of the “Import File” option. Find
your movie.xlsx and import it.

7. Click the right button of your mouse and click the “Edit Variable”.

2
IS5740 Mgt. Support & BI Systems

8. Edit the variables of the movie dataset as below.

9. Click on the 'Movie' data node and navigate to the property window. Change the
'Summarize' option to 'Yes'.

10. Let’s conduct data visualization using the GraphExplore node. Don’t forget to set the Size
option to “Max”.

3
IS5740 Mgt. Support & BI Systems

a. Check the histogram of the target variable.

b. Please check the scatter plot matrix of all input variables, and then check the matrix.

11. Select the Data Partition node icon in the Sample tab. Drag the node into the Diagram
Workspace. Connect the Movie data node to the Data Partition node. Click the “Data
Partition” node and put 50% for training and 50% for validation, 0% for test data in the

4
IS5740 Mgt. Support & BI Systems

property window. Make sure that the total, train, and validation datasets are all
balanced.

12. Check the result of data partition.

13. We could do logistic regression using Regression node (the same node as linear
regression). Select the Model tab. Drag a Regression tool into the diagram workspace.
Connect the Data Partition node to the Regression node.
a. Rename the regression node as “Full Regression”.
b. Select the Regression node and examine the Property panel. By default, the
regression type is logistic, so we have to change it to “Linear Regression”. Rename
it to “Full Regression”

5
IS5740 Mgt. Support & BI Systems

14. Run the Full Regression node.

15. First, go to the property window. Click the small button in the right side of the
“Exported Data”.

16. In the pop-up window, select the “VALIDATE’, and click the “Browse..” button.

17. You can see the columns of ‘Box Office’, and ‘Predicted Box Office’, and ‘Residual Box
Office’

6
IS5740 Mgt. Support & BI Systems

18. Click the ‘Full Regression’ node, and click the right button of your mouse. Select the
‘Results’.

- Score Rankings Matrix — The data were sorted by a target variable in ascending. Y-
axis shows a target variable, and X-axis shows the percentage of used observation.

- Effects Plot — displays a bar graph of the absolute values of the coefficients in the
final model. The bars are color coded to indicate the algebraic signs of the coefficients.

a. Maximize the output window. Check the r-square and model significance. Also, you need
to check which variables have significant effects on a target variable.

7
IS5740 Mgt. Support & BI Systems

b. Restore the Output window to its original size by double-clicking its title bar. Maximize the
Fit Statistics window.

If estimate predictions are the focus, model fit can be assessed by RMSE. There appears to be
some discrepancy between the values of these two statistics in the train and validation data.

III. Model Selection

1. Reduced variables
Look at the scatter matrix. There are variables which are highly correlated: Ratings
(Director_rating, Lead_Actor_rating, Lead_Actress_rating, and Producer_rating).
a. Select the Regression node and examine the Property panel. By default, the
regression type is logistic, so we have to change it to “Linear Regression”.
Rename it to “Reduced Regression”

8
IS5740 Mgt. Support & BI Systems

b. By clicking the right button of your mouse, open “Edit Variables”. Set the below
variables’ uses to “No”. Check the results!
Lead_Actor_rating, Lead_Actress_rating, and Producer_rating

c. Run the Reduced regression node, and check the results.

9
IS5740 Mgt. Support & BI Systems

2. Backward
a. Add the “Regression” node in the Model tab. Rename it to “Backward Regression”.
b. Select Selection Model  Backward on the Regression node Properties panel.
c. Connect the “Forward Regression” node to the “Data Partition” node.
d. Run and check the results.

3. Forward
Add the “Regression” node in the Model tab. Rename it to “Forward Regression”. Repeat
the above steps.

IV. Model Comparisons

1. Add the “Model Comparison” node from the Assess tab, and make the connections all
regression nodes and the model comparison node.

10
IS5740 Mgt. Support & BI Systems

2. Run the model comparison node, and check the results

3. In the Output window, see the “Fit Statistics Model Selection based on Valid: Average Squared
Error”.

11
IS5740 Mgt. Support & BI Systems

4. Please check the RMSE of all models.

1) Training dataset

2) Validation dataset

L3 Demo - Building A Linear Regression
No ratings yet
L3 Demo - Building A Linear Regression
60 pages
BSD 3101-Lab Exercise 1
No ratings yet
BSD 3101-Lab Exercise 1
12 pages
ISDA 5 Predictive Analytics 2024
No ratings yet
ISDA 5 Predictive Analytics 2024
62 pages
SAS Studio 10 Linear Regression With Assumptions
No ratings yet
SAS Studio 10 Linear Regression With Assumptions
5 pages
MIS BA 20232024 Practical Chapter03
No ratings yet
MIS BA 20232024 Practical Chapter03
2 pages
Essay Topics Grade 11
100% (2)
Essay Topics Grade 11
5 pages
6648 0400 5 PS Pi 0001 - F PDF
100% (1)
6648 0400 5 PS Pi 0001 - F PDF
97 pages
Teen Smart Prep 2 2020
No ratings yet
Teen Smart Prep 2 2020
151 pages
Paper I Telugu 8th Jan 2025 Shift 1
No ratings yet
Paper I Telugu 8th Jan 2025 Shift 1
88 pages
Quantitative Methods II Mid-Term Examination: Instructions
100% (1)
Quantitative Methods II Mid-Term Examination: Instructions
17 pages
Q3 Gender 2018 Sex Gender Nature Nurture
No ratings yet
Q3 Gender 2018 Sex Gender Nature Nurture
5 pages
Trainz 2004 DRAFT Content Creation Procedures
100% (1)
Trainz 2004 DRAFT Content Creation Procedures
101 pages
Resumen Productos Datalogic SENSORES
No ratings yet
Resumen Productos Datalogic SENSORES
219 pages
Practice Problems - Week 2 (ANSWERS)
No ratings yet
Practice Problems - Week 2 (ANSWERS)
2 pages
Warmups Linear Functions 8 TH Grade Math Common Core Standards
No ratings yet
Warmups Linear Functions 8 TH Grade Math Common Core Standards
61 pages
Caterpillar ISO Symbols
100% (2)
Caterpillar ISO Symbols
55 pages
Expansion of Theme
100% (2)
Expansion of Theme
10 pages
Get General Organic and Biochemistry 4th Edition Katherine Denniston Free All Chapters
100% (7)
Get General Organic and Biochemistry 4th Edition Katherine Denniston Free All Chapters
82 pages
London City Hall: Architectural Analysis Course: Intelligent Building
100% (2)
London City Hall: Architectural Analysis Course: Intelligent Building
17 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
88 pages
CISB474 Business Analytics Sem 2 Year 2016/2017 Lab 8 Linear Regression
No ratings yet
CISB474 Business Analytics Sem 2 Year 2016/2017 Lab 8 Linear Regression
4 pages
SinclairCollins K-Series 02 2016
No ratings yet
SinclairCollins K-Series 02 2016
20 pages
Schema de Principe Electrical Schematic
No ratings yet
Schema de Principe Electrical Schematic
78 pages
Semi-Detailed Lesson Plan in English 8
100% (1)
Semi-Detailed Lesson Plan in English 8
2 pages
Motion 1 QP
No ratings yet
Motion 1 QP
15 pages
Customer Inquiry Report-9
No ratings yet
Customer Inquiry Report-9
7 pages
Practice Problems For Solid Geometry
No ratings yet
Practice Problems For Solid Geometry
12 pages
Read The Masterplan
No ratings yet
Read The Masterplan
47 pages
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
No ratings yet
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
15 pages
Irr 7920
No ratings yet
Irr 7920
15 pages
Figure of Speech
No ratings yet
Figure of Speech
4 pages
Computerised Assessment of Handwriting
No ratings yet
Computerised Assessment of Handwriting
15 pages
Ephesians: What To Do
No ratings yet
Ephesians: What To Do
8 pages
How To Make A Good Presentation
No ratings yet
How To Make A Good Presentation
34 pages
Day 4 Plastic Pollution Ielts Nguyenhuyen
No ratings yet
Day 4 Plastic Pollution Ielts Nguyenhuyen
1 page
Pickle Brand Auditing and Strengthening
No ratings yet
Pickle Brand Auditing and Strengthening
34 pages
Compact, High-Flow, Electric Remote Controlled Water Monitor
No ratings yet
Compact, High-Flow, Electric Remote Controlled Water Monitor
2 pages
Isd Process V1
100% (1)
Isd Process V1
3 pages
Microprediction: Building an Open AI Network
From Everand
Microprediction: Building an Open AI Network
Peter Cotton
No ratings yet
Mastering Python for Finance
From Everand
Mastering Python for Finance
James Ma Weiming
5/5 (1)
Minitab® and Lean Six Sigma: A Guide to Improve Business Performance Metrics
From Everand
Minitab® and Lean Six Sigma: A Guide to Improve Business Performance Metrics
Forrest Breyfogle
5/5 (1)
Microsoft Dynamics GP 2010 Reporting
From Everand
Microsoft Dynamics GP 2010 Reporting
Christopher Liley
5/5 (2)
Microsoft Dynamics GP 2013 Reporting, Second Edition
From Everand
Microsoft Dynamics GP 2013 Reporting, Second Edition
David Duncan
5/5 (2)
Improved Performance Research Integration Tool User Guide - Version 4.6
From Everand
Improved Performance Research Integration Tool User Guide - Version 4.6
Beth Plott
No ratings yet
Mastering Symfony
From Everand
Mastering Symfony
Sohail Salehi
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Certified Solidworks Professional Advanced Weldments Exam Preparation
From Everand
Certified Solidworks Professional Advanced Weldments Exam Preparation
Matt G Boston
5/5 (2)
JasperReports 3.5 for Java Developers
From Everand
JasperReports 3.5 for Java Developers
David R. Heffelfinger
No ratings yet
Backtrader Essentials: Building Successful Strategies with Python
From Everand
Backtrader Essentials: Building Successful Strategies with Python
Ali AZARY
No ratings yet
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
Oracle Warehouse Builder 11g: Getting Started
From Everand
Oracle Warehouse Builder 11g: Getting Started
Bob Griesemer
No ratings yet
Oracle Hyperion Interactive Reporting 11 Expert Guide
From Everand
Oracle Hyperion Interactive Reporting 11 Expert Guide
Edward J. Cody
No ratings yet
Learning Highcharts
From Everand
Learning Highcharts
Joe Kuan
No ratings yet
VMware vRealize Operations Essentials: Harness the power of VMware vRealize Operations to efficiently manage your IT infrastructure
From Everand
VMware vRealize Operations Essentials: Harness the power of VMware vRealize Operations to efficiently manage your IT infrastructure
Matthew Steiner
No ratings yet
C Programming Pocket Primer: An Essential Guide to C Programming Basics
From Everand
C Programming Pocket Primer: An Essential Guide to C Programming Basics
Mercury Learning and Information
No ratings yet
Microsoft Visio 2010 Business Process Diagramming and Validation
From Everand
Microsoft Visio 2010 Business Process Diagramming and Validation
David J. Parker
No ratings yet
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
The Business Analyst's Guide to Oracle Hyperion Interactive Reporting 11
From Everand
The Business Analyst's Guide to Oracle Hyperion Interactive Reporting 11
Edward J. Cody
5/5 (1)
C++ for Game Developers: Building Scalable and Robust Gaming Applications
From Everand
C++ for Game Developers: Building Scalable and Robust Gaming Applications
Jarrel E.
No ratings yet
IBM Cognos Business Intelligence
From Everand
IBM Cognos Business Intelligence
Dustin Adkison
No ratings yet
Building Dashboards with Microsoft Dynamics GP 2013 and Excel 2013
From Everand
Building Dashboards with Microsoft Dynamics GP 2013 and Excel 2013
Mark Polino
No ratings yet
The Beginner’s Guide to Unreal Engine Building Complete Games: The Beginner’s Guide to Unreal Engine, #3
From Everand
The Beginner’s Guide to Unreal Engine Building Complete Games: The Beginner’s Guide to Unreal Engine, #3
Steven Mcananey
No ratings yet
Ecosystem-Led Growth: A Blueprint for Sales and Marketing Success Using the Power of Partnerships
From Everand
Ecosystem-Led Growth: A Blueprint for Sales and Marketing Success Using the Power of Partnerships
Bob Moore
No ratings yet
Microsoft Power Platform Up and Running: Learn to Analyze Data, Create Solutions, Automate Processes, and Develop Virtual Agents with Low Code Programming (English Edition)
From Everand
Microsoft Power Platform Up and Running: Learn to Analyze Data, Create Solutions, Automate Processes, and Develop Virtual Agents with Low Code Programming (English Edition)
Robert Rybaric
5/5 (1)
Mastering Ext JS - Second Edition
From Everand
Mastering Ext JS - Second Edition
Loiane Groner
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
From Everand
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
Michael Gilliland
No ratings yet
AutoIT Scripting For Beginners
From Everand
AutoIT Scripting For Beginners
Rajan
5/5 (2)
SAP C_THR85_2405 Exam Prep: 500 Practice Questions for SuccessFactors Succession Management Certification
From Everand
SAP C_THR85_2405 Exam Prep: 500 Practice Questions for SuccessFactors Succession Management Certification
Steve Brown
No ratings yet
Modeling and Simulation of Discrete Event Systems
From Everand
Modeling and Simulation of Discrete Event Systems
Byoung Kyu Choi
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
From Everand
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
Georgio Daccache
No ratings yet
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
SAP Variant Configuration: Your Successful Guide to Modeling
From Everand
SAP Variant Configuration: Your Successful Guide to Modeling
Mike Piehl
5/5 (2)
SolidWorks 2015 Learn by doing-Part 3 (DimXpert and Rendering)
From Everand
SolidWorks 2015 Learn by doing-Part 3 (DimXpert and Rendering)
Tutorial Books
4.5/5 (5)
20-Minute (Or Less) Animation Hacks
From Everand
20-Minute (Or Less) Animation Hacks
Sheela Preuitt
No ratings yet
SolidWorks 2016 Learn by doing 2016 - Part 3
From Everand
SolidWorks 2016 Learn by doing 2016 - Part 3
Tutorial Books
3.5/5 (3)
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
Java EE 7 Application Developer 1Z0 900
From Everand
Java EE 7 Application Developer 1Z0 900
Manish Soni
No ratings yet
Practice Questions for UiPath Certified RPA Associate Case Based
From Everand
Practice Questions for UiPath Certified RPA Associate Case Based
Exam OG
No ratings yet
Autodesk Maya 2018: A Comprehensive Guide, 10th Edition
From Everand
Autodesk Maya 2018: A Comprehensive Guide, 10th Edition
Prof. Sham Tickoo
No ratings yet
SC-200: Microsoft Security Operations Analyst Preparation
From Everand
SC-200: Microsoft Security Operations Analyst Preparation
Georgio Daccache
No ratings yet
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
From Everand
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
equitypress
3.5/5 (2)
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
Hibernate, Spring & Struts Interview Questions You'll Most Likely Be Asked
From Everand
Hibernate, Spring & Struts Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Solidworks 2018 Learn by Doing - Part 3: DimXpert and Rendering
From Everand
Solidworks 2018 Learn by Doing - Part 3: DimXpert and Rendering
Tutorial Books
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SOLIDWORKS 2017 Learn by doing - Part 3
From Everand
SOLIDWORKS 2017 Learn by doing - Part 3
Tutorial Books
No ratings yet

IS5740 W05 Tutorial Note (Regression)

Uploaded by

IS5740 W05 Tutorial Note (Regression)

Uploaded by

IS5740 Mgt.

Support & BI Systems

W05 Linear Regression

Marketing expense Input Interval Expense for promotions

Production expense Input Interval Expense for movie production

Movie_length Input Interval The length of a movie (minutes)

Lead_Actor_Rating Input Interval A lead actor’s rating

Lead_Actress_rating Input Interval A lead actress’s rating

Producer_rating Input Interval A producer’s rating

_3D_available Input Binary Whether 3D was used

Twitter_hashtags Input Interval Number of Twitter hashtags

Avg_age_actors Input Interval Average age of all actors

Num_multiplex Input Interval Number of multiplexes showing a movie

Boxoffice Target Interval Number of tickets sold in Box-Offices

II. Predict the box-office collection by a regression model

3. Create your diagram called “W05_Movie”

8. Edit the variables of the movie dataset as below.

a. Check the histogram of the target variable.

12. Check the result of data partition.

14. Run the Full Regression node.

III. Model Selection

c. Run the Reduced regression node, and check the results.

IV. Model Comparisons

2. Run the model comparison node, and check the results

4. Please check the RMSE of all models.

You might also like