0% found this document useful (0 votes)

131 views17 pages

ShowTime OTT Business Report

The document presents a business report on predictive modeling using linear regression for ShowTime, an OTT service provider, aiming to identify factors influencing first-day content viewership. It highlights the growth of the global OTT market, the impact of COVID-19 on viewership, and provides a detailed analysis of various factors such as visitor numbers, ad impressions, genre, and release timing. The findings suggest actionable insights for improving viewership, including optimal release days and seasons, as well as the effects of major sports events on content consumption.

Uploaded by

murali.dhiviya96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views17 pages

ShowTime OTT Business Report

Uploaded by

murali.dhiviya96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

PREDICTIVE MODELLING – LINEAR REGRESSION

SHOWTIME – OTT SERVICES

BUSINESS REPORT

PGPDSBA.O. AUG24.A

DHIVIYA MURALIDHARAN

An over-the-top (OTT) media service is a media service offered directly to viewers via the
internet. The term is most synonymous with subscription-based video-on-demand services that offer
access to film and television content, including existing series acquired from other producers, as well
as original content produced specifically for the service. They are typically accessed via websites on
personal computers, apps on smartphones and tablets, or televisions with integrated Smart TV
platforms.

Presently, OTT services are at a relatively nascent stage and are widely accepted as a trending
technology across the globe. With the increasing change in customers' social behaviour, which is
shifting from traditional subscriptions to broadcasting services and OTT on-demand video and music
subscriptions every year, OTT streaming is expected to grow at a very fast pace. The global OTT
market size was valued at $121.61 billion in 2019 and is projected to reach $1,039.03 billion by 2027,
growing at a CAGR of 29.4% from 2020 to 2027. The shift from television to OTT services for
entertainment is driven by benefits such as on-demand services, ease of access, and access to better
networks and digital connectivity.

With the outbreak of COVID19, OTT services are striving to meet the growing entertainment appetite
of viewers, with some platforms already experiencing a 46% increase in consumption and subscriber
count as viewers seek fresh content. With innovations and advanced transformations, which will
enable the customers to access everything they want in a single space, OTT platforms across the
world are expected to increasingly attract subscribers on a concurrent basis.

Objective
ShowTime is an OTT service provider and offers a wide variety of content (movies, web shows, etc.)
for its users. They want to determine the driver variables for first-day content viewership so that they
can take necessary measures to improve the viewership of the content on their platform. Some of
the reasons for the decline in viewership of content would be the decline in the number of people
coming to the platform, decreased marketing spends, content timing clashes, weekends and
holidays, etc. They have hired you as a Data Scientist, shared the data of the current content in their
platform, and asked you to analyse the data and come up with a linear regression model to
determine the driving factors for first-day viewership.

Data Description
The data contains the different factors to analyse for the content. The detailed data dictionary is
given below.

VARIABLES DESCRIPTION
Visitors Average number of visitors, in millions, to the platform in the past week
Number of ad impressions, in millions, across all ad campaigns for the
ad_impressions content
(running and completed)
major_sports_event Any major sports event on the day
genre Genre of the content
dayofweek Day of the release of the content
season Season of the release of the content
views_trailer Number of views, in millions, of the content trailer
views_content Number of first-day views, in millions, of the content
Data Overview
• Data Type

• Statistical Summary

• Statistical summary of categorical variable

Observations and Insights

• The data set contains information about first day views of the content based on the day of
week, genre, season.
• The data set contains 1000 rows with 8 attributes
• There are 5 numeric (float and int type) and 3 string (object type) columns in the data
• The target variable is the First day views of the content, which is of float type
• There are Four seasons, eight Genres and all days of the week mentioned in the Data
• The mean and standard deviation of Views of the content is 0.47 and 0.12 respectively
UNIVARIATE ANALYSIS
Visitor Distribution

• Normal distribution can be

seen with very less outlier
• Mean is around 1.7 million.

Trailer views Distribution

• Outliers can be seen heavily
• Data is heavily right –
skewed.
• Mean is around 70 million.

Content Views Distribution

• Outliers are present
• Data is slightly right-skewed
but seems to have a normal
distribution.
• Mean is around 0.47
million.
Ad impressions
Distribution

• Very less outliers

• Data is rightly-skewed and
doesn’t have any sign of
normal distribution
• Mean is around 1400.

Genre Distribution

• Others have the highest

release followed by Comedy
and Thriller.

Weekday Distribution

• Content released on
Friday has the highest
release, followed by
Wednesday
• Monday and
Tuesday have very less
content release.
Season Distribution
• Winter and Fall has the
highest release
• Other two seasons also
more or less have the same
percentage of release

Major Event Distribution

• Mostly content is release when
there is no any major sport event
happening.
BIVARIATE ANALYSIS
Correlation between numerical values
• Positive correlation
between Trailer views and
content views implying those who
have watched the trailer has
mostly watched the content.
• Mild correlation between
visitors and First day views of the
content.

• Outliers are present in all

the genre
• It is evident that when there
is no major sport event, there is
good response for the views and
content release
• The average views are
ranging from 0.4 to 0.5 million

• No major changes in Ad
campaign or benefit with respect
performed during any season.
• Conducting Ad campaigns
when there is a sport event or not
does not change anything.
• There is no much variation due to
the change in season or genre.
• Winter had the peak visitors and we
can focus on the genre action for the
release to get high views.

• Winter season the maximum

content release and First day views.
• We can see many outliers in
summer.

• There are outliers present in all days

of the week except Monday.
• Wednesday and Saturday have the
highest mean value.

• There is positive correlation

between visitors and First day views of
the content, implying those who have
visited the platform has mostly seen the
content on the first day of release.
KEY QUESTIONS
1. What does the distribution of content views look like?

It has normal distribution but slightly right

skewed and the mean values lie between 4 to 5
million.

2. What does the distribution of genres look like?

It is evident that, Genre (others) has highest
content release and most of the data comes
under this genre, followed by comedy and
thriller.

3. The day of the week on which content is released generally plays a key role in the
viewership. How, does the viewership vary with the day of release?

Wednesday and Saturday have highest

content release. Friday has the lease
viewership, while the other days has the
same average views during the content
release
4.How does the viewership vary with the season of release?

There is no much impact of season with the

viewership and release of the content. Summer has
the highest viewership followed by Winter. The
other two seasons have a decent number of views
but not very less.

5. What is the correlation between trailer views and content views?

There is positive relationship between Trailer

views and First day views which means when
there is increase in Trailer views, content views
increase correspondingly.

DATA PROCESSING
1.Duplicate and Missing values
There are no duplicate and missing values in the provided data set.
2.Feature Engineering
Since the Major Sports Event variable contains “0” and “1” values, we have replaced the
values as “0” = No and “1” = Yes.

3.Outlier Detection and Treatment

We can see many outliers in Trailer views and Content views. It will be considered
as valid and won’t be removed from the data set since there may be rare cases, where the content
release might be a block buster and the views would have reached the peak when compared to the
average views and content. Hence these outliers are considered as informative and no treated and
kept in the data for further insights.

4.Data Preparation for Modelling

• Target variable is “views of the content”
• The categorical features are – Major Sport event, Genre, Season, day of the week.
• Split the data in 80:20 ratio for train to test data.
• Number of rows in train data is 800
• Number of rows in test data is 200
Model Building – Linear Regression
This is the first OLS Regression Model without checking the VIF and P values and with other
necessary information.

Interpreting the Regression Results

1. Adjusted. R-squared: It reflects the fit of the model.

o Adjusted R-squared values generally range from 0 to 1, where a higher value

generally indicates a better fit, assuming certain conditions are met.

o In our case, the value for adj. R-squared is 0.780, which is good and its not
underfitting.

2. Constant coefficient: It is the Y-intercept.

o It means that if all the predictor variable coefficients are zero, then the expected
output (i.e., Y) would be equal to the constant coefficient.

o In our case, the value for constant coefficient is 0.0568

Observations
• The training R2 is 0.79, so the model is not underfitting

• The train and test RMSE and MAE are comparable, so the model is not overfitting either

• MAE suggests that the model can predict views of the content within a mean error of 0.039
on the test data

• MAPE of 8.6 on the test data means that we are able to predict within 8.6% of the views of
the content

In order to make statistical inferences from the model, we have to test the Linear Regression
assumptions are met.

We will be checking the following Linear Regression assumptions:

1. No Multicollinearity – By checking VIF

2. Linearity of variables

3. Independence of error terms

4. Normality of error terms

5. No Heteroscedasticity

1.No Multicollinearity
By checking the variance inflation factor, it is observed that no value is greater than 5 ,
stating there is no multicollinearity .

Dealing with High P-values

• Build a model, check the p-values of the variables, and drop the
column with the highest p-value

• Create a new model without the dropped feature, check the p-

values of the variables, and drop the column with the highest p-value

Repeat the above two steps till there are no columns with p-value > 0.05
Observations after dealing with high p-values
• Now no feature has p-value greater than 0.05, so we'll consider the features in x_train1 as
the final set of predictor variables and olsmod2 as the final model to move forward with

• Now adjusted R-squared is 0.78, i.e., our model is able to explain ~78% of the variance

• RMSE and MAE values are comparable for train and test sets, indicating that the model is not
overfitting

2&3. Linearity of variables and Independence of error Terms.

Observations
• The scatter plot shows the distribution of residuals (errors) vs fitted values (predicted
values).

• If there exist any pattern in this plot, we consider it as signs of non-linearity in the data and a
pattern means that the model doesn't capture non-linear effects.

• We see no pattern in the plot above. Hence, the assumptions of linearity and independence
are satisfied.
4.Normality of error terms

Observations
• The histogram of residuals does have a bell shape.

• The residuals more or less follow a straight line except for the tails.
• Since p-value > 0.05, the residuals are normally distributed as per the Shapiro-Wilk
test.
• So, the assumption is satisfied.

5.No Heteroscedasticity

Observations

• The residual vs fitted values plot can be looked at to check for homoscedasticity. In the case
of heteroscedasticity, the residuals can form an arrow shape or any other non-symmetrical
shape, since the plot does not form an arrow shape, there is no heteroscedasticity.
• The goldfeldquandt test can also be used. If we get a p-value > 0.05 we can say that the
residuals are homoscedastic. Otherwise, they are heteroscedastic. Since p-value > 0.05, we
can say that the residuals are homoscedastic. So, this assumption is satisfied.
PREDICTIONS ON TEST DATA

We can observe here that our model has returned pretty

good prediction results, and the actual and predicted values are
comparable

FINAL MODEL

Observations

• The model is able to explain ~78% of the variation in the data

• The train and test RMSE and MAE are low and comparable. So, our model is not suffering
from overfitting
• The MAPE on the test set suggests we can predict within 8.8% of First day views of the
content
• Hence, we can conclude the model olsmodel_final is good for prediction as well as inference
purposes
Actionable Insights & Recommendations

• The model's R-squared value is approximately 0.780, and the adjusted R-squared is 0.784,
indicating that the model can explain about 79% of the variance in the data. This is quite
satisfactory.
• An increase of one unit in the 'visitors' variable results in a 0.1253unit increase in content
viewership, with all other variables held constant.
• Releasing content on Saturdays and Wednesdays will boost viewership, provided no major
sports events occur on those days.
• Releasing content during the summer season can enhance viewership
• To improve content viewership, it is recommended to avoid releasing content on days when
major sports events are happening.
• In Genre "Other" categories has the highest counts followed by the Comedy, Thriller, Drama
etc.
• A major sports event will lead to a 0.0605 unit decrease in content viewership.
• As data set has the less variable so there should be detailed information or other variable
also provided for prediction to increase in the viewership.
• The summer season can result in a 0.0442 unit increase in content viewership.

Netflix Movies and TV Shows Clustering
No ratings yet
Netflix Movies and TV Shows Clustering
29 pages
Netflix Data Analysis
No ratings yet
Netflix Data Analysis
11 pages
Business Report - Predictive Modeling - ShowTime
No ratings yet
Business Report - Predictive Modeling - ShowTime
20 pages
PM Coded Project Sample Business Report
No ratings yet
PM Coded Project Sample Business Report
32 pages
PM+Coded+Project+Report KC
No ratings yet
PM+Coded+Project+Report KC
29 pages
Predictive Modelling Coded Project
No ratings yet
Predictive Modelling Coded Project
34 pages
Predictive Modelling Coded Project
No ratings yet
Predictive Modelling Coded Project
33 pages
PM Coded Project Business Report
No ratings yet
PM Coded Project Business Report
30 pages
Ott Data Project
No ratings yet
Ott Data Project
40 pages
WWWW
No ratings yet
WWWW
26 pages
Suspense and Surprise in Media Product Design Evidence From Twitch
No ratings yet
Suspense and Surprise in Media Product Design Evidence From Twitch
24 pages
TV Ratings
No ratings yet
TV Ratings
25 pages
Article 3 Journal
No ratings yet
Article 3 Journal
18 pages
Capstone 1 Presentation
No ratings yet
Capstone 1 Presentation
17 pages
Final 62512477289218
No ratings yet
Final 62512477289218
17 pages
IDV Report Group4
No ratings yet
IDV Report Group4
26 pages
Playground Trends and Points
No ratings yet
Playground Trends and Points
4 pages
Research Report Netflix India Group 4
100% (3)
Research Report Netflix India Group 4
31 pages
A Study of Online Audience Usage by Colleen McAndrew
No ratings yet
A Study of Online Audience Usage by Colleen McAndrew
61 pages
The Impact of Netflix's Controversial and Illicit Content On Changing Users ' Attitudes in Egypt
No ratings yet
The Impact of Netflix's Controversial and Illicit Content On Changing Users ' Attitudes in Egypt
47 pages
2 9-13 Netflix+and+Youth
No ratings yet
2 9-13 Netflix+and+Youth
5 pages
Clustering Presentation
No ratings yet
Clustering Presentation
25 pages
What Is The Impact of Audience Sentiment On Content Viewership?
No ratings yet
What Is The Impact of Audience Sentiment On Content Viewership?
7 pages
Homework Assignment-1
No ratings yet
Homework Assignment-1
3 pages
MAT Items DVD
No ratings yet
MAT Items DVD
7 pages
Annotated-Data 20report-Engl2210-6
No ratings yet
Annotated-Data 20report-Engl2210-6
15 pages
Final Netflix Group Project
No ratings yet
Final Netflix Group Project
31 pages
Netflix Content Analysis Using Python
No ratings yet
Netflix Content Analysis Using Python
16 pages
OTT vs. Cinemas The Future Trend in The Movie and Entertainment Sector
100% (1)
OTT vs. Cinemas The Future Trend in The Movie and Entertainment Sector
11 pages
Outline-Case 2
No ratings yet
Outline-Case 2
10 pages
Ijsred V7i6p1
No ratings yet
Ijsred V7i6p1
4 pages
Problem Statement - ShowTime
No ratings yet
Problem Statement - ShowTime
3 pages
Big Data Management - Assessment 4 - Answer Template - Computing BDM
No ratings yet
Big Data Management - Assessment 4 - Answer Template - Computing BDM
14 pages
Consumer Perception Towards Marketing Campaign of Netflix India
No ratings yet
Consumer Perception Towards Marketing Campaign of Netflix India
14 pages
Adoption and Challenges of OTT During Covid-19
No ratings yet
Adoption and Challenges of OTT During Covid-19
8 pages
Statistical Case Solve CH 3
No ratings yet
Statistical Case Solve CH 3
10 pages
DM Theory Mid Term
No ratings yet
DM Theory Mid Term
9 pages
The Future of Audience Measuring
No ratings yet
The Future of Audience Measuring
3 pages
Document
No ratings yet
Document
2 pages
Netflix Case Study Yash
No ratings yet
Netflix Case Study Yash
10 pages
Emerging Trend of OTT Platform During Covid 19 Research Paper
0% (1)
Emerging Trend of OTT Platform During Covid 19 Research Paper
20 pages
Rise of OTT Platforms: Effect of The C-19 Pandemic
No ratings yet
Rise of OTT Platforms: Effect of The C-19 Pandemic
11 pages
R Project 98
No ratings yet
R Project 98
15 pages
Netflix
No ratings yet
Netflix
7 pages
McAndrew Thesis Abstract
No ratings yet
McAndrew Thesis Abstract
13 pages
20bce0630 VL2022230504993 Pe003
No ratings yet
20bce0630 VL2022230504993 Pe003
11 pages
Using Machine Learning To Predict Future TV Ratings
No ratings yet
Using Machine Learning To Predict Future TV Ratings
13 pages
Presentation
No ratings yet
Presentation
16 pages
Sports Project Bba
No ratings yet
Sports Project Bba
10 pages
RFM
No ratings yet
RFM
3 pages
7153-Article Text-13270-1-10-20200227 PDF
No ratings yet
7153-Article Text-13270-1-10-20200227 PDF
8 pages
Running Head: Netflix Brand Audit 1
No ratings yet
Running Head: Netflix Brand Audit 1
12 pages
Netflix Survey Report
No ratings yet
Netflix Survey Report
7 pages
Jmp004 Film On The Rocks
No ratings yet
Jmp004 Film On The Rocks
9 pages
The Bellkor 2008 Solution To The Netflix Prize
No ratings yet
The Bellkor 2008 Solution To The Netflix Prize
21 pages
Technical Documenetflix Technicalnt
No ratings yet
Technical Documenetflix Technicalnt
15 pages
Finance Risk Analytics - Priyanka Sharma - Business Report
No ratings yet
Finance Risk Analytics - Priyanka Sharma - Business Report
49 pages
Fi
No ratings yet
Fi
5 pages
Car Claims For Insurance Corrupt
No ratings yet
Car Claims For Insurance Corrupt
641 pages
Data Analytics Mod 2 Netflix Case Study
No ratings yet
Data Analytics Mod 2 Netflix Case Study
4 pages
Statistics For Business Decision Making and Analysis 3rd Edition by Robert Stine Ebook and TestBank Bundle Verified PDF
No ratings yet
Statistics For Business Decision Making and Analysis 3rd Edition by Robert Stine Ebook and TestBank Bundle Verified PDF
415 pages
Machine Learning 2 Working-pages-Deleted
No ratings yet
Machine Learning 2 Working-pages-Deleted
16 pages
Use of Linear Regression For Time Series Prediction
No ratings yet
Use of Linear Regression For Time Series Prediction
38 pages
Listening Skills
No ratings yet
Listening Skills
5 pages
Java Lab Record 2024
No ratings yet
Java Lab Record 2024
43 pages
Multiple Linear Regression Test - 2025
No ratings yet
Multiple Linear Regression Test - 2025
47 pages
Sta334 Group Report
No ratings yet
Sta334 Group Report
40 pages
Cycle Test 2 Important Questions and Answers
No ratings yet
Cycle Test 2 Important Questions and Answers
27 pages
Assignment Python
No ratings yet
Assignment Python
25 pages
Nur
No ratings yet
Nur
38 pages
Regression
No ratings yet
Regression
12 pages
Theinfluenceoflearningvalueonlearningmanagementsystemuse Anextensionof UTAUT2
100% (1)
Theinfluenceoflearningvalueonlearningmanagementsystemuse Anextensionof UTAUT2
17 pages
SUBQUERIES - Basics
No ratings yet
SUBQUERIES - Basics
8 pages
Chapter 14 - Multiple Regression - 2019
No ratings yet
Chapter 14 - Multiple Regression - 2019
71 pages
KEBRI DEHAR UNIVERSITY - MoSHE Model For ABVM
No ratings yet
KEBRI DEHAR UNIVERSITY - MoSHE Model For ABVM
58 pages
Laptop Price Predicton Report
No ratings yet
Laptop Price Predicton Report
30 pages
Manasi Final Dissertation Report
No ratings yet
Manasi Final Dissertation Report
33 pages
3.linear Regression
No ratings yet
3.linear Regression
33 pages
Lab1-01 - Amazon Sagemaker Data Wrangling and Features Storel
No ratings yet
Lab1-01 - Amazon Sagemaker Data Wrangling and Features Storel
47 pages
33 - IJEEP - 14557 - Hidayat and Zuhroh, 2023
No ratings yet
33 - IJEEP - 14557 - Hidayat and Zuhroh, 2023
10 pages
Determinants of Households Vulnerability To Food Insecurity in
No ratings yet
Determinants of Households Vulnerability To Food Insecurity in
11 pages
1398-Article Text-8256-7-10-20231011
No ratings yet
1398-Article Text-8256-7-10-20231011
11 pages
Business Report
No ratings yet
Business Report
5 pages
Revue VIETNAM
No ratings yet
Revue VIETNAM
14 pages
Lecture - 8 MLR
No ratings yet
Lecture - 8 MLR
63 pages
Gregory 2019, Predictors of Women's Satisfaction With Prenatal Care in A Canadian Setting
No ratings yet
Gregory 2019, Predictors of Women's Satisfaction With Prenatal Care in A Canadian Setting
10 pages
Busch Et Al 2024 Few LGBTQ Science and Engineering Instructors Come Out To Students Despite Potential Benefits
No ratings yet
Busch Et Al 2024 Few LGBTQ Science and Engineering Instructors Come Out To Students Despite Potential Benefits
18 pages
TUTORIAL 7: Multiple Linear Regression I. Multiple Regression
No ratings yet
TUTORIAL 7: Multiple Linear Regression I. Multiple Regression
6 pages
01 - Notes - Chapter 9
No ratings yet
01 - Notes - Chapter 9
31 pages
EBE Ch4
No ratings yet
EBE Ch4
7 pages
Hazrina Ishmah, Solimun, Maria Bernadetha Theresia Mitakda
No ratings yet
Hazrina Ishmah, Solimun, Maria Bernadetha Theresia Mitakda
6 pages
The Effects of Business Digitalization and Knowledge Management P
No ratings yet
The Effects of Business Digitalization and Knowledge Management P
18 pages
Jurnal Darma Manalu
No ratings yet
Jurnal Darma Manalu
11 pages
R - Companion To Applied Regression
No ratings yet
R - Companion To Applied Regression
10 pages
Chapter 08
No ratings yet
Chapter 08
3 pages
Global TV: Exporting Television and Culture in the World Market
From Everand
Global TV: Exporting Television and Culture in the World Market
Denise D Bielby
No ratings yet
Infographics: The Power of Visual Storytelling
From Everand
Infographics: The Power of Visual Storytelling
Jason Lankow
4/5 (20)

ShowTime OTT Business Report

Uploaded by

ShowTime OTT Business Report

Uploaded by

PREDICTIVE MODELLING – LINEAR REGRESSION

SHOWTIME – OTT SERVICES

• Statistical summary of categorical variable

Observations and Insights

• Normal distribution can be

Trailer views Distribution

Content Views Distribution

• Very less outliers

• Others have the highest

Major Event Distribution

• Outliers are present in all

• Winter season the maximum

• There are outliers present in all days

• There is positive correlation

It has normal distribution but slightly right

2. What does the distribution of genres look like?

Wednesday and Saturday have highest

There is no much impact of season with the

5. What is the correlation between trailer views and content views?

There is positive relationship between Trailer

3.Outlier Detection and Treatment

4.Data Preparation for Modelling

Interpreting the Regression Results

o Adjusted R-squared values generally range from 0 to 1, where a higher value

2. Constant coefficient: It is the Y-intercept.

o In our case, the value for constant coefficient is 0.0568

We will be checking the following Linear Regression assumptions:

1. No Multicollinearity – By checking VIF

3. Independence of error terms

4. Normality of error terms

Dealing with High P-values

• Create a new model without the dropped feature, check the p-

2&3. Linearity of variables and Independence of error Terms.

We can observe here that our model has returned pretty

• The model is able to explain ~78% of the variation in the data

You might also like