0% found this document useful (0 votes)

43 views16 pages

COM 428 - Jupyter Notebook2 - 101223

Uploaded by

Kimondo King

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views16 pages

COM 428 - Jupyter Notebook2 - 101223

Uploaded by

Kimondo King

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

COM 428E Data Mining and Warehousing

GROUP MEMBERS
1. JAMES KIMONDO P107/1111G/18
2. AGNES MUTISYA P107/1180G/19
3. KIPKOECH ENOCK P107/1202G/19
4. KIPNGETICH TALAM P107/1138G/18
5. CHEPNGENO MILLICENT P107/1215G/19
6. KIOKO TIMOTHY P107/1196G/19
7. POLINE SILAS P107/1191G/19
8. JEDIDAH NDERITU P107/1167G/19
9. SIMON MUTIO P107/1149G/19
10. IVY MBOGO P107/1209G/17

Project: The Movie Database Analysis

Table of Contents
Introduction
Problem Statement and Formulate Hypothesis
Data Mining Technique
Data Collection
Data Preprocessing
Explanatory Data Analysis
Estimate the Model
Model Intepretation and Conclusion

Introduction

Dataset Description
The research background of the given dataset is to understand the factors that are associated
with the revenue generated by movies. The dataset contains information about 10,000 movies
collected from The Movie Database (TMDb), including user ratings and revenue. The data
includes multiple variables such as genres, cast, budget, release date, and revenue, among
others. The research aims to identify the properties and characteristics of movies that generate
high revenue. Moreover, it also intends to explore the most popular genres over the years to
analyze the trends in movie preferences. The dataset provides an opportunity to analyze and
model the factors associated with revenue and genrepreferences.

localhost:8888/notebooks/Downloads/COM 428.ipynb# 1/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

In [33]:  #importation of libraries

import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.simplefilter('ignore')

Problem Statement and Formulate Hypothesis

Problem Statement

The aim of this project is to explore the The Movie Database (TMDb) and
identify the most popular genres from year to year, and determine the
properties associated with movies that have high revenues.

Hypothesis

Movies with higher budgets tend to have higher revenues.

Movies with certain genres tend to have higher revenues.
Movies with higher user ratings tend to have higher revenues.
Movies with certain cast members tend to have higher revenues.

Data Mining Technique

Regression analysis is a statistical technique used in data mining to identify the
relationship between a dependent variable and one or more independent variables. It is
used to model the relationship between variables and to make predictions about future
data points.
We will consider a Multiple regression model which is a type of regression analysis that
uses more than one independent variable to predict the dependent variable. It extends
simple linear regression by allowing for more complex relationships between variables.
The goal is to find the best combination of predictor variables that can explain the variation
in the outcome variable. It is often used in data mining because it allows for more accurate
predictions and a better understanding of the underlying relationships between variables.
It can be used to analyze complex data sets with many variables, and it can identify which
variables are most important for predicting the outcome variable. Multiple regression can
also be used to control for confounding variables and to assess the statistical significance
of the relationships between variables. Overall, multiple regression is a powerful and
versatile tool in data mining that can help to uncover important insights and make accurate
predictions about future outcomes.
localhost:8888/notebooks/Downloads/COM 428.ipynb# 2/16
4/3/23, 10:04 PM COM 428 - Jupyter Notebook

Data Collection
The TMDb dataset was obtained from an online platform Kaggle.com

The data contains:

Total Rows = 10866

Total Columns = 21

In [3]:  TMD=pd.read_csv('tmdb_movies_data.csv')
TMD.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10866 entries, 0 to 10865
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 10866 non-null int64
1 imdb_id 10856 non-null object
2 popularity 10866 non-null float64
3 budget 10866 non-null int64
4 revenue 10866 non-null int64
5 original_title 10866 non-null object
6 cast 10790 non-null object
7 homepage 2936 non-null object
8 director 10822 non-null object
9 tagline 8042 non-null object
10 keywords 9373 non-null object
11 overview 10862 non-null object
12 runtime 10866 non-null int64
13 genres 10843 non-null object
14 production_companies 9836 non-null object
15 release_date 10866 non-null object
16 vote_count 10866 non-null int64
17 vote_average 10866 non-null float64
18 release_year 10866 non-null int64
19 budget_adj 10866 non-null float64
20 revenue_adj 10866 non-null float64
dtypes: float64(4), int64(6), object(11)
memory usage: 1.7+ MB

Data Preprocessing
Observation From The Dataset

localhost:8888/notebooks/Downloads/COM 428.ipynb# 3/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

The columns 'budget', 'revenue', 'budget_adj', 'revenue_adj' has not been

given.
But for this dataset we will assume the currency is in US dollor.
The dataset contain lots of movies where the budget or revenue have a
value of '0'.
The dataset has got many missing values.

Data Cleaning

We need to remove duplicate rows from the dataset

Changing format of release date into datetime format
Remove the unused colums that are not needed during the data mining
process.
Replace the missing values.

1. Removal of Duplicate values

In [4]:  #total duplicates in the dataset

sum(TMD.duplicated())

Out[4]: 1

In [5]:  #dropping of the duplicated

TMD.drop_duplicates(subset=None, keep='first', inplace=True)

2. Release Date Format

In [6]:  #changing from object to datetime format

TMD['release_date'] = pd.to_datetime(TMD['release_date'])

3. Removal of Unrequired Columns

In [7]:  #removing of the columns not necessary during dataming

TMD.drop(['budget_adj','revenue_adj','overview','imdb_id','homepage','tagl

4. Missing values

localhost:8888/notebooks/Downloads/COM 428.ipynb# 4/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

In [8]:  #replacing with zero

TMDB=TMD.fillna(0)
TMDB.info()

Data columns (total 15 columns):

# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 10865 non-null int64
1 popularity 10865 non-null float64
2 budget 10865 non-null int64
3 revenue 10865 non-null int64
4 original_title 10865 non-null object
5 cast 10865 non-null object
6 director 10865 non-null object
7 keywords 10865 non-null object
8 runtime 10865 non-null int64
9 genres 10865 non-null object
10 production_companies 10865 non-null object
11 release_date 10865 non-null datetime64[ns]
12 vote_count 10865 non-null int64
13 vote_average 10865 non-null float64
14 release_year 10865 non-null int64
dtypes: datetime64[ns](1), float64(2), int64(6), object(6)
memory usage: 1.3+ MB

Explanatory Data Analysis

Descriptive Statistics

In [9]:  TMDB.describe()

Out[9]: id popularity budget revenue runtime vote_coun

count 10865.000000 10865.000000 1.086500e+04 1.086500e+04 10865.000000 10865.00000

mean 66066.374413 0.646446 1.462429e+07 3.982690e+07 102.071790 217.39963

std 92134.091971 1.000231 3.091428e+07 1.170083e+08 31.382701 575.64462

min 5.000000 0.000065 0.000000e+00 0.000000e+00 0.000000 10.00000

25% 10596.000000 0.207575 0.000000e+00 0.000000e+00 90.000000 17.00000

50% 20662.000000 0.383831 0.000000e+00 0.000000e+00 99.000000 38.00000

75% 75612.000000 0.713857 1.500000e+07 2.400000e+07 111.000000 146.00000

max 417859.000000 32.985763 4.250000e+08 2.781506e+09 900.000000 9767.00000

Univariate Analysis

- Highest And Lowest Movie Budget

localhost:8888/notebooks/Downloads/COM 428.ipynb# 5/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

In [10]:  def find_minmax(x):

#use the function 'idmin' to find the index of lowest profit movie.
min_index = TMDB[x].idxmin()
#use the function 'idmax' to find the index of Highest profit movie.
high_index = TMDB[x].idxmax()
high = pd.DataFrame(TMDB.loc[high_index,:])
low = pd.DataFrame(TMDB.loc[min_index,:])
#print the movie with high and low budget
print("Movie Which Has Highest "+ x + " : ",TMDB['original_title'][hig
print("Movie Which Has Lowest "+ x + " : ",TMDB['original_title'][min
return pd.concat([high,low],axis = 1)
#information of the budget
TMDB['budget'] = TMDB['budget'].replace(0,np.NAN)
find_minmax('budget')

Movie Which Has Highest budget : The Warrior's Way

Movie Which Has Lowest budget : Fear Clinic

Out[10]: 2244 1151

id 46528 287524

popularity 0.25054 0.177102

budget 425000000.0 1.0

revenue 11087569 0

original_title The Warrior's Way Fear Clinic

director Sngmoo Lee Robert Hall

runtime 100 95

genres Adventure|Fantasy|Action|Western|Thriller Horror

Dry County Films|Anchor

production_companies Boram Entertainment Inc.
Bay Entertainment|Movi...

release_date 2010-12-02 00:00:00 2014-10-31 00:00:00

vote_count 74 15

vote_average 6.4 4.1

release_year 2010 2014

- Highest And Lowest Movie Revenue

localhost:8888/notebooks/Downloads/COM 428.ipynb# 6/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

In [11]:  find_minmax('revenue')

Movie Which Has Highest revenue : Avatar

Movie Which Has Lowest revenue : Wild Card

Out[11]: 1386 48

id 19995 265208

popularity 9.432768 2.93234

budget 237000000.0 30000000.0

revenue 2781505847 0

original_title Avatar Wild Card

Sam Worthington|Zoe Jason Statham|Michael

cast
Saldana|Sigourney Weaver|S... Angarano|Milo Ventimigli...

director James Cameron Simon West

culture clash|future|space war|space

keywords gambling|bodyguard|remake
colony|so...

runtime 162 92

-A year with the highest release of movies

In [19]:  # make group for each year and count the number of movies in each year
year=TMDB.groupby('release_year').count()['id']

#make group of the data according to their release year and count the tota
TMDB.groupby('release_year').count()['id'].plot(xticks = np.arange(1950,20

#set the figure size and labels
sb.set(rc={'figure.figsize':(12,6)})
plt.title("Year Vs Number Of Movies",fontsize = 16)
plt.xlabel('Release year',fontsize = 14)
plt.ylabel('Number Of Movies',fontsize = 14)

Out[19]: Text(0, 0.5, 'Number Of Movies')

localhost:8888/notebooks/Downloads/COM 428.ipynb# 7/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

2014 has got the highest release of movies from the above dataset.

-Genre with the highest release of movies

In [13]:  def count_genre(x):

# convert the column to string data type
TMDB[x] = TMDB[x].astype(str)

# concatenate all the rows of the genres

data_plot = TMDB[x].str.cat(sep='|')
data = pd.Series(data_plot.split('|'))

# count each of the genres and return

info = data.value_counts(ascending=False)
return info

# call the function for counting the movies of each genre
total_genre_movies = count_genre('genres')

# plot a 'barh' plot using plot function for 'genre vs number of movies'
total_genre_movies.plot(kind='barh', figsize=(15, 7), fontsize=12)

# setup the title and the labels of the plot
plt.title("Genre With Highest Release", fontsize=16)
plt.xlabel('Number Of Movies', fontsize=14)
plt.ylabel("Genres", fontsize=14)

Out[13]: Text(0, 0.5, 'Genres')

Drama is the most favoured among the genres in most movie release.

localhost:8888/notebooks/Downloads/COM 428.ipynb# 8/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

In [14]:  i = 0
genre_count = []
for genre in total_genre_movies.index:
genre_count.append([genre, total_genre_movies[i]])
i = i+1

plt.rc('font', weight='bold')
f, ax = plt.subplots(figsize=(8, 8))
genre_count.sort(key = lambda x:x[1], reverse = True)
labels, sizes = zip(*genre_count)
labels_selected = [n if v > sum(sizes) * 0.01 else '' for n, v in genre_co
ax.pie(sizes, labels=labels_selected,
autopct = lambda x:'{:2.0f}%'.format(x) if x > 1 else '',
shadow=False, startangle=0)
ax.axis('equal')
plt.tight_layout()

localhost:8888/notebooks/Downloads/COM 428.ipynb# 9/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

From the chart, we can see that Drama is the most common movie genre,
with the highest count of movies. Comedy and Action are also popular
genres with a relatively high count of movies. Other genres such as Horror
and War have a relatively lower count of movies.

Bivariate Analysis

- Movies with higher budgets,higher ratings and popularity tend to have higher
revenues.

In [15]:  correlation=TMDB.corr()
correlation

Out[15]: id popularity budget revenue runtime vote_count vote_averag

id 1.000000 -0.014351 -0.075766 -0.099235 -0.088368 -0.035555 -0.05839

popularity -0.014351 1.000000 0.479961 0.663360 0.139032 0.800828 0.2095

budget -0.075766 0.479961 1.000000 0.700162 0.265575 0.580050 0.0920

revenue -0.099235 0.663360 0.700162 1.000000 0.162830 0.791174 0.17254

runtime -0.088368 0.139032 0.265575 0.162830 1.000000 0.163273 0.1568

vote_count -0.035555 0.800828 0.580050 0.791174 0.163273 1.000000 0.2538

vote_average -0.058391 0.209517 0.092014 0.172541 0.156813 0.253818 1.00000

release_year 0.511393 0.089806 0.215402 0.057070 -0.117187 0.107962 -0.11757

localhost:8888/notebooks/Downloads/COM 428.ipynb# 10/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

In [16]:  plt.figure(figsize=(12,8))
sb.heatmap(data=correlation, annot = True, fmt = '.3f',
cmap = 'vlag_r', center = 0)
plt.yticks(rotation = 45)
plt.show()

Positive Correlation

popularity and budget

popularity and revenue
user rating nad popularity

Negative correlation

runtime and release year

- How the revenue and popularity differs budget and runtime and how does popularity
depends on profit?.

localhost:8888/notebooks/Downloads/COM 428.ipynb# 11/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

In [17]:  ax = sb.regplot(x=TMDB['revenue'], y=TMDB['budget'])

#set the title and labels of the figure
ax.set_title("Revenue Vs Budget",fontsize=16)
ax.set_xlabel("Revenue",fontsize=14)
ax.set_ylabel("Budget",fontsize=14)
#set the figure size
sb.set(rc={'figure.figsize':(12,6)})

-Which length movies most liked by the audiences according to their popularity?

In [18]:  TMDB.groupby('runtime')['popularity'].mean().plot(figsize = (13,5),xticks=

plt.title("Runtime Vs Popularity",fontsize = 16)
plt.xlabel('Runtime',fontsize = 14)
plt.ylabel('Average Popularity',fontsize = 14)
sb.set(rc={'figure.figsize':(16,6)})

localhost:8888/notebooks/Downloads/COM 428.ipynb# 12/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

Most of the people prefer watching a movie with Aa runtime of between

100 and 200 compared to a runtime outside this boundary

In [31]:  from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Convert cast and genres columns to string
TMDB_model[['cast','genres']] = TMDB_model[['cast','genres']].astype(str)

# Encode categorical variables
le = LabelEncoder()
TMDB_model['cast'] = le.fit_transform(TMDB_model['cast'])
TMDB_model['genres'] = le.fit_transform(TMDB_model['genres'])

# Prepare data for modeling
X = TMDB_model[['popularity', 'budget', 'cast', 'genres']]
y = TMDB_model['revenue']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, r

# Create and fit the model
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluate the model
score = model.score(X_test, y_test)

localhost:8888/notebooks/Downloads/COM 428.ipynb# 13/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

In [32]:  import statsmodels.api as sm

# Add constant to the independent variables
X_train = sm.add_constant(X_train)

# Fit the model
model = sm.OLS(y_train, X_train).fit()

# Print the summary
print(model.summary())

localhost:8888/notebooks/Downloads/COM 428.ipynb# 14/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

OLS Regression Results

=========================================================================
=====
Dep. Variable: revenue R-squared:
0.629
Model: OLS Adj. R-squared:
0.628
Method: Least Squares F-statistic:
1749.
Date: Mon, 03 Apr 2023 Prob (F-statistic):
0.00
Time: 18:35:54 Log-Likelihood: -8
2016.
No. Observations: 4135 AIC: 1.64
0e+05
Df Residuals: 4130 BIC: 1.64
1e+05
Df Model: 4
Covariance Type: nonrobust
=========================================================================
=====
coef std err t P>|t| [0.025
0.975]
-------------------------------------------------------------------------
-----
const -3.986e+07 4.69e+06 -8.496 0.000 -4.91e+07 -3.0
7e+07
popularity 4.807e+07 1.32e+06 36.331 0.000 4.55e+07 5.0
7e+07
budget 2.1768 0.047 46.708 0.000 2.085
2.268
cast 2106.5788 1041.907 2.022 0.043 63.881 414
9.277
genres 1398.0277 4863.308 0.287 0.774 -8136.675 1.0
9e+04
=========================================================================
=====
Omnibus: 3436.760 Durbin-Watson:
2.037
Prob(Omnibus): 0.000 Jarque-Bera (JB): 50374
0.129
Skew: 3.248 Prob(JB):
0.00
Kurtosis: 56.680 Cond. No. 1.5
0e+08
=========================================================================
=====

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is co
rrectly specified.
[2] The condition number is large, 1.5e+08. This might indicate that ther
e are
strong multicollinearity or other numerical problems.

localhost:8888/notebooks/Downloads/COM 428.ipynb# 15/16

4/3/23, 10:04 PM COM 428 - Jupyter Notebook

Model Intepretation and Conclusion

The overall model is statistically significant with a p-value less than 0.05,
indicating that it fits the data well.
The R-squared value of 0.629 suggests that 62.9% of the variance in
revenue is explained by the independent variables.
The popularity and budget variables have a statistically significant positive
effect on revenue, indicating that movies with higher popularity and budget
tend to generate higher revenues.
The cast variable has a statistically significant positive effect on revenue,
indicating that movies with more popular and higher-paid actors tend to
generate higher revenues.
The genres variable does not have a statistically significant effect on
revenue, as its p-value is greater than 0.05.
The condition number of 1.5e+08 suggests that there may be strong
multicollinearity or other numerical problems in the model.
The skewness value of 3.248 indicates a significant positive skewness in
the distribution of the residuals.

localhost:8888/notebooks/Downloads/COM 428.ipynb# 16/16

Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
20 pages
IMDB Movie Analysis 05 Project
No ratings yet
IMDB Movie Analysis 05 Project
7 pages
Netflix Case
0% (1)
Netflix Case
19 pages
Movie Notebook
No ratings yet
Movie Notebook
91 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
80 pages
Final Project1 IMDB Movie Analysis PDF
No ratings yet
Final Project1 IMDB Movie Analysis PDF
9 pages
Moviesuggester - Jupyter Notebook
No ratings yet
Moviesuggester - Jupyter Notebook
11 pages
PMT2 24
No ratings yet
PMT2 24
56 pages
Panda 3
No ratings yet
Panda 3
11 pages
Recommendation System 1696663388
No ratings yet
Recommendation System 1696663388
29 pages
IMDB Movie Analysis: by Biswajeet Nayak
No ratings yet
IMDB Movie Analysis: by Biswajeet Nayak
23 pages
Earned Value Analysis Example - INF3708
100% (2)
Earned Value Analysis Example - INF3708
3 pages
Document From Gr7
No ratings yet
Document From Gr7
29 pages
Anurag Chaturvedi Netflix - Jupyter - Notebook Case Study
No ratings yet
Anurag Chaturvedi Netflix - Jupyter - Notebook Case Study
27 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
Exercise - Pandas
No ratings yet
Exercise - Pandas
52 pages
Recomendacao de Filmes Chatbot
No ratings yet
Recomendacao de Filmes Chatbot
24 pages
Pandas Data Frame For Beginners
No ratings yet
Pandas Data Frame For Beginners
25 pages
Investigate A Dataset
No ratings yet
Investigate A Dataset
14 pages
Movie Recommender
No ratings yet
Movie Recommender
19 pages
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
No ratings yet
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
14 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
23 pages
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
No ratings yet
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
27 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
15 Pandas Function For 90 - of The Work
No ratings yet
15 Pandas Function For 90 - of The Work
12 pages
Walmart Business Case - Updated
No ratings yet
Walmart Business Case - Updated
47 pages
Import As Import As Import As Import Import As From Import: 'Ggplot'
No ratings yet
Import As Import As Import As Import Import As From Import: 'Ggplot'
13 pages
Amazon Prime Analysis
No ratings yet
Amazon Prime Analysis
10 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
Panda 4
No ratings yet
Panda 4
11 pages
Swati Mam The - Iscale Movies Project Code
No ratings yet
Swati Mam The - Iscale Movies Project Code
13 pages
IMDb+Movie+Assignment Stub
No ratings yet
IMDb+Movie+Assignment Stub
9 pages
Survival - Notes (Lecture 3)
100% (1)
Survival - Notes (Lecture 3)
23 pages
Final Project
No ratings yet
Final Project
7 pages
Group 15 Report
No ratings yet
Group 15 Report
23 pages
IMDB Analysis
No ratings yet
IMDB Analysis
4 pages
Report
No ratings yet
Report
26 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
Movie Data Analysis Netflix
No ratings yet
Movie Data Analysis Netflix
16 pages
IMDB Movie Analysis1
No ratings yet
IMDB Movie Analysis1
14 pages
Business Intelligence Project Report
No ratings yet
Business Intelligence Project Report
14 pages
Ads - Phase 5
No ratings yet
Ads - Phase 5
14 pages
IMDB Movie Analysis Project Report
No ratings yet
IMDB Movie Analysis Project Report
8 pages
Data Description For Data Mining
No ratings yet
Data Description For Data Mining
7 pages
Project 5
No ratings yet
Project 5
5 pages
Recommendation Engine 1657857468
No ratings yet
Recommendation Engine 1657857468
15 pages
Movies Final Report
No ratings yet
Movies Final Report
22 pages
Pandas - Colab
No ratings yet
Pandas - Colab
7 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
DLL Week 1.2 - Stat and Proba Q3
100% (1)
DLL Week 1.2 - Stat and Proba Q3
8 pages
Imdb
No ratings yet
Imdb
11 pages
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
No ratings yet
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
9 pages
Ecommerce Purchases Exercise - Jupyter Notebook
No ratings yet
Ecommerce Purchases Exercise - Jupyter Notebook
2 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Python Project
No ratings yet
Python Project
1 page
Acs 411 Lecture 9 Notes
No ratings yet
Acs 411 Lecture 9 Notes
15 pages
Acs 411 Sample Paper
No ratings yet
Acs 411 Sample Paper
5 pages
8614 Solved Paper
No ratings yet
8614 Solved Paper
9 pages
Fionaw Linear Algebra Math 232
No ratings yet
Fionaw Linear Algebra Math 232
4 pages
Classification
No ratings yet
Classification
58 pages
ICT 34 Data Structures and Analysis of Algorithm
No ratings yet
ICT 34 Data Structures and Analysis of Algorithm
9 pages
Forecasting Infant Mortality - Docx..20230129113759802
No ratings yet
Forecasting Infant Mortality - Docx..20230129113759802
26 pages
Survival - Notes (Lecture 4)
No ratings yet
Survival - Notes (Lecture 4)
29 pages
Association Analysis
No ratings yet
Association Analysis
3 pages
Mgeb12 23S T21
No ratings yet
Mgeb12 23S T21
22 pages
Survival - Notes (Lecture 6)
No ratings yet
Survival - Notes (Lecture 6)
27 pages
Distribution in Statistics
No ratings yet
Distribution in Statistics
49 pages
Malhotra 09 - Essentials 1E
No ratings yet
Malhotra 09 - Essentials 1E
53 pages
Acs 411 Lecture 10notes
No ratings yet
Acs 411 Lecture 10notes
14 pages
Chapter 7: Sampling and Sampling Distributions Cheat Sheet: by Via
No ratings yet
Chapter 7: Sampling and Sampling Distributions Cheat Sheet: by Via
1 page
L5 Research-Methods
No ratings yet
L5 Research-Methods
43 pages
Multicollinearity Autocorrelation
No ratings yet
Multicollinearity Autocorrelation
28 pages
Difference Between Hashgraph and Blockchain - Simplilearn
No ratings yet
Difference Between Hashgraph and Blockchain - Simplilearn
10 pages
Monday ACS 412 DF2 COM 411 DG1
No ratings yet
Monday ACS 412 DF2 COM 411 DG1
1 page
Karatina University: University Examinations 2018/2019 ACADEMIC YEAR
No ratings yet
Karatina University: University Examinations 2018/2019 ACADEMIC YEAR
4 pages
IFoA Foundation Grant Application Form.
No ratings yet
IFoA Foundation Grant Application Form.
4 pages
MATH140 Final Cheat Sheet: by Via
No ratings yet
MATH140 Final Cheat Sheet: by Via
1 page
7SSMM700 Lecture 5
No ratings yet
7SSMM700 Lecture 5
105 pages
Statistics For Management MCQs and Terminal Questions From All Units
No ratings yet
Statistics For Management MCQs and Terminal Questions From All Units
22 pages
Customer Satisfaction
No ratings yet
Customer Satisfaction
39 pages
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
No ratings yet
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
158 pages
Chapter 2 FECON
No ratings yet
Chapter 2 FECON
84 pages
Basic Statistics and Probability
No ratings yet
Basic Statistics and Probability
49 pages
Simulating Chi-Square Test Using Excel
No ratings yet
Simulating Chi-Square Test Using Excel
9 pages
Hypothesis Testing HUMSS 6
No ratings yet
Hypothesis Testing HUMSS 6
58 pages
Finance Exam 2 Cheat Sheet: by Via
No ratings yet
Finance Exam 2 Cheat Sheet: by Via
2 pages
Attachment
No ratings yet
Attachment
1 page
Lecture Material 2.5 - Bayesian Estimation & Concepts
No ratings yet
Lecture Material 2.5 - Bayesian Estimation & Concepts
12 pages
Sampling Techniques For Data Preprocessing
No ratings yet
Sampling Techniques For Data Preprocessing
22 pages
R Programming Cheat Sheet: by Via
No ratings yet
R Programming Cheat Sheet: by Via
2 pages
Linear Algebra Using Sympy Cheat Sheet: by Via
No ratings yet
Linear Algebra Using Sympy Cheat Sheet: by Via
2 pages
2324 Level N (Gr12 UAE - Gulf) AP Statistics Course Questions Solutions
No ratings yet
2324 Level N (Gr12 UAE - Gulf) AP Statistics Course Questions Solutions
25 pages
Scikit Learn
No ratings yet
Scikit Learn
10 pages
Empirical Distribution Function & Exploratory Data Analysis: Vijay Kumar
No ratings yet
Empirical Distribution Function & Exploratory Data Analysis: Vijay Kumar
22 pages
1q3b8AXWiBQ80Aki yDW-q qNGhtwoVV
No ratings yet
1q3b8AXWiBQ80Aki yDW-q qNGhtwoVV
8 pages
Comparative Performance of Three Perennial
No ratings yet
Comparative Performance of Three Perennial
14 pages
Screenshot 2024-01-31 at 6.54.16 PM
No ratings yet
Screenshot 2024-01-31 at 6.54.16 PM
8 pages
PSYC206 Mid-Semester Exam
No ratings yet
PSYC206 Mid-Semester Exam
7 pages
RPS FIS622104 Statistika Sosial
No ratings yet
RPS FIS622104 Statistika Sosial
7 pages
Stat - Quiz#1
No ratings yet
Stat - Quiz#1
4 pages
Joint Probability Mass Function
No ratings yet
Joint Probability Mass Function
6 pages
Scholastic Travel Retention
No ratings yet
Scholastic Travel Retention
5 pages
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Unleashing the Power of Data: Innovative Data Mining with Python
From Everand
Unleashing the Power of Data: Innovative Data Mining with Python
Edward Franklin
No ratings yet
Mastering matplotlib
From Everand
Mastering matplotlib
Duncan M. McGreggor
No ratings yet
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet