0% found this document useful (0 votes)

25 views44 pages

Bussiness Report PM

Uploaded by

monikasreee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views44 pages

Bussiness Report PM

Uploaded by

monikasreee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

PREDICTIVE MODELLING

BUSSINESS REPORT

THANUSRI A
14-01-2024
Problem 1

Define the problem and perform exploratory Data Analysis

Problem definition - Check shape, Data types, statistical summary - Univariate analysis -
Multivariate analysis - Use appropriate visualizations to identify the patterns and insights -
Key meaningful observations on individual variables and the relationship between variables

Data Pre-processing

Prepare the data for modelling: - Missing Value Treatment (if needed) - Outlier Detection
(treat, if needed) - Feature Engineering - Encode the data - Train-test split

Model Building - Linear regression

Apply linear Regression using Sklearn - Using Statsmodels Perform checks for significant
variables using the appropriate method - Create multiple models and check the performance
of Predictions on Train and Test sets using Rsquare, RMSE & Adj Rsquare.

Business Insights & Recommendations

Comment on the Linear Regression equation from the final model and impact of relevant
variables (atleast 2) as per the equation - Conclude with the key takeaways (actionable
insights and recommendations) for the business
The top 5 rows of the dataset:

 Note that there are many zeroes in few of the features

The last 5 rows of the dataset:

The number of rows and columns in the dataset:

 The shape of the data is (8192,22)

Dataset summary:

Basic info about the dataset:

 There are a total of 8192 rows and 22 columns in the dataset. Out of 22, 13 are float
8 are integer type and 1 object type variable.
Dataset null value check:

 There are missing values present in ‘rchar’ , ‘wchar’.

Let us treat them using median value

There are no duplicate rows present:

 There are many 0 values present in few of the variables
 Let us check the number of zeroes in each feature and remove accordingly if more
than 50 percent of the values are 0’s
 The following features have more than 50 percent of 0’s in the variables
'pgout','ppgout','pgfree','pgscan','atch'
 So we drop these 5 columns from the dataset.
 For rest of the features, let us treat all the zeroes with median values

The dataset after removing the required features and computing median values for the rest
of the features looks like:
 Now we create a dataframe that contains only the integer and float type variables
and try printing the boxplots for these features.

 There are outliers present in the data, these needs to be treated

 There are many methods in which outliers can be treated
 We choose IQR method to treat them
 So, we treat them using IQR method. In this method, any observation that is less
than Q1- 1.5 IQR or more than Q3 + 1.5 IQR is considered an outlier.
After outlier treatment :

 Outliers have been successfully treated from the dataset now.

UNIVARIATE ANALYSIS:
we plot the histograms for the different feature present in the dataset.
Bivariate analysis:
The scatteplots between the dependent variable and different independent variable is as
follows :

.
A bat plot between cpu run in usr mode vs freeswap

A bat plot between cpu run in usr mode vs freemem

Multivariate analysis:

Scatterplot ‘lread’ and ‘lwrite’ seperated by ‘runqsz’

Scatterplot ‘sread’ and ‘swrite’ seperated by ‘runqsz’

Scatterplot between ‘exec’ and ‘fork’ seperated by ‘runqsz’

Scatterplot between ‘vflt’ and ‘pflt’ seperated by ‘runqsz’

Checking for correlation:

Correlation between variables”:

Pairplot is as follows:

 Pair plot shows the relationship between the variables in the form of scatterplot and
the distribution of the variable in the form of histogram .
 As the given data set contains number of columns the pair plot is looking a little
messy.
 In some plots, we will be able to see possitive correlation, some having negative
correlation and some having no correlation
 Now we convert the categorical variable ‘runqsz’ into numerical by encoding using
dummy variable:
The first 5 rows of the dataset now :

 Now we make a copy of the dataset present at this moment so that it is feasible for
various things.
 Now we separate the dataset given into X independent variables and Y dependent
variable.

SPLITTING THE DATASET INTO TRAINING AND TESTING DATASET

 Let us create the x and y variable data with respect to ‘usr’ column as the target
variable. Now x having every data except the target variable and y having only the
targetvariable .
 We split the independent variables X into two parts, one for training X_train and one
for testing X_test
 we split the dependent variable Y into two parts, one for training Y_train and one for
the testing Y_test
 Using stats model api as SM to intercept the X variable.
 Using sklearn to split the data into x_train and y_train

X data frame :

Y data frame :
The coefficients are:

 The intercept of the model is

 R square on training data:

 R square on testing data:

 RMSE on training data:

 RMSE on testing data:

LINEAR REGRESSION USING STATS MODELS :
As the Train and the test data split up we can process with creating the linearmodel. Now for
creating the OLS model, we can use the .ols from stats model api package.And Fit the data
with x_train and y_train.

The model summary is :

 The R-square value tells that the model can explain 76.6 % of the variance in the
training set

 Adjusted R-square also nearly to the R-square,76.6%.

RMSE on train data:

RMSE on test data:

Scatterplot between the actual y value and predicted y value:

The following table shows the comparision between the actual and predicted values and
their difference i.e residual

Graph between predicted and residual values

The residual density graph:

The final linear model equation of the data is :

From the above linear equation it can be predcited that ,

 There are many negative coefficients present in the linear equation.

 Except ‘fork’, ‘freemem’ all coefficients are decrease when implies.

 When ‘fork - Number of system fork calls per second’ is increased by a unit then the
‘usr’ value increases by 33 % and also ‘Number of system exec calls per second’ is
increased by a unit then the ‘usr’ gets decresed by 38.9 %
Problem 2

Define the problem and perform exploratory Data Analysis

Data Pre-processing

Prepare the data for modelling: - Missing value Treatment (if needed) - Outlier
Detection(treat, if needed) - Feature Engineering (if needed) - Encode the data - Train-test
split

Model Building and Compare the Performance of the Models

Build a Logistic Regression model - Build a Linear Discriminant Analysis model - Build a
CART model - Prune the CART model by finding the best hyperparameters using
GridSearch - Check the performance of the models across train and test set using different
metrics - Compare the performance of all the models built and choose the best one with
proper rationale

Business Insights & Recommendations

Comment on the importance of features based on the best model - Conclude with the key
takeaways (actionable insights and recommendations) for the business.
Top five rows of the dataset:

Last five rows of the dataset:

Shape of the data:

Dataset summary:
Basic info about the dataset:

 There are 2 features with float datatype , 1 feature with integer datatype , 7 features
with object datatype.

Null value check:

 There are null values present in the ‘Wife_age’ feature and ‘No_of_children_born’
feature.
 Lets treat the null values by imputing null calue with median value of that particular
feature.
 There are 80 duplicate rows present in the dataset.

 Lets drop all the duplicate rows present in the dataset.

 Now the shape of the data is

UNIVARIATE ANALYSIS:
BIVARIATE ANALYSIS:

The above plot shows the relation between different age group of women and contraceptive
method used.
From the below plot we may note that, tertiary educated women use the most contraceptive
methods

The below plot shows that wives with highest husband’s education have used the most
contraceptive methods.
The above plot shows that the non working women use the most contraceptive methods.

From the above plot, Women with very high standard living index use the most contraceptive
measures.
Correlation between variables:

We have noticed that there are only three features in integer form which can be plotted,
Now we convert all the objects to categorical codes

Encoded the categorical variables Wife_ education, Husband_education,

Wife_religion,Standard_of_living_index, Media_exposure and Contraceptive_method_used
in the ascending orderfrom worst to best since LDA does not take string variables as
parameters into model building.
Below isthe encoding for ordinal values:
Wife_ education: Uneducated = 1, Primary = 2, Secondary = 3, Tertiary = 4.
Husband_education: Uneducated = 1, Primary = 2, Secondary = 3, Tertiary = 4.
Wife_religion: Scientology = 1 and non-Scientology = 2.
Wife_Working: Yes = 1 and No = 2.
Standard_of_living _index: Very Low = 1, Low = 2, High = 3, Very High = 4.
Media_exposure: Exposed = 1 and Not-Exposed = 2.
Contraceptive_method_used: Yes = 1 and No = 0
The first rows of the dataset:

Info :

Outliers are as follows:

There are outliers present in the data
We treat them using IQR method

Pairplot:
Correlation between variables :

LOGISTIC REGRESSION

Train and Test Split:Let us create the x and y variable data with respect
to ‘'Contraceptive_method_used'’column as the target variable. Now x having every data
except the target variable and yhaving only the target variable .

 Before we proceed the process , we need to import the required libraries or checking
it. In this encoding for ‘'Contraceptive_method_used'’ 1 as yes and 0 as No.

 We use Label Encoder form sklearn library to encode the data if we havent encoded
the data previously.

 The encoding is for creating the dummy variables .

 Now the Train set and the test set has been spitted up by using the
sklearnmodel.Using logistic regression model method to fit the data and creating a
logistic model.

 The proportion of 1s and 0s i.e.(Customers using Contraceptive_method_used

Yes /No) as follows,
 Now we need to fit the Logistic regression model by using newton cg as solver, 1000
as maximum iteration, then we get the predicted data frame model as,

In the above data frame, we will be able to see that 1 is having the highest accuracy 69.43 %
The model accuracy is 67.3 %

AUC and ROC curve

Now we can plot plot the AUC and ROC curve of the model and get the separatecurve and
auc score of Train dataset and test dataset.

AUC curve for the train data:

In this curve, if the plot occurs below the do ed lines, then it accept as worst modelever,
Eventhough the curve is not perfect but the curve is OK ,the AUC(Area underthe curve) of
the train data model is 71.8%.
AUC curve for the test data:

 This curve is similar to the train data curve AUC but slightly vary in the initial
locations.

 the curve is ok as plotted above the dotted lines.

 The Area under curve is same as the train data as 71.8%.

 For the comparision the train data AUC with the test data AUC, mostly both curve
issimilar with some variation only as the AUC of both is same as 71.8.10%. Lets
move to theconfusion matrix,
Confusion matrix for train data:

This plot shows the realtionship between the true label and predicted label as 0’s and 1’s
Classification report is as follows

For Contraceptive_method_used (Label 0 ):

 Precision (66%) – 66% of married women predicted are actually not using
Contraceptive method out of all married women predicted to not using Contraceptive
method.

 Recall (53%) – Out of all the married women not using Contraceptive method , 53%of
married women have been predicted correctly .

For Contraceptive_method_used (Label 1 ):

 Precision (68%) – 68% of married women predicted are actually using Contraceptive
method out of all married women predicted to be using Contraceptive method .

 Recall (79%) – Out of all the married women actually using contraceptive method
79% of married women have been predicted correctly .

 And the Accuracy is 67% which is more than 50%, so the model is Good.
Confusion matrix for test data:

This plot shows the relationship between the true labels and predicted labels as 0’s and 1’s
And the classification report is as follows

For Contraceptive_method_used (Label 0 ):

 Precision (64%) – 64% of married women predicted are actually not using
Contraceptive method out of all married women predicted to not using Contraceptive
method.

 Recall (46%) – Out of all the married women not using Contraceptive method , 46%of
married women have been predicted correctly .

For Contraceptive_method_used (Label 1 ):

 Precision (65%) – 65% of married women predicted are actually using Contraceptive
method out of all married women predicted to be using Contraceptive method .

 Recall (79%) – Out of all the married women actually using contraceptive method
,79% of married women have been predicted correctly .

 And the Accuracy is 65% which is more than 50%, so the model is also Good
asTraining Data.
Grid search :

By using the grid search CV from sklearn model to get predict the best model. Theprocess is
same as the above and we get,

For train data

As the above method, here also we get similar values and accuracy is still 67 %
For test data

As the above method, here also we get similar values and accuracy is still 65 %

 Overall accuracy of the model – 67 % of total predictions are correct

 Accuracy, AUC, Precision and Recall for test data is almost inline with training
data.This proves no overfitting or underfitting has happened, and overall the model is
agood model for classification
LINEAR DISCRIMINANT ANALYSIS

Train and Test Split:

The procedure is same as the above Logistics regression for splitting the Train and test data.
Need to import the LDA(Linear Discriminant analysis) from the sklearn library and the results
is as follows,

There is some slight difference with the Training and the test data reports, but its ok as the
Accuracy of train data is as 67% and the accuracy for the test data is as 65%
CART

In CART we can use the dataset with outliers as its not sensitive with outliers.
Train and Test Split:
The Same procedure as the above Logis c regression and the LDA, Train and testdata
need to be splitted, and before that the necessary libraries need to be imported.
In cart , the decision tree is the most important,

Decision tree:

 Fit the train and test data into decision tree. We need to create in new worddocument
and saved in Project folder.

 Now we can copy and paste the code in http://webgraphviz.com/. For checking
thedecision tree we can delete the existing codes and paste it there.

 The tree will be little messy as the data contains vast information or
classifications,sowe will reduce the max.leaf , max.depth of the tree and the min. sample size.

 Here “GINI” ,a decision tree classifier plays the important role. And creating
a newword document with reduced branches as 30, leaf is 10 and depth is 7
and saved thedocument in project folder.

 Now decision tree is looking better than before

Now Let us check the feature Importance, where Feature importance refers totechniques
that assign a score to input features based on how useful they are at predic nga target
variable.

As we see ,depend upon the ‘wife_age’ having more importance, we can slightly predict that
the contraceptive method can be used depend upon the age factors of women.
AUC plot:

As we see the AUC curve bending high , the model will be good and its
AUCvalue for train data is 82.4%

Here the plot is not quite smooth , but over the area its keeping up the bend formation and
its AUC value for test data is 70.0%
Confusion matrix for train data:

By checking up the confusion matrix of the train data, we can get the value of True Positive
as 260 and the True Negative as 474.

For Contraceptive_method_used (Label 0 ):

 Precision (77%) – 77% of married women predicted are actually not using
Contraceptive method out of all married women predicted to not using Contraceptive
method.

 Recall (62%) – Out of all the married women not using Contraceptive method , 62%of
married women have been predicted correctly .

For Contraceptive_method_used (Label 1 ):

 Precision (75%) – 75% of married women predicted are actually using Contraceptive
method out of all married women predicted to be using Contraceptive method .

 Recall (86%) – Out of all the married women actually using contraceptive method
,86% of married women have been predicted correctly .

 And the Accuracy is 75% which is more than 50%, so the model is also Good as
Training Data.
Confusion matrix for test data:

By checking up the confusion matrix of the train data, we can get the value of True Positive
as 91 and the True Negative as 182.

For Contraceptive_method_used (Label 0 ):

 Precision (67%) – 67% of married women predicted are actually not using
Contraceptive method out of all married women predicted to not using Contraceptive
method.

 Recall (47%) – Out of all the married women not using Contraceptive method , 47%of
married women have been predicted correctly .

For Contraceptive_method_used (Label 1 ):

 Precision (64%) – 64% of married women predicted are actually using Contraceptive
method out of all married women predicted to be using Contraceptive method .

 Recall (81%) – Out of all the married women actually using contraceptive method
,81% of married women have been predicted correctly .

 And the Accuracy is 65% which is more than 50%, so the model is also Good as
Training Data.

CONCLUSION

 From these above models , in Every models the Encoded label ‘1’(conceptive
method used) predicted as high and the Accuracy and the F1 score of the models
also favourfor the label ’1’.

 But we can’t conclude that the contraceptive method used or not , but we canpredict
that the married women used the Contraceptive method as prediction and the final
prediction also showing the same things only.

Problem 1: Linear Regression
54% (13)
Problem 1: Linear Regression
14 pages
Personality Psychology 6th Edition Larsen Test Bank 1
100% (70)
Personality Psychology 6th Edition Larsen Test Bank 1
18 pages
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
100% (4)
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
19 pages
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
Assignment Report - Predictive Modelling - Rahul Dubey
No ratings yet
Assignment Report - Predictive Modelling - Rahul Dubey
18 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Business Report PM Suchita Bhovar March 10 2024
No ratings yet
Business Report PM Suchita Bhovar March 10 2024
27 pages
Arpita - Sarkar - Business - Report - 17th December, 2023
No ratings yet
Arpita - Sarkar - Business - Report - 17th December, 2023
23 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Predictive Modelling Alternate Project Business Case
No ratings yet
Predictive Modelling Alternate Project Business Case
47 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Sukanya December Predictive Modeling 14th Jan 2024
No ratings yet
Sukanya December Predictive Modeling 14th Jan 2024
50 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Devidutta Predictive Modeling PDF
No ratings yet
Devidutta Predictive Modeling PDF
25 pages
Linear Regression Datascience Basit PDF
No ratings yet
Linear Regression Datascience Basit PDF
19 pages
Chapter 02 Overview (R)
No ratings yet
Chapter 02 Overview (R)
43 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
DA Programs
No ratings yet
DA Programs
44 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Predictive - Modelling - Project - PDF 1
No ratings yet
Predictive - Modelling - Project - PDF 1
31 pages
PM Projec2 - SOBAC
No ratings yet
PM Projec2 - SOBAC
38 pages
Predictive Modelling Project
No ratings yet
Predictive Modelling Project
29 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
FRA Report
100% (1)
FRA Report
30 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
SMDM Predictive Modeling Business Report 05.02.2022 PDF
No ratings yet
SMDM Predictive Modeling Business Report 05.02.2022 PDF
38 pages
Analysis and Prediction of House Prices by Linear Regression Model
No ratings yet
Analysis and Prediction of House Prices by Linear Regression Model
91 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
'Yatham Padma' 8 May 2022
No ratings yet
'Yatham Padma' 8 May 2022
82 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Project Report
100% (3)
Project Report
36 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Data Preprocessing
No ratings yet
Data Preprocessing
18 pages
Saurabh
No ratings yet
Saurabh
22 pages
Data Analytics Lab Manual - 250402 - 095326
No ratings yet
Data Analytics Lab Manual - 250402 - 095326
58 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
Pooja Kabadi - Predictive Modelling Project
No ratings yet
Pooja Kabadi - Predictive Modelling Project
70 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Predictive Analytics Group Assignment
No ratings yet
Predictive Analytics Group Assignment
21 pages
Data-Analytics-Manual Lab G.anill Kumar
No ratings yet
Data-Analytics-Manual Lab G.anill Kumar
23 pages
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
100% (2)
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
47 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
Arun 27072021 Predictive Modeling PDF
No ratings yet
Arun 27072021 Predictive Modeling PDF
33 pages
Data Science Project - Flow Graph
No ratings yet
Data Science Project - Flow Graph
7 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
E Monika Sree SQL
No ratings yet
E Monika Sree SQL
7 pages
E Monika Sree
No ratings yet
E Monika Sree
2 pages
E Monika Sree 10-10-2024
No ratings yet
E Monika Sree 10-10-2024
60 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
Bảng Điểm Song Ngữ
No ratings yet
Bảng Điểm Song Ngữ
1 page
Assignment - Globalization and Corporate Governance
100% (1)
Assignment - Globalization and Corporate Governance
3 pages
Screenshot 2024-12-08 at 1.30.55 AM
No ratings yet
Screenshot 2024-12-08 at 1.30.55 AM
1 page
Light Intensity Specs
No ratings yet
Light Intensity Specs
3 pages
03 1 of 2 JavaScript - Variables & Data Types
100% (1)
03 1 of 2 JavaScript - Variables & Data Types
14 pages
Account Number Branch Address Account Type: Account Statement For The Period 01/03/2022 To 04/04/2022
No ratings yet
Account Number Branch Address Account Type: Account Statement For The Period 01/03/2022 To 04/04/2022
3 pages
The Lasswell Formula
No ratings yet
The Lasswell Formula
6 pages
FVM Witch OpenFOAM-Matlab
100% (3)
FVM Witch OpenFOAM-Matlab
817 pages
RF Acceptance Document (GP-Huawei) - Modified - 290410
No ratings yet
RF Acceptance Document (GP-Huawei) - Modified - 290410
22 pages
Assignments Water Tank
No ratings yet
Assignments Water Tank
15 pages
Solcon - Istart Digital Soft Starter With Internal Bypass
No ratings yet
Solcon - Istart Digital Soft Starter With Internal Bypass
149 pages
POWER Surviving and Thriving After Narcissistic Abuse by Shahida ArabiThought Catalog
0% (1)
POWER Surviving and Thriving After Narcissistic Abuse by Shahida ArabiThought Catalog
318 pages
Assingment 6 Melisa Small 1-24-629 Good and Bad Luck
No ratings yet
Assingment 6 Melisa Small 1-24-629 Good and Bad Luck
3 pages
EnergyUniversity - by Schneider Electric
100% (1)
EnergyUniversity - by Schneider Electric
2 pages
Lab Activity 1
No ratings yet
Lab Activity 1
10 pages
8-Gost 26344.0-84
No ratings yet
8-Gost 26344.0-84
5 pages
FT450 FT550 Ft550lite FT600
No ratings yet
FT450 FT550 Ft550lite FT600
164 pages
All India Law Entrance Test (Ailet) 2008 Question Paper: Section 1 - English and Reading Comprehension
No ratings yet
All India Law Entrance Test (Ailet) 2008 Question Paper: Section 1 - English and Reading Comprehension
44 pages
Main - Page Integration Services (SSIS) : Transformation Description Examples of When Transformation Would Be Used
No ratings yet
Main - Page Integration Services (SSIS) : Transformation Description Examples of When Transformation Would Be Used
5 pages
NLP Synopsis
No ratings yet
NLP Synopsis
9 pages
12-CDP-2017-2020-Appendix-01 Problem Solution Matrix
100% (1)
12-CDP-2017-2020-Appendix-01 Problem Solution Matrix
32 pages
Bill of Quantities - Summary
No ratings yet
Bill of Quantities - Summary
2 pages
Newton-Euler Lagrange For Robotics
0% (1)
Newton-Euler Lagrange For Robotics
13 pages
VTC Actuator Fix
No ratings yet
VTC Actuator Fix
11 pages
Human Freedom: Philosophy of Human Person
No ratings yet
Human Freedom: Philosophy of Human Person
24 pages
Commissioning
No ratings yet
Commissioning
4 pages
R4850S1 98% Efficiency Rectifier Datasheet
No ratings yet
R4850S1 98% Efficiency Rectifier Datasheet
3 pages
Picture Style Editor Win Instruction Manual en
No ratings yet
Picture Style Editor Win Instruction Manual en
21 pages
Engr: Habibullah Bhutto: Academic Qualification
No ratings yet
Engr: Habibullah Bhutto: Academic Qualification
1 page

Bussiness Report PM

Uploaded by

Bussiness Report PM

Uploaded by

PREDICTIVE MODELLING

Define the problem and perform exploratory Data Analysis

Model Building - Linear regression

Business Insights & Recommendations

 Note that there are many zeroes in few of the features

The last 5 rows of the dataset:

The number of rows and columns in the dataset:

 The shape of the data is (8192,22)

Basic info about the dataset:

 There are missing values present in ‘rchar’ , ‘wchar’.

There are no duplicate rows present:

 There are outliers present in the data, these needs to be treated

 Outliers have been successfully treated from the dataset now.

A bat plot between cpu run in usr mode vs freemem

Scatterplot ‘lread’ and ‘lwrite’ seperated by ‘runqsz’

Scatterplot ‘sread’ and ‘swrite’ seperated by ‘runqsz’

Scatterplot between ‘vflt’ and ‘pflt’ seperated by ‘runqsz’

Correlation between variables”:

SPLITTING THE DATASET INTO TRAINING AND TESTING DATASET

 The intercept of the model is

 R square on training data:

 R square on testing data:

 RMSE on training data:

 RMSE on testing data:

The model summary is :

 Adjusted R-square also nearly to the R-square,76.6%.

RMSE on train data:

RMSE on test data:

Graph between predicted and residual values

The final linear model equation of the data is :

From the above linear equation it can be predcited that ,

 There are many negative coefficients present in the linear equation.

 Except ‘fork’, ‘freemem’ all coefficients are decrease when implies.

Define the problem and perform exploratory Data Analysis

Model Building and Compare the Performance of the Models

Business Insights & Recommendations

Last five rows of the dataset:

Shape of the data:

Null value check:

 Lets drop all the duplicate rows present in the dataset.

Encoded the categorical variables Wife_ education, Husband_education,

Outliers are as follows:

 The encoding is for creating the dummy variables .

 The proportion of 1s and 0s i.e.(Customers using Contraceptive_method_used

AUC and ROC curve

AUC curve for the train data:

 the curve is ok as plotted above the dotted lines.

 The Area under curve is same as the train data as 71.8%.

For Contraceptive_method_used (Label 0 ):

For Contraceptive_method_used (Label 1 ):

For Contraceptive_method_used (Label 0 ):

For Contraceptive_method_used (Label 1 ):

For train data

 Overall accuracy of the model – 67 % of total predictions are correct

Train and Test Split:

 Now decision tree is looking better than before

For Contraceptive_method_used (Label 0 ):

For Contraceptive_method_used (Label 1 ):

For Contraceptive_method_used (Label 0 ):

For Contraceptive_method_used (Label 1 ):

You might also like