100% found this document useful (1 vote)

147 views41 pages

Machine Learnin1

The document is a report on machine learning models to predict employees' preferred mode of transport based on their attributes. It describes exploring and preprocessing the data, then building and comparing several models including linear regression, naive bayes, KNN, LDA, logistic regression, decision trees, AdaBoost, and gradient boosting. The best performing models are decision trees and gradient boosting, which achieve over 96% accuracy on the test data.

Uploaded by

Surabhi Kulkarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

147 views41 pages

Machine Learnin1

Uploaded by

Surabhi Kulkarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Machine Learning - Report

Surabhi Kulkarni

PGP-DSBA Online

Table of Contents

Contents
Executive Summary
Data Description
Sample of the dataset
Exploratory Data Analysis
Let us check the types of variables in the data frame
Check for missing values in the dataset
Pair Plot
Box Plot
Histogram

List of Figures

Fig.1 – Pair plot

Fig.2 – Correlation Heatmap
Fig.3– Products
Fig.4– Outlier
Fig.5– Histplot
Executive Summary

Problem 1: You work for an office transport company. You are in

discussions with ABC Consulting company for providing transport for
their employees. For this purpose, you are tasked with understanding
how do the employees of ABC Consulting prefer to commute presently
(between home and office). Based on the parameters like age, salary,
work experience etc. given in the data set ‘Transport.csv’, you are
required to predict the preferred mode of transport. The project
requires you to build several Machine Learning models and compare
them so that the model can be finalised.

Data Dictionary

Age : Age of the Employee in Years

Gender : Gender of the Employee

Engineer : For Engineer =1 , Non Engineer =0

MBA : For MBA =1 , Non MBA =0

Work Exp : Experience in years

Salary : Salary in Lakhs per Annum

Distance : Distance in Kms from Home to Office

license : If Employee has Driving Licence -1, If not, then 0

Transport : Mode of Transport

The objective is to build various Machine Learning models on this data
set and based on the accuracy metrics decide which model is to be
finalised for finally predicting the mode of transport chosen by the
employee.

Importing Libraries.

Importing Data.

Checking the type of the dataset.

Checking the shape of the dataset: (444, 9)

Getting the info data types column wise.

dtypes: float64(2), int64(5), object(2)
memory usage: 31.3+ KB

Observation-1:

The data set contains 444 row, 9 columns .

In the given data set there are 5 Integer type features, 2 Float type
features. 2 Object type features

converting theGender'and 'Transport' column from object / string type

to integer..
Age Gender Engineer MBA Work Exp Salary Distance license Transport

0 28 1 0 0 4 14.3 3.2 0 1

1 23 0 1 0 4 8.3 3.3 0 1

2 29 1 1 0 7 13.4 4.1 0 1

3 28 0 1 1 5 13.4 4.5 0 1

4 27 1 1 0 4 13.4 4.6 0 1

5 26 1 1 0 4 12.3 4.8 1 1

6 28 1 1 0 5 14.4 5.1 0 0

7 26 0 1 0 3 10.5 5.1 0 1

8 22 1 1 0 1 7.5 5.1 0 1

0 1
9 27 1 1 0 4 13.5 5.2
Performing EDA

EDA-Step 1: Checking for duplicate records in the data

Number of duplicate rows = 0

EDA-Step 2: Checking Missing value.

Are there any missing values ?

we can observe there are 0 missing value in the depth column. Missing
value treatment will be done.

Outliers : Boxplot
Distplot
Histplot
Pairplot
Get the Correlation Heatmap

1.2 Split the data into train and test. Is scaling necessary or not?

Lets break the X and y dataframes into training set and test set. For this
we will use

Sklearn package's data splitting function which is based on random

function

Split X and y into training and test set in 70:30 ratio

invoke the LinearRegression function and find the bestfit model on
training data

LinearRegression()

The coefficient for Age is 0.04543297494826039

The coefficient for Gender is
0.19825715816344822
The coefficient for Engineer is -
0.04983406779229654
The coefficient for MBA is 0.05963535977202542
The coefficient for Work Exp is -
0.02955184857514718
The coefficient for Salary is -
0.008628446413510701
The coefficient for Distance is -
0.032482366443639825
The coefficient for license is -
0.3954409462694837

Let us check the intercept for the model

The intercept for our model is
0.07345136229447868

# R square on training data

0.34894130957779157

# R square on testing data

0.31501466929614674

#RMSE on Training data

0.3791229977315064
#RMSE on Testing data
0.3839320749182315

# Since this is regression, plot the predicted y

value vs actual y values for the test
data
# A good model's prediction will be close to
actual leading to high R and R2 values

# Naive Bayes
0.7935483870967742
[[ 51 51]
[ 13 195]]
precision recall f1-score
support
0 0.80 0.50 0.61
102
1 0.79 0.94 0.86
208

accuracy 0.79
310
macro avg 0.79 0.72 0.74
310
weighted avg 0.79 0.79 0.78
310

0.7910447761194029
[[22 20]
[ 8 84]]
precision recall f1-score
support

0 0.73 0.52 0.61

42
1 0.81 0.91 0.86
92

accuracy 0.79
134
macro avg 0.77 0.72 0.73
134
weighted avg 0.78 0.79 0.78
134

KNN

0.7161290322580646

precision recall f1-score support

0 0.62 0.36 0.46
102
1 0.74 0.89 0.81
208

accuracy 0.72
310
macro avg 0.68 0.63 0.63
310
weighted avg 0.70 0.72 0.69
310

0.5223880597014925
[[ 7 35]
[29 63]]
precision recall f1-score
support
0 0.19 0.17 0.18
42
1 0.64 0.68 0.66
92

accuracy 0.52
134
macro avg 0.42 0.43 0.42
134
weighted avg 0.50 0.52 0.51
134

# Linear Discriminant Analysis

0.8
[[ 57 45]
[ 17 191]]
precision recall f1-score
support
0 0.77 0.56 0.65
102
1 0.81 0.92 0.86
208

accuracy 0.80
310
macro avg 0.79 0.74 0.75
310
weighted avg 0.80 0.80 0.79
310

the auc 0.834

0.8208955223880597
[[26 16]
[ 8 84]]
precision recall f1-score
support
0 0.76 0.62 0.68
42
1 0.84 0.91 0.87
92

accuracy 0.82
134
macro avg 0.80 0.77 0.78
134
weighted avg 0.82 0.82 0.82
134

the auc curve 0.810

# Logistic Regression
LogisticRegression(max_iter=10000, n_jobs=2,
penalty='none', solver='newton-cg',
verbose=True)

0.7870967741935484
[[ 58 44]
[ 22 186]]
precision recall f1-score
support

0 0.72 0.57 0.64

102
1 0.81 0.89 0.85
208

accuracy 0.79
310
macro avg 0.77 0.73 0.74
310
weighted avg 0.78 0.79 0.78
310

train : 0.7870967741935484
0.8059701492537313
[[25 17]
[ 9 83]]
precision recall f1-score
support

0 0.74 0.60 0.66

42
1 0.83 0.90 0.86
92

accuracy 0.81
134
macro avg 0.78 0.75 0.76
134
weighted avg 0.80 0.81 0.80
134

AUC: 0.816
In [135]:

# DecisionTreeClassifier

1.0
[[102 0]
[ 0 208]]
precision recall f1-score
support

0 1.00 1.00 1.00

102
1 1.00 1.00 1.00
208

accuracy 1.00
310
macro avg 1.00 1.00 1.00
310
weighted avg 1.00 1.00 1.00
310
AUC: 1.000

0.8134328358208955
[[29 13]
[12 80]]
precision recall f1-score
support

0 0.71 0.69 0.70

42
1 0.86 0.87 0.86
92

accuracy 0.81
134
macro avg 0.78 0.78 0.78
134
weighted avg 0.81 0.81 0.81
134
AUC: 0.861
AdaBoostClassifier(n_estimators=100,
random_state=1)

0.8838709677419355
[[ 77 25]
[ 11 197]]
precision recall f1-score
support

0 0.88 0.75 0.81

102
1 0.89 0.95 0.92
208

accuracy 0.88
310
macro avg 0.88 0.85 0.86
310
weighted avg 0.88 0.88 0.88
310

AUC: 0.959

0.7910447761194029
[[22 20]
[ 8 84]]
precision recall f1-score
support
0 0.73 0.52 0.61
42
1 0.81 0.91 0.86
92

accuracy 0.79
134
macro avg 0.77 0.72 0.73
134
weighted avg 0.78 0.79 0.78
134

AUC: 0.959
# Gradient Boosting
GradientBoostingClassifier(random_state=1)

0.967741935483871
[[ 51 51]
[ 13 195]]
precision recall f1-score
support

0 0.99 0.91 0.95

102
1 0.96 1.00 0.98
208

accuracy 0.97
310
macro avg 0.97 0.95 0.96
310
weighted avg 0.97 0.97 0.97
310

AUC: 0.998

0.7686567164179104
[[22 20]
[ 8 84]]
precision recall f1-score
support

0 0.73 0.52 0.61

42
1 0.81 0.91 0.86
92

accuracy 0.79
134
macro avg 0.77 0.72 0.73
134
weighted avg 0.78 0.79 0.78
134

AUC: 0.815

# RandomForestClassifier

RandomForestClassifier(random_state=1)

1.0
[[ 51 51]
[ 13 195]]
precision recall f1-score
support

0 1.00 1.00 1.00

102
1 1.00 1.00 1.00
208

accuracy 1.00
310
macro avg 1.00 1.00 1.00
310
weighted avg 1.00 1.00 1.00
310

AUC: 1.000
0.8059701492537313
[[22 20]
[ 8 84]]
precision recall f1-score
support

0 0.73 0.52 0.61

42
1 0.81 0.91 0.86
92

accuracy 0.79
134
macro avg 0.77 0.72 0.73
134
weighted avg 0.78 0.79 0.78
134

AUC: 0.829
Problem 2: A dataset of Shark Tank episodes is made available. It
contains 495 entrepreneurs making their pitch to the VC sharks.

Importing Libraries.

Importing Data.

Checking Missing value.

Are there any missing values ?

we can observe there are 100 missing value in the depth column.
Missing value treatment will be done.

2.1 Pick out the Deal (Dependent Variable) and Description columns into a
separate data frame.

deal description

1 True Retail and wholesale pie factory with two reta...

2 True Ava the Elephant is a godsend for frazzled par...

3 False Organizing, packing, and moving services deliv...

4 False Interactive media centers for healthcare waiti...

5 True One of the first entrepreneurs to pitch on Sha...

... ... ...

490 True Zoom Interiors is a virtual service for interi...

491 True Spikeball started out as a casual outdoors gam...

deal description

492 True Shark Wheel is out to literally reinvent the w...

493 False Adriana Montano wants to open the first Cat Ca...

494 True Sway Motorsports makes a three-wheeled, all-el...

387 rows × 2 columns

2.2 Create two corpora, one with those who secured a Deal, the other with
those who did not secure a deal.

Getting the info.

0 deal 204 non-null object
1 description 204 non-null object
dtypes: object(2)
memory usage: 4.8+ KB

True Corpus 50302

False Corpus 34899

0 description 204 non-null object

1 chars 204 non-null object
dtypes: object(2)
memory usage: 4.8+ KB

We are importing the nltk library to use the inaugural.fileds()

Print true words

Print false words
word cloud true [Secured a deal]
word cloud false [Did not secure a deal]
Q4: Refer to both the word clouds. What do you infer?

The 'secured a deal' wordcloud contains words such as 'one', 'design' ,

'free' ,'children' ,'offer', 'easy' ,'online','use' .These indicate that Deals aimed
towards catering to the children, which provided offers or a free
sample/product, was easy to use, had a good design and was unique in its
creativity are more likely to secure a deal.

The 'Did not secure a deal' wordcloud contains words such as 'one',
'designed' , 'help' ,'device' ,'bottle', 'premium' ,'use' .These indicate that
Deals with a mediocre design, less suited to solve/help a problem, products
involving water bottles, having a higher and premium price tag and less
usability are less likely to secure a deal.

It is also observed that words such as 'one', 'designed' ,'system' and 'use'
have a higher weight in both these wordclouds.This indicates that either
these were not the defining factors to whether a deal is made or not or
might have been used in a different context in the description in each
scenario.

Q5.Looking at the word clouds, is it true that the entrepreneurs who

introduced devices are less likely to secure a deal based on your analysis?

The word 'device' is not easily found in the 'secured a deal' wordcloud while
it is easily spotted in tne 'not secured a deal' wordcloud. This indicates that
the word 'device' occured frequently when a deal was rejected hence
implying the statement given in the question is true.

Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Clustering Clean Ads - Data
0% (1)
Clustering Clean Ads - Data
1,657 pages
MRA - Project - Puvya - Ravi
100% (3)
MRA - Project - Puvya - Ravi
46 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
Time Series Project
50% (4)
Time Series Project
2 pages
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
100% (2)
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
17 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Analysis of Transport Choice of Employees - A Project On Machine Learning
100% (10)
Analysis of Transport Choice of Employees - A Project On Machine Learning
24 pages
PM - ExtendedProject - Business Report
100% (4)
PM - ExtendedProject - Business Report
35 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Marketing & Retail Analytics - Report - Part A
100% (2)
Marketing & Retail Analytics - Report - Part A
18 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
Sunira - Predictive Modeling
100% (1)
Sunira - Predictive Modeling
65 pages
Project Report
100% (3)
Project Report
36 pages
FRA Report
100% (1)
FRA Report
30 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
Rajiv Ranjan 11 Dec 2022
No ratings yet
Rajiv Ranjan 11 Dec 2022
18 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
100% (1)
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
47 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
100% (1)
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
14 pages
Project 2 SMDM
50% (2)
Project 2 SMDM
5 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
AS Project - 3 Business Report
0% (1)
AS Project - 3 Business Report
10 pages
Business Report
No ratings yet
Business Report
12 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
This Study Resource Was: Quiz 3
100% (1)
This Study Resource Was: Quiz 3
5 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Assignment Report - Data Mining
No ratings yet
Assignment Report - Data Mining
24 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
MRA Project Milestone2 PDF
100% (1)
MRA Project Milestone2 PDF
1 page
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Advanced Statistics Project - Jayant Chandra
No ratings yet
Advanced Statistics Project - Jayant Chandra
20 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
No ratings yet
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
9 pages
ML Quiz-2
No ratings yet
ML Quiz-2
5 pages
SQL Quiz Results
No ratings yet
SQL Quiz Results
17 pages
Problem Statement1
No ratings yet
Problem Statement1
1 page
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Pranjal - Singh - 25.12.2022 - Data Mining Project
No ratings yet
Pranjal - Singh - 25.12.2022 - Data Mining Project
8 pages
Problem 2
100% (1)
Problem 2
10 pages
Pranjal - Singh - 27.11.2022 AS Project
No ratings yet
Pranjal - Singh - 27.11.2022 AS Project
9 pages
Women in Mathematics
No ratings yet
Women in Mathematics
330 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
Jack and Jill School Mathematics Mock 2
No ratings yet
Jack and Jill School Mathematics Mock 2
6 pages
Parallelograms
No ratings yet
Parallelograms
4 pages
Management Science Activity
No ratings yet
Management Science Activity
2 pages
Bda Important Questions
100% (1)
Bda Important Questions
4 pages
Cut & Bent Reinforcement
No ratings yet
Cut & Bent Reinforcement
3 pages
Grade 4 Mathematics Term 4 Mock Exam: Place Value
0% (1)
Grade 4 Mathematics Term 4 Mock Exam: Place Value
4 pages
Matrices Notes Part 2 2024 SOLUTIONS
No ratings yet
Matrices Notes Part 2 2024 SOLUTIONS
13 pages
Creii-1 1
No ratings yet
Creii-1 1
53 pages
Grade 11 Ap CSP 4TH MP Exam
No ratings yet
Grade 11 Ap CSP 4TH MP Exam
4 pages
ST 16 2-5 (-4)
No ratings yet
ST 16 2-5 (-4)
9 pages
Machine Learning: Pradyumn Sharma Pragati Software Pvt. LTD
No ratings yet
Machine Learning: Pradyumn Sharma Pragati Software Pvt. LTD
85 pages
S.No Name of School Name of Teacher Contact Number Subject Type
No ratings yet
S.No Name of School Name of Teacher Contact Number Subject Type
1 page
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
ChE 3323 Syllabus 2016
No ratings yet
ChE 3323 Syllabus 2016
5 pages
Micro Economicsforever Sem3
No ratings yet
Micro Economicsforever Sem3
34 pages
Mathematics W 21
100% (1)
Mathematics W 21
25 pages
Absorption: Instructor: Zafar Shakoor
No ratings yet
Absorption: Instructor: Zafar Shakoor
14 pages
Trigonometry Sheet - 05
No ratings yet
Trigonometry Sheet - 05
10 pages
Pattern Recognition Assignment: Hari Narayan N.U B110490EE EEE A Batch
No ratings yet
Pattern Recognition Assignment: Hari Narayan N.U B110490EE EEE A Batch
18 pages
Constraint Programming: Michael Trick Carnegie Mellon
No ratings yet
Constraint Programming: Michael Trick Carnegie Mellon
41 pages
Print Assessment
No ratings yet
Print Assessment
20 pages
1 s2.0 S0022169421007320 Main
No ratings yet
1 s2.0 S0022169421007320 Main
13 pages
Kawasaki 1987
No ratings yet
Kawasaki 1987
23 pages
Time Complexity: Dr. Zahid Halim
No ratings yet
Time Complexity: Dr. Zahid Halim
32 pages
DPP-1 2D Projectile Motion Op
No ratings yet
DPP-1 2D Projectile Motion Op
2 pages
PrOBLEM Reading and Measuring THERMOMETER
No ratings yet
PrOBLEM Reading and Measuring THERMOMETER
16 pages
Neural Style Transfer
No ratings yet
Neural Style Transfer
14 pages
Acceleration
No ratings yet
Acceleration
4 pages
Introduction To Python - 2018
No ratings yet
Introduction To Python - 2018
20 pages
TPS (Think Pair Share) REPORT: Syed Ayub Ahmed DSBA Online Date:15/03/2021
No ratings yet
TPS (Think Pair Share) REPORT: Syed Ayub Ahmed DSBA Online Date:15/03/2021
8 pages
Scan 9 Apr 2019 PDF
No ratings yet
Scan 9 Apr 2019 PDF
26 pages
Day 1 August 24 - Grade 8
No ratings yet
Day 1 August 24 - Grade 8
4 pages
Computer Vision Assignment
No ratings yet
Computer Vision Assignment
1 page
M3 T1 V3 Joins Query
No ratings yet
M3 T1 V3 Joins Query
1 page

Machine Learnin1

Uploaded by

Machine Learnin1

Uploaded by

Machine Learning - Report

Fig.1 – Pair plot

Problem 1: You work for an office transport company. You are in

Age : Age of the Employee in Years

Gender : Gender of the Employee

Engineer : For Engineer =1 , Non Engineer =0

MBA : For MBA =1 , Non MBA =0

Work Exp : Experience in years

Salary : Salary in Lakhs per Annum

Distance : Distance in Kms from Home to Office

license : If Employee has Driving Licence -1, If not, then 0

Transport : Mode of Transport

Checking the type of the dataset.

Getting the info data types column wise.

The data set contains 444 row, 9 columns .

converting theGender'and 'Transport' column from object / string type

EDA-Step 1: Checking for duplicate records in the data

Number of duplicate rows = 0

EDA-Step 2: Checking Missing value.

Are there any missing values ?

Sklearn package's data splitting function which is based on random

Split X and y into training and test set in 70:30 ratio

The coefficient for Age is 0.04543297494826039

Let us check the intercept for the model

# R square on training data

# R square on testing data

#RMSE on Training data

# Since this is regression, plot the predicted y

0 0.73 0.52 0.61

precision recall f1-score support

# Linear Discriminant Analysis

the auc 0.834

the auc curve 0.810

0 0.72 0.57 0.64

0 0.74 0.60 0.66

0 1.00 1.00 1.00

0 0.71 0.69 0.70

0 0.88 0.75 0.81

0 0.99 0.91 0.95

0 0.73 0.52 0.61

0 1.00 1.00 1.00

0 0.73 0.52 0.61

Checking Missing value.

Are there any missing values ?

1 True Retail and wholesale pie factory with two reta...

2 True Ava the Elephant is a godsend for frazzled par...

3 False Organizing, packing, and moving services deliv...

4 False Interactive media centers for healthcare waiti...

5 True One of the first entrepreneurs to pitch on Sha...

... ... ...

490 True Zoom Interiors is a virtual service for interi...

491 True Spikeball started out as a casual outdoors gam...

492 True Shark Wheel is out to literally reinvent the w...

494 True Sway Motorsports makes a three-wheeled, all-el...

387 rows × 2 columns

Getting the info.

True Corpus 50302

0 description 204 non-null object

We are importing the nltk library to use the inaugural.fileds()

Print true words

The 'secured a deal' wordcloud contains words such as 'one', 'design' ,

Q5.Looking at the word clouds, is it true that the entrepreneurs who

You might also like