0% found this document useful (0 votes)

29 views35 pages

DVT Project

Uploaded by

Monica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views35 pages

DVT Project

Uploaded by

Monica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

| MONICA SHARMA

MACHINE LEARNING PROJECT REPORT

MACHINE LEARNING PROJECT

REPORT
- MONICA SHARMA

pg. 1
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

TABLE OF CONTENT
1. Problem 1---------------------------------------------------------------------------------------------------------------------4
1.1 Define the problem and perform Exploratory Data Analysis - Problem definition - Check
shape, Data types, statistical summary - Univariate analysis - Bivariate analysis - Use appropriate
visualizations to identify the patterns and insights - Key meaningfu observations on individual
variables and the relationship between variables.-------------------------------------------------------------------5
1.2 Data Pre-processing Prepare the data for modelling: - Outlier Detection (treat, if needed) -
Feature Engineering / drop redundant features (if needed) - Encode the data - Train-test split-------12
1.3 Model Building - Bagging - Build a Bagging classifier - Build a Random Forest classifier - Check
the performance of the models across train and test set using different metrics and comment on the
same. 13
1.4 Model Improvement - Bagging - Try and improve the model performance by tuning the
model (minimum 2 parameters to be tuned) - Bagging Classifier - Random Forest Classifier -
Comment on model performance after tuning the model.-------------------------------------------------------17
1.5 Model Building - Boosting - Build a Boosting classifier - Check the performance of the models
across train and test set using different metrics and comment on the same Note: AdaBoost or
GradientBoosting classifier can be built.-------------------------------------------------------------------------------21
1.6 Model Improvement - Boosting - Try and improve the model performance by tuning the
model (minimum 2 parameters to be tuned) - Comment on model performance after tuning the
model.23
1.7 Actionable Insights & Recommendations - Compare all the models and choose the best
model with proper rationale - Conclude with the key takeaways (actionable insights and
recommendations) for the business.-----------------------------------------------------------------------------------25
2. Problem 2-------------------------------------------------------------------------------------------------------------------27
2.1 Data Preparation Data preparation and exploratory data analysis - Pick out the Deal
(Dependent Variable) and Description columns into a separate dataframe - Create two corpora - one
with those who secured a deal and the other with those who did not secur a deal - Find the number
of characters for both the corpuses Text preprocessing on corpora which secured the deal-----------27
2.1.1 Pick out the Deal (Dependent Variable) and Description columns into a separate data
frame. 31
2.1.2 Create two corpora - one with those who secured a deal and the other with those who
did not secure a deal-----------------------------------------------------------------------------------------------------31
2.1.3 Find the number of characters for both the corpuses Text preprocessing on corpora which
secured the deal.----------------------------------------------------------------------------------------------------------31
2.1.4 Text pre-processing on corpora which secured the deal.-------------------------------------------32
2.2 Insight Generation - Create a wordcloud of common words used by companies who secure a
deal - Provide insights from the preprocessed data.---------------------------------------------------------------35
2.3 Business Report Quality - Adhere to the business report checklist----------------------------------35

pg. 2
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

LIST OF FIGURES
Figure 1-1: No. of male & female using different transport modes........................................................6
Figure 1-2: Distribution of Age..............................................................................................................7
Figure 1-2: Distribution of Work Experience.........................................................................................7
Figure 1-2: Observation on Gender.......................................................................................................8
Figure 1-2: Distribution on preferred mode of transport.......................................................................8
Figure 1-2: Gender Impact on mode of transport................................................................................10
Figure 1-2: Work Exp Impact on mode of transport.............................................................................10
Figure 1-2: Age Impact on mode of transport......................................................................................11
Figure 1-9: Outlier Plot.......................................................................................................................12

LIST OF TABLES
Table 1-1:Data Information...................................................................................................................5
Table 1-2:Duplicate Value information..................................................................................................5
Table 1-3:Shape of the data..................................................................................................................5
Table 1-4:Statistical Information of the dataset....................................................................................6
Table 1-6:Preferred mode of Transport wrt Gender..............................................................................9
Table 1-5:Multivariate Analysis (Heat Map)..........................................................................................9
Table 2-1:head of the dataset (Part 1).................................................................................................27
Table 2-2:head of the dataset (Part 1).................................................................................................28
Table 2-3:Shape of the data................................................................................................................28
Table 2-4:Dataset type........................................................................................................................29
Table 2-5:Dataset Information............................................................................................................29
Table 2-6:Null value of Dataset...........................................................................................................30

pg. 3
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1. Problem 1
Context
You are in discussions with ABC Consulting company for providing transport for their employees. For this
purpose, you are tasked with understanding how do the employees of ABC Consulting prefer to
commute presently (between home and office). Based on the parameters like age, salary, work
experience etc. given in the data set ‘Transport.csv’, you are required to predict the preferred mode of
transport. The project requires you to build several Machine Learning models and compare them so that
the model can be finalized.

Objective
The objective is to build various Machine Learning models on this data set and based on the accuracy
metrics decide which model is to be finalized for finally predicting the mode of transport chosen by the
employee.

Data Dictionary
Age: Age of the Employee in Years

Gender: Gender of the Employee

Engineer: For Engineer =1 , Non Engineer =0

MBA: For MBA =1 , Non-MBA =0

Work Exp: Experience in years

Salary: Salary in Lakhs per Annum

Distance: Distance in km from Home to Office

license: If Employee has Driving Licence -1, If not, then 0

Transport: Mode of Transport

pg. 4
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.1 Define the problem and perform Exploratory Data Analysis - Problem definition - Check shape,
Data types, statistical summary - Univariate analysis - Bivariate analysis - Use appropriate
visualizations to identify the patterns and insights - Key meaningful observations on individual
variables and the relationship between variables.
Data is imported and the following are the observations:

Table 1-1:Data Information

Table 1-2:Duplicate Value information

Table 1-3:Shape of the data

 There are 444 employee records.

 There is a total of 9 variables, Transport is dependent and other variables are independent.
 There are no duplicate values in the record

Statistical Summary

pg. 5
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Table 1-4:Statistical Information of the dataset

 50% of the employees have work experience of less than 5 years and 75% of the employees have
work experience below 8 yrs.
 Average employee age is 27.75 years.
 The average salary of an employee is 16.23.
 75% of the employees have travel distance of less than 13

Figure 1-1: No. of male & female using different transport modes.

 Out of 444 records 316 is of ‘Male’ and remaining 128 is ‘Female’.

 Frequency of employees travelling through public transport is 300 and 144 is Private transport.

pg. 6
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Figure 1-2: Distribution of Age

 The distribution of age is rightly skewed. From the plot it is inferred that most of the employees
are aged between 23 to 30 years

Figure 1-3: Distribution of Work Experience

 Work Exp variable looks right skewed with most of the employees having work experience
between 0 to 8 years.

pg. 7
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Figure 1-4: Observation on Gender

- As it can be observed, the dataset has 71.2% male and 28.8% female

Figure 1-5: Distribution on preferred mode of transport

- 300 people use public transport and rest 144 prefer Private transport

pg. 8
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Table 1-5:Preferred mode of Transport wrt Gender

Multivariate Analysis

Table 1-6:Multivariate Analysis (Heat Map)

- As it can be observed from the heat map Work Exp is highly correlated with Salary and Age

pg. 9
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Figure 1-6: Gender Impact on mode of transport

- More females tend to prefer Private transport as compared to males

Figure 1-7: Work Exp Impact on mode of transport

- People with higher work experience prefer to travel using Private transport than Public
transport

pg. 10
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Figure 1-8: Age Impact on mode of transport

- People with Age more than 30 generally prefer to travel using Private transport than Public
transport

pg. 11
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.2 Data Pre-processing Prepare the data for modelling: - Outlier Detection (treat, if needed) -
Feature Engineering / drop redundant features (if needed) - Encode the data - Train-test split

Outlier Detection

Figure 1-9: Outlier Plot

 There are outliers present. However, for now we will keep the outlier and proceed with model
building.

Data Split

 Shape of Training set: (310, 8)

 Shape of test set: (134, 8)
 Percentage of classes in training set:

1 0.674194
0 0.325806
Name: Transport, dtype: float64
 Percentage of classes in test set:

1 0.679104
0 0.320896
Name: Transport, dtype: float64

pg. 12
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.3 Model Building - Bagging - Build a Bagging classifier - Build a Random Forest classifier - Check
the performance of the models across train and test set using different metrics and comment on
the same.

Model evaluation criterion:

Model can make wrong predictions as:

1. The model predicts that the public mode of transport is preferred but employees prefer private
mode.
2. The model predicts that that the Private mode of transport is preferred but employee prefers
public mode.

Which case is more important?

Both are important to correctly estimate the number of employees who prefer private transport.

How to reduce the losses?

 F1 Score can be used as the metric for evaluation of the model, greater the F1 score higher are
the chances of minimizing False Negatives and False Positives.
 We will use balanced class weights so that the model focuses equally on both classes.

We have created functions to calculate different metrics and confusion matrix so that we don't have to
use the same code repeatedly for each model.

 The model_performance_classification_sklearn function will be used to check the model

performance of models.
 The confusion_matrix_sklearn function will be used to plot the confusion matrix.

a. Bagging - Model Building

 Checking model performance on training set

pg. 13
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

 Checking model performance on tested set

pg. 14
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

- As we can see, the model is overfitting here. We will try to tune the model and reduce
overfitting.

b. Random Forest- Model Building

 Checking model performance on training set

 Checking model performance on tested set

pg. 15
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

- Similar to bagging model, it can be seen that the random forest model is overfitting
here. We will try to tune the model and reduce overfitting.

pg. 16
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.4 Model Improvement - Bagging - Try and improve the model performance by tuning the model
(minimum 2 parameters to be tuned) - Bagging Classifier - Random Forest Classifier - Comment
on model performance after tuning the model.

a. Hyperparameter Tuning – Bagging Classifier

 Checking model performance on tested set

pg. 17
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

- The model is still found to overfit the training data, as the training metrics are high, but
the testing metrics are not.

b. Hyperparameter Tuning – Random Classifier

pg. 18
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

pg. 19
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

- The model is still found to overfit the training data, as the training metrics are high, but
the testing metrics are not.

pg. 20
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.5 Model Building - Boosting - Build a Boosting classifier - Check the performance of the models
across train and test set using different metrics and comment on the same Note: AdaBoost or
GradientBoosting classifier can be built.
a. Boosting- Model Building and Hyperparameter Tuning
 Checking model performance on training set

- We can see that the True positives account to 206, False negatives account to 3, False
Positives account to 32 and true negatives account to 69.

 Checking model performance on tested set

- We can see that the True positives account to 87, False negatives account to 4, False
Positives account to 17 and true negatives account to 26.

pg. 21
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

pg. 22
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.6 Model Improvement - Boosting - Try and improve the model performance by tuning the model
(minimum 2 parameters to be tuned) - Comment on model performance after tuning the model.

- We can see that the True positives account to 204, False negatives account to 5, False
Positives account to 40 and true negatives account to 61.

pg. 23
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

- We can see that the True positives account to 86, False negatives account to 5, False
Positives account to 19 and true negatives account to 24.

pg. 24
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.7 Actionable Insights & Recommendations - Compare all the models and choose the best model
with proper rationale - Conclude with the key takeaways (actionable insights and
recommendations) for the business.

Observation
- Based on the above data for all the modules, it can be observed that Adaboost classsifer
model will be able to provide better predictions. Compared to all the models, Adaboost
classifier shows better accuracy and precision.

pg. 25
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

- Looking at the feature importance of the Adaboost classifier model, the top three
important features to look for are -Salary, Distance and Age.

Actionable Insights and Recommendations:

- Important variables are Salary, Age, Work. exp, And Distance

- Age and Work.Exp are correlated.
- People with higher salaries prefer to use Private transport. However, we can see outlier
in the dataset.
- People with age more than 30 generally prefer to travel using Private transport than
public transport.
- People with higher work experience tend to prefer using Private mode of transport.
There are outlier present in the public transport data with more experience.

pg. 26
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

2. Problem 2
Context

A dataset of Shark Tank episodes is made available. It contains 495 entrepreneurs making their pitch
to the VC sharks. You will ONLY use “Description” column for the initial text mining exercise.

1.8 Data Preparation Data preparation and exploratory data analysis - Pick out the
Deal (Dependent Variable) and Description columns into a separate dataframe -
Create two corpora - one with those who secured a deal and the other with those
who did not secure a deal - Find the number of characters for both the corpuses
Text preprocessing on corpora which secured the deal

a. Data Description

Table 2-7:head of the dataset (Part 1)

pg. 27
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Table 2-8:head of the dataset (Part 1)

Table 2-9:Shape of the data

pg. 28
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Table 2-10:Dataset type

Table 2-11:Dataset Information

pg. 29
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

Table 2-12:Null value of Dataset

- There 495 rows and 19 columns

- The dataset contains 2 Boolean, 5 integer and 12 objects.
- There are null values present in entrepreneur and website columns. However, as we will
not be using these columns for our study, we can keep it as it is.

pg. 30
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.8.1 Pick out the Deal (Dependent Variable) and Description columns into a separate data
frame.
- The new dataframe “df2” have 495 rows and 2 columns i.e., Deal and Description

1.8.2 Create two corpora - one with those who secured a deal and the other with those who did not
secure a deal
We created two corpora – Corpora 1: deal secured and Corpora 2 : deal not secured

1.8.3 Find the number of characters for both the corpuses Text preprocessing on corpora which
secured the deal.

- The number of characters in corpus which secure the Deal is 45002

- The number of characters in corpus which did not secure the Deal is 47184

pg. 31
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.8.4 Text pre-processing on corpora which secured the deal.

We'll be doing text preprocessing on the corpus for those who secured the deal

a. Removal of http links

b. De-contraction of words

c. Tokenization

d. Lowercasing: Lowercasing ALL your text data, although commonly overlooked, is one of the
simplest and most effective form of text preprocessing.

pg. 32
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

e. Removal of Punctuation

 Removal of stop words:

- Stop words are a set of commonly used words in a language.

- Examples of stop words in English are “a”, “the”, “is”, “are” etc. The intuition behind
using stop words is that, by removing low information words from text, we can focus on
the important words instead.

f. Lemmatization

- Lemmatization on the surface is very similar to stemming, where the goal is to

remove inflections and map a word to its root form.

pg. 33
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

g. Normalization (aggregating pre-processing function into one):

pg. 34
| MONICA SHARMA
MACHINE LEARNING PROJECT REPORT

1.9 Insight Generation - Create a wordcloud of common words used by companies who secure a
deal - Provide insights from the preprocessed data.

- From the word cloud, we can say that an entrepreneur who secured the deal used
words like ‘product’, ‘make’, ‘design’, ‘online’, ‘offer’, ‘need’ and more positive and
product descriptive words to attract the customer’s interest, hence securing the deal.
- Hence to increase the performance one must make more use of words which will attract
the customer’s interest and use more product and design oriented words.

1.10 Business Report Quality - Adhere to the business report checklist

pg. 35

Project 4 - Cars-Datasets PDF
100% (2)
Project 4 - Cars-Datasets PDF
44 pages
k3 Ve Service Manual
26% (19)
k3 Ve Service Manual
2 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Machine Learning
100% (1)
Machine Learning
33 pages
Analysis of Transport Choice of Employees - A Project On Machine Learning
100% (10)
Analysis of Transport Choice of Employees - A Project On Machine Learning
24 pages
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
100% (2)
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
17 pages
SMDM Project Report
100% (2)
SMDM Project Report
35 pages
Machine Learning Project
67% (3)
Machine Learning Project
30 pages
MP - English (R - 23)
No ratings yet
MP - English (R - 23)
192 pages
Ritesh Tandon Machine Learning Project
100% (5)
Ritesh Tandon Machine Learning Project
23 pages
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Machine Learning
100% (2)
Machine Learning
30 pages
Machine Learning Project On Cars
92% (13)
Machine Learning Project On Cars
22 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
Cars Project PDF
No ratings yet
Cars Project PDF
9 pages
Case History, Assessment Process and Report
No ratings yet
Case History, Assessment Process and Report
88 pages
Assignment ML
100% (2)
Assignment ML
21 pages
Machine Learning Project - Sapan Parikh
100% (1)
Machine Learning Project - Sapan Parikh
12 pages
Machine Learning (Project5) PDF
100% (2)
Machine Learning (Project5) PDF
13 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Magnetic Flow E&H
No ratings yet
Magnetic Flow E&H
20 pages
ML 2 - Problem Statements and Rubirics
No ratings yet
ML 2 - Problem Statements and Rubirics
3 pages
Concrete Sheet Pile Drawingdrawing06040
100% (1)
Concrete Sheet Pile Drawingdrawing06040
4 pages
Curriculum Map Subject: Science Quarter: 4 Grade Level: Grade 4 Topic: Earth and Space
100% (1)
Curriculum Map Subject: Science Quarter: 4 Grade Level: Grade 4 Topic: Earth and Space
5 pages
Structure and Written Expression: Section Two
100% (1)
Structure and Written Expression: Section Two
26 pages
Machine Learning Solution
100% (1)
Machine Learning Solution
12 pages
SMDM Project Report - Shubham Bakshi - 07.05.2023
0% (1)
SMDM Project Report - Shubham Bakshi - 07.05.2023
23 pages
Employee Performance Analysis
No ratings yet
Employee Performance Analysis
3 pages
LT-LT-: Satellite Tracer
No ratings yet
LT-LT-: Satellite Tracer
70 pages
Project 5 Surabhi Sood - Report
No ratings yet
Project 5 Surabhi Sood - Report
34 pages
ML Project - Monica Sharma
No ratings yet
ML Project - Monica Sharma
35 pages
Machine Learning Project: Choice of Employee Mode of Transport
No ratings yet
Machine Learning Project: Choice of Employee Mode of Transport
35 pages
Machine Learning Project - Parijat
No ratings yet
Machine Learning Project - Parijat
26 pages
PM Guided Project
No ratings yet
PM Guided Project
25 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
No ratings yet
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
54 pages
Presentation 2nd Mock 2
No ratings yet
Presentation 2nd Mock 2
33 pages
Sukanya December Predictive Modeling 14th Jan 2024
No ratings yet
Sukanya December Predictive Modeling 14th Jan 2024
50 pages
Abhishek Chhetri: Work Experience Skills
No ratings yet
Abhishek Chhetri: Work Experience Skills
2 pages
Machine Learning Extended Project
No ratings yet
Machine Learning Extended Project
3 pages
Machine Learning Extended Project - BrahmaChari
No ratings yet
Machine Learning Extended Project - BrahmaChari
29 pages
Yash - Capstone Report
No ratings yet
Yash - Capstone Report
29 pages
SMDM Project Report
No ratings yet
SMDM Project Report
39 pages
Exam
No ratings yet
Exam
3 pages
MachineLearning Project PDF
No ratings yet
MachineLearning Project PDF
32 pages
Employee Turnover Problem Statement
No ratings yet
Employee Turnover Problem Statement
5 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Capstone Project
No ratings yet
Capstone Project
42 pages
Predicting Mode of Transport
No ratings yet
Predicting Mode of Transport
29 pages
ML Report
No ratings yet
ML Report
3 pages
BerkeGündüz MelihAydın Cmpe442 Training Report
No ratings yet
BerkeGündüz MelihAydın Cmpe442 Training Report
14 pages
Turover Prediction
No ratings yet
Turover Prediction
52 pages
International Project Management Guide 2.0 (IAPM)
100% (1)
International Project Management Guide 2.0 (IAPM)
44 pages
Cab Fare Prediction Report by Abhinav Jha
No ratings yet
Cab Fare Prediction Report by Abhinav Jha
41 pages
New Content-1
No ratings yet
New Content-1
2 pages
Research Paper
No ratings yet
Research Paper
5 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
SMDM Project Report
No ratings yet
SMDM Project Report
27 pages
BUS2004 Ass3 Sem2 2024
No ratings yet
BUS2004 Ass3 Sem2 2024
2 pages
RESEARCH PAPER (HR Analytics)
No ratings yet
RESEARCH PAPER (HR Analytics)
11 pages
Capstone Final PPT Group 6
No ratings yet
Capstone Final PPT Group 6
19 pages
ML - Extended Project Business Report-Richa
No ratings yet
ML - Extended Project Business Report-Richa
32 pages
ML 2 Project Business Report - Nandini
No ratings yet
ML 2 Project Business Report - Nandini
43 pages
Sprocket Central Pty LTD: Data Analytics Approach
No ratings yet
Sprocket Central Pty LTD: Data Analytics Approach
5 pages
Ril Painting Procedure
No ratings yet
Ril Painting Procedure
3 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
Fiat Hitachi Excavator Ex135w Workshop Manual
100% (1)
Fiat Hitachi Excavator Ex135w Workshop Manual
22 pages
2nd Quarter Examination English 7
No ratings yet
2nd Quarter Examination English 7
3 pages
Iso 11600 2002
No ratings yet
Iso 11600 2002
9 pages
Building Consensus Around Difficult Strategic Decisions
No ratings yet
Building Consensus Around Difficult Strategic Decisions
9 pages
Ajay Kumar Garg Engineering College: 27 KM Stone, Delhi-Hapur Bypass Road
No ratings yet
Ajay Kumar Garg Engineering College: 27 KM Stone, Delhi-Hapur Bypass Road
32 pages
Section One1
No ratings yet
Section One1
85 pages
Job Opportunity Bootloader Specialist at Elektrobit Automotive GMBH Jobportal1
No ratings yet
Job Opportunity Bootloader Specialist at Elektrobit Automotive GMBH Jobportal1
3 pages
Heading Hints A Guide To Cold Forming Specialty Alloys
No ratings yet
Heading Hints A Guide To Cold Forming Specialty Alloys
63 pages
Chapters 7
No ratings yet
Chapters 7
64 pages
OD328816327605052100
No ratings yet
OD328816327605052100
1 page
Rewriting The Classics Argumentative Essay by Lucienne Tanios
No ratings yet
Rewriting The Classics Argumentative Essay by Lucienne Tanios
2 pages
08 - FGD by Ammonia Scrubbing in CFB Power Plant
No ratings yet
08 - FGD by Ammonia Scrubbing in CFB Power Plant
4 pages
Write Up of Mech Dept For NAAC
No ratings yet
Write Up of Mech Dept For NAAC
3 pages
Assignment 1 To 4 - BTC507 - 20376005
No ratings yet
Assignment 1 To 4 - BTC507 - 20376005
35 pages
POLARES 2.0 UK LQ
No ratings yet
POLARES 2.0 UK LQ
4 pages
Improving Quality in Food Products: Nestlé's Strategies For Standard Operating Procedures (SOP) and Documentation
No ratings yet
Improving Quality in Food Products: Nestlé's Strategies For Standard Operating Procedures (SOP) and Documentation
10 pages
Edexcel Igcse Physics
No ratings yet
Edexcel Igcse Physics
12 pages
Monday Tuesday Wednesday Thursday Friday
No ratings yet
Monday Tuesday Wednesday Thursday Friday
8 pages
Graph 2 Worksheet
No ratings yet
Graph 2 Worksheet
2 pages
07 Rawlbolts Plugs Anchors
No ratings yet
07 Rawlbolts Plugs Anchors
1 page
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)

DVT Project

Uploaded by

DVT Project

Uploaded by

| MONICA SHARMA

MACHINE LEARNING PROJECT REPORT

MACHINE LEARNING PROJECT

Gender: Gender of the Employee

Engineer: For Engineer =1 , Non Engineer =0

MBA: For MBA =1 , Non-MBA =0

Work Exp: Experience in years

Salary: Salary in Lakhs per Annum

Distance: Distance in km from Home to Office

license: If Employee has Driving Licence -1, If not, then 0

Transport: Mode of Transport

Table 1-1:Data Information

Table 1-2:Duplicate Value information

Table 1-3:Shape of the data

 There are 444 employee records.

Table 1-4:Statistical Information of the dataset

 Out of 444 records 316 is of ‘Male’ and remaining 128 is ‘Female’.

Figure 1-2: Distribution of Age

Figure 1-3: Distribution of Work Experience

Figure 1-4: Observation on Gender

Figure 1-5: Distribution on preferred mode of transport

Table 1-5:Preferred mode of Transport wrt Gender

Table 1-6:Multivariate Analysis (Heat Map)

Figure 1-6: Gender Impact on mode of transport

- More females tend to prefer Private transport as compared to males

Figure 1-7: Work Exp Impact on mode of transport

Figure 1-8: Age Impact on mode of transport

Figure 1-9: Outlier Plot

 Shape of Training set: (310, 8)

Model evaluation criterion:

Model can make wrong predictions as:

Which case is more important?

How to reduce the losses?

 The model_performance_classification_sklearn function will be used to check the model

a. Bagging - Model Building

 Checking model performance on tested set

b. Random Forest- Model Building

 Checking model performance on tested set

a. Hyperparameter Tuning – Bagging Classifier

b. Hyperparameter Tuning – Random Classifier

 Checking model performance on tested set

Actionable Insights and Recommendations:

- Important variables are Salary, Age, Work. exp, And Distance

Table 2-7:head of the dataset (Part 1)

Table 2-8:head of the dataset (Part 1)

Table 2-9:Shape of the data

Table 2-10:Dataset type

Table 2-11:Dataset Information

Table 2-12:Null value of Dataset

- There 495 rows and 19 columns

- The number of characters in corpus which secure the Deal is 45002

1.8.4 Text pre-processing on corpora which secured the deal.

a. Removal of http links

 Removal of stop words:

- Stop words are a set of commonly used words in a language.

- Lemmatization on the surface is very similar to stemming, where the goal is to

g. Normalization (aggregating pre-processing function into one):

1.10 Business Report Quality - Adhere to the business report checklist

You might also like