0% found this document useful (0 votes)

20 views78 pages

Machine Learning-2 Business Report

Uploaded by

arpitasaha.1994

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views78 pages

Machine Learning-2 Business Report

Uploaded by

arpitasaha.1994

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 78

Machine Learning-2

Define the problem and perform Exploratory Data Analysis

- Problem definition - Check shape, Data types, and statistical summary - Univariate analysis
- Multivariate analysis - Use appropriate visualizations to identify the patterns and insights -
Key meaningful observations on individual variables and the relationship between variables

Observations.
 Data set contains 1525 rows with 9 columns (Shape).
Observations:
 Data set contains no missing values. .
 Data set contains object named 'vote' & 'gender'.
 Data set contains 7 integer values, and 2 object.
Observations.
 The minimum and maximum age is 24 to 93.

 The minimum and maximum Assessment of current national economic

conditions is 1 to 5.

 The minimum and maximum Assessment of current household economic

conditions is 1 to 5.

 The mean of Assessment of current national economic conditions is 3.245221.

 The mean of Assessment of current household economic conditions is

3.137772.

 The mean of Assessment of the Labour leader is 3.335531.

 The mean of Assessment of the Conservative leader is 2.749506.

 The mean of an 11-point scale that measures respondents' attitudes toward

European integration is 6.740277. High scores represent ‘Eurosceptic’
sentiment.

 The mean of Knowledge of parties' positions on European integration is

1.540541.

 The Frequency of Labour vote is more by 1057.

 The Frequency of female gender is more by 808

Observations.
Distplot and Boxplot of age:
 The data is normally distributed.
 Maximum number of people are aged between 40 and 70.
 Outliers are not present.
 The minimum value is 24 and the maximum value is 93.
 The mean value is 54.241.
Observations.
Distplot and Boxplot of "economic.cond.national, economic.cond.household, Blair, Hague,
Europe, political.knowledge"
 We can see that all the numerical Variables are normally distributed (not perfectly normal
though and are multi modal in some instances as well.
 There are outliers present in "economic.cond.national" and "economic.cond.household"
variables that can be seen from the boxplots on the right too.
 Also from the boxplots the min and max values of the variables are not very clear, we can
separately obtain them while checking for outliers.
Observations
 We can clearly see that, the labour party has got more votes than the
conservative party.

 In every age group the labour party has got more votes than the conservative
party.
 Female votes are considerably higher than the male votes in both parties.

 In both genders, the labour party has got more votes than the conservative
party.
Observations
 Labour party has higher votes overall.

 Out of 82 people who gave score of 5, 73 people have voted for the labour
party.

 Out of 542 people who gave score of 4, 450 people have voted for the labour
party.This is the highest set of people in the labour party

 Out of 607 people who gave score of 3, 407 people have voted for the labour
party.This is the second highest set of people in the labour party. The
remaining 200 people who have voted for conservative party is the highest set
of people in that party.

 Out of 257 people who gave score of 2, 117 people have voted for the labour
party. 140 people have voted for the conservative party. This is the instance
where the conservative party has got more votes than the labour party.

 Out of 37 people who gave score of 1, 16 people have voted for the labour
party. 21 people have voted for the conservative party.

 The score of 3,4 and 5 have more votes in the labour party.

 The score of 1 and 2 have more votes in the conservative party.

Observations
 Labour party has higher votes overall.

 Out of 92 people who gave a score of 5, 69 people have voted for the labour
party.

 Out of 440 people who gave score of 4, 353 people have voted for the labour
party.This is second the highest set of people in the labour party.

 Out of 648 people who gave score of 3, 450 people have voted for the labour
party.This is the highest set of people in the labour party. The remaining 198
people who have voted for conservative party is the highest set of people in
that party.

 Out of 280 people who gave score of 2, 154 people have voted for the labour
party. 126 people have voted for the conservative party.

 Out of 65 people who gave score of 1, 37 people have voted for the labour
party. 28 people have voted for the conservative party.

 The score of 3,4 and 5 have more votes in the labour party.

 In all the instances, the labour party have more votes than the conservative
party.
Observations
 Labour party has higher votes overall.
 Out of 153 people who gave a score of 5, 150 people have voted for the
labour party.The remaining 3 people despite giving a score of 5 to the
labour leader have chosen to vote foe the conservative party.
 Out of 836 people who gave score of 4, 679 people have voted for the
labour party.The remaining 157 people despite giving a score of 4 to the
labour leader have chosen to vote for the conservative party.
 Only 1 person has given a score of 3 and that person has voted for the
conservative party.
 Out of 438 people who gave score of 2, 242 people have voted for the
conservative party. The remaining 196 people, despite giving an
unsatisfactory score of 2 to the labour leader, have chosen to vote for
the labour party.
 Out of 97 people who gave score of 1, 59 people have voted for the
conservative party. The remaining 38 people despite giving the lowest
score of 1 to the labour leader, have chosen to vote the labour party.
 The score of 4 and 5 have more votes in the labour party.
 The score of 1,2 and 3 have more votes in the conservative party.
Observations

 Labour party has higher votes overall.

 Out of 73 people who gave a score of 5, 59 people have voted for the
conservative party.The remaining 14 people despite giving a score of 5
to the conservative leader have chosen to vote foe the labour party.
 Out of 558 people who gave score of 4, 287 people have voted for the
conservative party.The remaining 271 people despite giving a score of 4
to the conservative leader have chosen to vote for the labour party.
 Out of 37 people who gave a score of 3, 28 have voted for the labour
party. The remaining 9 despite giving an average score of 3 to the
conservative party have chosen to vote for the conservative party.
 Out of 624 people who gave score of 2, 528 people have voted for the
labour party. The remaining 96 people, despite giving an unsatisfactory
score of 2 to the conservative leader, have chosen to vote for the
conservative party.
 Out of 233 people who gave score of 1, 222 people have voted for the
labour party. The remaining 11 people despite giving the lowest score
of 1 to the conservative leader, have chosen to vote the conservative
party.
 The score of 4 and 5 have more votes in the conservative party,
although in 4 the votes are almost equal in both the
parties.Conservative party gets slightly higher.
 The score of 1,2 and 3 have more votes in the labour party. Still a
significant percentage of people who have a bad score to the
conservative leader still chose to vote for 'Hague'.
Observations

 Out of 338 people who gave ascore of 11, 166 people have voted for the
labour party and 172 people have voted for the conservative party.
 People who gave score of 7 to 10 have voted for labour and
conservative almost equally.Conservative party seem to be slightly
higher in these instances.
 Out of 209 people who gave a score of 6, 173 people have voted for the
labour party and 36 people have voted for the conservative party.
 People who gave a score of 1 to 6 have predominantly voted for the
labour party. As we can see there are total of 770 people who have
given scores from 1 to 6. Out of 770 people 672 people have voted for
the labour party. So, 87.28% of the people have chosen labour party.
 So, we can infer that lower the 'Eurosceptic' sentiment, higher the
votes for labour party.
Observations

 Out of 250 people who gave a score of 3, 178 people have voted for the
labour party and 72 people have voted for the conservative party.
 Out of 782 people who gave a score of 2, 498 people have voted for the
labour party and 284 people have voted for the conservative party.
 Out of 38 people who gave a score of 1, 27 people have voted for the
labour party and 11 people have voted for the conservative party.
 Out of 455 people who gave a score of 0, 360 people have voted for the
labour party and 95 people have voted for the conservative party.
 We can see that, in all instances, labour party gets the higher number
of votes.
 Out of 1525 people 455 people gave a score of 0. So, this means that
29.93% of the people are casting their votes without any political
knowledge.
Observations

 Pairplot tells us about the interaction of each variable with every other
variable present. As such there is no strong relationship present
between the variables. There is a mixture of positive and negative
relationships though which is expected.
 Overall its a rough estimate of the interactions, clearer picture can be
obtained by heatmap values and also different kinds of plots.
 Pairplot is acombination of histograms and scatterplots.
 From the histogram we can see that the 'Blair','Europe' and
'political.knowledge' variables are slightly left skewed.
 All other variables seem to be normally distributed.
 From the scatterplot, we can see that there is mostly no correlation
between the variables.
 We can use the correlation heatmap to view them more clearly.
Observations

 We can see that, mostly there is no correlation in the dataset through

this. There are some variables that are moderately positively correlated
and some that are slightly negatively correlated.
 'economic.cond.national' with 'economic.cond.household' have
moderate positive correlation.
 'Blair' with 'economic.cond.national' and 'economic.cond.household'
have moderate positive correlation.
 'Europe' with 'Hague' have moderate positive correlation.
 'Hague' with 'economic.cond.national' and 'Blair' have moderate
negative correlation.
 'Europe' with 'economic.cond.national' and 'Blair' have moderate
negative correlation.

Data Pre-processing
Prepare the data for modelling: - Outlier Detection(treat, if needed)) - Encode the data -
Data split - Scale the data (and state your reasons for scaling the features).
Observations

 There are nearly no outliers in most of the numerical columns.

 Only outliers are present in 'economic.cond.national' and
'economic.cond.household' variables that can be seen from the
boxplots.
 In Gaussian Naive Bayes, outliers will affect the shape of the Gaussian
distribution and have the usual effects on the mean etc. So depending
on our use case, it makes sense to remove outlier .
Observations.

 As we can see after treating the outliers with cap and floor technique
all the outliers have been adjusted.

Observations.

 From the above results we can see that both variables contain only two
classifications of data in them.
 We can use a simple categorical conversion (pd.categorical() or dummy
encoding with drop_first = True, both of them will work here)
 This will convert the values into 0 and 1. As there is no level or order in
the subcategory any encoding will give the same result.
Observations.

 The info of the dataset doest not contain any object datatype after
encoding the data.
 The 'vote' and 'gender' variable is converted to 0 and 1 after encoding.

Reasons for Scaling the feature:

 The dataset contains features highly varying in magnitudes, units and

range between the 'age' column and other columns.
 But since, most of the machine learning algorithms use Eucledian
distance between two data points in their computations, this is a
problem.
 If left alone, these algorithms only take in the magnitude of features
neglecting the units.
 The results would vary greatly between different units, 1km and 1000
metres.
 The features with high magnitudes will weigh in a lot more in the
distance calculations than features with low magnitudes.
 To supress this effect, we need to bring all features to the same level of
magnitudes. This can be acheived by scaling.
 in this case, we have a lot of encoded, ordinal, categorical and
continuous variables. So, we use the min max scaler technique to scale
the data.
Model Performance evaluation
- Check the confusion matrix and classification metrics for all the models (for both train and
test dataset) - ROC-AUC score and plot the curve - Comment on all the model performance.
K-Nearest Neighbor Model - Observation

Train data:

• Accuracy: 84% • Precision: 86% • Recall: 91% • F1-Score: 89% • AUC: 90.4%

Test data:

• Accuracy: 83% • Precision: 86% • Recall: 90% • F1-Score: 88% • AUC: 90.4%

Validness of the model:

• The model is not over-fitted. • As we can see, the train data has a 84%
accuracy and test data has 83% accuracy. The difference is very less. So, we
can infer that the KNN model has performed well.

Naïve Bayes Model - Observation

Train data:

• Accuracy: 83% • Precision: 88% • Recall: 88% • F1-Score: 88% • AUC: 88.7%

Test data:

• Accuracy: 82% • Precision: 89% • Recall: 86% • F1-Score: 87% • AUC: 88.7%

Validness of the model:

• The model is not over-fitted or under-fitted. • The error in the test data is
slightly higher than the train data, which is absolutely fine because the error
margin is low and the error in both train and test data is not too high. Thus,
the model is not over-fitted or under-fitted.
Bagging Model - Observation

Train data:

• Accuracy: 100% • Precision: 100% • Recall: 100% • F1-Score: 100% • AUC:

100%

Test data:

• Accuracy: 80% • Precision: 86% • Recall: 86% • F1-Score: 86% • AUC: 100%

Validness of the model:

• The model is over-fitted. • As we can see, the train data has a 100%
accuracy and test data has 80% accuracy. The difference is more in this
model. So, we can infer that the Bagging model has not performed well.

Boosting Model - Observation

Train data:

• Accuracy: 89% • Precision: 91% • Recall: 93% • F1-Score: 92% • AUC: 95%

Test data:

• Accuracy: 83% • Precision: 89% • Recall: 87% • F1-Score: 88% • AUC: 95%

Validness of the model:

• The model is not over-fitted. • As we can see, the train data has a 89%
accuracy and test data has 83% accuracy. The difference is very less. So, we
can infer that the Boosting model has performed well.

Model Performance improvement

- Improve the model performance of bagging and boosting models by tuning the model -
Comment on the model performance improvement on training and test data.
After 10 fold cross validation, scores both on train and test data set
respectively for all 10 folds are almost same.

Hence our model is valid.

Bagging

 Before Tuning : The model permormed well on the training data with an
accuracy of 1.00, indicating potential overfitting. On the test data, it
had an acuuracy of 0.80, with balanced precision and recall for both
classes.
 After Tuning : The model's test accuracy improved slightly to 0.83 after
tuning. Precision and recall improved for class 0, indicating better
performance in predicting the minority class. The over all F1 score and
accuracy improved, suggesting a better balance between precision and
recall.

Boosting

 Before Tuning : The model permormed well on the training data with an
accuracy of 0.89, indicating potential overfitting. On the test data, it
had an acuuracy of 0.83, with balanced precision and recall for both
classes.
 After Tuning : The model's test accuracy improved slightly to 0.82 after
tuning. Precision and recall improved for class 0, indicating better
performance in predicting the minority class. The over all F1 score and
accuracy improved, suggesting a better balance between precision and
recall.

Final Model Selection

 Compare all the model built so far - Select the final model with the
proper justification - Check the most important features in the final
model and draw inferences.

To compare all the models and select the final one, lets analyze the
performance based on various metrics accuracy,F1-score,recall,precision and
AUC-ROC score. Below there is the summary of each model:

KNN:

(Test Data Set)

 Accuracy- 0.83
 Precision- 0.86
 Recall-0.90
 F1-score-0.88
 AUC-Roc-0.90

Naive Baye's:

(Test Data Set)

 Accuracy- 0.82
 Precision- 0.89
 Recall-0.86
 F1-score-0.87
 AUC-Roc-0.887

Bagging(After Tuning):

(Test Data Set)

 Accuracy- 0.80
 Precision- 0.80
 Recall-0.80
 F1-score-0.90
 AUC-Roc-0.82

Boosting(After Tuning):

(Test Data Set)

 Accuracy- 0.83
 Precision- 0.80
 Recall-0.80
 F1-score-0.86
 AUC-Roc-0.81

Conclusion:

• There is no under-fitting or over-fitting in any of the tuned models.

• All the tuned models have high values and every model is good. But as we
can see, the most consistent tuned model in both train and test data is the
Boosting model.

• The tuned gradient boost model performs the best with 79% accuracy score
in train and 83% accuracy score in test. Also it has the best AUC score of 81%
inboth train and test data which is the highest of all the models.

• It also has a precision score of 80% and recall of 80% which is also the
highest of all the models. So, we conclude that Gradient Boost Tuned model is
the best/optimized model.
Actionable Insights & Recommendations¶

 Compare all four models - Conclude with the key takeaways for the
business.

KNN:

(Test Data Set)

 Accuracy- 0.83
 Precision- 0.86
 Recall-0.90
 F1-score-0.88
 AUC-Roc-0.90

Naive Baye's:

(Test Data Set)

 Accuracy- 0.82
 Precision- 0.89
 Recall-0.86
 F1-score-0.87
 AUC-Roc-0.887

Bagging(After Tuning):

(Test Data Set)

 Accuracy- 0.80
 Precision- 0.80
 Recall-0.80
 F1-score-0.90
 AUC-Roc-0.82
Boosting(After Tuning):

(Test Data Set)

 Accuracy- 0.83
 Precision- 0.80
 Recall-0.80
 F1-score-0.86
 AUC-Roc-0.81

Insights:

• Labour party has more than double the votes of conservative party.

• Most number of people have given a score of 3 and 4 for the national
economic condition and the average score is 3.245221

• Most number of people have given a score of 3 and 4 for the household
economic condition and the average score is 3.137772

• Blair has higher number of votes than Hague and the scores are much better
for Blair than for Hague.

• The average score of Blair is 3.335531 and the average score of Hague is
2.749506. So, here we can see that,Blair has a better score.

• On a scale of 0 to 3, about 30% of the total population has zero knowledge

about politics/parties.

• People who gave a low score of 1 to a certain party, still decided to vote for
the same party instead of voting for the other party. This can be because of
lack of political knowledge among the people.
• People who have higher Eurosceptic sentiment, has voted for the
conservative party and lower the Eurosceptic sentiment, higher the votes for
Labour party.

• Out of 454 people who gave a score of 0 for political knowledge, 360 people
have voted for the labour party and 94 people have voted for the conservative
party.

• All models performed well on training data set as well as test dat set. The
tuned models have performed better than the regular models.

• There is no over-fitting in any model except Random Forest and Bagging

regular models.

• Gradient Boosting model tuned is the best/optimized model.

Business recommendations:

• Hyper-parameters tuning is an import aspect of modelbuilding. There are

limitations to this as to process these combinations, huge amount of
processing power is required. But if tuning can be done with many sets of
parameters, we might get even better results.

• Gathering more data will also help in training the models and thus improving
the predictive powers.

• We can also create a function in which all the models predict the outcome in
sequence. This will helps in better understanding and the probability of what
the outcome will be.

• Using Gradient Boosting model without scaling for predicting the outcome
as it has the best optimized performance.

Problem 2 - Define the problem and Perform Exploratory Data Analysis

 Problem Definition - Find the number of Character, words & sentences

in all three speeches.
• President Franklin D. Roosevelt's speech have 1323 of total words.

• President John F. Kennedy's speech have 1364 of total words.

• President Richard Nixon's speech have 1769 of total words.

• President Franklin D. Roosevelt's speech have 7651 characters (including

spaces).

• President John F. Kennedy's speech have 7673 characters (including

spaces).

• President Richard Nixon's speech have 10106 characters (including spaces)

• There are 4.78 avg_word in President Franklin D.Roosevelt's speech.

• There are 4.62 avg_word in President John F. Kennedy'sspeech.

• There are 4.71 avg_word in President Richard Nixon's speech.

• There are 632 stopwords in President Franklin D.Roosevelt's speech.

• There are 618 stopwords in President John F. Kennedy'sspeech.

• There are 899 stopwords in President Richard Nixon's speech.

• There are 14 numerics in President Franklin D. Roosevelt's speech.

• There are 7 numerics in President John F. Kennedy'sspeech.

• There are 10 numerics in President Richard Nixon's speech.

• There are 1 UpperCase words in President Franklin D. Roosevelt's

speech.

• There are 5 UpperCase words in President John F. Kennedy'sspeech.

• There are 13 UpperCase words in President Richard Nixon's speech.

• There are 119 UpperCase letters in President Franklin D. Roosevelt's

speech.

• There are 94 UpperCase letters in President John F. Kennedy'sspeech.

• There are 132 UpperCase letters in President Richard Nixon's speech.

Problem 2 - Text cleaning

 Stopword removal - Stemming - find the 3 most common words used in

all three speeches.
After removal of stopwords:

• President Franklin D. Roosevelt's speech have 5144 characters (including

spaces).

• President John F. Kennedy's speech have 5205 characters (including

spaces).

• President Richard Nixon's speech have 6557 characters (including spaces).

After removal of stopwords:

• President Franklin D. Roosevelt's speech have 662 of total words.

• President John F. Kennedy's speech have 723 of total words.

• President Richard Nixon's speech have 843 of total words.

 As we can see '--' 63, 'us' 44, 'new' 26 these are most frequent word and
character.
Observations:

The most frequent words used in all three speeches are:

• us - 44

• new - 26

• let - 25

• america - 15

• shall - 13

Here, 'every','peace','people' all are on 7th place because of the same number
of occurences. Most occuring word: '--', 'us'and 'new'.
Problem 2 - Plot Word cloud of all three speeches

 Show the most common words used in all three speeches in the form of
word clouds.
We can see some highlighted words like
'let','us','new','nation','world','america','people','peace',etc. This
shows bigger the size more the frequency.

Insights:

 Our objective was to look at all the speeches and analyse them. To find
the strength and sentiment of the speeches.
 Based on the outputs we can see that there are some similar words
that are present in all the speeches.
 These words may prove the point which inspired many people and also
get them the seat of the president of United States of America.
 Among all the speeches "nation" is the word that is significantly
highlighted in all three.

Final Project - ML - Nikita Chaturvedi - 03.10.2021 - Jupyter Notebook
100% (11)
Final Project - ML - Nikita Chaturvedi - 03.10.2021 - Jupyter Notebook
154 pages
Professor Geoffrey Evans, DR Pippa Norris - Critical Elections - British Parties and Voters in Long-Term Perspective - Sage Publications LTD (1999)
No ratings yet
Professor Geoffrey Evans, DR Pippa Norris - Critical Elections - British Parties and Voters in Long-Term Perspective - Sage Publications LTD (1999)
351 pages
Question-Answers in Machine Learning
No ratings yet
Question-Answers in Machine Learning
14 pages
ASSIGNMENT Machine Learning
100% (5)
ASSIGNMENT Machine Learning
63 pages
Inferential Statistics Project
No ratings yet
Inferential Statistics Project
16 pages
The Little Book of Deep Learning
100% (1)
The Little Book of Deep Learning
140 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
100% (3)
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
77 pages
House Price Prediction
No ratings yet
House Price Prediction
52 pages
ML ProjectReport-Sonali Joshi
100% (2)
ML ProjectReport-Sonali Joshi
38 pages
38 Degrees Final Tables
No ratings yet
38 Degrees Final Tables
415 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
AIC and BIC
No ratings yet
AIC and BIC
5 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
An Explainable Transformer-Based Model For Phishing Email Detection: A Large Language Model Approach
No ratings yet
An Explainable Transformer-Based Model For Phishing Email Detection: A Large Language Model Approach
15 pages
Neural Networks Bias
No ratings yet
Neural Networks Bias
7 pages
ML - Business Report - Priyanka Sharma
No ratings yet
ML - Business Report - Priyanka Sharma
117 pages
DSBDA ORAL Question Bank
100% (1)
DSBDA ORAL Question Bank
6 pages
Britain Barometer February 2022 Tables Final
No ratings yet
Britain Barometer February 2022 Tables Final
63 pages
Machine Learning-1 BUSINESS REPORT
No ratings yet
Machine Learning-1 BUSINESS REPORT
122 pages
Predictive Modelling Project
No ratings yet
Predictive Modelling Project
94 pages
UK Politics Essay Plans
No ratings yet
UK Politics Essay Plans
22 pages
Important Questions of Machine Learning
No ratings yet
Important Questions of Machine Learning
5 pages
RAHULSHARMA
No ratings yet
RAHULSHARMA
40 pages
Korea2020 1
No ratings yet
Korea2020 1
14 pages
WP 060076
No ratings yet
WP 060076
41 pages
Election Data
No ratings yet
Election Data
66 pages
Voting Behaviour Booklet
No ratings yet
Voting Behaviour Booklet
37 pages
Loan Approval Prediction Using Machine Learning
No ratings yet
Loan Approval Prediction Using Machine Learning
11 pages
EN-Lesson 5. Một số cách tiếp cận Quản trị đầu tư trong Khoa học dữ liệu (Some Approaches to Investment Management in Data Science)
No ratings yet
EN-Lesson 5. Một số cách tiếp cận Quản trị đầu tư trong Khoa học dữ liệu (Some Approaches to Investment Management in Data Science)
37 pages
Evaluate The View That The Conservative Party Is Internally
No ratings yet
Evaluate The View That The Conservative Party Is Internally
22 pages
PROG8430 - Data Analysis, Modeling and Algorithms Assignment 1 Exploratory Data Analysis With R'
No ratings yet
PROG8430 - Data Analysis, Modeling and Algorithms Assignment 1 Exploratory Data Analysis With R'
7 pages
Business+Report - Ensemble1 Lavekar
No ratings yet
Business+Report - Ensemble1 Lavekar
32 pages
Industrial Data Science
No ratings yet
Industrial Data Science
39 pages
Final Internship Report
No ratings yet
Final Internship Report
37 pages
Multinomial Uk
No ratings yet
Multinomial Uk
24 pages
Machine Learning-1 Project
No ratings yet
Machine Learning-1 Project
47 pages
Yazan Waqfi Paper Published 2022
No ratings yet
Yazan Waqfi Paper Published 2022
21 pages
Micro Econometrics Project
No ratings yet
Micro Econometrics Project
21 pages
Himanshu Gupta
No ratings yet
Himanshu Gupta
21 pages
Election Data
No ratings yet
Election Data
33 pages
Introduction To Quantitative Data Analysis
No ratings yet
Introduction To Quantitative Data Analysis
15 pages
Lecture w6 2 Hypothesis Testing 1 PDF
No ratings yet
Lecture w6 2 Hypothesis Testing 1 PDF
29 pages
UNIT 1 2021 Question Paper (Standard)
No ratings yet
UNIT 1 2021 Question Paper (Standard)
16 pages
Kailash BusinessReport ML
No ratings yet
Kailash BusinessReport ML
51 pages
The UK Party System and Party Politics: Part 1: The Electoral Dimension
No ratings yet
The UK Party System and Party Politics: Part 1: The Electoral Dimension
33 pages
Umendra Pratap Singh Solanki ML Graded Project 18-12-2022
No ratings yet
Umendra Pratap Singh Solanki ML Graded Project 18-12-2022
27 pages
A Level Edexcel Politics Past Paper 2020
No ratings yet
A Level Edexcel Politics Past Paper 2020
24 pages
Davice ML 21-01-2024
No ratings yet
Davice ML 21-01-2024
50 pages
Resources ML
No ratings yet
Resources ML
22 pages
Temperature Sensor Drift
No ratings yet
Temperature Sensor Drift
17 pages
Business Report ML
No ratings yet
Business Report ML
29 pages
Dedag Dawood (Chapter 4, 5)
No ratings yet
Dedag Dawood (Chapter 4, 5)
19 pages
Ai Datarobot
No ratings yet
Ai Datarobot
84 pages
Learning Hatching For Pen-And-Ink Illustration of
No ratings yet
Learning Hatching For Pen-And-Ink Illustration of
18 pages
ML P L Lohitha 22-01-23 Business Report
No ratings yet
ML P L Lohitha 22-01-23 Business Report
34 pages
15 Midterm 2740 Version B PDF
No ratings yet
15 Midterm 2740 Version B PDF
17 pages
Who Abstained in The 2016 United Kingdom European Union Membership Referendum?
No ratings yet
Who Abstained in The 2016 United Kingdom European Union Membership Referendum?
7 pages
Bayesian Hierarchical Marketing Mix Modeling in PyMC - by Dr. Robert Kübler - Ju
No ratings yet
Bayesian Hierarchical Marketing Mix Modeling in PyMC - by Dr. Robert Kübler - Ju
16 pages
Arpita Saha SMDM Coded Project Module 2 10 01 2024 G2 Business Report
No ratings yet
Arpita Saha SMDM Coded Project Module 2 10 01 2024 G2 Business Report
21 pages
Project Report
No ratings yet
Project Report
11 pages
Output - 2024-11-17T201224.726
No ratings yet
Output - 2024-11-17T201224.726
9 pages
Data Analysis: Section I Composition of The Sample Table 4.1.1 Composition of Sample by Gender
No ratings yet
Data Analysis: Section I Composition of The Sample Table 4.1.1 Composition of Sample by Gender
14 pages
Mvchine Learning Project Report
No ratings yet
Mvchine Learning Project Report
33 pages
NEWEST in UNIT 3 Electoral System
No ratings yet
NEWEST in UNIT 3 Electoral System
11 pages
ML Assignment 2
No ratings yet
ML Assignment 2
7 pages
09 - Machine Learning
No ratings yet
09 - Machine Learning
7 pages
GOVP1 Participation & Voting Behaviour
No ratings yet
GOVP1 Participation & Voting Behaviour
7 pages
Year Level Q1A: Crosstab
No ratings yet
Year Level Q1A: Crosstab
12 pages
Detection of Autism Spectrum Disorder in Children Using
No ratings yet
Detection of Autism Spectrum Disorder in Children Using
17 pages
AI Notes Module 1
No ratings yet
AI Notes Module 1
14 pages
CS3492-DBMS Question Bank - Watermark
No ratings yet
CS3492-DBMS Question Bank - Watermark
23 pages
Greenberg Quinlan Rosner Research: The Change Election - What Voters Were Really Saying
No ratings yet
Greenberg Quinlan Rosner Research: The Change Election - What Voters Were Really Saying
59 pages
Machine Learning GL
No ratings yet
Machine Learning GL
25 pages
#3 Variables
No ratings yet
#3 Variables
6 pages
IT SBA Questions 2021
No ratings yet
IT SBA Questions 2021
4 pages
RCPA3 Fillable Chapter Exercises - CH 4
No ratings yet
RCPA3 Fillable Chapter Exercises - CH 4
8 pages
Introsoc Repaso
No ratings yet
Introsoc Repaso
7 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
4 pages
A Vote Equation and The 2004 Election
No ratings yet
A Vote Equation and The 2004 Election
10 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Apegs Salary Survey Summary Results 2021
No ratings yet
Apegs Salary Survey Summary Results 2021
4 pages
Project 2
No ratings yet
Project 2
2 pages
Voting Behaviour
No ratings yet
Voting Behaviour
2 pages
General Election Case Studies
No ratings yet
General Election Case Studies
1 page
Hypothesis Assignment
No ratings yet
Hypothesis Assignment
3 pages
Theories of Voting Behaviour
No ratings yet
Theories of Voting Behaviour
2 pages
Explainable AI (XAI) For Obesity Prediction: An Optimized MLP Approach With SHAP Interpretability On Lifestyle and Behavioral Data
No ratings yet
Explainable AI (XAI) For Obesity Prediction: An Optimized MLP Approach With SHAP Interpretability On Lifestyle and Behavioral Data
9 pages
Best For Britain - Tactical Voting in 2017 UK Parliamentary General Election
No ratings yet
Best For Britain - Tactical Voting in 2017 UK Parliamentary General Election
9 pages
Winning and Holding Elective Office
No ratings yet
Winning and Holding Elective Office
2 pages
Algorithmic Trading Using Intelligent Agents
No ratings yet
Algorithmic Trading Using Intelligent Agents
7 pages
Hyperparametric Tuning of XG and RFC
No ratings yet
Hyperparametric Tuning of XG and RFC
2 pages
Grade 10 SBA-2
No ratings yet
Grade 10 SBA-2
7 pages
Sba 2017 To 2018
No ratings yet
Sba 2017 To 2018
12 pages
Ambedkar, Gandhi and Patel: The Making of India's Electoral System
From Everand
Ambedkar, Gandhi and Patel: The Making of India's Electoral System
Raja Sekhar Vundru
5/5 (3)
Diverse, Disillusioned, and Divided: Millennial Values and Voter Engagement in the 2012 Election
From Everand
Diverse, Disillusioned, and Divided: Millennial Values and Voter Engagement in the 2012 Election
Robert P. Jones
No ratings yet

Machine Learning-2 Business Report

Uploaded by

Machine Learning-2 Business Report

Uploaded by

Machine Learning-2

Define the problem and perform Exploratory Data Analysis

 The minimum and maximum Assessment of current national economic

 The minimum and maximum Assessment of current household economic

 The mean of Assessment of current national economic conditions is 3.245221.

 The mean of Assessment of current household economic conditions is

 The mean of Assessment of the Labour leader is 3.335531.

 The mean of Assessment of the Conservative leader is 2.749506.

 The mean of an 11-point scale that measures respondents' attitudes toward

 The mean of Knowledge of parties' positions on European integration is

 The Frequency of Labour vote is more by 1057.

 The Frequency of female gender is more by 808

 The score of 1 and 2 have more votes in the conservative party.

 Labour party has higher votes overall.

 We can see that, mostly there is no correlation in the dataset through

 There are nearly no outliers in most of the numerical columns.

Reasons for Scaling the feature:

 The dataset contains features highly varying in magnitudes, units and

Validness of the model:

Naïve Bayes Model - Observation

Validness of the model:

• Accuracy: 100% • Precision: 100% • Recall: 100% • F1-Score: 100% • AUC:

Validness of the model:

Boosting Model - Observation

Validness of the model:

Model Performance improvement

Hence our model is valid.

Final Model Selection

(Test Data Set)

(Test Data Set)

(Test Data Set)

(Test Data Set)

• There is no under-fitting or over-fitting in any of the tuned models.

(Test Data Set)

(Test Data Set)

(Test Data Set)

(Test Data Set)

• On a scale of 0 to 3, about 30% of the total population has zero knowledge

• There is no over-fitting in any model except Random Forest and Bagging

• Gradient Boosting model tuned is the best/optimized model.

• Hyper-parameters tuning is an import aspect of modelbuilding. There are

Problem 2 - Define the problem and Perform Exploratory Data Analysis

 Problem Definition - Find the number of Character, words & sentences

• President John F. Kennedy's speech have 1364 of total words.

• President Richard Nixon's speech have 1769 of total words.

• President Franklin D. Roosevelt's speech have 7651 characters (including

• President John F. Kennedy's speech have 7673 characters (including

• President Richard Nixon's speech have 10106 characters (including spaces)

• There are 4.62 avg_word in President John F. Kennedy'sspeech.

• There are 4.71 avg_word in President Richard Nixon's speech.

• There are 632 stopwords in President Franklin D.Roosevelt's speech.

• There are 618 stopwords in President John F. Kennedy'sspeech.

• There are 899 stopwords in President Richard Nixon's speech.

• There are 14 numerics in President Franklin D. Roosevelt's speech.

• There are 10 numerics in President Richard Nixon's speech.

• There are 1 UpperCase words in President Franklin D. Roosevelt's

• There are 5 UpperCase words in President John F. Kennedy'sspeech.

• There are 13 UpperCase words in President Richard Nixon's speech.

• There are 119 UpperCase letters in President Franklin D. Roosevelt's

• There are 94 UpperCase letters in President John F. Kennedy'sspeech.

• There are 132 UpperCase letters in President Richard Nixon's speech.

Problem 2 - Text cleaning

 Stopword removal - Stemming - find the 3 most common words used in

• President Franklin D. Roosevelt's speech have 5144 characters (including

• President John F. Kennedy's speech have 5205 characters (including

• President Richard Nixon's speech have 6557 characters (including spaces).

• President Franklin D. Roosevelt's speech have 662 of total words.

• President John F. Kennedy's speech have 723 of total words.

• President Richard Nixon's speech have 843 of total words.

The most frequent words used in all three speeches are:

You might also like