100% found this document useful (7 votes)
1K views60 pages

Business Report Machine Learning-1

This business report discusses two machine learning projects for a news channel. The first project involves building a model to predict which political party voters will vote for using survey data with 9 variables. The best performing model was a Naive Bayes classifier, which provided insights such as Labour receiving more votes than Conservatives and factors influencing votes. The second project analyzes inaugural speeches from Roosevelt, Kennedy, and Nixon, finding the number of characters, words, and sentences in each, and generating word clouds with the most common words after removing stopwords.

Uploaded by

Yogesh Kulawade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (7 votes)
1K views60 pages

Business Report Machine Learning-1

This business report discusses two machine learning projects for a news channel. The first project involves building a model to predict which political party voters will vote for using survey data with 9 variables. The best performing model was a Naive Bayes classifier, which provided insights such as Labour receiving more votes than Conservatives and factors influencing votes. The second project analyzes inaugural speeches from Roosevelt, Kennedy, and Nixon, finding the number of characters, words, and sentences in each, and generating word clouds with the most common words after removing stopwords.

Uploaded by

Yogesh Kulawade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Business Report

PROJECT: MACHINE LEARNING


PROBLEM 1-Problem 1:
You are hired by one of the leading news channels CNBE who wants to analyze
recent elections. This survey was conducted on 1525 voters with 9 variables. You
have to build a model, to predict which party a voter will vote for on the basis of the
given information, to create an exit poll that will help in predicting overall win and
seats covered by a particular party.
1.1 Read the dataset. Do the descriptive statistics and do the null value condition
check.
1.2 Perform Univariate and Bivariate Analysis. Do exploratory data analysis. Check
for Outliers.
UNIVARIATE ANALYSIS
Scaling is necessary for those algorithms which are distance based or weight
based models. As we are going to use models such as KNN and boosting ,
scaling will be needed for those models as they are distance based. But for our
first iteration we will not going to perform scaling and will compare the results
with second iteration of model. One more thing to note is that , scales of all the
features are in similar range so we can skip the scaling in this problem. But in
general its a good practice to scale the model.
1.8) Based on these predictions, what are the insights?

1)Comparing all the performance measure, Naïve Bayes model from second iteration is
performing best. Although there are some other models such as SVM and Extreme Boosting
which is performing almost same as that of Naïve Bayes. But Naïve Bayes model is very
consistent when train and test results are compared with each other. Along with other
parameters such as Recall value, AUC_SCORE and AUC_ROC_Curve, those results were
pretty good is this model.

2)Labour party is performing better than Conservative from huge margin.

3)Female voters turn out is greater than the male voters.

4)Those who have better national economic conditions are preferring to vote for Labour party.

5)Persons having higher Eurosceptic sentiments conservative party are preferring to vote for
Conservative party.

6)Those who have higher political knowledge have voted for Conservative party

7)Looking at the assessment for both the leaders, Labour Leader is performing well as he has
got better ratings in assessment.
Problem 2-
In this particular project, we are going to work on the inaugural corpora from the nltk
in Python. We will be looking at the following speeches of the Presidents of the
United States of America:
1. President Franklin D. Roosevelt in 1941
2. President John F. Kennedy in 1961
3. President Richard Nixon in 1973

2.1 Find the number of characters, words, and sentences for the mentioned
documents
No. of characters (with space) in Roosevelt Data = 7571

No. of characters (without space) in Roosevelt Data = 6174

No. of words (without space) in Roosevelt Data = 1360

No. of Sentence in Roosevelt Data = 68

No. of characters (with space) in Kennedy Data = 7618

No. of characters (without space) in Kennedy Data = 6202

No. of words (without space) in Kennedy Data = 1390

No. of Sentence in Kennedy Data = 55

No. of characters (with space) in Nixon Data = 9991

No. of characters (without space) in Nixon Data = 8122

No. of words (without space) in Nixon Data = 2028

No. of Sentence in Nixon Data = 69

The word cloud of each of the speeches of the


variable. (after removing the stopwords)
ROOSEVELT WORDCLOUD-
KENNEDY WORDCLOUD-
NIXON WORDCLOUD-

You might also like