0% found this document useful (0 votes)
55 views

Data Mining: Slide-1: Title Slide

This document summarizes a presentation on predicting whether MBA students will be placed after their studies and which factors influence placement outcomes the most. It provides an overview of the dataset used, which contains information on 215 students' demographics, academic performance, work experience, and placement outcomes. Exploratory data analysis is conducted to examine the relationships between these factors and placements. Key findings include that female students performed better academically but were placed less often than male students, certain degree specializations and work experience increased placement chances, and academic performance correlated with placement but did not guarantee it. Feature engineering steps are also briefly outlined.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Data Mining: Slide-1: Title Slide

This document summarizes a presentation on predicting whether MBA students will be placed after their studies and which factors influence placement outcomes the most. It provides an overview of the dataset used, which contains information on 215 students' demographics, academic performance, work experience, and placement outcomes. Exploratory data analysis is conducted to examine the relationships between these factors and placements. Key findings include that female students performed better academically but were placed less often than male students, certain degree specializations and work experience increased placement chances, and academic performance correlated with placement but did not guarantee it. Feature engineering steps are also briefly outlined.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Mining

Slide- 1: Title Slide


Job Offer Prediction for MBA Students
(Will you get placed or not?)

Intro:
We all have paid 11 lakhs, and al. this for placements .... ??

Problem Statement:
To predict if a candidate was placed in a role after their MBA studies and if so, then which factors
helped the most (i.e., work experience, degree, school results, gender, etc)

Slide- 2:Dataset Overview


Dataset Name:​ Campus Recruitment ​(Academic and Employability Factors influencing placement)

Link:​ ​https://www.kaggle.com/benroshan/factors-affecting-campus-placement

We selected the Campus Recruitment dataset from Kaggle which was made available by Ben Roshan

Snapshot of our dataset: df.head()

Shape: 215 * 15
This is a snapshot of the dataset, It contains 15 variables and a total of 215 observations with the
following details

attribute type description

sl_no factor Serial Number

gender factor Gender: Male=‘M’, Female=‘F’

Secondary Education Percentage (grades


ssc_p numeric
9 and 10) - exam at the end of 10th grade

ssc_b factor Board of Education - Central/Others


Higher Secondary Education (grades 11
hsc_p numeric
and 12) - exam at the end of 12th grade

hsc_b factor Board of Education - Central/Others

Specialization in Higher Secondary


hsc_s factor
Education

degree_p numeric Degree Percentage

Undergraduate (Degree type) - Field of


degree_t factor
degree education

workex factor Work Experience - Yes/No

Employability test Percentage (conducted


etest_p numeric
by college)

specialisati
factor Post Graduation (MBA) - Specialization
on

mba_p numeric MBA percentage

status factor Status of placement - Placed/Not placed

salary numeric Salary offered by corporate to candidates

We have 7 columns with real values (Numeric) and 8 with object data type (Categorical)

So, before moving ahead we checked for Null and NA values in the dataset.
Slide- 3: Dataset Overview
df.info()

Check for ​NULLs​ :

Code

Number of
NULLs

Check for ​NA​s:

Code

Number of
NAs

67

Code

sl_ gend ssc ssc hsc hsc hsc degree degre wor etest specialis mba stat sala
no er _p _b _p _b _s _p e_t kex _p ation _p us ry

0 0 0 0 0 0 0 0 0 0 0 0 0 0 67
So, we can see that there are no NULL values but there are 67 NA values and all are from the salary
field. Now, we need to check why we have these 67 NAs in the salary category. Is this missing data or
another reason behind it?

Code

Status n

Not
67
Placed

Placed 148

It looks like 67 NAs in the salary column are due to the fact that 67 students did not get a placement.
This makes sense and therefore, no further investigation is needed.

Explanation:
So, from the dataset overview we can say that​, except for hsc_s and degree_t with 3 classes, all others
have 2 classes each and also we can see that this data is slightly imbalanced as we have 148 placed
students and 67 not placed students. This means that around 31% candidates were not placed which is
sad but let's see what were the reasons :) by performing the EDA which would be explained by Noor
and Parikshit

Exploratory Data Analysis


EDA ....
First, let’s check whether gender affects the candidate placements or not

Slide- 4: Does gender affect placements?


But, before that let’s check if there are any gender- specific differences in the performance scores?

1. University Scores:

2. MBA:

Interpretation:
From, the above two graphs we can see that female students scored significantly higher
percentages than men at university and MBA level and there is no significant difference in
performance during secondary, higher secondary levels and employability test (We haven’t
added those graphs)

3. Gender Vs Placement Stats:


4. Gender Vs Salary Box- Whisker plot:

Interpretation:
1. And, from the below two graphs we can see that the dataset contains a sample of 139
male students and 76 female students which means the number of male students is
almost double as compared to female students
2. And, more outliers in the male box plot tells that they are getting high CTC jobs
3. And, are offered slightly greater salaries than female students on an average

Slide- 5: Does the board of education affect placements?


Now, let’s move on to check whether different boards make a significant difference in placement offer
or not?
Interpretation:
1. The count of central board students is very high as compared to all other boards in ssc_b but
its reverse in hsc_b
2. And there is no significant difference in the number of people that received an offer from
either board at the secondary or higher secondary level

Slide- 6: Which degree and MBA specialization has the highest Salary?
The next variable is specialisation. Now, let’s check the impact of specialisation on the chances of
receiving a better score or place for an offer?
Interpretation:
1. Looks like Commerce and Science degree students are preferred by companies which is
obvious
2. Students who opted for Others have very low placement chance
3. Specialisation is a clear indicator in placements. Significantly more Marketing and Finance
students received an offer when compared to those specialised in Marketing and HR. This
might be because there is a low requirement for HR in a company
4. The last graph shows that Mkt & Fin students get highly paid jobs and also that Commerce &
Mgmt students occasionally get dream placements with high salary

So, now let’s check if academic scores influence the chances of placements or not

Slide- 7: Does your academic score influence your chances of placement?


1. Correlation Plot:

In this correlation plot, the darker the colour is the higher the correlation.

Interpretation:
1. And here, as we can observe there are medium correlations between the academic
scores which suggests that the students who performed well in secondary school also
performed well in their further education (i.e., higher secondary, university and
MBA)
2. Also, we can notice that employability test scores have a low correlation with
academic scores therefore we can say that these tests were more practical than
theoretical

Next let’s check how the scores for each level of education are distributed

2. What does the distribution of the scores look like for each level of education?
a. Average academic scores Vs Placement Status (​How many students were placed?​):
b. Secondary:

c. Higher Secondary:

d. University:
e. MBA:

f. Employability:

Interpretation:
We can see that,
1. From the 1st graph, we can see that, most of the candidates educational
performances are between 60 - 80%
2. The distribution is more concentrated around the median range (62 - 66%) as
the students progressed in their education, from secondary (wide distribution)
to MBA (which is a narrow distribution)
3. The employability test has a different trend, with a very wide and almost
equal distribution of each bucket

3. Box- Whiskers plot:


Interpretation:
And, these box plots tell us that,
1. Good percentages in MBA does not guarantee placements of the candidate
2. And there's a comparatively slight difference between the percentage scores of both
the groups, But still the placed candidates have an upper hand

Slide- 8: Did previous work experience matter?


After academic scores, now let’s check whether work experience helps in getting job offers?

Interpretation:
Significantly more students with work experience received offers than those without any work
experience. Work Experience is a clear indicator as more work experience results in higher CTC jobs.

Slide- 9: Salary
And the last variable is salary

Interpretation:
1. Looking at the distribution we can say that the most of the students get a package between
200k - 400k and most salaries above 400,000 are outliers.
2. Male candidates are making more money as compared to female candidates

This was all about EDA.

Next is Feature Engg. In Feature engg. We,

Slide- 10: Feature Engineering


1. Create Dummy Variables: Dummy variable is a categorical variable that has been transformed
into numeric. For example the column Gender, we have "male" and "female" we will
transform these variables into numeric. Creating a new column just for gender_id, where male
category is coded as 0 and female as 1. In this manner all 5 variables (Gender, ssc_b, hsc_b,
degree_t, specialization, placement status) are coded.

2. Create a Correlation Matrix: Correlation is a statistical technique that can show whether and
how strongly pairs of variables are related

3. Feature selection: From the correlation matrix we can now select the features for our model
that are highly correlated with the placement status variable. Ssc_p, hsc_p, degree_p, workex,
and specialisation are the 7 significant features that will help our model identify patterns.

Now that we have our variables decided, let’s move on to perform logistic regression as explained by
Ranjani

Slide- 11: Logistic Regression


SPSS Outputs (chi sq, model*, variables, correlation)
4. "When feature engineering is done, we usually tend to decrease the dimensionality by
selecting the "right" number of features that capture the essential." (I/p and O/P)

Slide 12 (Confusion matrix):


Confusion Matrix + Formulae
It is important to define what “performance” means when it comes to choosing a model (one type of
prediction error is costlier than the other). For example, incorrectly predicting that someone would be
placed(false positive) is not as bad as incorrectly predicting that someone would not be placed(false
negative). The cost of the former is the time spent interviewing, while the cost of the latter is losing
out on a job that the student would’ve secured

Slide 13 (Conclusion):
Here are a few things to keep in mind:

Specialisations Matter. Choose the right one.


Go for Internship. Work Experience helps.
Don't worry about grades for salary (although you need them to get placed).

1. overall, the grades became more concentrated as the students progressed in their education; it
could be that it is harder for students to differentiate on grades alone and that they will focus
on other achievements (work experience, voluntary roles)
2. successfuly placed students performed significantly better than their counterparts during
secondary, highschool and university, but not at the MBA level

Slide 14 (Thank you)

You might also like