Aug Batch Project Details

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Title of the Project: Ensemble ML Modelling to classify Real/Fake jobs

Below are the details of the project for the August Machine Learning Batch.

Students need to work on Ensemble Learning (modeling) for a given problem.

Problem Statement: For a given input features which is the most common answer (as per
majority). Also, suggest which algorithm gives maximum accuracy for the dataset worked on.

Note: In the dataset attached "Fraudulent" is the target feature. Description of the dataset can
be found here https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction

Ask any 4 questions on the dataset of your choice and provide answers for the same. For
instance, for the given dataset questions can be as follows.

Q1) What are the most common title used in jobs in the US?
Q2) Which department has the most number of fake jobs?
Q3) Which department or function has high-paying jobs in the UK?
Q4) What are the top 3 most commonly used words in Company Profile? (Excluding stopwords)

Take up three classification algorithms of your own choice and build three respective Machine
learning models. Compare the Accuracy of all three and suggest which ML algorithms suit best
for the given problem.

NOTE: For the given dataset "Fraudulent" will be your dependent variable.

Evaluation will be done on the following points:

1) Exploratory data analysis and Data Cleaning if required


2) At least 3 visualizations of data using Matplotlib or any other visualization library
3) Questions asked on dataset and answers for the same with a brief explanation
4) Feature Selection and feature Engineering if required depending on the dataset
5) Ensemble Machine learning Modelling (3 Classification Algorithms or 5 would do too)
6) Metrics calculation (along with justification about why a particular metrics was used)
7) Summarised write up at the end

OPTIONAL REQUIREMENT: It will be appreciated if any one algorithm is built from scratch
instead of using a library.

Please explain all your steps with clear details and comments. Do mention which are your
Independent and dependent variables on the dataset

Prepare a PDF/Word Document at the end with a Summary of this project and submit it.
Mail subject: Capstone Project August Machine Learning

You might also like