Aug Batch Project Details
Aug Batch Project Details
Aug Batch Project Details
Below are the details of the project for the August Machine Learning Batch.
Problem Statement: For a given input features which is the most common answer (as per
majority). Also, suggest which algorithm gives maximum accuracy for the dataset worked on.
Note: In the dataset attached "Fraudulent" is the target feature. Description of the dataset can
be found here https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction
Ask any 4 questions on the dataset of your choice and provide answers for the same. For
instance, for the given dataset questions can be as follows.
Q1) What are the most common title used in jobs in the US?
Q2) Which department has the most number of fake jobs?
Q3) Which department or function has high-paying jobs in the UK?
Q4) What are the top 3 most commonly used words in Company Profile? (Excluding stopwords)
Take up three classification algorithms of your own choice and build three respective Machine
learning models. Compare the Accuracy of all three and suggest which ML algorithms suit best
for the given problem.
NOTE: For the given dataset "Fraudulent" will be your dependent variable.
OPTIONAL REQUIREMENT: It will be appreciated if any one algorithm is built from scratch
instead of using a library.
Please explain all your steps with clear details and comments. Do mention which are your
Independent and dependent variables on the dataset
Prepare a PDF/Word Document at the end with a Summary of this project and submit it.
Mail subject: Capstone Project August Machine Learning