0% found this document useful (0 votes)
99 views25 pages

Fake Job Entry Detectionnn

The project focuses on detecting fake job postings using machine learning techniques to protect job seekers from online scams. It proposes an automated tool that compares various classifiers, revealing that ensemble classifiers, particularly Random Forest, significantly outperform single classifiers in accuracy. The study aims to enhance the reliability of job listings on online portals through effective detection methods.

Uploaded by

anirudh.v4444
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views25 pages

Fake Job Entry Detectionnn

The project focuses on detecting fake job postings using machine learning techniques to protect job seekers from online scams. It proposes an automated tool that compares various classifiers, revealing that ensemble classifiers, particularly Random Forest, significantly outperform single classifiers in accuracy. The study aims to enhance the reliability of job listings on online portals through effective detection methods.

Uploaded by

anirudh.v4444
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PROJECT TITLE

FAKE JOB POST DETECTION USING


MACHINE LEARNING
Presented by Under the guidance of
 ABSTRACT  ARCHITECTURE

 INTRODUCTION  MODULES
CONTENTS
 LITERATURE SURVEY  UML DIAGRAMS

 OBJECTIVES  RESULTS

 PROBLEM STATEMENT  CONCLUSION

 EXISTING SYSTEM  FEATURE ENHANCEMENT

 DRAWBACKS
 REFERENCES
 PROPOSED SYSTEM
ABSTRACT:
This project tackles the pervasive issue of fraudulent job postings on the internet.To avoid
fake job post,an automated tool using machine learning based classification techniques is
proposed in the project.Different classifiers are used for checking fraudulent post in the
web and the results of those classifiers are compared for identifying the best employment
scam detection model from vast number of listings.The study compares the performance
of single classifiers in identifying fraudulent job posts.Experimental results reveal that
ensemble classifiers significantly outperform single classifiers in scam detection
accuracy.This research contributes to the development of an effective employment scam
detection model,safeguarding job seekers from online deception.
INTRODUCTION :

The rise of online job portals has transformed


the job search landscape, but it has also led
to an increase in fraudulent job postings.
These scams can result in financial losses,
identity theft, and emotional distress for
unsuspecting job seekers. With millions of job
postings online, manually identifying genuine
opportunities has become a daunting task.
This project aims to address this issue by
leveraging machine learning and natural
language processing techniques to detect
fake job posts.
LITERATURE SURVEY :
TITLE AUTHOR Pros Cons DATASET METRICES
A Survey of Machine Learning Shubham Bind1 Used Realtime NIR Less Accuracy NIR Dataset 75
Images
Based Approaches for blood
Disease Prediction 2021
Classifcation of blood’s disease John MichaelTempleton active NIR patches The training NIR2D 65
and its stages using machine are extracted database consist
learning less image
2022

Analysis and Prediction of Sohom Sen Used 30 NIR Images The system was BRAINNIR 84
trained with Less
blood's Disease using Machine
type of Disease.
Learning Algorithms
2022

Detection of blood Disease G.priyadarshini, High Accuracy lowest accuracy NIRDATA 90


with 40%.
Using Machine Learning T.Gowtham
2020

Predicting Severity Of blood's Srishti Grover Dataset belonging to Accuracy Low database with 65
Multiple category Disease by Name
Disease Using Deep Learning
2020
OBJECTIVES :
The primary objective of this project is to design and develop an automated tool using
machine learning based classification techniques to detect fraudulent job postings on
the internet.The specific objectives are:
 To develop an accurate machine learning based classification model that can
detect fraudulent job postings.
 To compare the performance of single classifiers(Naive Bayes,Support Vector
Machine,Logistic Rgression) and ensemble classifiers in detecting fraudulent job
postings.
 To design and implement a system that can prepocess job posting data,classify
fraudulent postings and provide insights for job seekers.
 To evaluate the effevctiveness of the proposed system in reducing the prevalence
of fraudulent job postings.
PROBLEM STATEMENT :

 The increasing volume of online job postings has made manual detection of fake posts
impractical.Automated solutions are needed to avoid :
 Financial losses
 Emotional distress
 Wasted time and resources
Project
 Erosion of trust in online job portals Prject
 An effective solution is required to combat this growing problem.
 This project aims to develop a machine learning based solutions to address this
challenge.

Project Project
EXISTING SYSTEM :

 The existing system are predicted using classifiers that have been learned

 When detecting fraudulent jost postings, the following classifiers are used -

A. Naive Bayes

B. Support Vector Machine

C. Logistic Regression
DRAWBACKS :
 Less features considered while training the data which results in low
accuracy.
 Support Vector Machine model having less processing accuracy compare
to the proposed system.
 Also it’s a complex and time consuming process.
E
PROPOSED SYSTEM :
 This main purpose is to identify whether a job posting is genuine or not.
 Job seekers will be able to focus entirely on legitimate job openings if fake
job postings are identified and deleted.
 In this system, we plan to use a kaggle dataset that contains information on
the job, including attributes such as job id, title, location and department.
 Then there’s data prepocessing, which involves removing things like trivial
spaces, null entries, stopwords, and so on.
 The data is provided to the classifier for predictions after it has been
prepocessed and cleaned to make it prediction ready.
 The three-part method is a combination between Machine Learning
algorithms that subdivide into supervised learning techniques, and natural
language processing methods.
 Although each of these approaches can be solely used to classify and detect
fake Job, in order to increase the accuracy and be applicable to the social
media domain, they have been combined into an integrated algorithm as a
method for fake news detection
ARCHITECTURE :

Data
Data Collection Preprocess Feature
ing Extractio
n

Accuracy

Precision
Data Data Split
Recall Classifi
Predicti Training - 80%
er Testing - 20%
F- on
Measures
MODULES :
 Dataset : This kaggle dataset contains 17,880 job posting data entries.
 DataCollection : We must first preprocess this data in order to prepare it for prediction
before fitting it into any of the machine learning models or classifiers.Some pre-processing
techniques are used on this dataset before it is fitted to any classifier.
 Data Preparation :Before running the algorithm,the dataset is preprocessed to check for
missing values, noisy data, and other irregularities.
 Feature Extraction : By selecting and merging variables into features ,feature extraction
aids in extracting the best feature from those large data sets, effectively lowering the amount
of data.
 Implementation of Classifier : In the Random forest model, a subset of data points and a
subset of features is selected for constructing each decision tree. Simply put, n random
records and m features are taken from the data set having k number of records.
Safety Production

• Individual decision trees are constructed for each sample.


Responsibility

• Each decision tree will generate an output.


• Final output is considered based on Majority Voting or Averaging for Classification and
regression, respectively..
UML DIAGRAMS :
 USE CASE DIAGRAM : The main purpose of a use case diagram is to
show what system functions are performed for which actor. Roles of the
actors in the system can be depicted.
 CLASS DIAGRAM : A class diagram in the Unified Modeling
Language (UML) is a type of static structure diagram that describes the
structure of a system by showing the system's classes, their attributes,
operations (or methods), and the relationships among the classes. It
explains which class contains information.
 SEQUENCE DIAGRAM : A sequence diagram in Unified Modeling
Language (UML) is a kind of interaction diagram that shows how
processes operate with one another and in what order.
 ACTIVITY DIAGRAM : Activity diagrams are graphical
representations of workflows of stepwise activities and actions with
support for choice, iteration and concurrency.An activity diagram
shows the overall flow of control.
RESULTS :
CONCLUSION :
In conclusion, the "Fake Job Post Detection
Using Machine Learning" project marks the
growing issue of deceptive job postings in the
digital landscape. The research successfully
demonstrates the effectiveness of machine
learning algorithms in discerning genuine from
fraudulent advertisements, providing a basis
for a more secure and reliable job market.
Supervised mechanism is used to exemplify
the use of several classifiers for employment
scam detection. Experimental results indicate
that Random Forest classifier outperforms
over its peer classification tool. The proposed
approach achieved accuracy 98.37% which is
much higher than the existing methods
FEATURE ENHANCEMENT :

 Data Collection Enhancements  Machine Learning Enhancements


This kaggle dataset contains 17,880 job posting
Random forest is it can handle missing
data entries.We must first preprocess this data in
values. It is one of the best techniques
order to prepare it for prediction before fitting it
with high performance, widely
into any of the machine learning models or
used in various industries for its efficiency. It
classifiers. Some pre-processing techniques
can handle binary, continuous, and
are used on this dataset before it is fitted to any
categorical data. Overall, random forest is a
classifier.
 Feature Extraction Enhancements fast, simple, flexible, and robust model.
Feature extraction is a step in the dimensionality  Fake job post detection
reduction process, which divides and Enhancements
reduces a large set of raw data into smaller Train-Test split: In a random forest, we
groupings. he fact that these enormous data don’t have to segregate the data for train
sets have a large number of variables is the most and test as there will always be 30% of
crucial feature. To process these variables, a the data which is not seen by the decision
large amount of computational power is required. tree.
So, by selecting and merging variables into Stability: Stability arises because the
features, feature extraction aids in extracting result is based on majority voting/
the best feature from those large data sets, averaging.
REFERENCE :

 “Detecting fake job posts using machine learning” by S.K.Goyal et al.(2020)-This paper
proposes a machine learning based approach to detect fake job posts using features from job
description.
 “Fake job post detection using Text classification” by A.K.Singh et al.(2019)-This paper
presents a text classification based approach to detect fake job posts using machine learning
algorithms.
 J. Howington, Survey: More millennials than seniors victims of job scams, 2015, [online] Available:
https://www.flexjobs.com/blog/post/survey-results-millennialsseniors-victims-job-scams
 Shivam Bansal (2020, February). [Real or Fake] Fake job Posting Prediction,Version 1.Retrieved
March 29,2020 from https://www.kaggle.com/shivamb/real-or-fake-fakejobposting-prediction
 N. Hussain, H. T. Mirza, G. Rasool, I. Hussain, and M. Kaleem, ―Spam review detection
techniques: A systematic literature review.
THANK YOU

You might also like