0% found this document useful (0 votes)
6 views23 pages

Fake Job Prediction

The document presents a project titled 'Fake Job Prediction' aimed at identifying fraudulent job postings using machine learning and natural language processing techniques. It outlines the increasing prevalence of employment scams, the development of a predictive model to distinguish between legitimate and fake job postings, and the algorithms used for analysis. The project emphasizes the importance of protecting job seekers and improving job market security while suggesting future enhancements and collaborations.

Uploaded by

dummyboy353
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views23 pages

Fake Job Prediction

The document presents a project titled 'Fake Job Prediction' aimed at identifying fraudulent job postings using machine learning and natural language processing techniques. It outlines the increasing prevalence of employment scams, the development of a predictive model to distinguish between legitimate and fake job postings, and the algorithms used for analysis. The project emphasizes the importance of protecting job seekers and improving job market security while suggesting future enhancements and collaborations.

Uploaded by

dummyboy353
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

FAKE JOB

PREDICTIO
N
GROUP
NUMBER
:
32

1
BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE & TECHNOLOGY

Submitted By
NAME ENROLLMENT NO. REGISTRATION NO.
ADITYA GHOSH 12020009022288 304202000901063
ANUBHAV SENAPATI 12020009022257 304202000901008
ANNWESHA MAHANTA 12020009022172 304202000900639
RAHUL DAS 12020009022215 304202000900682
SNEHA SARKAR 12020009027009 304202000900828
JOYEE SAHA 12020009022168 304202000900635

Under the guidance of


(Prof.) Dr. Sudipta Basu Pal & (Prof.) Dr. Piyali Chandra
Department of COMPUTER SCIENCE AND TECHNOLOGY

2
TABLE OF
1 ABSTRACT
CONTENTS
2 INTRODUCTION PROBLEM
3 STATEMENT

4 SOLUTION 5 ALGORITHM AND


6 ANALYSIS
FLOW CHART
9 CONCLUSION
7 RESULT AND
8 FUTURE WORK
OUTPUT
1
REFERENCES
0
3
ABSTRACT
Employment scams are on the rise.
According to CNBC, the number of
employment scams doubled in 2018 as
compared to 2017. The current market
situation has led to high unemployment. This
project “Fake Job Prediction” mainly based
upon a guided model which predicts the
correct job whether it is genuine or not. Based
upon the opportunities providing or check the
identity we can use ML to check that from
where the job is been originated. Keeping the
current status of job and unemployment it can
it is necessary to identify the identity of the
job.
INTRODUCTION
The prevalence of employment fraud is increasing due to the
current economic situation and the impact of corona virus,
which has led to high unemployment rates. This situation
creates an opportunity for scammers to take advantage of
vulnerable individuals. Many people are falling prey to these
scammers. The primary goal of these scammers is to extract
personal information, such as bank account details and
addresses, from their victims. Scammers often lure people
with lucrative work opportunities that seem profitable, only
to request payment later on. This poses a significant danger,
but it can be mitigated through the use of Machine Learning
techniques and Natural Language Processing (NLP) which can
differentiate between legitimate and fake job postings.
PROBLEM STATEMENT

With the rise of online job


portals and the increasing
number of remote job
opportunities, there has been a
corresponding increase in the
number of fake job postings that
are designed to scam job
seekers. These fake job postings
can be used to collect personal
information, steal money, or
carry out other fraudulent
activities.
SOLUTION
The goal of the fake job prediction problem is to
develop a model that can automatically distinguish
between legitimate job postings and fake job
postings. This requires analyzing a variety of
features, including the job description, company
information, and contact details. The model must be
able to accurately identify patterns and indicators of
fraud, such as unrealistic job requirements, vague
or misleading descriptions, or requests for personal
information.
The successful development of a fake job prediction
model has important implications for both job
seekers and job portals. It can help to protect job
seekers from falling victim to scams, and can also
help job portals to maintain the quality and
legitimacy of their job listings.
ALGORITHM

1 2 3
Natural Language Naïve Bayes SGD Classifier
Processing Algorithm

Naïve Bayes and SGD Classifier are compared on accuracy and F1-scores and
a final model is chosen. These models are used on both the text and numeric
data separately and the final results are combined.
WHY THIS ALGORITHM?

Naïve Bayes SGD Classifier


Algorithm
A comparative model,
Naïve Bayes is the SGD Classifier is used
baseline model, and it is since it implements a
used because it can plain stochastic gradient
compute the conditional descent learning routine
probabilities of occurrence which supports different
of two events based on loss functions and
the probabilities of penalties for
occurrence of each classification. This
individual event, encoding classifier will need high
those probabilities is penalties when classified
extremely useful. incorrectly.
FLOW CHART

The following steps are taken for text processing:

Stop
Lemm
Tokeni To word
atizati
zation Lower remov
on
al

Tokenization: The textual data is split into smaller units.


In this case the data is split into words.
To Lower: The split words are converted to lowercase
Stop word removal: Stop words are words that do not
add much meaning to sentences. For example: the, a,
an, he, have etc. These words are removed.
Lemmatization: The process of lemmatization groups in
which inflected forms of words are used together.
ANALYSIS
DATA EXPLORATION
The data for this project is available at Kaggle -
https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction. The dataset consists of
17,880 observations and 18 features.
After initial assessment of the dataset, it could be seen
that since these job postings have been extracted from
several countries the postings were in different
languages. To simplify the process this project uses
data from US based locations that account for nearly
60% of the dataset. This was done to ensure all the
data is in English for easy interpretability. Also, the
location is split into state and city for further analysis.
The final dataset has 10593 observations and 20
features.
The dataset is highly unbalanced with 9868 (93% of the
jobs) being real and only 725 or 7% of the jobs being
fraudulent. A count plot of the same can show the
disparity very clearly.
ANALYSIS CONTD.
EXPLORATORY ANALYSIS

The first step to visualize the


dataset in this project is to
create a correlation matrix to
study the relationship
between the numeric data.
ANALYSIS CONTD.
EXPLORATORY ANALYSIS

After the numeric features the


textual features of this dataset is
explored. We start this
exploration from location.

The graph aside shows which states


produces the greatest number of
jobs. California, New York and
Texas have the highest number of
job postings.
ANALYSIS CONTD.
EXPLORATORY ANALYSIS

The following formula is used to


compute how many fake jobs
are available for every real job:

Only ratio values


greater than or equal
to one are plotted
aside.
ANALYSIS CONTD.
EXPLORATORY ANALYSIS

A histogram describing a
character count is explored to
visualize the difference
between real and fake jobs.
What can be seen is that even
though the character count is
fairly similar for both real and
fake jobs, real jobs have a
higher frequency.
RESULT AND OUTPUT
The final model used for this analysis is – SGD. This is based on the results of
the metrics as compared to the baseline model. The outcome of the baseline
model and SGD are presented in the table below:

MODEL ACCURACY F1-SCORE

Naïve Bayes Algorithm 0.971 0.743

SGD Classifier 0.974 0.79

Based on these metrics, SGD has a slightly better


performance than the baseline model. This is how the
final model is chosen to be SGD.
FUTURE WORK
The future scope for a project on identifying and preventing fake
job postings using machine learning can be vast, depending on
how the project is designed and implemented
The project can be integrated with job portals to automatically scan
all job postings for any signs of fraudulence. Machine learning
algorithms can be refined over time by continuously training them
on new data, improving their accuracy in identifying fake job
postings.The project can collaborate with law enforcement
agencies to identify and prosecute individuals or organizations
involved in posting fake job advertisements. The project can be
customized to different regions, languages, and cultures to improve
its effectiveness in identifying fake job postings specific to those
regions.
CONCLUSION

In conclusion, a project focused on identifying and


preventing fake job postings using machine
learning can be an effective solution for improving
job market security and protecting job seekers
from fraudulent activities. Additionally, natural
language processing techniques can be used to
analyze candidate resumes and identify any
inconsistencies between their skills and
qualifications and the job requirements stated in
the job posting. However, it is important to note
that such a project should be developed and
implemented with caution and trained on a
diverse range of data to avoid biases and ensure
fairness.
REFERENCE
REFERENCE
REFERENCE
NK
HA
T U
YO
GROUP NUMBER :

32

You might also like