Sample IEEE Article Ready Format
Sample IEEE Article Ready Format
Abstract—There are a lot of job advertisements on the internet 2. Identify key features (words, entities, phrases) of job
even on reputed job advertising sites which never seem fake but descriptions that are fraudulent in nature.
after selection, the so-called recruiters start asking for money
and bank details. Many of the candidates fall into their trap and 3. Run a mock-up from closely related job descriptions.
lose a lot of money and the current job. This is why the 4. Perform an exploratory data analysis of the data set to find
proponents developed a project using machine learning interesting insights from this data set.
algorithms (SVM & Random Forest) utilizing a fake job posting
data set from Kaggle to identify whether a job advertisement III. METHODOLOGY
posted on a site is real or fake. The accuracy rate of SVM (96%)
and the Random Forest Classifier (97%) have a 1% difference The dataset used in the project was originally from
yet both of the models are capable on predicting fake job Kaggle (2014). The dataset contains 17,014 real jobs and 866
postings with the given dataset. fake jobs. A variety of measures were added to the data,
including synonymous adjectives and subsampling, to
Keywords—machine learning, fake job posting, data set, SVM, address class imbalances in the data set.
random forest
This project follows four phases namely:
I. INTRODUCTION
1. Data Collection
Fraudulent job postings are everywhere and exist for
certain reasons such as to evaluate the current talent pool, • The CSV file will be imported to the data frame
which allows the data from the dataset to be
reinstate plagiarism, or scam those who are currently hunting
collected/read by the system.
for a job.
• Module Installation
According to CNBC, the number of fraud cases doubled
2. Data Handling
in 2018 compared to 2017. The current market situation has
led to high unemployment and has now increased due to the • The data will then cleanse through the process of
pandemic. Economic stress and the effects of the coronavirus identifying and correcting damaged or inaccurate
have significantly reduced job availability and job loss for records in the tabularized data set. Data cleaning
many people. refers to identifying incomplete, incorrect/
undefined (NaN), inaccurate, or irrelevant pieces of
Such a case gives scammers an opportunity. Many people
data and then replacing, changing, or deleting that
fall victim to these scammers taking advantage of the
data.
desperation caused by an unprecedented incident. Most
scammers do this in order to obtain personal information • Data Visualization & Pre-processing were done to
the data set with graphs and tables.
from the target person. Personal information can include
address, bank account details, social security number, and so 3. Modeling
on. Scammers offer users a very lucrative job opportunity and
then charge them for it. Some may even require an investment • Support Vector Machine (SVM) is a supervised
from the job seeker with the promise of a job. This is a machine learning algorithm that can be used for both
problem that machine learning and natural language classification and regression problems where its
processing (NLP) techniques can help address. main objective is to find a hyperplane in N-
dimensional space (N — the number of features)
II. OBJECTIVES that distinctly classifies the data points.
• Random forest is a supervised machine learning
This project creates a classifier that identifies real job
algorithm that contains N- Decision Trees (DT)
postings from fake ones. Specifically, this project aims to: having a different set of hyper-parameters and trains
1. Build a classification model using textual data on different subsets of data to create a reliable
characteristics and meta characteristics to predict which job dataset and improve the quality of data.
description is fraudulent or real. 4. Evaluation
• The final model uses all of the relevant posting data
and provides an end result that determines whether
the job posting is real or not.
IV. IMPLEMENTATION
Libraries along with the fake job posting dataset (the CSV
file from Kaggle) were then imported as shown in figures 2
& 3 below.
Figure 2. Import Library
The proponents compared the results of the SVM The proponents would like to acknowledge the efforts
and Random Forest Classifier Models as shown in the figures done by our instructor, Engr. Jodie Rey Fernandez, for this
below. Random Forest Classifier yields a higher accuracy course in educating us with his knowledge and expertise in
rate with 97% than support vector machine which only the field of Artificial Intelligence.
garnered 96%. The models have a 1% difference in accuracy
rate yet both are efficient in predicting fake job postings using The proponents would also like to acknowledge the
the specified dataset. University of the Aegean, Laboratory of Information &
Communication Systems Security for creating the fake job
posting data set and the previous works of the people from
Figure 21. SVM & Random Forest Implementation
GitHub.
REFERENCES
[1] Bureau of Labor Statistics US Department of Labor. The
Employment Situation - June 2020. Accessed 07/26/2020.
https://www.bls.gov/news.release/pdf/empsit.pdf
[2] USC Career Center. Avoid Fraudulent Job Postings. Accessed
07/26/2020. https://careers.usc.edu/students/find-a-job/avoid-fraudulent-
job-postings/
[3] Rajapakse, Thilina. Simple Transformers - Introducing the
Figure 22. Random Forest & SVM Tabular Results Easiest Way To Use BERT, RoBERTa, XLNet, and XLM.Accessed
07/26/2020. https://towardsdatascience.com/simple-transformers-
introducing-the-easiest-bert-roberta-xlnet-and-xlm-library-58bf8c59b2a3
[4] D. (2020). dchen71/fake_job_classification. GitHub.
https://github.com/dchen71/fake_job_classification?fbclid=IwAR06RGlQg
rwI48NRXsqW55qyIzlx6xcj3LWDiwiVr4KhmBDKIwp4yblts8c
[5] A. (2021). Anshupriya2694/Fake-Job-Posting-Prediction.
GitHub. https://github.com/Anshupriya2694/Fake-Job-Posting-Prediction
[6] A. (2020). anuragkumar/fake-job-posting-prediction. GitHub.
https://github.com/anuragkumar/fake-job-posting-prediction
[7] S. (2020). saketh97/FakeJobPrediction. GitHub.
https://github.com/saketh97/FakeJobPrediction
[8] E. (2020). estheryl/fake_job_posting. GitHub.
https://github.com/estheryl/fake_job_posting
[9] [Real or Fake] Fake JobPosting Prediction. (2020, February 29).
Kaggle. https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-
prediction?fbclid=IwAR004SFbIKxL89TQ73IVOELninMacqcOrCZ5N3b
oQtLGhKJYr2dzZOskKgw