0% found this document useful (0 votes)

17 views74 pages

1822 B.E Cse Batchno 220

The document presents a comparative study on predicting fake job posts using various machine learning techniques, including KNN, random forest, multilayer perceptron, and deep neural networks. It highlights the challenges posed by fake job postings in the recruitment process and demonstrates that a deep neural network classifier can achieve approximately 98% accuracy in identifying fraudulent job posts. The study utilizes the Employment Scam Aegean Dataset (EMSCAD) containing 18,000 samples to validate the effectiveness of the proposed methods.

Uploaded by

Âttîtûđé Łøvèř

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views74 pages

1822 B.E Cse Batchno 220

Uploaded by

Âttîtûđé Łøvèř

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

A COMPARATIVE STUDY ON FAKE JOB POST PREDICTION USING

DIFFERENT MACHINE LEARNING TECHNIQUES

Submitted in partial fulfillment of the requirements

for the award of
Bachelor of Engineering degree in Computer Science and Engineering

Ranga Avinash (38110449)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SCHOOL OF COMPUTING

SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC

JEPPIAAR NAGAR, RAJIV GANDHI SALAI,

CHENNAI - 600 119

MAY– 2022

i
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

This is to certify that this Project Report is the bonafide work of Ranga

Avinash (Reg No: 38110449) who carried out the project entitled “A
COMPARATIVE STUDY ON FAKE JOB POST PREDICTION
USING DIFFERENT MACHINE LEARNING TECHNIQUES” under my
supervision from June 2021 to November 2021.

INTERNAL GUIDE

Dr Prayla Shyry M.E Ph.D.

HEAD OF THE DEPARTMENT

Dr. S VIGNESHWARI, M.E. PhD.,

Dr. L. Lakshmanan M.E. PhD.,

_____________________________________________________________________

Submitted for Viva voce Examination held on _________________________

Internal Examiner External Examine

ii
DECLARATION

I Ranga Avinash (Reg No: 38110449) hereby declare that the Project
Report entitled “A COMPARATIVE STUDY ON FAKE JOB POST
PREDICTION USING DIFFERENT MACHINE LEARNING
TECHNIQUES” done by us under the guidance of Dr Prayla Shyry M.E
Ph.D. is submitted in partial fulfillment of the requirements for the award of
Bachelor of Engineering degree in 2018-2022.

DATE:

PLACE: SIGNATURE OF THE CANDIDATE

1
ACKNOWLEDGEMENT

I am pleased to acknowledge my sincere thanks to Board of Management of

SATHYABAMA for their kind encouragement in doing this project and for
completing it successfully. I am grateful to them.

I convey my thanks to Dr. T. Sasikala M.E., Ph.D., Dean, School of Computing

Dr. S. Vigneshwari M.E., Ph.D. and Dr. L. Lakshmanan M.E., Ph.D., Heads of
the Department of Computer Science and Engineering for providing us necessary
support and details at the right time during the progressive reviews.

I would like to express my sincere and a deep sense of gratitude to my Project

Guide Dr Prayla Shyry M.E., Ph.D. for her valuable guidance, suggestions and
constant encouragement paved way for the successful completion of my project
work.

I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Engineering who were helpful in many
ways for the completion of the project.

1
ABSTRACT

In recent years, due to advancement in modern technology and social

communication, advertising new job posts has become very common issue in the
present world. So, fake job posting prediction task is going to be a great concern
for all. Like many other classification tasks, fake job posing prediction leaves a lot
of challenges to face. This paper proposed to use different machine learning
techniques and classification algorithm like KNN, random forest classifier,
multilayer perceptron and deep neural network to predict a job post if it is real or
fraudulent. It was Experimented on Employment Scam Aegean Dataset
(EMSCAD) containing 18000 samples. Deep neural network as a classifier,
performs great for this classification task. Use three dense layers for this deep
neural network classifier. The trained classifier shows approximately 98%
classification accuracy (DNN) to predict a fraudulent job post.

1
TABLE OF CONTENTS

Chapter No. TITLE Page No.

1 INTRODUCTION 8

1.1. OVERVIEW 8

1.2 . MACHINE LEARNING 9

1.3 OBJECTIVE 9

2 1.4
LITERATURE SURVEY 10

2.1 RELATED WORK 10

3 METHODOLOGY 15

3.1 EXISTING SYSTEM 15

3.1.1 DISADVANTAGES EXISTING SYSTEM 15

3.2 PROPOSED SYSTEM 15

3.2.1 ADVATAGES OF PROPOSED SYSTEM 16

3.3 ALGORITHMS USED 16

3.3.1 RANDOM FOREST ALGORITHM 16

3.3.2 KNN ALGORITHM 18

3.4 HARDWARE REQUIREMENTS 19

3.5 SOFTWARE REQUIREMENTS 19

3.6 DIAGRAMS 19

3.7 MODULES 28

3.8 SYSTEM ARCHITECTURE 29

3.9 LANGUAGE USED 33

4 SYSTEM STUDY 40

1
4.1 FEASIBILITY STUDY 40

5 CONCLUSION 48

5.1 CONCLUSION 48

REFERENCE 49

APPENDICES 51

A. SOURCE CODE 51

B. SCREENSHOTS 55

C. PLAGIARISM REPORT 74

D. JOURNAL PAPER 76

1
CHAPTER 1

INTRODUCTION

1.1 OVERVIEW

In modern time, the development in the field of industry and technology has
opened a huge opportunity for new and diverse jobs for the job seekers. With the
help of the advertisements of these job offers, job seekers find out their options
depending on their time, qualification, experience, suitability etc. Recruitment
process is now influenced by the power of internet and social media. Since the
successful completion of a recruitment process is dependent on its advertisement,
the impact of social media over this is tremendous. Social media and
advertisements in electronic media have created newer and newer opportunity to
share job details. Instead of this, rapid growth of opportunity to share job posts has
increased the percentage of fraud job postings which causes harassment to the job
seekers. So, people lacks in showing interest to new job postings due to preserve
security and consistency of their personal, academic and professional information.
Thus the true motive of valid job postings through social and electronic media
faces an extremely hard challenge to attain people’s belief and reliability.
Technologies are around us to make our life easy and developed but not to create
unsecured environment for professional life. If jobs posts can be filtered properly
predicting false job posts, this will be a great advancement for recruiting new
employees. Fake job posts create inconsistency for the job seeker to find their
preferable jobs causing a huge waste of their time. An automated system to predict
false job post opens a new window to face difficulties in the field of Human
Resource Management.

1
1.2 MACHINE LEARNING

Machine learning could be a subfield of computer science (AI). The goal of

machine learning typically is to know the structure information of knowledge of
information and match that data into models which will be understood and used by
folks. Although machine learning could be a field inside technology, it differs from
ancient process approaches.

In ancient computing, algorithms are sets of expressly programmed directions

employed by computers to calculate or downside solve. Machine learning algorithms
instead give computers to coach on knowledge inputs and use applied math analysis
so as to output values that fall inside a particular vary. thanks to this, machine
learning facilitates computers in building models from sample knowledge so as to
modify decision-making processes supported knowledge inputs.

1.3 OBJECTIVE

In modern technology and social communication, advertising new job posts has
become very common issue in the present world. So, fake job posting prediction
task is going to be a great concern for all. Like many other classification tasks, fake
job posing prediction leaves a lot of challenges to face.

1
CHAPTER 2
LITERATURE SURVEY

2.1 REALTED WORK

[2.1] Statistical features-based real-time detection of drifted Twitter spam

AUTHORS: C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min

Twitter spam has become a critical problem nowadays. Recent works focus on
applying machine learning techniques for Twitter spam detection, which make use
of the statistical features of tweets. In our labeled tweets data set, however, we
observe that the statistical properties of spam tweets vary over time, and thus, the
performance of existing machine learning-based classifiers decreases. This issue is
referred to as “Twitter Spam Drift”. In order to tackle this problem, we first carry
out a deep analysis on the statistical features of one million spam tweets and one
million non-spam tweets, and then propose a novel Lfun scheme. The proposed
scheme can discover “changed” spam tweets from unlabeled tweets and
incorporate them into classifier's training process. A number of experiments are
performed to evaluate the proposed scheme. The results show that our proposed
Lfun scheme can significantly improve the spam detection accuracy in real-world
scenarios.

[2.2] Automatically identifying fake news in popular Twitter threads

AUTHORS: C. Buntain and J. Golbeck

1
Information quality in social media is an increasingly important issue, but web-
scale data hinders experts' ability to assess and correct much of the inaccurate
content, or "fake news," present in these platforms. This paper develops a method
for automating fake news detection on Twitter by learning to predict accuracy
assessments in two credibility-focused Twitter datasets: CREDBANK, a
crowdsourced dataset of accuracy assessments for events in Twitter, and PHEME,
a dataset of potential rumors in Twitter and journalistic assessments of their
accuracies. We apply this method to Twitter content sourced from BuzzFeed's fake
news dataset and show models trained against crowdsourced workers outperform
models based on journalists' assessment and models trained on a pooled dataset of
both crowdsourced workers and journalists. All three datasets, aligned into a
uniform format, are also publicly available. A feature analysis then identifies
features that are most predictive for crowdsourced and journalistic accuracy
assessments, results of which are consistent with prior work. We close with a
discussion contrasting accuracy and credibility and why models of non-experts
outperform models of journalists for fake news detection in Twitter.

[2.3] A performance evaluation of machine learning-based streaming spam

tweets detection

AUTHORS: C. Chen, J. Zhang, Y. Xie, Y. Xiang,W. Zhou, M. M. Hassan, A.

AlElaiwi, and M. Alrubaian

The popularity of Twitter attracts more and more spammers. Spammers send
unwanted tweets to Twitter users to promote websites or services, which are
harmful to normal users. In order to stop spammers, researchers have proposed a
1
number of mechanisms. The focus of recent works is on the application of machine
learning techniques into Twitter spam detection. However, tweets are retrieved in a
streaming way, and Twitter provides the Streaming API for developers and
researchers to access public tweets in real time. There lacks a performance
evaluation of existing machine learning-based streaming spam detection methods.
In this paper, we bridged the gap by carrying out a performance evaluation, which
was from three different aspects of data, feature, and model. A big ground-truth of
over 600 million public tweets was created by using a commercial URL-based
security tool. For real-time spam detection, we further extracted 12 lightweight
features for tweet representation. Spam detection was then transformed to a binary
classification problem in the feature space and can be solved by conventional
machine learning algorithms. We evaluated the impact of different factors to the
spam detection performance, which included spam to nonspam ratio, feature
discretization, training data size, data sampling, time-related data, and machine
learning algorithms. The results show the streaming spam tweet detection is still a
big challenge and a robust detection technique should take into account the three
aspects of data, feature, and model.

[2.4] A model-based approach for identifying spammers in social networks

AUTHORS: F. Fathaliani and M. Bouguessa

In this paper, we view the task of identifying spammers in social networks from a
mixture modeling perspective, based on which we devise a principled unsupervised
approach to detect spammers. In our approach, we first represent each user of the
social network with a feature vector that reflects its behaviour and interactions with
1
other participants. Next, based on the estimated users feature vectors, we propose a
statistical framework that uses the Dirichlet distribution in order to identify
spammers. The proposed approach is able to automatically discriminate between
spammers and legitimate users, while existing unsupervised approaches require
human intervention in order to set informal threshold parameters to detect
spammers. Furthermore, our approach is general in the sense that it can be applied
to different online social sites. To demonstrate the suitability of the proposed
method, we conducted experiments on real data extracted from Instagram and
Twitter.

[2.5] Spam detection of Twitter traffic: A framework based on random forests

and non-uniform feature sampling

AUTHORS: C. Meda, E. Ragusa, C. Gianoglio, R. Zunino, A. Ottaviano, E.

Scillia, and R. Surlinelli

Law Enforcement Agencies cover a crucial role in the analysis of open data and
need effective techniques to filter troublesome information. In a real scenario, Law
Enforcement Agencies analyze Social Networks, i.e. Twitter, monitoring events
and profiling accounts. Unfortunately, between the huge amount of internet users,
there are people that use microblogs for harassing other people or spreading
malicious contents. Users' classification and spammers' identification is a useful
technique for relieve Twitter traffic from uninformative content. This work
proposes a framework that exploits a non-uniform feature sampling inside a gray
box Machine Learning System, using a variant of the Random Forests Algorithm
to identify spammers inside Twitter traffic. Experiments are made on a popular
1
Twitter dataset and on a new dataset of Twitter users. The new provided Twitter
dataset is made up of users labeled as spammers or legitimate users, described by
54 features. Experimental results demonstrate the effectiveness of enriched feature
sampling method

1
CHAPTER 3

METHODOLOGY

3.1 EXISTING SYSTEM

• Tingminet al. provide a survey of new methods and techniques to identify

Twitter spam detection. The above survey presents a comparative study of
the current approaches.
• On the other hand, S. J. Somanet. al. conducted a survey on different
behaviors exhibited by spammers on Twitter social network. The study also
provides a literature review that recognizes the existence of spammers on
Twitter social network.
• Despite all the existing studies, there is still a gap in the existing literature.
Therefore, to bridge the gap, we review state-of-the-art in the spammer
detection and fake user identification on Twitter

3.1.1 DISADVANTAGES OF EXISTING SYSTEM

• Because of Privacy Issues the Facebook dataset is very limited and a lot of
details are not made public.
• having less accuracy
• More complex

3.2 PROPOSED SYSTEM

The proposed framework, the sequence of processes that need to be followed for
continues detection of fake job post with active learning from the feedback of the
result given by the classification algorithm. This framework can easily be
implemented by social networking companies. 1. The detection process starts with
1
the selection of the post that needs to be tested. 2. After the selection of the post,
the suitable attributes (i.e. features) are selected on which the classification
algorithm is implemented. 3. The attributes extracted is passed to the trained
classifier. The classifier gets trained regularly as new training data is feed into the
classifier. 4. The classifier determines whether the post is fake or genuine. 5. The
classifier may not be 100% accurate in classifying the post so; the feedback of the
result is given back to the classifier. 6. This process repeats and as the time
proceeds, the no. of training data increases and the classifier becomes more and
more accurate in predicting the fake job post.

3.2.1 ADVANTAGES OF PROPOSED SYSTEM

• The social networking sites are making our social lives better but
nevertheless there are a lot of issues with using these social networking sites.
• The issues are privacy, online bullying, potential for misuse, trolling, etc.
These are done mostly by using fake job post.
• In this project, we came up with a framework through which we can detect a
fake job post using machine learning algorithms so that the social life of
people become secured.

3.3 ALGORITHMS USED

3.3.1 RANDOM FOREST ALGORITHM

Random forest algorithm can use both for classification and the regression
kind of problems. In this you are going to learn, how the random forest
algorithm works in machine learning for the classification task.

Random Forest is a popular machine learning algorithm that belongs to the

supervised learning technique. It can be used for both Classification and

1
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.

A random forest algorithm consists of many decision trees. The ‘forest’ generated
by the random forest algorithm is trained through bagging or bootstrap
aggregating. Bagging is an ensemble meta-algorithm that improves the accuracy of
machine learning algorithms.

As the name suggests, "Random Forest is a classifier that contains a number of

decision trees on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset." Instead of relying on one decision
tree, the random forest takes the prediction from each tree and based on the
majority votes of predictions, and it predicts the final output.

The below diagram explains the working of the Random Forest algorithm:
ALGORITHM USED

1. Randomly select “k” features from total “m” features.

1. Where k << m
2. Among the “k” features, calculate the node “d” using the best split
point.
3. Split the node into daughter nodes using the best split.
4. Repeat 1 to 3 steps until “l” number of nodes has been reached.
5. Build forest by repeating steps 1 to 4 for “n” number times to
create “n” number of trees.

1
The beginning of random forest algorithm starts with randomly
selecting “k” features out of total “m” features. In the image, you can observe that
we are randomly taking features and observations.

3.3.2 KNN ALGORITHMS

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms

based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
o K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to
the new data.

1
3.4 HARDWARE REQUIREMENTS

System : Pentium i3 Processor

Hard Disk : 500 GB.

Monitor : 15’’ LED

Input Devices : Keyboard, Mouse

Ram : 2 GB

3.5 SOFTWARE REQUIREMENTS

Operating system : Windows 10

Coding Language : Python

3.6 DIAGRAM

DATA FLOW DIAGRAM

1. The DFD is also called as bubble chart. It is a simple graphical formalism

that can be used to represent a system in terms of input data to the system,
various processing carried out on this data, and the output data is generated
by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It
is used to model the system components. These components are the system
process, the data used by the process, an external entity that interacts with
the system and the information flows in the system.
3. DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that

1
depicts information flow and the transformations that are applied as data
moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a
system at any level of abstraction. DFD may be partitioned into levels that
represent increasing information flow and functional detail.

FIG 3.1

1
UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized

general-purpose modeling language in the field of object-oriented software
engineering. The standard is managed, and was created by, the Object
Management Group.
The goal is for UML to become a common language for creating models of
object oriented computer software. In its current form UML is comprised of two
major components: a Meta-model and a notation. In the future, some form of
method or process may also be added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying,
Visualization, Constructing and documenting the artifacts of software system, as
well as for business modeling and other non-software systems.
The UML represents a collection of best engineering practices that have
proven successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software
and the software development process. The UML uses mostly graphical notations
to express the design of software projects.

GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that
they can develop and exchange meaningful models.

1
2. Provide extendibility and specialization mechanisms to extend the core
concepts.
3. Be independent of particular programming languages and development
process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations,
frameworks, patterns and components.
7. Integrate best practices.

USE CASE DIAGRAM:

A use case diagram in the Unified Modeling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose is
to present a graphical overview of the functionality provided by a system in terms
of actors, their goals (represented as use cases), and any dependencies between
those use cases. The main purpose of a use case diagram is to show what system
functions are performed for which actor. Roles of the actors in the system can be
depicted.

1
Fig.3.2

ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities
and actions with support for choice, iteration and concurrency. In the Unified
Modeling Language, activity diagrams can be used to describe the business and
operational step-by-step workflows of components in a system. An activity
diagram shows the overall flow of control.

1
Fig 3.3
1
SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction
diagram that shows how processes operate with one another and in what order. It is
a construct of a Message Sequence Chart. Sequence diagrams are sometimes called
event diagrams, event scenarios, and timing diagrams.

FIG 3.4

1
CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language
(UML) is a type of static structure diagram that describes the structure of a system
by showing the system's classes, their attributes, operations (or methods), and the
relationships among the classes. It explains which class contains information.

FIG 3.5

1
3.7 MODULES
➢ Data Collection
➢ Pre-Processing
➢ Train and Test
➢ Machine Learning Technique
➢ Detection of Fake Job post

MODULE DESCRIPTION

3.7.1 Data Collection

The dataset was collected from online (www.Kaggle.com) The dataset contains
2180 different samples of job posts. It contains different parameters of job post like
Title,Location,department,salary,description,required education etc.Dimensions of
dataset is 2180 x 17.

3.7.2 Pre-Processing

We convert data in to scalar format and then create new features which are passed
to algorithm and features are saved in x and labels in y.

3.7.3 Train and Test

we will split the dataset into training dataset and test dataset. We will use 70% of
our data to train and the rest 30% to test. To do this, we will create a split
parameter which will divide the data frame in a 70-30 ratio.

1
3.7.4 Machine Learning Technique
After splitting the dataset into training and test dataset, we will instantiate Random
Forest Classifier and KNN classifier fit the train data by using ‘fit’ function. Then
we will store as model.

3.7.5 Fake Job Post Prediction

In this step details are fed as input in the form of csv of various profiles and
prediction is performed. New input csv file with different job profile details is
given as input and prediction is performed and details are stored in new csv with
prediction results.

3.8 SYSTEM ARCHITECTURE

Describing the overall features of the software is concerned with defining the
requirements and establishing the high level of the system. During architectural
design, the various web pages and their interconnections are identified and
designed. The major software components are identified and decomposed into
processing modules and conceptual data structures and the interconnections among
the modules are identified. The following modules are identified in the proposed
system.

1
FIG 3.6

The above architecture describes the work structure of the system.

Proposed system is equipped with various Machine Learning tasks and the
architecture followed is as shown below. The proposed system collects the dataset
which are preprocessed by providing a framework of algorithms using which we
can detect fake job post in Facebook by comparing the accuracy of three machine
learning algorithms and the algorithm with very high efficiency is found for the
given dataset. The different ways in which an algorithm can model a problem is
based on its interaction with the experience or environment for the model
preparation process that helps in choosing the most appropriate algorithm for the
given input data in order to get the best result.

1
3.8.1 Problem Statement:
In modern technology and social communication, advertising new job posts has
become very common issue in the present world. So, fake job posting prediction
task is going to be a great concern for all. Like many other classification tasks, fake
job posing prediction leaves a lot of challenges to face.

INPUT DESIGN AND OUTPUT DESIGN

INPUT DESIGN

The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and
those steps are necessary to put transaction data in to a usable form for processing
can be achieved by inspecting the computer to read data from a written or printed
document or it can occur by having people keying the data directly into the system.
The design of input focuses on controlling the amount of input required,
controlling the errors, avoiding delay, avoiding extra steps and keeping the process
simple. The input is designed in such a way so that it provides security and ease of
use with retaining the privacy. Input Design considered the following things:

➢ What data should be given as input?

➢ How the data should be arranged or coded?
➢ The dialog to guide the operating personnel in providing input.
➢ Methods for preparing input validations and steps to follow when error
occur.

1
OBJECTIVES

1. Input Design is the process of converting a user-oriented description of the input

into a computer-based system. This design is important to avoid errors in the data
input process and show the correct direction to the management for getting correct
information from the computerized system.

2. It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to be
free from errors. The data entry screen is designed in such a way that all the data
manipulates can be performed. It also provides record viewing facilities.

3. When the data is entered it will check for its validity. Data can be entered with
the help of screens. Appropriate messages are provided as when needed so that the
user will not be in maize of instant. Thus the objective of input design is to create
an input layout that is easy to follow

OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents
the information clearly. In any system results of processing are communicated to
the users and to other system through outputs. In output design it is determined
how the information is to be displaced for immediate need and also the hard copy
output. It is the most important and direct source information to the user. Efficient
and intelligent output design improves the system’s relationship to help user
decision-making.

1
1. Designing computer output should proceed in an organized, well thought out
manner; the right output must be developed while ensuring that each output
element is designed so that people will find the system can use easily and
effectively. When analysis design computer output, they should Identify the
specific output that is needed to meet the requirements.

2. Select methods for presenting information.

3. Create document, report, or other formats that contain information produced by

the system.

The output form of an information system should accomplish one or more of the
following objectives.

❖ Convey information about past activities, current status or projections of the

❖ Future.
❖ Signal important events, opportunities, problems, or warnings.
❖ Trigger an action.
❖ Confirm an action.

1
CHAPTER 4

SYSTEM STUDY

4.1 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business

proposal is put forth with a very general plan for the project and some cost
estimates. During system analysis the feasibility study of the proposed system is to
be carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major requirements
for the system is essential.

Three key considerations involved in the feasibility analysis are

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the

1
research and development of the system is limited. The expenditures must be
justified. Thus the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.

TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Any system developed must not have a high
demand on the available technical resources. This will lead to high demands on the
available technical resources. This will lead to high demands being placed on the
client. The developed system must have a modest requirement, as only minimal or
null changes are required for implementing this system.

SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently.
The user must not feel threatened by the system, instead must accept it as a
necessity. The level of acceptance by the users solely depends on the methods that
are employed to educate the user about the system and to make him familiar with
it. His level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the system.

1
SYSTEM TESTING

The purpose of testing is to discover errors. Testing is the process of trying to

discover every conceivable fault or weakness in a work product. It provides a way
to check the functionality of components, sub assemblies, assemblies and/or a
finished product It is the process of exercising software with the intent of ensuring
that the

Software system meets its requirements and user expectations and does not fail in
an unacceptable manner. There are various types of test. Each test type addresses a
specific testing requirement.

TYPES OF TESTS

UNIT TESTING
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is the
testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that
relies on knowledge of its construction and is invasive. Unit tests perform basic
tests at component level and test a specific business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined
inputs and expected results.

1
INTEGRATION TESTING

Integration tests are designed to test integrated software components to

determine if they actually run as one program. Testing is event driven and is more
concerned with the basic outcome of screens or fields. Integration tests
demonstrate that although the components were individually satisfaction, as shown
by successfully unit testing, the combination of components is correct and
consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

FUNCTIONAL TEST

Functional tests provide systematic demonstrations that functions tested are

available as specified by the business and technical requirements, system
documentation, and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.

1
Organization and preparation of functional tests is focused on requirements, key
functions, or special test cases. In addition, systematic coverage pertaining to
identify Business process flows; data fields, predefined processes, and successive
processes must be considered for testing. Before functional testing is complete,
additional tests are identified and the effective value of current tests is determined.

SYSTEM TEST
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An
example of system testing is the configuration oriented system integration test.
System testing is based on process descriptions and flows, emphasizing pre-driven
process links and integration points.

WHITE BOX TESTING

White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at least
its purpose. It is purpose. It is used to test areas that cannot be reached from a black
box level.

BLACK BOX TESTING

Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as
most other kinds of tests, must be written from a definitive source document, such
as specification or requirements document, such as specification or requirements
document. It is a testing in which the software under test is treated, as a black box

1
.you cannot “see” into it. The test provides inputs and responds to outputs without
considering how the software works.

6.1 UNIT TESTING:

Unit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding and unit
testing to be conducted as two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written
in detail.

Test objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.

Features to be tested
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.

1
6.2 INTEGRATION TESTING

Software integration testing is the incremental integration testing of two or

more integrated software components on a single platform to produce failures
caused by interface defects.

The task of the integration test is to check that components or software

applications, e.g. components in a software system or – one step up – software
applications at the company level – interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects
encountered.

6.3 ACCEPTANCE TESTING

User Acceptance Testing is a critical phase of any project and requires

significant participation by the end user. It also ensures that the system meets the
functional requirements.

1
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.

1
CHAPTER-5
CONCLUSION

5.1 CONCLUSION
Job scam detection has become a great concern all over the world at present. In this
paper, we have analyzed the impacts of job scam which can be a very prosperous
area in research filed creating a lot of challenges to detect fraudulent job posts. We
have experimented with EMSCAD dataset which contains real life fake job posts.
In this paper we have experimented both machine learning algorithms (SVM,
KNN, Naive Bayes, Random Forest and MLP) and deep learning model (Deep
Neural Network). This work shows a comparative study on the evaluation of
traditional machine learning and deep learning based classifiers. We have found
highest classification accuracy for Random Forest Classifier among traditional
machine learning algorithms and 99 % accuracy for DNN (fold 9) and 97.7%
classification accuracy on average for Deep Neural Network.

1
REFERENCES
[1] C. Chen, S. Wen, J. Zhang, Y. Xiang, J. Oliver, A. Alelaiwi, and M. M.
Hassan, ‘‘Investigating the deceptive information in Twitter spam,’’ Future Gener.
Comput. Syst., vol. 72, pp. 319–326, Jul. 2017.
[2] I. David, O. S. Siordia, and D. Moctezuma, ‘‘Features combination for the
detection of malicious Twitter accounts,’’ in Proc. IEEE Int. Autumn Meeting
Power, Electron. Comput. (ROPEC), Nov. 2016, pp. 1–6.
[3] M. Babcock, R. A. V. Cox, and S. Kumar, ‘‘Diffusion of pro- and anti-false
information tweets: The black panther movie case,’’ Comput. Math. Org. Theory,
vol. 25, no. 1, pp. 72–84, Mar. 2019.
[4] S. Keretna, A. Hossny, and D. Creighton, ‘‘Recognising user identity in Twitter
social networks via text mining,’’ in Proc. IEEE Int. Conf. Syst., Man, Cybern.,
Oct. 2013, pp. 3079–3082.
[5] C. Meda, F. Bisio, P. Gastaldo, and R. Zunino, ‘‘A machine learning approach
for Twitter spammers detection,’’ in Proc. Int. Carnahan Conf. Secur. Technol.
(ICCST), Oct. 2014, pp. 1–6.
[6] W. Chen, C. K. Yeo, C. T. Lau, and B. S. Lee, ‘‘Real-time Twitter content
polluter detection based on direct features,’’ in Proc. 2nd Int. Conf. Inf. Sci. Secur.
(ICISS), Dec. 2015, pp. 1–4.
[7] H. Shen and X. Liu, ‘‘Detecting spammers on Twitter based on content and
social interaction,’’ in Proc. Int. Conf. Netw. Inf. Syst. Comput., pp. 413–417, Jan.
2015.
[8] G. Jain, M. Sharma, and B. Agarwal, ‘‘Spam detection in social media using
convolutional and long short term memory neural network,’’ Ann. Math. Artif.
Intell., vol. 85, no. 1, pp. 21–44, Jan. 2019.
[9] M. Washha, A. Qaroush, M. Mezghani, and F. Sedes, ‘‘A topic-based hidden
Markov model for real-time spam tweets filtering,’’ Procedia Comput. Sci., vol.
112, pp. 833–843, Jan. 2017.

1
[10] F. Pierri and S. Ceri, ‘‘False news on social media: A data-driven survey,’’
2019, arXiv:1902.07539. [Online]. Available: https://arxiv. org/abs/1902.07539
[11] S. Sadiq, Y. Yan, A. Taylor, M.-L. Shyu, S.-C. Chen, and D. Feaster,
‘‘AAFA: Associative affinity factor analysis for bot detection and stance
classification in Twitter,’’ in Proc. IEEE Int. Conf. Inf. Reuse Integr. (IRI), Aug.
2017, pp. 356–365.
[12] M. U. S. Khan, M. Ali, A. Abbas, S. U. Khan, and A. Y. Zomaya,
‘‘Segregating spammers and unsolicited bloggers from genuine experts on
Twitter,’’ IEEE Trans. Dependable Secure Comput., vol. 15, no. 4, pp. 551–560,
Jul./Aug. 2018.

1
APPENDICES

A. Source code:

import numpy as np
import pandas as pd
from flask import Flask, request, jsonify, render_template, redirect, flash,
send_file
from sklearn.preprocessing import MinMaxScaler
from werkzeug.utils import secure_filename
import pickle
app = Flask(__name__) #Initialize the flask App
model = pickle.load( open('random.pickle', 'rb') )
vecs = pickle.load( open('vectorizers.pickle', 'rb') )
classifiers = pickle.load( open('classifiers.pickle', 'rb') )
@app.route('/')
@app.route('/index')
def index():
return render_template('index.html')
@app.route('/chart')
def chart():
return render_template('chart.html')
@app.route('/performance')
def performance():
return render_template('performance.html')
@app.route('/login')
def login():
1
return render_template('login.html')
@app.route('/upload')
def upload():
return render_template('upload.html')
@app.route('/preview',methods=["POST"])
def preview():
if request.method == 'POST':
dataset = request.files['datasetfile']
df = pd.read_csv(dataset,encoding = 'unicode_escape')
df.set_index('Id', inplace=True)
return render_template("preview.html",df_view = df)
@app.route('/fake_prediction')
def fake_prediction():
return render_template('fake_prediction.html')
@app.route('/predict',methods=['POST'])
def predict():
features = [float(x) for x in request.form.values()]
final_features = [np.array(features)]
y_pred = model.predict(final_features)
if y_pred[0] == 1:
label="Fake Job Post"
elif y_pred[0] == 0:
label="Legit Job Post"
return render_template('fake_prediction.html', prediction_texts=label)
@app.route('/text_prediction')
def text_prediction():
return render_template("text_prediction.html")
1
@app.route('/job')
def job():
abc = request.args.get('news')
input_data = [abc.rstrip()]
# transforming input
tfidf_test = vecs.transform(input_data)
# predicting the input
y_preds = classifiers.predict(tfidf_test)
if y_preds[0] == 1:
labels="Fake Job Post"
elif y_preds[0] == 0:
labels="Legit Job Post"
return render_template('text_prediction.html', prediction_text=labels)
if __name__ == "__main__":
app.run(debug=True)

1
B. SCREEN SHOTS

Fig 5.1
• It is the home page of our website.

1
Fig 5.2

• This is the static login page for the user.It was created as static login so that
every one can access it.Username and password is commom for any user.

1
Fig 5.3

1
Fig 5.4

• This is the upload section of our website.Dataset should be uploaded in the

page so that it can train the data.

1
Fig 5.5

• This is the preview section of dataset.it displays the details of dataset like the
parameters of dataset
• In this preview section all the sample data is displayed .

1
Fig 5.6

1
Fig 5.7

• At the end of preview page we can train the data using KNN algorithm.

1
Fig 5.8
• This is the main part of the project we can predict the job post is legit or
fake.By choosing different parameters of the job post like employement
type,required education,required experience and function.

1
Fig 5.9

1
Fig 5.10

• For corresponding parameters of the job post the result is Legit Job

1
Fig 5.11

• For the corresponding parameters the given job post is a fake post

1
Fig 5.12

• This is the Text prediction page of our website.

• In this page we can predict the post whether a legit or fake using text of job
(description)

1
Fig 5.13

1
Fig 5.14

1
Fig 5.15

1
Fig 5.16

1
Fig 17

• It is the analysis of data using chart representation.

1
A. PLAGIARISM REPORT

1
1
1
1
1
1
1
1
1
1
1

Behavior Analysis Based On Google Play ST
No ratings yet
Behavior Analysis Based On Google Play ST
11 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
300 pages
Travel Time Prediction Using Random Forest
No ratings yet
Travel Time Prediction Using Random Forest
55 pages
Documentation-Fake News Detection
100% (1)
Documentation-Fake News Detection
57 pages
Documentation-Fake News Detection
No ratings yet
Documentation-Fake News Detection
57 pages
Synopsis
No ratings yet
Synopsis
12 pages
G H Raisoni College of Engineering and Management, Pune: Department Name
No ratings yet
G H Raisoni College of Engineering and Management, Pune: Department Name
22 pages
Sample IEEE Article Ready Format
No ratings yet
Sample IEEE Article Ready Format
5 pages
Fake Job Recruitment Detection Using Machine Learning Approach
No ratings yet
Fake Job Recruitment Detection Using Machine Learning Approach
7 pages
Project Report: Fake Job Prediction
No ratings yet
Project Report: Fake Job Prediction
3 pages
A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques
100% (1)
A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques
5 pages
Sample IEEE Article Ready Format
No ratings yet
Sample IEEE Article Ready Format
5 pages
Detection of Online Employment Scam Through Fake Jobs Using Random Forest Classifier
No ratings yet
Detection of Online Employment Scam Through Fake Jobs Using Random Forest Classifier
8 pages
A. Rupasri (20NE1A0510) Sk. Rehamunnisha (20NE1A0539) D. Sai Supriya (20NE1A0542) Sk. Mohammad Fahim (20NE1A0551)
No ratings yet
A. Rupasri (20NE1A0510) Sk. Rehamunnisha (20NE1A0539) D. Sai Supriya (20NE1A0542) Sk. Mohammad Fahim (20NE1A0551)
20 pages
Predicting The Trends of Quality-Oriented Jobs
No ratings yet
Predicting The Trends of Quality-Oriented Jobs
3 pages
20011f0015 Akshay PRC3
No ratings yet
20011f0015 Akshay PRC3
18 pages
Fake Job Post Prediction: Supervisor: I.Lakshmi Manikyamba Ass0Ciate Professor-Cse
No ratings yet
Fake Job Post Prediction: Supervisor: I.Lakshmi Manikyamba Ass0Ciate Professor-Cse
10 pages
Predicting Fraudulant Job Ads With Machine Learning
No ratings yet
Predicting Fraudulant Job Ads With Machine Learning
3 pages
A Comparative Study On Fake Job Post Prediction Using Different Machine Learning Techniques
No ratings yet
A Comparative Study On Fake Job Post Prediction Using Different Machine Learning Techniques
11 pages
D8 - Fake Profile Detection (Gpku)
No ratings yet
D8 - Fake Profile Detection (Gpku)
94 pages
Fake Job Post Detection Using Machine Learning
100% (1)
Fake Job Post Detection Using Machine Learning
24 pages
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
No ratings yet
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
6 pages
Project Viva
No ratings yet
Project Viva
4 pages
Data Science: Executive PG Programme in
No ratings yet
Data Science: Executive PG Programme in
53 pages
Final Report22.4 PDF
No ratings yet
Final Report22.4 PDF
118 pages
Fake Job Detection System
No ratings yet
Fake Job Detection System
7 pages
Final-Report22 3 PDF
No ratings yet
Final-Report22 3 PDF
124 pages
Fake Job Post Detection Using Machine Learning
No ratings yet
Fake Job Post Detection Using Machine Learning
9 pages
Fin Ijprems1680687249
No ratings yet
Fin Ijprems1680687249
6 pages
Fin Irjmets1668589338
No ratings yet
Fin Irjmets1668589338
6 pages
Litrature - Survey - Keer
No ratings yet
Litrature - Survey - Keer
11 pages
Accurate Prediction of Real and Fake Job Postings Using Machine Learning
No ratings yet
Accurate Prediction of Real and Fake Job Postings Using Machine Learning
5 pages
Fake News Detection System Project Report-Merged
No ratings yet
Fake News Detection System Project Report-Merged
60 pages
Fake Job Post Prediction Using ML
No ratings yet
Fake Job Post Prediction Using ML
7 pages
Docdownloader Com PDF To Study Fake News Detection in Online Social Media in Context of Machine DD
No ratings yet
Docdownloader Com PDF To Study Fake News Detection in Online Social Media in Context of Machine DD
78 pages
A Major Project Report On: Bachelor of Technology
No ratings yet
A Major Project Report On: Bachelor of Technology
109 pages
Bibilography 5
No ratings yet
Bibilography 5
29 pages
Fake Online Job Recruitment
100% (1)
Fake Online Job Recruitment
13 pages
Bhargav Last (1) - 241128 - 143747
No ratings yet
Bhargav Last (1) - 241128 - 143747
48 pages
AB Report Group 2
No ratings yet
AB Report Group 2
14 pages
IEEE Conference Template 9
No ratings yet
IEEE Conference Template 9
6 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
Fake News Detection Using Multi
No ratings yet
Fake News Detection Using Multi
9 pages
Ijett V68i4p209s
No ratings yet
Ijett V68i4p209s
6 pages
Summer Intern
No ratings yet
Summer Intern
34 pages
Fakejobdett
No ratings yet
Fakejobdett
9 pages
Principles of Artificial Intelligence CAT
No ratings yet
Principles of Artificial Intelligence CAT
8 pages
ABSTRACT
No ratings yet
ABSTRACT
5 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
Automated Resume Screening Using Natural Language Processing
No ratings yet
Automated Resume Screening Using Natural Language Processing
39 pages
Generative AI and Machine Learning Course Content
No ratings yet
Generative AI and Machine Learning Course Content
19 pages
2403 13536
No ratings yet
2403 13536
23 pages
Predicting Bank Insolvencies Using Machine Learning Techniques
No ratings yet
Predicting Bank Insolvencies Using Machine Learning Techniques
42 pages
Fake Job Detection Using ML Abstract
No ratings yet
Fake Job Detection Using ML Abstract
3 pages
Penalized Logit Tree Regression
No ratings yet
Penalized Logit Tree Regression
40 pages
Fake Job Entry Detectionnn
No ratings yet
Fake Job Entry Detectionnn
25 pages
Project JAISON
No ratings yet
Project JAISON
61 pages
A General Guide To Applying Machine Learning To Computer Architecture - Marked
No ratings yet
A General Guide To Applying Machine Learning To Computer Architecture - Marked
21 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
8 pages
Batch 1 Job Market Analysis and Prediction-1
No ratings yet
Batch 1 Job Market Analysis and Prediction-1
60 pages
Fake E Job Posting Prediction Based On A
No ratings yet
Fake E Job Posting Prediction Based On A
7 pages
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
No ratings yet
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
6 pages
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
23 pages
Documentation - Real and Fake
No ratings yet
Documentation - Real and Fake
66 pages
Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
No ratings yet
Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
21 pages
Random Forest Classifier
No ratings yet
Random Forest Classifier
18 pages
ETRI Journal - 2022 - Gangrade - Taxi Demand Forecasting Using Dynamic Spatiotemporal Analysis
No ratings yet
ETRI Journal - 2022 - Gangrade - Taxi Demand Forecasting Using Dynamic Spatiotemporal Analysis
17 pages
Fake Job Prediction
No ratings yet
Fake Job Prediction
23 pages
Intern Project Report
No ratings yet
Intern Project Report
47 pages
ML Notes
No ratings yet
ML Notes
16 pages
Fakejob
No ratings yet
Fakejob
5 pages
Fake Job Posting Detection
No ratings yet
Fake Job Posting Detection
5 pages
05 Ensemble Learning
No ratings yet
05 Ensemble Learning
67 pages
Week 7 Prev & Current Assignments
No ratings yet
Week 7 Prev & Current Assignments
21 pages
Shubham Nov 2022
No ratings yet
Shubham Nov 2022
2 pages
FDS Viva
No ratings yet
FDS Viva
46 pages
Fake Job Detection Research Proposal
No ratings yet
Fake Job Detection Research Proposal
4 pages
Fakejobpublished
No ratings yet
Fakejobpublished
5 pages
Research Paper
No ratings yet
Research Paper
5 pages
Chap 7-2 Regularization For Deep Learning-Hyun-Lim Yang
No ratings yet
Chap 7-2 Regularization For Deep Learning-Hyun-Lim Yang
49 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Application of ML To The Estimation of Intact Rock Strength From Core Logging Data
No ratings yet
Application of ML To The Estimation of Intact Rock Strength From Core Logging Data
22 pages
Final
No ratings yet
Final
30 pages
M11 Final Document
No ratings yet
M11 Final Document
82 pages
Scikit-Learn Interview Questions and Answers-1
No ratings yet
Scikit-Learn Interview Questions and Answers-1
2 pages
Final Year Project - Nagabhusana K Nagabhusana K
No ratings yet
Final Year Project - Nagabhusana K Nagabhusana K
6 pages
Permeability - Soil Rock Mix Version 1
No ratings yet
Permeability - Soil Rock Mix Version 1
37 pages
Fake Job Abstract
No ratings yet
Fake Job Abstract
2 pages
07-Ensembles Notes
No ratings yet
07-Ensembles Notes
21 pages
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet