Mini Project
Mini Project
On
Fake News Detection using different
Machine Learning models
Gayatri Vidya Parishad College of Engineering
(Autonomous)
Madhurawada,Visakhapatnam-530 048
Under the esteemed guidance of
S. Kanthi Kiran
Associate Professor
Department of Information Technology
Many news sources contain false information and are therefore “fake news.”
Because there is a lot of “fake news” articles and fabricated, misleading
information on the web, we would like to determine which texts are legitimate
(real) and which are illegitimate (fake). To solve this as a binary classification
problem, we investigate the effectiveness of different Natural Language
Processing models which are used to convert character based texts into numeric
representations such as TFIDF, CountVectorizer and Word2Vec models and find
out which model is able to preserve most of the contextual information about
the text used in a fake news data set and how helpful and effective it is in
detecting whether the text is a fake news or not.
TECHNOLOGIES USED:
• CountVectorizer
• TF-IDF
• Word2Vec
• ANNs
• LSTMs
• Logistic Regression
• Support Vector Machine
• Random Forest Classifier
PLATFORMS USED:
• Pycharm
• Python 3.6
CONVOLUTIONAL NEURAL NETWORK(CNN):
A Convolutional neural network (CNN) is a neural network that has one or more
convolutional layers and are used mainly for image processing, classification,
segmentation. Each convolutional layer contains a series of filters known as
convolutional kernels. The filter is a matrix of integers that are used on a subset of the
input pixel values, the same size as the kernel. Each pixel is multiplied by the
corresponding value in the kernel, then the result is summed up for a single value for
simplicity representing a grid cell, like a pixel, in the output channel/feature map.
Unsupervised Pre-training to encode our texts into numeric
representations
(1) (2)
(3)
RECURRENT NEURAL NETWORK (RNN):
Recurrent Neural Network is a generalization of feed forward neural network that has
an internal memory.
RNN is recurrent in nature as it performs the same task for every input data with the
output being dependent on the
previous computations. Once an output is produced, it is copied and sent back into the
recurrent network.
CONNECTIONIST TEMPORAL CLASSIFICATION(CTC):
If you want a computer to recognize text, neural networks (NN) are a good choice as
they outperform all other approaches at the moment. The NN for such use-cases
usually consists of convolutional layers (CNN) to extract a sequence of features and
recurrent layers (RNN) to propagate information through this sequence. It outputs
character-scores for each sequence-element, which simply is represented by a matrix.
Now, there are two things we want to do with this matrix: train: calculate the loss value
to train the NN infer: decode the matrix to get the text contained in the input image
Both tasks are achieved by the CTC operation.
EXISTING SYSTEM:
Drawbacks:
• It doesn’t have noise reduction.
• The input image is fed into the CNN layers. These layers are
trained to extract relevant features from the image.
• While training the NN, the CTC is given the RNN output matrix and
the ground truth text and it computes the loss value.
Overview of the NN operations (green) and the data flow
through the NN (pink).
REQUIREMENTS SPECIFICATION:
HARDWARE REQUIREMENTS:
• Ram:4GB or higher
• Disc Space:1TB
• Processor:Intel i5 or higher
SOFTWARE REQUIREMENTS:
• Operating System:WINDOWS
• Python3
• Packages :TensorFlow,numpy,opencv,keras
• Pycharm
FUNCTIONAL REQUIREMENTS:
• The system should process the input given by the user only if it is
an image file.
• System will show the error message to the user when the input
given is not in the required format.
• Availability: This system will retrieve the handwritten character regions only if
the image contains written characters in it.
• Recognition Ability: The software is very easy to use and recognizes the
characters from the image.
• Reliability: This software will work reliably for any type of character images.
SYSTEM ARCHITECTURE/FLOW CHART:
Start
Real Image
Noise Removal
Classification of image
Stop
Image Acquisition
Preprocessing
Segmentation
Classification and
Recognition
Post processing
Process Flow
Image Acquisition:
• In Image acquisition,the recognition system acquires a scanned image as an input
image.
Pre-processing:
• It is a gray-value image of size 128×32.
• Usually, the images from the dataset do not have exactly this size, therefore we
resize it (without distortion) until it either has a width of 128 or a height of 32.
• Then, we copy the image into a (white) target image of size 128×32
Segmentation:
Post-Processing:
Upload Image
Cancel
<<include>> User
Convert Image-Gray
Initialize Scale
<<include>>
Pre-Process Image
<<include>> Gray Scale to
Binary format
System Recognize
Normalization
Generate Output
Output Screens:
Conclusions:
• This model is built to analyze the text we have written and convert it in
computer text.
References:
https://towardsdatascience.com/2326a3487cd5
https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742
https://arxiv.org/pdf/1507.05717.pdf
Status: