0% found this document useful (0 votes)
31 views30 pages

Mini Project

The document summarizes a project on detecting fake news using machine learning models. It discusses using TF-IDF, CountVectorizer, and Word2Vec to convert texts into numeric representations and classify whether news articles are real or fake. Neural networks like CNNs, LSTMs, and logistic regression are investigated. The project aims to determine which NLP model best preserves contextual information to help detect fake news.

Uploaded by

Anusha Kandula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views30 pages

Mini Project

The document summarizes a project on detecting fake news using machine learning models. It discusses using TF-IDF, CountVectorizer, and Word2Vec to convert texts into numeric representations and classify whether news articles are real or fake. Neural networks like CNNs, LSTMs, and logistic regression are investigated. The project aims to determine which NLP model best preserves contextual information to help detect fake news.

Uploaded by

Anusha Kandula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Main Project Seminar

On
Fake News Detection using different
Machine Learning models
Gayatri Vidya Parishad College of Engineering
(Autonomous)
Madhurawada,Visakhapatnam-530 048
Under the esteemed guidance of
S. Kanthi Kiran
Associate Professor
Department of Information Technology

Project Team Members


K. Anusha 17131A1251
B. Kamal
17131A1217
A. Amruth 17131A1210
K. Adithya Pavan
17131A1255
INTRODUCTION

Many news sources contain false information and are therefore “fake news.”
Because there is a lot of “fake news” articles and fabricated, misleading
information on the web, we would like to determine which texts are legitimate
(real) and which are illegitimate (fake). To solve this as a binary classification
problem, we investigate the effectiveness of different Natural Language
Processing models which are used to convert character based texts into numeric
representations such as TFIDF, CountVectorizer and Word2Vec models and find
out which model is able to preserve most of the contextual information about
the text used in a fake news data set and how helpful and effective it is in
detecting whether the text is a fake news or not.
TECHNOLOGIES USED:
• CountVectorizer
• TF-IDF
• Word2Vec
• ANNs
• LSTMs
• Logistic Regression
• Support Vector Machine
• Random Forest Classifier

PLATFORMS USED:
• Pycharm
• Python 3.6
CONVOLUTIONAL NEURAL NETWORK(CNN):

A Convolutional neural network (CNN) is a neural network that has one or more
convolutional layers and are used mainly for image processing, classification,
segmentation. Each convolutional layer contains a series of filters known as
convolutional kernels. The filter is a matrix of integers that are used on a subset of the
input pixel values, the same size as the kernel. Each pixel is multiplied by the
corresponding value in the kernel, then the result is summed up for a single value for
simplicity representing a grid cell, like a pixel, in the output channel/feature map.
Unsupervised Pre-training to encode our texts into numeric
representations
(1) (2)

(3)
RECURRENT NEURAL NETWORK (RNN):

Recurrent Neural Network is a generalization of feed forward neural network that has
an internal memory.
RNN is recurrent in nature as it performs the same task for every input data with the
output being dependent on the
previous computations. Once an output is produced, it is copied and sent back into the
recurrent network.
CONNECTIONIST TEMPORAL CLASSIFICATION(CTC):

If you want a computer to recognize text, neural networks (NN) are a good choice as
they outperform all other approaches at the moment. The NN for such use-cases
usually consists of convolutional layers (CNN) to extract a sequence of features and
recurrent layers (RNN) to propagate information through this sequence. It outputs
character-scores for each sequence-element, which simply is represented by a matrix.
Now, there are two things we want to do with this matrix: train: calculate the loss value
to train the NN infer: decode the matrix to get the text contained in the input image
Both tasks are achieved by the CTC operation.
EXISTING SYSTEM:

• Optical Character Recognition(OCR) is the existing system


for character recognition.

• It is an electronic translation of images of hand-written ,


type-written or printed text into machine editable text.

Drawbacks:
• It doesn’t have noise reduction.

• Direct use of OCR remains difficult problem to resolve,as it


leads to low reading accuracy.
PROPOSED SYSTEM:

• We use a NN for our task. It consists of convolutional NN (CNN)


layers, recurrent NN (RNN) layers and a final Connectionist
Temporal Classification (CTC) layer.

• The input image is fed into the CNN layers. These layers are
trained to extract relevant features from the image.

• The RNN output sequence is mapped to a matrix of size 32×80.

• While training the NN, the CTC is given the RNN output matrix and
the ground truth text and it computes the loss value.
Overview of the NN operations (green) and the data flow
through the NN (pink).
REQUIREMENTS SPECIFICATION:
HARDWARE REQUIREMENTS:
• Ram:4GB or higher
• Disc Space:1TB
• Processor:Intel i5 or higher

SOFTWARE REQUIREMENTS:
• Operating System:WINDOWS
• Python3
• Packages :TensorFlow,numpy,opencv,keras
• Pycharm
FUNCTIONAL REQUIREMENTS:

• The system should process the input given by the user only if it is
an image file.

• System will show the error message to the user when the input
given is not in the required format.

• System should detect the characters present in the image.

• System should retrieve characters present in the image and


display them to the user.
NON-FUNCTIONAL REQUIREMENTS:

• Performance: Handwritten characters in the input image will be recognized


with high accuracy.

• Functionality: This software will deliver on the functional requirements


mentioned in this document.

• Availability: This system will retrieve the handwritten character regions only if
the image contains written characters in it.

• Recognition Ability: The software is very easy to use and recognizes the
characters from the image.

• Reliability: This software will work reliably for any type of character images.
SYSTEM ARCHITECTURE/FLOW CHART:
Start

Real Image

Noise Removal

Classification of image

Extraction of text from image

Text contained in the image will


be displayed

Stop
Image Acquisition

Preprocessing

Segmentation

Classification and
Recognition

Post processing

Process Flow
Image Acquisition:
• In Image acquisition,the recognition system acquires a scanned image as an input
image.

• The image should be in png format.

Pre-processing:
• It is a gray-value image of size 128×32.

• Usually, the images from the dataset do not have exactly this size, therefore we
resize it (without distortion) until it either has a width of 128 or a height of 32.

• Then, we copy the image into a (white) target image of size 128×32
Segmentation:

• In this stage, an image of sequence of characters is decomposed into


sub-images of individual characters.

• The pre-processed input device is segmented into isolated characters by


assigning a number to each character using labelling process.

• Labelling process provides information about number of characters in


image.

• Each individual character is uniformly resized into pixels.


Classification And Recognition:

• The classification stage is the decision making part of the recognition


system.

• A feed forward back propagation neural network is used in this work


for classifying and recognizing the handwritten characters .

• The total number of neurons in the output layer is 79 as the proposed


system is designed to recognize English alphabets and digits.

Post-Processing:

• Post-Processing stage is the final stage of the proposed recognition


system.

• It prints the corresponding recognized character in the structured text


form.
UML DIAGRAM:

USE CASE DIAGRAM:

Upload Image

Cancel
<<include>> User
Convert Image-Gray
Initialize Scale
<<include>>

Pre-Process Image
<<include>> Gray Scale to
Binary format
System Recognize

Normalization
Generate Output
Output Screens:
Conclusions:

• Handwritten Text Recognition is a complex problem, which is not easily


solvable. The necessity is around dataset and database.

• This model is built to analyze the text we have written and convert it in
computer text.

• This application is applicable in many sectors of health care and consumer


sector.

• This type of model is used in health application can save understanding


perspectives of people and store each and every record digitally.

• Recognition of text depends on writing style.

• Salt and pepper noise can through off results.


Future Scope:

• This work can further be implemented to convert a handwritten paragraph in English to


structured format.

• It can be further extended to recognize other languages also.

References:

https://towardsdatascience.com/2326a3487cd5

https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742

https://arxiv.org/pdf/1507.05717.pdf
Status:

• Data Collection - 100%

• Model Building - 100%

• User Interface - 100%


Thank You

You might also like