0% found this document useful (0 votes)
31 views

Media Detection Based On Natural Language Processing and

The proposed solution for detecting false news involves a combination of Natural Language Processing (NLP) techniques, Reinforcement Learning (RL), and blockchain technology. The process begins with the collection of a comprehensive dataset of news articles and their associated metadata, followed by NLP-based pre-processing to clean and tokenize the text. Relevant features, such as word frequencies and readability, are then extracted and used to train an RL agent. The agent is trained to disting

Uploaded by

Editor in Chief
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Media Detection Based On Natural Language Processing and

The proposed solution for detecting false news involves a combination of Natural Language Processing (NLP) techniques, Reinforcement Learning (RL), and blockchain technology. The process begins with the collection of a comprehensive dataset of news articles and their associated metadata, followed by NLP-based pre-processing to clean and tokenize the text. Relevant features, such as word frequencies and readability, are then extracted and used to train an RL agent. The agent is trained to disting

Uploaded by

Editor in Chief
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ISSN 2394-3777 (Print)

ISSN 2394-3785 (Online)


Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 11, Issue 4, April 2024

Media Detection based on Natural Language Processing and


Blockchain Approaches
1
Mrs. C.Vasuki, 2Aishwarya .S, 3Sowmiya .R, 4Sugul .G
1Assistant Professor, 2,3,4Information Technology

Nandha Engineering College,


Erode.

Abstract: The proposed solution for detecting false news involves a combination of Natural Language Processing (NLP)
techniques, Reinforcement Learning (RL), and blockchain technology. The process begins with the collection of a
comprehensive dataset of news articles and their associated metadata, followed by NLP-based pre-processing to clean and
tokenize the text. Relevant features, such as word frequencies and readability, are then extracted and used to train an RL
agent. The agent is trained to distinguish between true and false news using a reward and punishment system for learning.
Once trained, the RL agent can classify new articles as true or false based on their extracted features. Although the potential
role of blockchain technology is mentioned, further elaboration is required. This innovative approach is aimed at combating
the dissemination of false information and misinformation in the digital news.

Keywords: Natural Language Processing (NLP), Block chain,Fake Media

advanced learning models. With the help of these


I. INTRODUCTION advancements, PCs can now completely "comprehend"
The identification of false information through human language as message or audio information, including
unsupervised models presents an essential and innovative the speaker's or essayist's expectation and point of view.
approach to combat the widespread dissemination of
Computer programmers that translate text between
disinformation and misinformation in the modern digital age.
With the proliferation of online platforms and social media, languages reply to spoken commands, and summaries
the propagation of inaccurate or deceptive content has enormous volumes of text quickly and even constantly are
become a pressing concern, posing risks to public discourse, all powered by NLP. NLP is used in voice-activated GPS
democracy, and even public safety. Unsupervised models for systems, digital assistants, speech-to-message transcription
detecting fake news rely on the inherent patterns and programmers, Chatbots for customer care, and other
characteristics of textual data to differentiate between
shopping conveniences. However, NLP also contributes
authentic news and fabricated material, without the need for
pre-labeled training data. By utilizing techniques such as significantly to large-scale commercial strategies that
natural language processing, clustering, and anomaly improve critical business processes, promote employee
detection, these models strive to automatically detect productivity, and simplify operations.
deceitful narratives and potentially harmful information, 1.2 BLOCKCHAIN
providing a scalable and proactive solution to the pervasive With the use of a block chain, data may be stored in
issue of fake news. a form that makes system modifications, hacking, and fraud
1.1 NATURAL LANGUAGE PROCESSING (NLP)
difficult or impossible. The simplest definition of a block
The field of software engineering known as "natural
chain is a network of computers that copies and disseminates
language processing" (NLP) focuses on making it possible
a digital record of transactions throughout the whole
for computers to comprehend text and spoken words in a
network. Each participant's ledger receives a copy of every
manner that is similar to that of humans. NLP blends
new transaction that occurs on the block chain, and each
computational etymological rule-based demonstration of
block in the chain consists of several transactions. The
human language with facts, artificial intelligence, and

All Rights Reserved © 2024 IJARTET 1


ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 11, Issue 4, April 2024

decentralized database that is controlled by several users is conventional machine learning techniques to detect concept
known as distributed ledger technology (DLT). A block drifts using both real and artificial data. ElStream utilizes the
chain is a continually expanding database of unchangeable majority voting technique to make only the optimum
transactional records that have undergone cryptographically classifier vote for decision. Experimental analysis shows that
authentication and have been shared by all network the ensemble learning approach provides consistent
participants. Each record has a time stamp and references performance for both artificial and real-world datasets, with
previous transactions. Anyone with access rights can use this ElStream providing better accuracy than previous state-of-
information to travel back in time to any moment in a the-art studies and conventional machine learning algorithms.
transactional event's past that belongs to any participant. A Big data has gained significant attention in the last decade
block chain is one form of the more generic concept of due to its potential to provide invaluable insights and
networked ledgers. benefits such as cost reduction, faster decision-making, and
1.3 FAKE MEDIA innovation in new products across various industries.
Fake news is information that is false or misleading However, the fact that this data is often in the form of
yet is reported as news. The destruction of someone or continuous streams poses a challenge for analysis.The
something's reputation or the generations of advertising complexity of big data renders the traditional approach to
revenue are frequent objectives of false news. Despite the data analysis ineffective.
reality that false information has always been shared The issue of fake news [2] has become a significant
throughout history, the term "fake news" was first used in problem in today's world, largely due to the widespread use
the 1890s, a time when spectacular newspaper tales were of social media. To ensure the authenticity of information
common. The term, which has no specific definition, is posted on social media, it is crucial to verify that it comes
frequently used to describe all false information. High- from reputable sources. However, the intensity and sincerity
profile people have also used it to describe any negative of internet news remain a challenge. In this study, we
news that pertains to them. Disinformation is also the propose an FNU-BiCNN model that utilizes NLTK
deliberate spread of misleading information, and it is characteristics such as stop words and stem words for data
commonly produced and spread by hostile foreign actors, pre-processing. We then compute the TF-IDF using LSTM,
especially during election seasons. Stories with batch normalization, and dense, and choose features using
sensationalist or click bait headlines without any underlying the WORDNET Lemmatize. Bi-LSTM with ARIMA and
material are some examples of fake news, as are satirical CNN are used to train the datasets, and various machine
articles that are misconstrued as the genuine thing. Due to learning techniques are employed to classify them. By
the variety of false news sources, researchers are beginning deriving credibility ratings from textual data, this model
to adopt the term "information disorder" since it is more develops an ensemble strategy for concurrently learning the
objective and informative. depictions of news stories, authors, and titles. To achieve
greater accuracy, we use a Voting ensemble classifier and
II. LITERATURE REVIEW compare it with several machine learning algorithms such as
The surge in data traffic due to the rapid increase in SVM, DT, RF, KNN, and Naive Bayes. Our results show
communication technologies and smart devices has led to that the voting ensemble classifier achieved the highest
the generation of a massive amount of data every second by accuracy of 99.99%. We assess the performance and
various applications, users, and devices. This has created a efficacy of classifiers using accuracy, recall, and F1-Score.
need for solutions to analyse the changes in data over time Chang Li [3] et.al. have proposed in their paper that online
despite resource constraints, which are identified as concept debates can provide valuable information on various
drifts. In their paper, Ahmad Abbasi [1] et.al. propose a perspectives. However, understanding the expressed stances
novel approach called ElStream that uses ensemble and in these debates is a difficult task that requires modelling

All Rights Reserved © 2024 IJARTET 2


ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 11, Issue 4, April 2024

both the textual content and the users' conversational application on a real data set. Specifically, the suggested
interactions. Current approaches take a collective approach generates subsets of attributes based on two
classification approach, disregarding the relationships criteria: (1) individual attributes exhibiting high
between different debate topics. In this study, we suggest discrimination (classification) power; and (2) the attributes
treating this task as a representation learning problem and within the subset complement each other by misclassifying
jointly embedding the text and authors based on their different classes. The method evaluates one attribute at a
interactions. We evaluate our model using the Internet time, utilizing information from a confusion matrix. While
Argumentation Corpus and compare different approaches for achieving good accuracy in classification is the primary
embedding structural information. The experimental results objective in classification problems, the identification of
demonstrate that our model outperforms previous attributes with the greatest separation power is also of
competitive models significantly. In recent years, social interest. Moreover, in the case of large data sets, such as
media platforms have played an increasingly important role MRI images of the brain, feature selection greatly influences
in shaping political discourse. Online debate forums enable the classification process. This is primarily due to the fact
users to express their opinions and engage with others who that as the number of attributes increases, the data becomes
hold different views. Understanding the interactions between moresparse, necessitating a significantly larger amount of
users on these platforms can provide insights into current training data to accurately represent such a vast domain.
political discourse, argumentation strategies, and public Consequently, high-dimensional data sets are typically
sentiment on policy issues on a large scale. underrepresented, a phenomenon commonly referred to as
Umar Mohammed Abatcha [4] et.al. have presented in their "the curse of dimensionality" in literature. For instance, a 2-
paper the concept of grouping reports, which is a significant attribute data set with 10 examples can adequately cover the
aspect in the fields of data and software engineering. This domain defined by the corners (0,0) and (1,1).
involves accurately organizing archives into specific
categories, which is considered to be a crucial method for III.EXISITING SYSTEM
sorting information. With the continuous advancement of Social media is heavily relied upon by users for
personal computers and technology, the number of reports news consumption and sharing, resulting in the widespread
has been constantly increasing. Therefore, it is essential to dissemination of both genuine and fake stories. The presence
arrange these archives based on their content. Text of misinformation across various social media platforms
classification is commonly employed to categorize text into poses significant consequences for society. One major
different classes, and it involves multiple stages that can be challenge in effectively detecting fake news on Twitter lies
approached using various methods. The selection of the in the difficulty of distinguishing between different forms of
appropriate method for each category plays a vital role in false information. To address this issue, researchers have
enhancing the efficiency of text processing. The task of made progress by focusing on methods that can identify fake
organizing archives into categories based on their content is news. In this study, the FNC-1 dataset, which consists of
a complex challenge that is central to the efforts of data four categories for identifying false news, will be utilized.
experts and researchers. It plays a fundamental role in To evaluate and compare the state-of-the-art techniques for
various applications, including designing, organizing, detecting fake news, big data technology (Spark) and
ordering, and efficiently managing large volumes of machine learning will be employed. The methodology
information. This is particularly important for publishers, employed in this study involves the use of a decentralized
news outlets, bloggers, and individuals dealing with Spark cluster to create a stacked ensemble model. After
extensive content repositories within an organization. performing feature extraction using N-grams, Hashing TF-
Aparna Kumari [5]et.al has proposed in this paper the IDF, and count vectorizer, the proposed stacked ensemble
introduction of a novel technique for feature selection and its classification model is utilized.

All Rights Reserved © 2024 IJARTET 3


ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 11, Issue 4, April 2024

IV.PROPOSED SYSTEM incorporate data authentication is through the utilization of


A combination of Natural Language Processing, digital signatures, which authenticate the source of the news
Reinforcement Learning, and block chain technology is article. These digital signatures, created using cryptographic
proposed for detecting false news. The system involves algorithms, serve to verify the authenticity of the
collecting a large dataset of news articles with metadata such information. By attaching the digital signature to the news
as source, date, and author. The collected data would be pre- article and storing it on a blockchain, the tamper-proof
processed using NLP techniques to clean and tokenize the nature of the signature is ensured, allowing for easy
text. From the pre-processed data, features such as word verification. Additionally, machine learning algorithms can
frequencies, sentence length, and readability would be be employed to detect inconsistencies within the data. For
extracted. An RL agent would be trained on the extracted instance, language inconsistencies between the headline and
features to identify patterns that distinguish between true and the body of a fake news article can be identified through
false news. The agent would be rewarded for correctly training these algorithms. Such inconsistencies can then be
identifying false news and penalized for incorrectly flagged as potentially fake media.
identifying true news as false. Once the agent is trained, it NEWS COLLECTION AND TRAINING PHASE

can be used to classify new news articles as true or false


based on their extracted features. SEGMENTATION
CLEANING
FEATURE FEATURED
V. MODULE DESCRIPTIONS NEWS EXTACTION DATA
STOREAGE

5.1 ORGANIZATION OF NEWS


One potential method for combating the
dissemination of disinformation and fake news is through INDEXING
WORD
the utilization of natural language processing and blockchain EMBEDDING

techniques to identify and detect fake media. A viable


approach to this issue involves examining the organization
of news articles, encompassing elements such as the
headline, introduction, body, and conclusion. By scrutinizing NWES QUERY
the structure of news, it becomes feasible to uncover
discernible patterns that could potentially signify the DATA TRAINED
MACHINE
existence of fake media. Natural language processing, a SOURCE
SEGMENTATION
LEARNING

subfield of artificial intelligence that concentrates on the CLEANING


FEATURE
EXTACTION
interaction between computers and human language, can be
employed to analyse the content of news articles and identify
patterns that may indicate the presence of fake media. For
INDEXING REAL-FAKE
instance, NLP methodologies can be utilized to assess the WORD
EMBEDDING
SUSPICIOUS

language employed in news articles and identify


irregularities that may suggest the presence of fake media.
5.2 DATA AUTHENTICATION
Data authentication techniques can further enhance
the improvement of fake media detection based on natural
language processing and blockchain approaches. Ensuring
the legitimacy and integrity of the analysed information is Figure 1. Block diagram
crucial in detecting fake media. One effective method to 5.3 PROOF-OF-AUTHORITY (POA)

All Rights Reserved © 2024 IJARTET 4


ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 11, Issue 4, April 2024

A group of reliable validators are designated in a formal definition suggests that it is a set of tools used to
PoA framework to authenticate transactions on the block derive meaningful and useful information from natural
chain. These validators are typically reputable organizations language sources such as web pages and text documents. A
or individuals known for their honesty and integrity. Their user query is processed using NLP techniques in order to
responsibility is to validate the credibility of news articles generate a result page that a user can use. When we work
and add them to the block chain. PoA enables the creation of with a language, the terms, syntax, and semantics, are
a system that can detect fake media and is resilient to attacks frequently encountered. The syntax of a language refers to
from malicious actors. Since the validators are trustworthy the rules that control a valid sentence structure. For example,
and have a reputation to maintain, they are less likely to a common sentence structure in English starts with a subject
engage in fraudulent activities or collude with other followed by a verb and then an object such as "Tim hit the
validators to manipulate the system. Natural language ball". We are not used to unusual sentence order such as "Hit
processing techniques can be utilized to analyse the ball Tim". Although the rule of syntax for English is not as
language used in news articles and identify potential rigorous as that for computer languages, we still expect a
instances of fake media. The results of the analysis can then sentence to follow basic syntax rules. The semantics of a
be presented to the validators for verification. If the sentence is its meaning. As English speakers, we understand
validators confirm the legitimacy of the news article, it can the meaning of the sentence "Tim hit the ball". However,
be added to the block chain. Otherwise, it will be rejected. English and other natural languages can be ambiguous at
5.4 FAKE MEDIA times and a sentence's meaning may only be determined
The utilization of natural language processing and from its context. As various machine learning techniques can
block chain techniques can serve as a potent solution for be used to attempt to derive the meaning of text. Here we in
detecting and combating the proliferation of fake media. our Application we use Apache OpenNLP library.
Fake media pertains to news articles, images, or videos that B. Reinforcement Learning Model
are intentionally created to deceive or mislead the public. Some action is performed on There is a midway
Natural language processing methods can be employed to between supervised and unsupervised learning, which is
scrutinize the language used in news articles and identify known as reinforcement learning. The environment by the
potential instances of fake media. For instance, NLP can training network during this method then, the training Based
detect inconsistencies in the language used in a news article, on the response obtained, the action is graded by the system
such as a discrepancy between the headline and the body of as rewarding, i.e., good action, or punishing action, i.e., bad
the article. Additionally, NLP can analyse the sentiment of action. Network obtains a feedback reaction from it.
the article and identify any bias or misinformation. Block C.Block chain
chain technology can be utilized to establish a secure and Block chain technology has the implicit to reshape
tamper-proof system for storing and verifying news articles. and contribute to an enhanced quality of life. Block chain
Each news article can be assigned a unique digital signature technology has surfaced as a important force with the
that is stored on the block chain, making it effortless to eventuality to reshape traditional paradigms and
authenticate the article's legitimacy. significantly ameliorate the quality of life. This exploration
paper dives into the different practical applications of block
VI.ALGORITHM DETAILS chain. Block chain plays major role in cryptocurrencies.
A. Natural Language Processing (NLP) block chain boosts the sectors ranging from finance and
A formal definition of NLP frequently includes supply chain operation to healthcare and governance. By
wording to the effect that it is a field of study using implementing translucency, security, and data integrity,
computer science, artificial intelligence, and formal block chain is at the van of fostering trust in the digital age.
linguistics concepts to analyse natural language. A less This study examines both the openings and challenges that

All Rights Reserved © 2024 IJARTET 5


ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 11, Issue 4, April 2024

block chain presents in realworld , emphasizing its capacity The ratio of true positives to total (real) positives in the data
to revise the way we interact with the world around us is known as recall or sensitivity. Sensitivity and recall are
synonymous.
VII. RESULT ANALYSIS Recall = TP / (TP + FN)
Various metrics, such as precision, recall, and F1 The ratio of genuine negatives to total negatives in the data
score, can be used to evaluate the proposed system's is known as specificity. Specificity is the program's accurate
effectiveness in detecting false news. The ratio of true designation for everyone who is actually healthy.
positives to all actual positives is measured by recall, while Specificity = TN / (TN + FP)
the ratio of true positives to all predicted positives is The proposed system's performance can be evaluated by
measured by precision. A higher F1 score indicates better comparing its predictions to a labelled dataset of news
performance because it is a weighted average of precision articles that are true and false. The framework's expectations
and recall. can then be dissected to decide the accuracy, review, and F1
algorithm accuracy precision recall f1 score of the framework. In addition, the system's efficiency
score can be evaluated by comparing it to other cutting-edge false
NLP 89.67 88.78 86.18 87.46 news detection methods. The quality of the dataset, the
RL 93.75 92.86 94.67 93.76 performance of the NLP techniques used to pre-process the
data, the design of the RL agent, and the accuracy of the
block 94.43 92.68 94.18 93.43 block chain technology used to safeguard the data all affect
chain the proposed system's overall effectiveness in detecting false
news. In order to evaluate the system's efficacy and identify
Table 1. Comparison table areas for improvement, extensive testing and analysis are
required.

VIII. CONCLUSION
To summarize, the identification of false news is a
crucial undertaking in the present era, where the
dissemination of misinformation can yield severe
repercussions. The suggested approach to detect false news
involves the utilization of Natural Language Processing,
Reinforcement Learning, and blockchain technology, which
presents a promising resolution to this issue. By employing
NLP techniques to pre-process and extract features from
Figure 2. Comparison graph news articles, an RL agent can be trained to discern patterns
that differentiate between true and false news. Additionally,
One of the most widely used metrics for assessing the implementation of blockchain technology guarantees the
classification performance is accuracy, which is calculated integrity and authenticity of the analysed data, rendering it
as the ratio of correctly segmented samples to all samples. arduous for anyone to manipulate the data without detection.
Accuracy = TP/ (TP+ FN) In essence, this proposed system holds the potential to play a
Precision: The number of positive class predictions that truly pivotal role in curbing the propagation of false news and
belong to the positive class is quantified by precision, which fostering the dissemination of accurate information.
is estimated in the manner described below.
Precision = TP / (TP + FP)
IX.FUTURE WORK

All Rights Reserved © 2024 IJARTET 6


ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 11, Issue 4, April 2024

Further enhancements to the proposed system in the "FakeChain: A blockchain architecture to ensure trust
realm of false news detection could be pursued in future in social media networks" in Proc. Int. Conf. Qual. Inf.
research. The feature extraction process presents an Commun. Technol. Algarve, Portugal: Springer, 2019,
opportunity for potential improvement, as additional features pp. 105–118.
could be investigated to bolster the RL agent's capacity to
7. Y. Wang, W. Yang, F. Ma, J. Xu, B. Zhong, Q. Deng,
differentiate between true and false news. Additionally, the
and J. Gao published Weak supervision for fake news
utilization of advanced NLP techniques, including deep
detection via reinforcement learning in Proc. AAAI
learning models, holds promise for enhancing the system's
Conf. Artif. Intell., vol. 34, 2020, pp. 516–523.
overall performance.
8. Chokshi and R. Mathew's "Deep learning and natural
REFERENCES language processing for fake news detection: A
research." January 2021, SSRN Electronic Journal.
1. Augenstein, T. Rocktäschel, A. Vlachos, and K. [Online]. Available at papers.ssrn.com/sol3/papers.cfm
Bontcheva conducted a study on stance detection using with abstract id=3769884
bidirectional conditional encoding in their paper 9. J. A. Vijay, H. A. Basha, and J. A. Nehru, "A Dynamic
published in 2020 on arXiv:1606.05464. Technique for Identifying the False News Using
2. M. Taulé, M. A. Martí, F. M. Rangel, P. Rosso, C. Random Forest Classifier and NLP," Computational
Bosco, and V. Patti provided an overview of the task Methods and Data Engineering, 2021, Springer, pp.
on stance and gender detection in tweets related to 331-341.
Catalan independence at IberEval 2017. This work was 10. In a paired textual input schema, "Deep learning for
presented in the proceedings of the 2nd Workshop on fake news identification," By D. Mouratidis, M.
Evaluating Human Language Technologies for Iberian Nikiforos, and K. L. Kermanidis, published in
Languages (CEUR-WS), volume 1881, in 2017. Computation, vol. 9, no. 2, p. 20 in February 2021.
3. M. Lai, A. T. Cignarella, D. I. Hernández Farías, C.
Bosco, V. Patti, and P. Rosso focused on multilingual
stance detection in social media political debates. Their
research was published in the journal Computational
Speech and Language in September 2020, with the
article number 101075.
4. B. Riedel, I. Augenstein, G. P. Spithourakis, and S.
Riedel proposed a baseline approach for the Fake News
Challenge stance detection task. Their work, which
presented a simple yet effective method, was published
in May 2018.
5. C. Dulhanty, J. L. Deglint, I. B. Daya, and A. Wong
explored automatic disinformation assessment through
deep bidirectional transformer language models for
stance detection in their paper published in 2019.
6. S. Ochoa, G. D. Mello, L. A. Silva, A. J. Gomes, A. M.
R. Fernandes, and V. R. Q. Leithardt published

All Rights Reserved © 2024 IJARTET 7

You might also like