0% found this document useful (0 votes)
12 views30 pages

Fact Hunt

The document presents a project titled 'FACTHUNT,' which aims to develop an AI-powered system for detecting fake news on social media using a fusion of linguistic analysis and knowledge-based verification techniques. It integrates a Passive Aggressive Classifier for linguistic analysis and SBERT with XGBoost for knowledge verification, leveraging the Google Fact-Checking API for enhanced accuracy and real-time results. The proposed system addresses limitations of existing methods by providing a user-friendly interface and optimizing model performance through hyperparameter tuning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views30 pages

Fact Hunt

The document presents a project titled 'FACTHUNT,' which aims to develop an AI-powered system for detecting fake news on social media using a fusion of linguistic analysis and knowledge-based verification techniques. It integrates a Passive Aggressive Classifier for linguistic analysis and SBERT with XGBoost for knowledge verification, leveraging the Google Fact-Checking API for enhanced accuracy and real-time results. The proposed system addresses limitations of existing methods by providing a user-friendly interface and optimizing model performance through hyperparameter tuning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

FACTHUNT - A Fusion Model for

Detecting Fake News on Social Media

TeamMembers Supervisor
Babuji V -111721203008 Dr. A. Anna Lakshmi M. E., Ph. D.(Associate Professor)
Kesava Reddy B -111721203010
Dhanajayan T -111721203016
Harish V -111721203022
INTRODUCTION

Social media has become a primary news source, leading to the rapid
spread of misinformation and fake news.

Traditional rule-based methods struggle with evolving deceptive tactics


used in fake news propagation.

A fusion of linguistic and knowledge-based analysis enhances accuracy


and robustness in detecting fake news.

This project integrates Passive Aggressive Classifier (PAC) for linguistic


analysis and SBERT + XGBoost for knowledge-based verification,
leveraging Google Fact-Checking API.
ABSTRACT

To develop an AI-powered fake news detection system using a fusion


of machine learning and deep learning techniques.

Combining linguistic-based classification (PAC) and knowledge-based


verification (SBERT + XGBoost) for improved accuracy.

Hyperparameter tuning is applied to maximize model performance and


minimize false positives/negatives.

The system provides a reliable way to validate news articles, helping


curb misinformation on social media platforms.
LITERATURE REVIEW

1.Hybrid Fake News Detection Using Linguistic Features(2020)


Proposed a hybrid fake news detection model using linguistic features, enhancing accuracy by
analyzing both content and social interactions.

2.Fake News Detection Using Deep Learning and Semantic Analysis (2020)
Introduced a machine learning-based approach that used NLP and semantic analysis for
detecting fake news. Their system relied on word embeddings to capture the semantic context.

3.Leveraging Knowledge Graphs for Fake News Verification (2021)


Focused on leveraging knowledge graphs for fake news verification. The proposed method
incorporated external sources to cross-check the authenticity of news articles.

4.Sentiment Analysis and Fake News Detection (2021)


Developed a fake news detection framework using sentiment analysis and NLP. They
demonstrated that sentiment polarity could significantly impact fake news identification.
LITERATURE REVIEW

5.Multimodal Fake News Detection Using Textual and Visual Features (2022)
Combined textual features, visual based verification for fake news detection. The hybrid model
outperformed traditional models in both accuracy and recall.

6. Semantic Analysis and Fact-Checking for Fake News Detection (2022)


Proposed a model using a combination of semantic analysis and fact-checking data to detect fake
news. Their method focused on content validation through external sources.

7. Multi-Phase Fake News Detection Using Linguistic Features and Domain-Specific Knowledge (2023)
Introduced a multi-phase model combining linguistic features with domain-specific knowledge for
detecting fake news. Their approach showed significant improvements in both precision and recall.

8. Hybrid Fake News Detection System Using Machine Learning and Expert Knowledge (2023)
Designed a hybrid fake news detection system that combined machine learning algorithms with
expert knowledge. The model performed well in real-time detection applications.
LITERATURE REVIEW

9. Fake News Detection with Semantic Analysis and Knowledge Graphs (2024)
Developed a fake news detection model using semantic analysis and knowledge graphs for
effective content and source validation.

10. Linguistic Features and Knowledge-Based Verification for Fake News Detection (2024)
Developed an ensemble learning approach that combined multiple models for detecting fake
news. The model utilized a mix of text features and metadata for reliable classification..

11. Deep Learning for Fake News Detection Using Linguistic Features (2024)
Introduced a hybrid model combining linguistic features and knowledge-based verification
for real-time fake news detection, achieving improved accuracy.

12. Ensemble Learning Approach for Fake News Detection (2021)


Developed a fake news detection framework using sentiment analysis and NLP. They
demonstrated that sentiment polarity could significantly impact fake news identification.
LITERATURE REVIEW

13. Hybrid System for Fake News Detection Using Text and Fact-Checking Databases (2021)
Proposed a hybrid system leveraging textual features and fact-checking databases to verify
article claims.

14. Cross-Referencing Claims with Trusted Databases for Fake News Detection (2022)
Developed a fake news detection system using a knowledge-based approach to cross-reference
claims with trusted databases, improving reliability.

15. Topic Modeling and Linguistic Features for Fake News Detection (2022)
Investigated using topic modeling and linguistic features to classify fake news. Their system
extracted thematic features and correlated them with factual databases.

16. Behavioral Analysis and Linguistic Features for Fake News Classification. (2023)
Applied a hybrid approach combining textual content and user behavior analysis, leveraging
social media patterns and linguistic inconsistencies to detect fake news.
LITERATURE REVIEW

17. Fake News Detection Using Deep Learning and Knowledge-Based Reasoning (2023)
Focused on a multi-layered detection model combining deep learning and knowledge-based
reasoning. The system utilized both syntactic features and external fact-checking systems.

18. Graph-Based Knowledge Models for Fake News Detection (2024)


Developed a graph-based system for fake news detection, integrating user behavior, network
features, and text analysis.

19. Integrating Semantic Analysis and Knowledge Graphs for Fake News Detection (2024)
Proposed an approach combining linguistic features and knowledge graphs to verify news
authenticity through content cross-checking.

20. Fake News Detection Using Contextual Word Embeddings and Knowledge Sources (2021)
Investigated fake news detection using semantic analysis alongside machine learning. Their
system utilized contextual word embeddings to enhance the detection process.
LITERATURE REVIEW

22. Hybrid Algorithms for Fake News Detection: Supervised and Unsupervised Learning (2022)
Built a fake news detection model based on hybrid algorithms that combined supervised and
unsupervised learning techniques.

23. Real-Time Fake News Detection Using User Data and Linguistic Features (2023)
Developed a real-time fake news detection framework that integrated user engagement data
with linguistic features.

24. Fake News Detection with Social Media Data and Knowledge-Based Features. (2024)
Proposed a system using a combination of textual analysis and knowledge graphs for fake news
detection.

25. Fake News Detection Using Textual and Knowledge-Based Features (2024)
Focused on a hybrid deep learning model that integrated both linguistic analysis and domain-
specific knowledge.
EXISTING SYSTEM

Manual Fact-Checking: Current fact-checking relies heavily on human verification,


which is time-consuming and inefficient for large-scale misinformation detection.

Rule-Based and Keyword Matching: Many systems use predefined keyword-based


detection methods, which fail against evolving fake news tactics and sophisticated
false narratives.

Single-Model Approach: Traditional machine learning models, like Naïve Bayes and
SVM, focus only on linguistic features, lacking external knowledge verification.

Limited Real-Time Verification: Most existing solutions do not provide real-time


analysis, making them ineffective for social media platforms where news spreads
rapidly.
PROPOSED SYSTEM

Linguistic Analysis Module: Uses the Passive Aggressive Classifier (PAC) to


classify news based on textual features and patterns.

Knowledge-Based Verification: Utilizes SBERT (Sentence-BERT) for semantic


similarity and XGBoost for final classification, referencing fact-checked sources.

Hyperparameter Tuning: Optimizes model parameters for higher accuracy,


reducing bias and improving generalization.

User-Friendly Web Interface: Users can input news, get real-time verification
results, and view fake/real percentages via a React-based UI.
ADVANTAGES

Hybrid Model Accuracy – Combines linguistic analysis (Passive Aggressive Classifier)


and knowledge-based verification (SBERT + XGBoost) for improved detection
accuracy.

Real-Time Verification – Processes news articles instantly, helping users identify


misinformation quickly.

Robust Against Evolving Fake News – Uses hyperparameter tuning to adapt to new
trends and manipulation techniques in fake news.

Reliable Knowledge Source – Leverages the Google Fact-Checking API to cross-


check news with verified sources.

User-Friendly Interface – Designed like a social media platform where users can post
news and view credibility scores intuitively.
MODULE DESCRIPTION

The following modules have been utilized in the project,

Data collection

Data preprocessing & Feature extraction

Linguistic based analysis

Hyper parameter Tuning

Knowledge based analysis


MODULE DESCRIPTION

1.DATA COLLECTION

The dataset for fake news detection is collected from various sources, including
online news portals, fact-checking websites, and public datasets like LIAR and
FakeNewsNet. Additionally, real-time data is fetched using the Google Fact-
Checking API to enhance the knowledge-based verification. The dataset
contains labeled news articles with categories such as fake or real, enabling
supervised learning.
MODULE DESCRIPTION

2.DATA PREPROCESSING & FEATURE EXTRACTION

Before training the model, the collected data undergoes preprocessing steps like
tokenization, stop-word removal, lemmatization, and vectorization (TF-IDF and word
embeddings). Feature extraction is performed to transform text into numerical
representations, ensuring effective input for machine learning models. Sentence
embeddings from SBERT are also used for semantic understanding in the knowledge-
based analysis.
MODULE DESCRIPTION

3.LINGUISTIC BASED ANALYSIS

A Passive Aggressive Classifier (PAC) is employed for linguistic analysis, which


is effective for binary classification tasks like fake news detection. It works well
in online learning settings, where new articles can be classified in real time. The
model is trained on extracted textual features to differentiate between fake and
real news based on writing patterns, word usage, and content structure.
MODULE DESCRIPTION

4.HYPERPARAMETER TUNING

To enhance model performance, hyperparameter tuning is applied using Grid


Search and Random Search techniques. Key parameters such as C
(regularization strength), max iterations, and loss functions are optimized for
PAC, while learning rate, number of estimators, and depth are fine-tuned for
XGBoost. This ensures improved accuracy, better generalization, and reduced
overfitting.
MODULE DESCRIPTION

5.KNOWLEDGE BASED ANALYSIS

In addition to linguistic analysis, knowledge-based verification is performed


using SBERT (Sentence-BERT) embeddings and XGBoost classification. The
input news article is compared with verified claims from fact-checking sources
like Google Fact-Checking API. If a similar fact-checked article exists, its
credibility is used to reinforce the model’s prediction, improving accuracy and
trustworthiness.
TECHNOLOGY UTILIZED

Python – Primary programming language for data processing and machine learning
model development.
scikit-learn – Used for machine learning algorithms, feature extraction, and model
evaluation.
Pandas – Library for data manipulation, cleaning, and processing.
NumPy – Utilized for numerical operations and handling large data arrays efficiently.
Matplotlib – Used for data visualization, including charts and performance metrics.
Flask – Framework for deploying the model as a web service.
ReactJS – Used for building the frontend to interact with the deployed model.
ARCHITECTURE DIAGRAM

Passive Aggressive
Classifier Model

News - Texual Dataset Data Processing Feature Extraction

FactChecking API XG Boost + SBERT


HyperParameter
Model
Tuning
Structured Meta Data

Predicted Output (Real/Fake)

FusionModel
(Linguistic+Knowledge)
News Input
BLOCK DIAGRAM
Data
Data Collection Linguistic Analysis
Preprocessing
(News Articles, & Feature
(Text Cleaning,
Social Media, Extraction (Train
Tokenization,
APIs like Google Classifier on
Removing
News Labeled Data)
Stopwords, etc.)

Knowledge-based
Fake/Real News Analysis (Google
Model Evaluation &
Detection (Real-time News API for
Testing (Use Model
Fake News Detection,| Contextual
for Final Decision)
Continuous Learning) |Understanding &
Prediction)
SCREENSHOTS

Fig.1.1 Data Analysis


SCREENSHOTS

Fig.1.2 Output (Real,Fake)


SCREENSHOTS

Fig.1.3 Postman API Response


CONCLUSION & FUTURE ENHANCEMENT

Combines linguistic analysis and knowledge-based verification for accurate fake news
detection.

Hyperparameter tuning optimizes model performance and minimizes misclassification.

Cross-checks news with trusted fact-checking sources for enhanced reliability.

Can be extended with deep learning, real-time tracking, and multilingual support.
REFERENCE

1.Gupta, P., & Agrawal, A. (2020). Hybrid Detection for Fake News using Linguistic and Social
Features. International Journal of Artificial Intelligence, 12(4), 345-359.

2. Sharma, V., & Bhatnagar, R. (2020). Deep Learning Approaches for Fake News Detection.
Journal of Computational Linguistics, 15(2), 78-92.

3. Kumar, R., & Verma, P. (2021). Leveraging Knowledge Graphs for Fake News Verification.
Knowledge Engineering Review, 34(1), 23-40.

4. Rath, S., & Sahoo, B. (2021). Sentiment Analysis for Fake News Detection. Journal of
Information Science, 28(3), 112-125.

5. Verma, A., & Patel, R. (2022). Multimodal Fake News Detection using Text and Visual
Content. International Journal of Data Science, 16(6), 199-212.
REFERENCE

6. Sarkar, S., & Bhattacharya, S. (2022). Semantic Verification and Fake News Detection. Journal
of Artificial Intelligence Research, 29(1), 88-104.

7. Mishra, S., & Yadav, A. (2023). Phase-Based Fake News Detection Model. Proceedings of the
International Conference on AI, 57-68.

8. Patil, R., & Deshmukh, V. (2023). Fake News Detection using Expert Knowledge. Journal of
Computational Intelligence, 40(5), 1503-1519.

9. Singh, J., & Kaur, M. (2024). Semantic Graphs for Fake News Detection in Social Media.
Social Media Mining Journal, 8(4), 99-112.

10. Jain, N., & Mehta, A. (2024). Linguistic Verification for Fake News Classification. Journal of
Natural Language Processing, 21(7), 345-359.
REFERENCE

11. Joshi, R., & Sharma, N. (2020). Textual Analysis Techniques for Fake News Detection.
International Journal of Text Mining, 32(2), 121-136.

12. Iyer, A., & Desai, D. (2021). Ensemble Learning for Fake News Classification. Machine
Learning and Applications, 13(9), 218-231.

13. Bansal, A., & Gupta, S. (2021). Fact-Checking Models for Fake News Detection. Journal of
Information Retrieval, 45(3), 87-102.

14. Khan, A., & Ahmed, N. (2022). Topic Modeling for Fake News Detection. Artificial
Intelligence Journal, 31(4), 54-67.

15. Patel, S., & Mehta, R. (2022). Fake News Detection using Behavioral Analysis.
Computational Social Science Review, 22(8), 50-63.
REFERENCE

16. Singh, K., & Soni, P. (2023). Knowledge Reasoning for Fake News Detection. Journal of
Knowledge Systems, 19(5), 173-189.

17. Vishwakarma, R., & Rathi, A. (2023). Fake News Detection using Graph Models.
Computational Intelligence and Applications, 34(2), 88-99.

18. Desai, M., & Rao, K. (2024). Cross-Referencing Claims for Fake News Verification. Journal
of Information Systems, 40(1), 111-124.

19. Choudhury, A., & Jain, M. (2024). Contextual Embeddings for Fake News Detection. Journal
of AI Research, 30(3), 142-157.

20. Nair, H., & Joshi, A. (2021). Hybrid Algorithms for Fake News Detection. Journal of
Machine Learning, 36(7), 199-212.
REFERENCE

21. Singh, P., & Sharma, H. (2022). Detecting Fake News Using Engagement Data. Journal of
Social Media Analytics, 29(4), 76-89.

22. Yadav, R., & Verma, S. (2023). Fake News Detection on Social Media. International Journal
of Social Media Research, 14(2), 201-215.

23. Reddy, P., & Das, D. (2024). Real-Time Fake News Detection Framework. Real-Time
Systems Journal, 15(3), 58-70.

24. Singh, P., & Iyer, V. (2024). Fake News Detection Using Textual Features. Journal of
Computational Linguistics, 19(5), 222-235.

25. Shukla, K., & Tiwari, A. (2020). Feature Extraction Techniques for Fake News Detection.
Data Science Review, 11(6), 45-59.

You might also like