0% found this document useful (0 votes)
7 views12 pages

Deepfake Video Detection Research Paper

The document presents a hybrid ResNeXt CNN-LSTM model for detecting deepfakes, achieving a high accuracy of 96.5%. It addresses the limitations of existing methods by effectively combining spatial and temporal features for robust video authentication. The research highlights the importance of deepfake detection in various fields, including journalism, cybersecurity, and digital forensics.

Uploaded by

joshivedika10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

Deepfake Video Detection Research Paper

The document presents a hybrid ResNeXt CNN-LSTM model for detecting deepfakes, achieving a high accuracy of 96.5%. It addresses the limitations of existing methods by effectively combining spatial and temporal features for robust video authentication. The research highlights the importance of deepfake detection in various fields, including journalism, cybersecurity, and digital forensics.

Uploaded by

joshivedika10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Unmasking Deepfakes: A Hybrid ResNeXt

CNN-LSTM Approach for Robust Video


Authentication

Authors : Affiliation:
Deepak Patel Department of Computer Science and
Aman Patel Information Technology
Aditya Choudhary Acropolis Institute of Technology and
Prof. Nidhi Nigam Research, Indore, India
Prof. Shruti Lashkari Affiliated to : Rajiv Gandhi
Proudyogiki Vishwavidyalaya (RGPV)
Introduction
What are Deepfakes?
Deepfakes are videos or audio clips generated using AI that look and sound real, but are completely fake.
They use deep learning to swap faces, mimic voices, and create realistic yet false content.

Why are Deepfakes a Concern?


• They can be used to spread fake news and misinformation.
• Pose threats to identity security and public trust.
• Can damage reputations and be used for fraud or cybercrime.

Why This Research is Important


As deepfakes become more advanced, traditional detection methods (like spotting visual glitches) no
longer work well. We need powerful AI models that can analyze both the visuals and timing (sequence of
frames) in videos to catch these fakes accurately.
Research Problem & Significance
Research Problem :
Can we build a real-time, accurate, and scalable deepfake video detection system that uses both spatial (image-based) and temporal
(sequence-based) features .

Why This Matters :

Deepfake detection has real-world impact in:


• Journalism & Media: Detect fake news content

• Cybersecurity: Ensure authenticity of video communication

• Digital Forensics: Help in legal investigations

• Social Media: Flag manipulated videos to protect users

Our Aim :
To create a deep learning model that is both robust and generalizable, capable of detecting various deepfake techniques with
high accuracy.
Literature Review & Research Gap
Existing Approaches
• CNNs detect pixel-level anomalies in single frames.
• LSTMs capture sequence patterns across video frames.
• Transformer models use attention for temporal analysis.

Limitations of These Methods


• CNNs miss subtle manipulations in smooth fakes.
• LSTMs can be heavy on computation for high-res videos.
• Most models don’t generalize well across datasets.
• Adversarial deepfakes often bypass these models.

Our Research Gap :


No existing model combines both spatial and temporal features effectively while maintaining high accuracy and efficiency.
Our Solution :
We propose a hybrid ResNeXt CNN + LSTM model that improves detection performance across various datasets and deepfake types.
Proposed Method
Our Hybrid Deepfake Detection Model
We combine the strengths of two powerful deep learning techniques:
1. ResNeXt CNN
• Captures fine-grained spatial features from each video frame

• More powerful than traditional CNNs due to grouped convolutions

2. LSTM (Long Short-Term Memory)


• Analyzes the temporal sequence of frames

• Learns patterns and irregularities over time

🎯 Final Output:
• A classification layer predicts whether the video is real or fake
Datasets Used
FaceForensics++
• Real and manipulated videos using face-swapping techniques

• Commonly used benchmark for deepfake detection

Deepfake Detection Challenge (DFDC)


• Large-scale dataset by Meta (Facebook)

• Includes diverse actors, backgrounds, and manipulation methods

Celeb-DF
• High-quality deepfake videos of celebrities

• Designed to challenge models with realistic fakes


Preprocessing Pipeline
Before training, we prepared the data with several essential preprocessing steps:
1 Frame Extraction
1️⃣
• Videos are broken down into individual frames.

2️⃣Face Detection & Cropping


• Facial regions are detected using OpenCV or Dlib.

• Only faces are cropped for focused analysis.

3️⃣Normalization & Resizing


• Cropped faces are resized to 224x224 pixels.

• Normalized to ensure uniform input.

4️⃣Sequence Structuring
• Frames grouped into sequences (e.g., 10 frames each).

• Helps LSTM model learn temporal patterns.


Model Architecture
11️⃣ ResNeXt CNN
• Extracts spatial features from each video frame.

• Uses grouped convolutions for enhanced feature extraction.

• More efficient and accurate in detecting fine-grained details.

2️⃣ LSTM
• Analyzes temporal information from the sequence of frames.

• Captures time-based patterns and changes between frames.

3️⃣ Classification Layer


• Output: Predicts whether the video is real or fake.

• Uses the features from both ResNeXt and LSTM to make accurate predictions.
Results & Analysis
Model Performance:
Accuracy: 96.5%

Evaluation Metrics:
• Precision: 94.2%
• Measures the accuracy of positive predictions.

• Recall: 97.8%
• Measures the ability to correctly identify all positive cases.

• F1-Score: 96.0%
• Harmonic mean of precision and recall, balancing both metrics.
Limitations & Solutions
• Overfitting:
• The model may overfit to specific features in the training data.

• Computation Cost:
• High computational resources required for training and inference.

• Low-quality Video Handling:


• Struggles with videos that have low resolution or significant noise.
Conclusion
Key Findings:
• Our hybrid ResNeXt CNN-LSTM model achieves high accuracy (96.5%) in detecting deepfakes.

• It significantly outperforms existing models in handling both spatial and temporal features.

Real-World Applicability:
• The model is highly suitable for use in journalism, cybersecurity, digital forensics, and social media
platforms to detect and prevent the spread of deepfakes.
Future Work
• Audio Deepfake Detection:

• Extend the model to include audio-based deepfake detection for multimedia integrity.

• Mobile Deployment:

• Develop lightweight versions of the model for real-time mobile deployment.

• Adversarial Robustness:

• Enhance the model’s robustness against adversarial attacks on deepfake generation.

• Livestream Detection:

• Explore techniques for detecting deepfakes in real-time live streams.

You might also like