0% found this document useful (0 votes)
11 views12 pages

Practicum Report

The project report outlines the development of a sentiment analysis tool for Reddit, leveraging Python libraries such as PRAW and TextBlob to extract and analyze user comments. It addresses the challenges of analyzing large volumes of unstructured data on social media, aiming to provide insights into public sentiment across various topics. The tool is designed for ease of use, making it accessible for students, researchers, and organizations interested in understanding online discourse.

Uploaded by

nimesh.chn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views12 pages

Practicum Report

The project report outlines the development of a sentiment analysis tool for Reddit, leveraging Python libraries such as PRAW and TextBlob to extract and analyze user comments. It addresses the challenges of analyzing large volumes of unstructured data on social media, aiming to provide insights into public sentiment across various topics. The tool is designed for ease of use, making it accessible for students, researchers, and organizations interested in understanding online discourse.

Uploaded by

nimesh.chn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

A

PROJECT REPORT (Practicum)


On

SENTILYTICS FOR REDDIT


Submitted for partial fulfilment of award of the degree of

Bachelor of Technology
In
Artificial Intelligence and Machine Learning/
Artificial Intelligence and Data Science
Submitted by
Nimesh Chauhan – 00318011623

Under the Guidance of


Mr. Aman Gujjar
Assistant Professor

Department of Artificial Intelligence

DELHI TECHNICAL CAMPUS, GREATER NOIDA


(Affiliated Guru Gobind Singh Indraprastha University, New Delhi)
Session 2023-2024 (EVEN SEM)

1
INDEX

S.NO CONTENT PAGE NO.

1 Introduction 3

2 Problem statement 3

3 Objective 3

4 Feasibility 4

5 Need and Significance 5

6 Intended Use 5

7 Abbreviations and Acronyms 6

8 Literature review 7

9 Methodology 7

10 Hardware requirements 8

11 Software requirements 9

12 Screenshots and Interface demonstration 9

13 Results and Conclusion 11

14 Future work 12

15 Reference 12

2
1. Introduction
In the current digital age, the role of social media platforms has evolved far beyond just
communication. These platforms now serve as primary sources for gauging public opinion,
identifying emerging trends, and conducting research in various fields such as politics,
economics, marketing, and social sciences. Reddit, being one of the largest and most diverse
online forums, provides a rich dataset of user-generated content ranging from casual discussions
to deeply insightful conversations. Unlike platforms that prioritize brevity, Reddit encourages
long-form content which provides a more nuanced understanding of public sentiment.

The motivation behind this project stems from the desire to tap into this unstructured data pool
and extract meaningful sentiment-related insights. By leveraging Reddit’s open API and
combining it with the powerful natural language processing capabilities of Python libraries, this
project establishes a basic yet functional sentiment analysis pipeline. Using PRAW (Python
Reddit API Wrapper), we can access and extract data from Reddit with ease. Coupled with
TextBlob, a sentiment analysis library that performs lexicon-based analysis, the system is
capable of classifying the nature of Reddit comments as Positive, Negative, or Neutral. The
ultimate goal is to create an easy-to-use, extensible tool that can be beneficial for students,
developers, analysts, and researchers interested in understanding online discourse.

2. Problem Statement
The exponential growth of data on the internet, especially on community-driven platforms like
Reddit, has led to a vast pool of information that remains underutilized. While Reddit contains
extremely valuable insights due to its open and anonymous nature, it also presents a
challenge—its content is massive, unstructured, and highly diverse. Attempting to manually sift
through thousands of comments and discussions to derive public opinion is not only inefficient
but also prone to bias and inconsistency.

Current market solutions for sentiment analysis often target Twitter or Facebook data, which tend
to be brief and sometimes lack context. Reddit, on the other hand, has not been fully explored
due to the complexity of its comment threading and the richness of its content. Therefore, there
exists a pressing need for a system that can automatically fetch Reddit content, analyze it using
natural language processing techniques, and present the findings in a clear and concise manner.
The lack of such systems in educational and academic settings further highlights the necessity of
developing a functional, user-friendly Reddit sentiment analyzer.

3. Objectives
The overarching aim of this project is to develop a sentiment analysis tool that utilizes Reddit’s
vast database of user comments. The specific objectives of the project can be outlined as follows:

● To design and implement a Python-based tool that utilizes the Reddit API through the
PRAW library for extracting posts and comments from specific subreddits.

3
● To employ the TextBlob library for performing lexicon-based sentiment analysis,
enabling the classification of user comments into three primary sentiment categories:
Positive, Negative, and Neutral.
● To present the sentiment distribution in a visually appealing and easily understandable
manner using data visualization tools such as Matplotlib and Seaborn.
● To provide a modular structure that allows for future enhancements, such as the use of
more advanced NLP models like VADER or BERT, support for multilingual analysis, or
integration with web applications.
● To facilitate ease of use by minimizing dependencies and maximizing code readability
and documentation.

4. Feasibility Study
4.1 Technical Feasibility:

The project is technically feasible because it relies on well-established


open-source technologies that are readily available and extensively documented.
Python serves as the programming language due to its simplicity and the
abundance of libraries supporting web scraping, sentiment analysis, and data
visualization. PRAW provides a seamless interface to interact with Reddit’s data,
and TextBlob simplifies the sentiment analysis task with its built-in polarity
scoring system. These tools, along with visualization libraries like Matplotlib,
reduce the complexity involved in implementing the solution.

4.2 Operational Feasibility:

From an operational standpoint, the system is easy to set up and use. It requires
only basic Python programming knowledge and can be run on standard
computing systems. The user simply needs to configure API credentials and
specify a subreddit or keyword, after which the system autonomously fetches,
processes, and analyzes the data. Additionally, the output is displayed in visual
formats, making interpretation intuitive and accessible to even non-technical
users.

4.3 Economic Feasibility:

This project is economically viable because it does not involve any proprietary
tools or expensive hardware. All the libraries used are open-source, and the only
prerequisite is an internet connection for accessing the Reddit API. Since the
computational requirements are minimal, it can be executed on most personal
computers or laptops. This makes the tool highly accessible for students,
hobbyists, and small businesses.

4
5. Need and Significance
With the constant evolution of data-driven decision-making, the importance of understanding
public sentiment cannot be overstated. Businesses, governments, media organizations, and
academic researchers all rely on public opinion to shape their strategies and outputs. Reddit
serves as a treasure trove of candid and diverse user opinions that reflect a wide range of societal,
cultural, and economic issues. However, due to its sheer size and complexity, this data often goes
unanalyzed.

The proposed sentiment analyzer addresses this gap by providing an automated system for
mining Reddit content and interpreting sentiment. It is especially relevant for students and
beginners in data science and NLP, as it uses accessible tools that are easy to learn and
implement. The project stands out because it demonstrates how open-source software can be
combined in a meaningful way to create practical, real-world solutions. Its modular nature also
ensures that it can be adapted and scaled to meet more complex needs over time.

6. Intended User
This project is designed to be beneficial for a wide array of users, making it a versatile tool for
different fields and purposes:

6.1 Students and Learners:

For those who are just beginning to explore the field of Natural Language
Processing or sentiment analysis, this project serves as an ideal starting point. It
demonstrates the basic workflow of collecting, analyzing, and visualizing text
data.

6.2 Academic Researchers:

Scholars who are conducting studies in social sciences, psychology, marketing, or


political science can use this tool to gauge public sentiment on various topics
discussed in Reddit communities.

6.3 Software Developers and Data Analysts:

Developers can extend this system into more complex applications, while data
analysts can use it to quickly get a snapshot of sentiment trends in different
subreddits.

6.4 Organizations and Brands:

Companies looking to understand how their products or services are perceived in


online communities can use this as a cost-effective tool to perform initial
sentiment research.

5
7. Abbreviations and Acronyms

Abbreviation Full Form

NLP Natural Language Processing

API Application Programming Interface

PRAW Python Reddit API Wrapper

GUI Graphical User Interface

DFD Data Flow Diagram

ER Entity-Relationship

POS Part-of-Speech

IDE Integrated Development Environment

BERT Bidirectional Encoder Representations from


Transformers

6
8. Literature Review
Sentiment analysis is a subfield of Natural Language Processing that focuses on identifying and
extracting opinions from textual data. The evolution of sentiment analysis has seen the transition
from basic lexicon-based techniques to sophisticated machine learning and deep learning models.
Various studies have examined the effectiveness of sentiment analysis in platforms like Twitter,
Facebook, and product reviews. These studies typically involve classifying text into binary or
multi-class sentiment categories and have shown significant success in marketing, politics, and
brand management.

Reddit, however, has not been explored to the same extent despite being one of the most
content-rich platforms on the internet. Its structure allows for nested discussions and long-form
opinions, which provide better sentiment depth but also require more nuanced analysis. Tools
like TextBlob are commonly used in introductory and educational projects because they offer
built-in functions for sentiment scoring and part-of-speech tagging. When combined with PRAW,
which provides a high-level interface for interacting with Reddit’s API, the duo presents an
effective combination for performing basic sentiment analysis.

This project builds upon previous academic and practical efforts by focusing on an underutilized
data source and demonstrating how even simple NLP techniques can yield meaningful insights
from complex social data.

9. Methodology in Brief
The development methodology of this Reddit Sentiment Analyzer can be broken down into
multiple structured steps, each contributing to the overall objective of extracting and classifying
sentiment data from Reddit:

9.1 Reddit Authentication and Data Retrieval:

○ The first step involves configuring access to Reddit’s API using the PRAW
library. API credentials such as client_id, client_secret, and
user_agent must be set up to authenticate the application.
○ Once authenticated, the system allows the user to input a subreddit name and
define the number of posts or comments to analyze. This makes the tool flexible
and user-driven.

9.2 Text Preprocessing and Sentiment Analysis:

○ After collecting the data, the comments are cleaned to remove unwanted
characters, URLs, punctuation, and other noise.
○ Each cleaned comment is passed through TextBlob to compute its sentiment
polarity, which ranges from -1.0 (most negative) to +1.0 (most positive).

7
○ A classification algorithm is applied where polarity scores are segmented into
three categories: Positive, Negative, and Neutral.

9.3 Result Aggregation and Visualization:

○ The results are aggregated to count the frequency of each sentiment type.
○ Visualization is done using libraries like Matplotlib and Seaborn, providing pie
charts and bar graphs to clearly show the sentiment distribution.
○ This visual representation helps in making quick and effective interpretations.

9.4 Logging and Export (Optional):

○ The sentiment results can optionally be stored in CSV files for record-keeping,
further analysis, or integration with other systems.

10. Hardware Requirements

Component Specification

Processor Intel Core i3 or higher

RAM Minimum 4 GB

Storage At least 100 MB free space

Operating System Windows / Linux / macOS

Internet Required for API access

Display Minimum 720p resolution

8
11. Software Requirements
Front-End

● Optional use of Streamlit for browser-based interface

Back-End

● Python 3.x
● Required Python Libraries:
○ PRAW: For accessing Reddit’s API
○ TextBlob: For sentiment analysis
○ Pandas: For data handling and manipulation
○ Matplotlib / Seaborn: For visualization
● IDE: Visual Studio Code / PyCharm / Jupyter Notebook
● OS Compatibility: Works on Windows, macOS, and most Linux distributions

12. Screenshots and Interface Demonstration


This section showcases the operational aspect of the Reddit Sentiment Analyzer project, offering
a visual demonstration of how the application functions in real-time. The following screenshots
illustrate the usage of the Streamlit command-line interface to initiate the application, as well as
the visual representation of the output displayed on the user interface.

Streamlit command:

(Figure 1) Below is a screenshot of the terminal window where the Streamlit application is
launched using the command:

(Figure 1)

This command initializes the Streamlit server and opens the application in the default web
browser, enabling users to input subreddit names and visualize sentiment analysis results in an
interactive format.

9
Application Interface:

(Figure 2) This screenshot displays the main interface of the Reddit Sentiment Analyzer
application. It includes a text input for entering the subreddit, a button to initiate sentiment
analysis, and sections for displaying the sentiment distribution graph, data summary, and user
instructions.

(Figure 2)

Reddit page:

(Figure 3) Another screenshot shows a sample analysis for a popular subreddit.

(Figure 3)

10
Sentiment analysis:

(Figure 4) The sentiment histogram categorizes comments into Positive, Negative, and Neutral
based on polarity scores calculated using TextBlob. Additional statistics, such as total comments
analyzed and percentage breakdown, are also presented to offer meaningful insights.

(Figure 4)

13. Results and Conclusion


The Reddit Sentiment Analyzer successfully integrates Reddit's API with Python’s TextBlob to
perform real-time sentiment analysis. Upon inputting a subreddit, the tool fetches a collection of
user comments, analyzes each one’s sentiment polarity, and visually presents the results through
graphs. The system effectively distinguishes between positive, negative, and neutral comments,
and provides a clear statistical summary.

The results indicate that sentiment analysis can be a powerful tool in understanding public
opinion on various topics. Our analysis on multiple subreddits revealed diverse emotional
responses based on the theme of discussion. For instance, subreddits focusing on personal
growth and support tend to have more positive sentiments, while those discussing controversial
issues show a greater mix of sentiment polarity.

In conclusion, the project meets its objectives of real-time data extraction, efficient sentiment
classification, and user-friendly visualization. It serves as a functional prototype for larger-scale

11
data sentiment systems, proving the potential of combining natural language processing and
social media analytics.

14. Future Work


While the current system is effective, there are several avenues for future development. First,
integrating advanced NLP models like VADER or BERT can increase sentiment classification
accuracy, especially for complex sentence structures and sarcasm.

Second, implementing a real-time dashboard with continuous subreddit monitoring could


provide ongoing sentiment trends, which would be useful for researchers and businesses.

Third, adding features like topic modeling or word cloud generation can give users deeper
insight into what themes are trending in particular communities.

Lastly, expanding the platform to support multilingual sentiment analysis would make it more
globally accessible.

15. References
[1] Loria, S. (2018). TextBlob: Simplified Text Processing.

[2] Reddit API Documentation.

[3] Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O'Reilly
Media.

[4] Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers.

[5] Pang, B., & Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends
in Information Retrieval, 2(1–2), 1–135.

12

You might also like