0% found this document useful (0 votes)
44 views

Group File

Uploaded by

maheshsharmamn
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Group File

Uploaded by

maheshsharmamn
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

CHAT LENS

A Project report submitted


In partial fulfillment of the requirements for the degree of

Bachelor of Technology
in
Computer Science & Engineering
Name of the Student(s): Name of the Supervisor:
Mahesh Sharma Mrs. Nandini Sharma
2005110100061(CSE) Assistant Professor (CSE)
Gargi Teotia
2005110100036(CSE)
Anjali Gautam
2105111529002(CSE-AI)
Mukesh
2005110100070(CSE)

Department of Computer Science & Engineering

GL BAJAJ GROUP OF INSTITUTIONS, MATHURA


Approved by AICTE, CoA & Affiliated to Dr APJ AKTU, Lucknow
NH#2, Mathura-Delhi Road, PO-Akbarpur, Mathura-281406 (UP)

Session: 2023-2024

i
Declaration

I hereby certify that the work which is being presented in B.Tech. Project Report entitled

“Chat Lens”, as partial fulfillment of the requirement for the degree of Bachelor of

Technology in Computer Science and Engineering, submitted to the Department of

Computer Science and Engineering of GL BAJAJ Group of Institutions, NH#2- Mathura-

Delhi Road, PO- Akbarpur, Mathura-281001 (UP),

is an authentic record of our own work carried out during a period from 13/05/2024 to

25/05/2024 under the supervision of Mrs. Nandini Sharma, Assistant Professor in the

department of Computer Science and Engineering.

The matter presented in this project report in full or part, has not been submitted by us for

the award of any other degree elsewhere and is free from plagiarism.

Name of the Candidate(s):

Gargi Teotia

2005110100036

Mahesh Sharma

2005110100061

Anjali Gautam

2105111529002

Mukesh

2005110100070

ii
Certificate

This is to certify that the Project report entitled “Chat Lens” done by Mahesh

Sharma(2005110100061), Gargi Teotia(2005110100036), Anjali Gautam

(2105111529002) and Mukesh(2005110100070) is an original work carried out by

them in Department of Computer Science & Engineering, GL Bajaj Group of

Institutions, Mathura under my guidance. The matter embodied in this project work has

not been submitted earlier for the award of any degree or diploma to the best of my

knowledge and belief.

Date: 31 May 2024

Place: Mathura

Signature Signature

Mrs. Nandini Sharma Dr. Ramakant Baghel


Assistant Professor (CSE) Head of the Department (CSE)

iii
Acknowledgement

The merciful guidance bestowed to us by the almighty made us stick out this project to
a successful end. We humbly pray with sincere heart for his guidance to continue
forever.

We pay thanks to our project guide Mrs. Nandini Sharma who has given guidance
and light to us during this project. Her versatile knowledge has helped us in the critical
times during the span of this project.

We pay special thanks to our Head of Department Dr. Ramakant Baghel who has
been always present as a support and help us in all possible way during this project.

We also take this opportunity to express our gratitude to all those people who have
been directly and indirectly with us during the completion of the project.

We want to thank our friends who have always encouraged us during this project.

At the last but not least thanks to all the faculty of CSE and CSE AI department who
provided valuable suggestions during the period of project.

iv
Abstract

Chat Lens is a web-based service, which collects and analyzes chat histories of the

mobile messaging application WhatsApp and it can also analyze bank data to get

insights of that data. It leverages the e-mail export feature of WhatsApp to obtain the

chat histories, which cannot be accessed otherwise due to encrypted storage on the

mobile device and end-to-end encrypted transmission over the Internet.

Thus, the major asset of the service is that real communication data can be collected

without the bias introduced by observing or surveying participants.

The collected communication data can be analyzed and provides valuable insights into

the communication in WhatsApp and the resulting network traffic. To incentivize users

to send chat histories, the privacy of users is respected by anonymizing all

communication data.

Moreover, it provides valuable insights in various patterns such as: Which jobs types of

customer are likely to subscribe term deposit? Are Single or Married people more

likely to subscribe term deposit? Does education has any effect on subscription of term

deposit?

v
TABLE OF CONTENT

Declaration............................................................................................................(ii)
Certificate.............................................................................................................(iii)
Acknowledgement................................................................................................(iv)
Abstract..................................................................................................................(v)
Table of Content...................................................................................................(vi)
List of Figures....................................................................................................(viii)

Chapter 1. Introduction………………………………………………… 1
1.1 Preliminaries............................................................................. 1
1.2 Motivation ….…………......................................................... 2
1.3 Problem Statement……………………...…............................ 3
1.4 Aim and Objectives ………………………………….……… 4

Chapter 2. Literature Survey................................................................... 5


2.1 Introduction …………………………………………...……. 5
2.2 Existing System ........................................................................ 6
2.3 Research Gap……………….................................................... 7

Chapter 3. Proposed Methodology……………………………….……. 8


3.1 Problem Formulation ……………………………………. 8
3.2 System Analysis and esign ……………………………… 15

Chapter 4. Implementation…................................................................... 16
4.1 Introduction ………………………….…………………….... 16
4.2 Implementation Strategy ……………………………………. 16
4.3 Tools/Hardware/Software Requirements..………...………… 32
vi
4.4 Expected Outcome (Performance metrics with details) …...... 33

Chapter 5. Result & Discussion ……………........................................... 40

5.1. Result………………………….……………………………... 40
5.2 Discussion………………………….………………………... 41
Chapter 6. Conclusion & Future Scope.……………............................ 42
6.1 Conclusion…………………………………………………… 42
6.2 Future Scope ………………………………………………… 42
References 45

Appendix I: Plagiarism Report of Project Report (<=15%)

vii
LIST OF FIGURES

Figure No. Description Page No.

Figure 3.1.1 Use Case Model(social media chat) xviii

Figure 3.1.2 Use Case Model(Bank Data Analysis) xviii

Figure 3.2.1 Activity Diagram xix

Figure 3.2.2 Activity Diagram(social media chat) xx

Figure 3.2.3 Activity Diagram(Employee Data) xx

Figure 4.1 Sequence Diagram xxiv

Figure 4.2 State Diagram xxvi

Figure 4.3 Collaboration Diagram xxvi

Figure 4.4 Export Chat Screenshot xxxvi

Figure 4.5 Daily Activity Timeline xxxvi

Figure 4.6 Monthly Activity Timeline xxxvii

Figure 4.7 Emoji Analysis xxxvii

Figure 4.8 Most Common Words xxxviii

Figure 4.9 Word Cloud xxxviii

Figure 4.10 Weekly Activity Map xxxix

Figure 4.11 Most Busy Users xxxix

Figure 4.12 Sentiment Analysis xl


viii
Figure 4.13 Top Statistics xl

Figure 4.14 Analysis 1 xli

Figure 4.15 Analysis 2 xli

Figure 4.16 Analysis 3 xlii

Chapter 1
INTRODUCTION

1.1 PRELIMINARIES
One of the most generally utilized informing applications overall is
WhatsApp. Group chats have become an essential tool for
communication, with people using them for personal, educational, and
business purposes. The amount of data generated from these group chats
and the data produced in a bank can be overwhelming, making it difficult
to extract meaningful insights and patterns. To overcome this challenge,
we have created “Chat Lens” that use data processing to extract valuable
information from these conversations and data sheets. These tools can
provide insights on topics discussed, frequently used keywords, which
person is likely to subscribe to term deposit or less likely to do so and the
sentiment of messages exchanged. The Chat Lens can be useful in
various domains, such as education, business, and social settings. In
education, instructors can analyze student group chats to identify topics
of interest and monitor engagement. In business, managers can analyze
group chats to identify areas of improvement and evaluate team
communication. In social settings, individuals can use the tool to analyze
their chat history and gain insights into their communication patterns.
This paper presents a comprehensive overview of the Chat Lens and its
applications. It provides an in-depth analysis of WhatsApp chat and Bank
Data used to extract insights and the challenges associated with analyzing
WhatsApp group chats and data related to a bank’s employee.
ix
1.2 MOTIVATION
The development of a Chat Lens stems from a profound motivation
rooted in the recognition of the significance of communication and data
in both personal and professional spheres. Understanding communication
patterns, sentiments, and collaboration dynamics is crucial for personal
development, team efficiency, and overall well-being. The motivations
behind this project can be categorized as follows:

1.2.1. Insights into Communication Dynamics


Analyzing social App chats and data sheets provides an
opportunity for users to gain valuable insights into their
communication patterns and financial patterns. This includes
aspects such as response times, active hours, and the frequency
of interactions, more likely or less likely to subscribe to term
deposit. By deciphering these patterns, users can make
informed decisions to enhance their strategies.

1.2.2. Sentiment Monitoring


The capability to monitor sentiment trends within group chats
or personal conversations is valuable. Recognizing shifts in
sentiment over time allows users to gauge the emotional tone
of their communications and adapt their messaging
accordingly.

1.2.3. Personal Data Visualization


The project is motivated by the desire to transform raw chat
data into visually engaging representations. Visualizations
enhance the interpretability of communication habits, making
it easier for users to grasp and act upon the insights derived
from their own data.

1.2.4. Educational and Skill Development


x
Building a Chat Lens serves as an educational endeavor,
providing a practical application for developing skills in data
analysis, natural language processing, and data visualization.
The project aligns with the objective of empowering
individuals to navigate and interpret data effectively.

1.2.5. Privacy and Security Consciousness


Privacy-conscious users seek alternatives to external services
for analyzing their chat data. The project addresses this
motivation by enabling users to perform analyses locally,
ensuring control over the handling and storage of sensitive
communication data.

In summary, the motivations behind the development of Chat Lens are


diverse, encompassing personal development, team collaboration
enhancement, and the acquisition of valuable skills. The project serves as
a tool for users to gain deeper insights into their communication and
financial habits, fostering a data-driven approach to improve and
optimize interpersonal interactions.

1.3 PROJECT OVERVIEW/SPECIFICATIONS


The Chat Lens project extracts and analyzes user chat data to provide
insights into communication habits. By focusing on sentiment analysis,
user engagement metrics, and keyword extraction, the tool aims to
enhance user understanding of historical chat and data patterns. The
project prioritizes privacy by operating locally, and its user-friendly
interface includes visualizations for intuitive data interpretation.

The author aim to develop a complete interface where users have the
option to select whether to analyze WhatsApp chat or whether to analyze
the bank data. Upon selecting between the two, the user can upload the
WhatsApp chat in text format by exporting the chat from WhatsApp or
the bank’s csv file. In case when user selects ‘social media chat
analyzer’ , it will provide users with two options to study the chats. On
submitting the chat, the engine will display the complete report with
interactive graphs, which is easy to understand. In case the user selects
xi
‘Bank Data Analysis’ the user can get an in-depth idea of how many
people (educated or not, married or single ,etc) are more likely or less
likely to subscribe to term deposit. The report we want to display will
include the following analysis from the chat we need to showcase.

1. Top Statistics
2. Activity timelines and Maps
3. Word Cloud
4. Most Common words
5. Emoji Analysis
6. Sentiment Analysis
7. Does education has any effect on subscription of term deposit?
8. Are Single or Married people more likely to subscribe term deposit?
9. Which jobs types of customer are likely to subscribe term deposit?

1.4 AIM AND OBJECTIVES

1.4.1 AIM
The primary aim of Chat Lens is to empower users with a sophisticated
tool that not only extracts and organizes chat data and
employee’s data but goes further to offer meaningful insights
into their communication and financial habits. By leveraging data
analytics and natural language processing, the analyzer seeks to
provide users with a deeper understanding of the emotional
nuances, engagement dynamics, and prevalent topics within their
conversations.

1.4.2 OBJECTIVE
In this decade the upcoming technologies are mainly dependent on
data. This data can only be obtained if there is some research
applied on the context of the requirements of the tool. Since a lot of
machine learning enthusiasts develop models which helps solve
multiple problems the requirements of appropriate data are very
large scale this project aims to provide a better understanding
towards various types of chats. This analysis proves to be better
input to machine learning models which essentially explore the chat
xii
data. These models require proper learning instances which
provides better accuracy for these models .Our project ensures to
provide an in-depth exploratory data analysis on various types of
chats.

Chapter 2
LITERATURE SURVEY

2.1 INTRODUCTION
A survey analysis on WhatsApp Chat Exploratory Data Analysis [1] has
been conducted and various studies and analysis have been found. These
studies include WhatsApp has been the most used mode of
communication and has been an efficient one too. It consists of many
conversations in groups and individuals. So, there might be some hidden
facts in them. This project takes those chats and provide a deep analysis
of that data. Being any topic, the chats are it provide the analysis in an
efficient and accurate way.

Another survey on WhatsApp Chat Analyzer [2] said the most used and
efficient method of communication in recent times is an application
called WhatsApp. WhatsApp chats consists of various kinds of
conversations held among group of people. This chat consists of various
topics. This information can provide lots of data for latest technologies
such as machine learning.

A survey regarding how the application is growing in schools as well has


been laid down in WhatsApp Goes To School [6]. It turns out that class
WhatsApp groups are used for four main purposes: communicating with
students; nurturing the social atmosphere; creating dialogue and
encouraging sharing among students; and as a learning platform. The
participants mentioned the technical advantages of WhatsApp, such as
simple operation, low cost, availability, and immediacy.
xiii
Python is a general-purpose language. It has an easily understandable syntax.

Python is an effective and powerful language, which gives the knowledge


to programmer to transfer their skill and can be used in scientific research
in theoretical calculations and data analyze.[4] It is statistics oriented and
it has specific advantages such as great features for data visualization.

Chinthapanti Bharath Sai Reddy, along with others, in his research paper
Analysing and Predicting the Emotion of WhatsApp Chats [3] said that
everyone has the curiosity of what other person thinks about the other
while having a conversation, judging the other person can’t be done
perfectly, So this paper is providing a way using sentiment analysis
between conversation
.While chatting with other person we always have a question about our
image on the other persons mind. This process deals with preprocessing
the data obtained from the WhatsApp chat which is exported to a server
and then sentiment analysis is applied for each message and all of the
messages’ sentiment is normalized from a proposed method and overall
sentiment is found out.

2.2 EXISTING SYSTEM


There is a lot of development in the current system. In the older version
there was no feature to display status, there was no feature to share
documents and there was no feature to share location. In the current
version, all of these features are available. In older version we couldn’t
share images through doc’s format. In this system user is able to access
WhatsApp in windows through WhatsApp web application, which can
be connected through QR code. There is another feature called export
chat where user can send or share or get the chat detail for data
analysis through email, Facebook or some messenger application.

2.3 BENEFITS OF THE PROJECT


Chat Analysis on WhatsApp using Machine Learning [5] analyzed
various benefits of this project. This tool aims to offer a thorough study
of the information that WhatsAppxiv
provides. Regardless of the subject
around which the conversation is centered, our generated code may be
used to improve comprehension of the data. This project not only analyze
WhatsApp chat but also provides insight on the data collected from a
bank about its employee and provided insight on subscription of term
deposit. The benefit of this tool is that is implemented using simple
python modules such as pandas, matplotlib, seaborn and sentiment
Analysis that are used to produce data frames and plot various graphs.
Because this approach is effective and resource-conserving, it can be
readily applied to the largest dataset.

xv
Chapter 3
PROPOSED METHODOLOGY

3.1 PROBLEM FOUNDATION


With the surge in the use of Social Media, users face difficulties in
making sense of the vast amounts of data generated through their chats.
Manual analysis is time-consuming and often impractical, leading to
missed opportunities for understanding user behavior, sentiment, and
communication patterns. Existing tools and methods for WhatsApp chat
analysis and Employee’s data analysis are often limited in their
capabilities, lacking advanced features and user-friendly interfaces. There
is a clear need for an innovative solution that can automate the analysis
process, providing users with actionable insights from their WhatsApp
conversations and financial insights.

The inability to effectively analyze chat data hampers individuals and


businesses from harnessing the full potential of their communication
history. This has implications for personal users seeking to understand
their messaging patterns or sentiments over time, as well as businesses
aiming to derive actionable intelligence for customer engagement,
marketing, and decision- making. By addressing this issue, Chat Lens
aims to empower users with the ability to gain valuable insights, enhance
communication strategies, and improve overall user experience.

This project addresses a pressing need for a comprehensive and user-


friendly tool to analyze chats and data. By automating the analysis
process and integrating advanced analytics, the project aims to empower
users with actionable insights, contributing to improved communication
strategies.

xvi
3.2 SYSTEM ANALYSIS AND DESIGN

3.2.1 SYSTEM ANALYSIS


3.2.1.1 FEASIBILITY STUDY
The main objective of the feasibility study is to treat
the technical operational and economic of developing
the application. Feasibility is the determination of
whether or not project is worth doing. The process
followed in making this determination is called
feasibility study. All systems are feasible, given
unlimited resources and infinite time. The feasibility
study to be conducted for this project involves:
 Technical Feasibility
 Operational Feasibility
 Economic Feasibility

The Technical feasibility study reports whether there


exists correct required resources and technologies
which will be used for project development. It is the
measure of the specific technical solution and the
availability of the technical resources and expertise.
In our project we will be using Jupyter notebook(web
based application)and VS code(text editor), both of
them are open source softwares. Along with these
various python libraries will also be used. Cost and
benefit of the project is analyzed in economic
feasibility, that means what will be the cost of final
development of the product. This project has no cost
in development since all the software and
technologies used are open source.

It is to determine whether the system will be used


after the development and implementation. In
Operational Feasibility degree of providing service to
requirements is analyzed. This involves the study of
utilization and performance
xvii of the product. Our
project shows the whole analysis of the chats and
data among people. It can be two people or a group
of people and provides various information using
charts in easily readable format.

3.2.2 SYSTEM DESIGN

3.2.2.1 USE CASE MODEL


 In the use case diagram the actor is User.
 Users can make use of chat upload use cases to give
input to the system.
 Select time format use case describes that user can
input the time format of the file in the system.
 Select user use case is to select whose analysis
result is desired.
 Users can make use of Show analysis use cases to
see the result of the entire analysis done by the
system.

Fig.3.1.1 Use Case Model(social media chat)

Fig. 3.1.2 Use Case Model(Bank Data Analysis)

xvii
i
3.2.2.2 ACTIVITY DIAGRAM
 In the activity diagram as the initial activity starts user
will upload the file as input which is action and in the
next action time format will be selected.
 The decision box check chat format represents the
validity of the time format of the file.
 If the time format is correct then analysis will be done
and process will end.
 If the time format is wrong user will have to again
check for the correct format.

Fig.3.2.1 Activity Diagram

xix
Fig. 3.2.2 Activity Diagram(Social Media Chat)

Fig.3.2.3 Activity Diagram(Employee Data)

3.2.3 PROPOSED WORK


Data pre-processing, the initial part of the project is to understand
implementation and usage of various python-built modules. The above
process helps us to understand why different modules are helpful rather
xx
than implementing those functions from scratch by the developer. These
various modules provide better code representation and user
understandability. The following libraries are used such as streamlit ,
preprocessor , helper, numpy, pandas, matplotlib, NLTK, seaborn etc.

Exploratory data analysis, first step in this to apply a sentiment analysis


algorithm which provides positives negative and neutral part of the
chat and is used to plot pie chart based on these parameters. To plot a
line graph which shows author and message count of each date, to plot
a line graph which shows author and message count of each author,
Ordered graph of date vs message count, media sent by authors and
their count.

Chapter 4
IMPLEMENTATION

4.1 INTRODUCTION
This project is a social media chat analyzer built with Python and Streamlit.
The application provides various analyses on a chat log, including top
statistics, activity timelines, activity maps, word cloud, most common
words, emoji analysis, and sentiment analysis. The analysis can be done for
a specific user or for the overall chat.

4.2 IMPLEMENTATION STRATEGY

4.2.1 SEQUENCE DIAGRAM


Actors:
 User
 Front-end
 Server

Steps: xxi
4.2.1.1. User Initiates Chat Upload:
 The user interacts with the front-end interface to upload a
chat file or csv file accordingly.

4.2.1.2. Front-end Validates Time Format and file format:


 Upon upload, the front-end checks the time format of the
uploaded chat data to ensure it matches the time format
selected by the user and also checks whether the file uploaded
is .csv file. If the formats do not match:
 The front-end displays an error message.

 The sequence terminates, awaiting user action to correct the error.

4.2.1.3. Front-end Sends Data to Server:


 If the time format matches, the front-end sends the chat or to
the server for analysis.

4.2.1.4. Server Receives Data:


 The server receives the uploaded chat or data and the
selected time format from the front-end.

4.2.1.5. Server Performs Analysis:


 The server analyzes the chat or data based on the specified
time format, performing various operations such as
sentiment analysis, word frequency analysis, or emoji
analysis, term deposit analysis.

4.2.1.6. Server Generates Analysis Results:


 After completing the analysis, the server generates the
results, which may include sentiment scores, word
frequency lists, or emoji usage statistics.

4.2.1.7. Server Sends Results to Front-end:


 The server sends the analysis results back to the front-end
for display to the user.

xxii
4.2.1.8. Front-end Displays Results:
 The front-end receives the analysis results from the server
and displays them to the user on the interface.

4.2.1.9. Interaction Ends:


 The interaction between the user, front-end, and server
concludes, and the sequence diagram terminates.

Fig 4.1 Sequence Diagram

4.2.2 CONCEPTUAL LEVEL STATE DIAGRAM


Below is an elaboration of the conceptual level state diagram,
illustrating the process of uploading a file, selecting the time format,
performing analysis, and displaying results, with the added capability
for the user to select whose analysis they want to see:

4.2.2.1. States:
 File Upload: Initial state where the user uploads the chat file.
 Select Time Format: State where the user selects the
desired time format for analysis.
 Analysis: State where the analysis of the chat data is performed.
 Display Overall Result: State where the overall analysis
result is displayed on the user interface.
 Select User for Specific Analysis: State where the user
xxii
i
selects whose analysis they want to see.
 Display User-Specific Result: State where the analysis
result for the selected user is displayed on the user interface.

4.2.2.2. Transitions:

1. Upload File to Select Time Format:


 Trigger: Successful upload of the chat file.
 Action: Transition to the "Select Time Format" state.

2. Select Time Format to Analysis:


 Trigger: Valid time format selection.
 Action: Transition to the "Analysis" state.

3. Select Time Format to Invalid Time Format Error:


 Trigger: Invalid time format selection.
 Action: Transition to a state indicating an error, prompting
the user to select the correct time format.

4. Analysis to Display Overall Result:


 Trigger: Completion of analysis.
 Action: Transition to the "Display Overall Result" state.

5. Display Overall Result to Select User for Specific Analysis:


 Trigger: User selects an option to view analysis for a specific user.
 Action: Transition to the "Select User for Specific Analysis" state.

6. Select User for Specific Analysis to Display User-Specific Result:


 Trigger: User selects a specific user for analysis.
 Action: Transition to the "Display User-Specific Result" state.

4.2.2.3. Additional Feature:


User Interaction for Specific Analysis:
 After the overall result is displayed, the user has the
option to select whose analysis they want to see.
 This feature allows for a more detailed analysis tailored to
xxi
v
specific users' contributions to the chat.

Fig.4.2 State Diagram

4.2.3 COLLABORATION DIAGRAM


 This collaboration diagram shows the relationship
between the objects in a system.
 An object consists of several features. Multiple objects
present in the system are connected to each other.

Fig.4.3 Collaboration Diagram

4.2.4 ALGORITHM

4.2.4.1 DATA PRE-


PROCESSING Input:
 A string data containing the raw export from a WhatsApp chat or .csv file.
Output:
 A Pandas DataFrame with structured chat data, including
timestamps, user names, messages, and additional date-time
components.

Steps:
1. Initialize Regular Expression Patterns:
 A pattern to identify timestamps and delimiters in the chat log.

xxv
2. Split the Data:
 Use the timestamp pattern to split the data into individual
messages, excluding the very first split which does not contain a
message.
 Find all instances of timestamps using the same pattern to
create a list of dates.

3. Create Initial DataFrame:


 Construct a Pandas DataFrame with two columns:
User_message for the messages and message_date for the
timestamps.

4. Format and Rename Date Column:


 Convert the message_date column to datetime format using
the specified format.
 Rename the message_date column to date.

5. Extract Users and Messages:


 Initialize empty lists for users and messages.

 Iterate through each entry in User_message, splitting it into the


user and message parts based on a colon followed by a space,
which separates the user name from their message.
 For messages that follow the standard format (user name
followed by message), add the user name and message to their
respective lists.
 For messages that do not follow the standard format (e.g.,
notifications), label them as group_notification and add the
whole text as the message.

6. Update DataFrame:
 Add new columns for user and message to the DataFrame, and
remove the original User_message column.

7. Extract Additional DateTime Components:


 Add columns for the year, month (as name), hour, minute,
xxv
i
numeric month, date only (without time), and day name
extracted from the date column.

8. Determine Message Period:


 Initialize an empty list for the message period.
 Iterate over each hour value in the DataFrame, assigning a
period based on the hour. The period is a string representing
the range of hours in which the message was sent (e.g., "23-
00" for messages sent at 23:00).
 Add the period list as a new column in the DataFrame.

9. Return the Processed DataFrame:


 The DataFrame now contains structured data from the
WhatsApp chat log, with detailed datetime components and
separated user names and messages, ready for analysis.

4.2.4.2 DAILY AND MONTHLY ACTIVITY TIMELINE


1. Set Up Visualization Titles:
 Display a title for the monthly activity timeline using Streamlit's
st.title() function.

2. Generate Monthly Activity Data:


 Call a helper function (presumed to be monthly_timeline)
passing the selected_user and the chat DataFrame df as
arguments. This function is expected to aggregate the data on a
monthly basis, counting messages for each month for the
selected user or for all users if none is specified.

3. Create Monthly Activity Plot:


 Initialize a figure and axes object with a specified size using
Matplotlib's plt.subplots().
 Define a color palette with Seaborn to enhance the visual appeal.
 Plot the monthly activity data using Seaborn's lineplot()
function, with time on the x-axis and message count on the y-
axis, applying the defined color palette and line width.
xxv
ii
 Rotate x-axis labels for better readability and label axes with
appropriate font sizes.
 Display the plot in the Streamlit app using st.pyplot().

4. Set Up Daily Activity Timeline Visualization:


 Similar to step 1, use Streamlit to display a title for the daily
activity timeline.

5. Generate Daily Activity Data:


 Invoke another helper function (presumed to be
daily_timeline) with selected_user and df to aggregate
message data on a daily basis.

6. Create Daily Activity Plot:


 Repeat the plotting process as in step 3, but this time plot the
daily activity data. The x-axis now represents individual dates
(only_date), and the y-axis represents the count of messages
per day.
 Ensure labels, rotation, and plot aesthetics match the monthly
activity plot for consistency.

7. Display Plots:
 Utilize Streamlit's st.pyplot() function to render each plot in
the web application, allowing users to visually analyze the
messaging activity over time, both on a monthly and daily
basis.

4.2.4.3 WORD CLOUD


1. Load Stopwords:
 Open and read a text file containing stopwords. These are
common words (like "the", "is", "in") that you wish to exclude
from the word cloud for better relevance.

2. Filter Data by Selected User (if applicable):


 If a specific user is selected (i.e., selected_user is not
xxv
iii
"Overall"), filter the DataFrame df to include messages only
from this user.

3. Exclude Non-Relevant Messages:


 Further filter the DataFrame to exclude messages from
'group_notification' and messages that contain '<Media
omitted>\n', which indicates non-text content that can't be
rendered in a word cloud.

4. Remove Stopwords from Messages:


 Define a function remove_stop_words that takes a message as
input, splits it into words, and removes any word found in
the loaded list of stopwords.
 Apply this function to every message in the filtered
DataFrame. This cleans the text data by lowering case and
removing stopwords, making it ready for word cloud
generation.

5. Generate Word Cloud:


 Initialize a WordCloud object with specified dimensions,
minimum font size, and background color.
 Generate the word cloud by combining all the cleaned
messages into a single string (using .str.cat(sep=" ")) and
passing this string to the generate method of the WordCloud
object. This creates a word cloud image where the size of each
word indicates its frequency in the messages.

6. Return Word Cloud Image:


 The WordCloud object, now containing the generated word
cloud image, is returned. This image can then be displayed in
the application interface.

4.2.4.4. MOST COMMON WORDS


1. Load Stopwords:
 Open and read the stopwords.txt
xxi file to load stopwords into
x
memory. These words will be excluded from the analysis to
focus on more meaningful words in the chat messages.

2. User Selection Filtering:


 If the function is called with a specific user selected (selected_user
!= "Overall"), filter the DataFrame df to include messages only
from this user, narrowing down the analysis to their
contributions.

3. Preprocess Messages:
 Convert all messages in the DataFrame to lowercase and strip
leading and trailing whitespaces. This standardization helps in
accurately counting word frequencies by treating the same
words in different cases as identical.

4. Exclude Non-relevant Messages:


 Remove messages that are not relevant to word frequency
analysis, such as '<Media omitted>', 'group_notification', and
variants thereof. These entries typically do not contain text that
contributes to the conversation's content in a meaningful way.

5. Collect and Filter Words:


 Iterate through each message in the filtered DataFrame. Split
each message into words and collect those words into a list,
excluding any word that is in the loaded list of stopwords. This
step focuses the analysis on relevant words by removing
common, less informative words.

6. Count Word Frequencies:


 Utilize the Counter class to count the frequencies of each word
in the collected list. Counter produces a dictionary-like object
where keys are the words and values are the counts of those
words.

7. Prepare the Output DataFrame:


xxx
 Convert the Counter object into a DataFrame, taking the 20
most common words (.most_common(20)) for a focused
analysis. Rename the columns of this DataFrame to "Words"
and "Frequency" for clarity.

8. Return the Result:


 The function returns the DataFrame most_common_df, which
contains the top 20 most frequent words used in the messages
(after applying all filters and exclusions), along with their
respective frequencies.

4.2.4.5 EMOJI ANALYSIS


1. Filter Messages by Selected User (if applicable):
 If a specific user is selected (selected_user != "Overall"), the
function filters the DataFrame df to include only messages
from this user. This allows for a focused analysis on the usage
patterns of emojis by the selected user.

2. Extract Emojis from Messages:


 Initialize an empty list emojis to hold all extracted emojis from
the messages.
 Iterate through each message in the filtered DataFrame.
 For each message, extract emojis by checking each character.
If the character's demojized form (textual representation)
differs from the character itself, it is identified as an emoji.
Add these identified emojis to the emojis list.
 The extraction process involves converting each emoji within
the messages into its demojized form (a textual representation
using the emoji.demojize method) and comparing it to the
original character. Characters that demojize to a different
string are considered emojis.

3. Count Emoji Frequencies:


 Use the Counter class to count the frequencies of each
extracted emoji. The Counter is initialized with the emojis list,
xxx
i
producing a dictionary-like object where keys are the emojis
and values are their counts.

4. Prepare the Output DataFrame:


 Convert the Counter object to a DataFrame to facilitate easy
visualization and analysis. The conversion
uses .most_common to sort emojis by their frequency,
ensuring the most used emojis are listed first.

 Rename the DataFrame columns to "Emoji" and "Frequency"


for better readability.

5. Return the Result:


 The function returns emoji_df, a DataFrame containing emojis
and their frequencies, sorted from most to least common. This
DataFrame is ready for further analysis or visualization,
providing insights into the emotional or expressive elements of
the chat conversations.

4.2.4.6 SENTIMENT ANALYSIS


1. Initialize Sentiment Analyzer:
 Use SentimentIntensityAnalyzer from the nltk library to
analyze the sentiment of text messages.

2. Prepare User Sentiments Dictionary:


 Create a dictionary (user_sentiments) to hold sentiment scores
for each user, structured as {user: {'compound': 0.0, 'pos': 0.0,
'neu': 0.0, 'neg': 0.0}}.

3. Iterate Over Messages:


 Loop through each message in the DataFrame df, extracting
the user and the message.

4. Text Sentiment Analysis:


 For each message, calculate sentiment scores using
sia.polarity_scores(message) and update the user's cumulative
xxx
ii
sentiment scores in user_sentiments.

5. Emoji Sentiment Analysis (if applicable):


 Extract emojis from the message.
 Assign predefined sentiment scores to specific emojis
(positive, negative, neutral).
 Update the user's cumulative sentiment scores with emoji
sentiment values.

6. Average Sentiment Scores:


 For each user, divide the cumulative sentiment scores by the
total number of messages sent by that user to get the average
sentiment scores.

7. Filter by Selected User (if applicable):


 If a specific user is selected, return sentiment scores for that
user only. If "Overall" is selected, return the sentiment scores
for all users.

4.2.4.7 SENTIMENT SCORE FUNCTION


1. Calculate Average Sentiments (if Overall):
 If "Overall" is selected, calculate the average positive,
negative, and neutral scores across all users.

2. Extract Selected User's Sentiments:


 If a specific user is selected, extract that user's sentiment scores.

3. Determine Dominant Sentiment:


 Compare the positive, negative, and neutral scores to
determine which is dominant.
 Return a sentiment summary ("Positive 😊", "Negative

😠", or "Neutral 🙂") based on the dominant sentiment score

4.2.4.8 BANK DATA ANALYSIS


xxx
iii
1. Import necessary libraries:
 streamlit for creating the web application.
 pandas and numpy for data manipulation.
 matplotlib.pyplot and seaborn for data visualization.

2. Define the bank_app() function:


 Set the title of the web application as "Bank Data Analysis".
 Allow users to upload a CSV file.
 If a file is uploaded, read the data into a DataFrame.
 Define numerical and categorical columns.

3. Create a matplotlib figure and subplot for categorical columns:


 Initialize a counter to keep track of the subplot index.
 Create a subplot for each categorical column.
 Plot the count of each category using a bar plot.

4. Display the subplot using st.pyplot().

5. Plot a bar chart for the 'deposit' variable.

6. Plot bar charts to analyze the relationship between 'deposit' and other
categorical variables:
 'job'
 'marital'
 'education'

7. Run the Streamlit application.

4.3 HARDWARE AND SOFTWARE REQUIREMENTS

4.3.1 HARDWARE SPECIFICATIONS


Describes the logical and physical characteristics of each
interface between our software and the hardware
components of the system.
 Hardware Required :Any web browser supported device.
xxx
iv
 Supported device types: The software is developed
for Windows 32-bit/64-bit or android etc.
 Nature of the data and control interactions between
the software and the hardware : Internet connection

4.3.2 SOFTWARE SPECIFICATIONS


The connections of your software with other
operating systems: the software is developed for all
operating system. The connections of your software with
other libraries:
 Streamlit
 Numpy
 Pandas
 Wordcloud
 Regular Expression(re)
 Pyplot
 Collection
 Matplotlib
 Emoji
 NLTK.Sentiment
 UrlExtract

4.4 EXPECTED OUTCOME

Fig .4.4 Chat Export


xxx
v
Fig.4.5 Daily Activity Timeline

Fig.4.6 Monthly Activity


Timeline

xxx
vi
Fig.4.7 Emoji Analysis

Fig.4.8 Most Common Words

Fig.4.9 Word Cloud

xxx
vii
Fig.4.10 Weekly Activity Map

Fig.4.11 Most Busy Users

xxx
viii
Fig.4.12 Sentiment Analysis

Fig.4.13 Top Statistics

xxx
ix
Fig.4.14 Analysis 1

Fig.4.15 Analysis 2

xl
Fig.4.16 Analysis 3

xli
Chapter 5
RESULT AND DISCUSSION

The Chat Lens developed as part of this project successfully enables users to
upload chat files, analyze chat data based on selected time formats, perform
sentiment analysis, and visualize results. The system allows users to view
overall analysis results and also drill down to see individual user-specific
analysis.
Discussion:
 User Experience: The system enhances user experience by providing a
user- friendly interface for uploading chat files, selecting analysis
options, and visualizing results. Interactive features such as user-
specific analysis empower users to customize their analysis
experience.
 Insights for Communication Analysis: The chat analyzer offers
valuable insights for communication analysis in various contexts, such
as group projects, team collaborations, or social interactions,
subscription to deposit or not. Understanding communication patterns,
sentiment trends, and individual contributions can inform decision-
making and improve communication strategies.
 Potential Applications: Beyond personal use, the chat or data analyzer
can find applications in academic projects, research studies, and
organizational analyses. For instance, in a college project report, the
analyzer can be used to analyze group chat conversations among
project members to assess communication effectiveness, identify key
topics of discussion, and evaluate team dynamics.

xlii
Chapter 6
CONCLUSION AND FUTURE WORK

6.1 CONCLUSION
In conclusion, it can be said that the capabilities of the WhatsApp
application or data collected in bank and the power of the python
programming language in implementing whatever data analysis intended,
cannot be overemphasized. This work was to discuss the WhatsApp
application and python libraries, to create an analysis of a WhatsApp chat
.We propose to employ dataset manipulation techniques to have a better
understanding of WhatsApp chat present in our phones .It shows most
used emoji and which word was repeated most times. It tracks our
conversation and analyzes how much time we are spending .The system
was done with python, and the python libraries that were implemented
which includes, NumPy, Pandas, Matplotlib and Seaborn. At the end of the
work expected results were obtained and the analysis was able to show the
level of participation of the various individuals on the given group chat .
On serious note this system has the ability to analyze any WhatsApp chat
or data in the csv file into it.

6.2 FUTURE WORK


6.2.1. Support for Additional Analysis Metrics:
Expand the analysis capabilities to include additional metrics
such as topic modeling, named entity recognition, or language
translation. This would provide users with a more
comprehensive understanding of chat content and facilitate
deeper insights.

6.2.2. Real-Time Analysis:


Implement real-time analysis capabilities to enable users to
analyze chat conversations as they occur. This would be
particularly useful for monitoring ongoing discussions or
conducting live sentiment analysis during events or meetings.
xliii
6.2.3. Integration with External Data Sources:
Integrate the chat analyzer with external data sources such as
social media platforms or project management tools to
aggregate and analyze communication data from multiple
channels. This would provide users with a holistic view of
communication dynamics across various platforms.

6.2.4. Interactive Visualization Features:


Enhance the visualization capabilities by adding interactive
features such as drill-down functionality, filtering options, or
dynamic charts. This would allow users to explore and interact
with analysis results more effectively, gaining deeper insights
into chat content.

6.2.5. Customizable Analysis Settings:


Provide users with customizable analysis settings to tailor the
analysis process according to their specific requirements. This
could include options to customize sentiment analysis
thresholds, adjust time window settings, or define custom stop-
word lists.

6.2.7. Integration with Collaboration Tools:


Integrate the chat analyzer with popular collaboration tools such
as Slack, Microsoft Teams, or Google Workspace to provide
seamless communication analysis within existing workflows.
This would streamline the analysis process and enhance
collaboration among team members.

6.2.8. Machine Learning Models for Analysis:


Explore the use of machine learning models for more advanced
analysis tasks such as sentiment classification, topic modeling,
or user profiling. Training machine learning models on chat data
can improve the accuracy and granularity of analysis results.

xliv
6.2.9. Cross-Platform Compatibility:
Ensure cross-platform compatibility to support analysis of chat
data from various messaging platforms beyond WhatsApp. This
would make the analyzer more versatile and adaptable to
different communication ecosystems.

6.2.10. Privacy and Security Enhancements:


Implement robust privacy and security measures to protect
sensitive chat data and ensure compliance with data protection
regulations. This may include encryption of chat data, user
authentication mechanisms, and access control policies.

6.2.11. User Feedback and Iterative Improvements:


Solicit feedback from users to identify pain points, usability
issues, and feature requests. Continuously iterate and improve
the chat analyzer based on user feedback to enhance user
satisfaction and adoption.

REFERENCES
1. Anurag Kumar Singh , Rishabh Bhatia, Dr. Praveena Akki ,”WhatsApp
Chat Exploratory Data Analysis” Volume 11 Issue V May 2023-
Available at www.ijraset.com.
xlv
2. Ravishankara K , Dhanush , Vaisakh , Srajan I S,” WhatsApp Chat
Analyzer” Volume 09, Issue 05 (May 2020).

3. Chinthapanti Bharath Sai Reddy , Kowshik S , Rakesh Kumar M V , O


Nikhil Kumar Reddy , Gopichand G ,” Analysing and Predicting the
Emotion of WhatsApp Chats Using Sentiment Analysis” Volume 83
Page Number: 15454
- 15461 Publication Issue: March - April 2020.

4. Dr. D. Lakshminarayanan, S. Prabhakaran, “Dogo Rangsang


Research Journal”, UGC Care Group I Journal, Vol-10 Issue-07
No. 12 July 2020.

5. Lakshya Munjal, Anmol Arora , Neetu Narwal ,” Sentiment Analysis of


WhatsApp Group Chat” Volume 4 January - December, 2018 ISSN No.
2395-5457.

6. Dan Bouhnik and Mor Deshen,” WhatsApp Goes to School: Mobile


Instant Messaging between Teachers and Students” Journal of
Information Technology Education: Research, 13,
from http://www.jite.org/documents/Vol13/JITEv13ResearchP217-
231Bouhnik0601.pdf

7. Marada Pallavi , Meesala Nirmala , Modugaparapu Sravani ,


Mohammad Shameem , Dr. K. Soumya,” WHATSAPP CHAT
ANALYSIS” Volume:04/Issue:05/May-2022.

xlvi

You might also like