0% found this document useful (0 votes)
32 views7 pages

WhatsApp Chat Analyzer8

Uploaded by

srahul.2113
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views7 pages

WhatsApp Chat Analyzer8

Uploaded by

srahul.2113
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com
WHATSAPP CHAT ANALYSIS
Marada Pallavi*1, Meesala Nirmala*2, Modugaparapu Sravani*3, Mohammad
Shameem*4, Dr. K. Soumya*5
*1,2,3,4Student, Department Of Computer Science And Systems Engineering Andhra University
College Of Engineering For Women, Visakhapatnam, Andhra Pradesh, India.
*5Project Guide, Department Of Computer Science And Systems Engineering Andhra University
College Of Engineering For Women, Visakhapatnam, Andhra Pradesh, India.
ABSTRACT
Whatsapp has been the most used mode of communication and has been an efficient one too. It consists of
many conversations in groups and individuals. So, there might be some hidden facts in them. This project takes
those chats and provide a deep analysis of that data. Being any topic, the chats are it provide the analysis in an
efficient and accurate way. The main advantage of this project is that it has been built using libraries like
pandas, seaborn, matplotlib, emoji etc. They are used to create data frames and plot graphs in an efficient way.
Keywords: Whatsapp Chat, Python, Streamlit, Analysis, Nature Language Processing, Emoji, Pandas, Matplotlib.
I. INTRODUCTION
WhatsApp chat Analyzer is an analyzing tool for the WhatsApp chats. The chat files can be exported from
WhatsApp and it generates various plots and graphs showing, number of messages or emojis or images sent by
a person, most active member in the group etc. It helps us to have a better understanding of our WhatsApp
chats. This system is based on data analysis and pre-processing. The first step is pre-processing and data pre-
processing plays a major role when it comes to machine learning. In order to apply the libraries, it has to be
pre-processed and stored in an efficient way.
II. LITERATURE REVIEW
2.1 Literature review on WhatsApp Chat Analysis:
A survey analysis on the usage and impact of WhatsApp Messenger [1] has been conducted and various studies
and analysis have been found. These studies include the impact of WhatsApp on the students(youth).
In the survey it was found that in the southern part of India, ages 18 to 23 spend around 8 hours using
whatsapp and sometimes be online almost 12-16 hours a day. Most of them agreed to be using whatsapp tan
any other site. They exchange images, audios and videos. This survey also proved that the whatsapp has been
the most widely used app on the smart phones than any other app. This survey was conducted to know the
positive and negative impacts of using whatsapp.
As we can know that from this survey, whatsapp is most used app by the youth and other generations so, our
project can give them the insights of their chats and provide them unknown facts.
2.2 Literature review on Modules:
a. Streamlit: Streamlit is a free and open-source python framework. [2] We can quickly develop web apps for
Machine Learning and Data Science by using Streamlit. Streamlit can easily integrates with other popular
python packages such as NumPy, Pandas, Matplotlib, Seaborn. Streamlit provides fastest way to develop and
deploy web apps.
b. Matplotlib: Matplotlib is a popular Python packages used for data visualization. It is a cross-platform library
for making plots from data in arrays. It helps in creating static, animated and interactive visualizations in
python.
c. Seaborn: Seaborn is the data visualization library. It is used for making statistical graphs. Visualization is the
central part of seaborn. Seaborn provides exploration and better understanding of data. Seaborn closely
integrates into the data structures from python.
d. Word cloud: Word Cloud is a data visualization library used for representing most frequently used words
within a given text. Most frequent and important words are represented in bigger and bolder size

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[32]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com
e. Pandas:
 Pandas is an open-source python library. Pandas used to convert string data into Data frame. Data frame is
the representation of data into 2-dimensional table of rows and columns. We can work with large data sets
using Pandas library. Pandas library has many built-in functions for data analysis, data cleaning, data
exploration and data manipulation
 In 2008, developer Wes McKinney started developing pandas because he needed a high performance,
flexible tool for analysis of data.
2.3 Literature review on Natural Processing Language:
This research gathered all scientific publications in urban studies that utilized the method of NLP.[3]To conduct
this research we have taken the journals and conference papers from databases EBSCO Urban Studies
Abstracts, Scopus, ProQuest, and Web of Science. This research timeframe was “all years,” which means the
results contained all publications to date (November 2019).
Table 1
Subject Source/document Another
Database Search term Search field
area type filter
EBSCO
“Natural “Title,
Urban
language Abstract, or N/A N/A N/A
Studies
processing” Keywords”
Abstracts
“Natural
language “Title, “Journals OR
“Social
Scopus processing” Abstract, or conference N/A
sciences”
AND (city OR Keywords” proceedings”
urban)
“Natural
language “Anywhere “Conference Papers &
“Peer
ProQuest processing” except full N/A Proceedings OR
reviewed”
AND (city OR text” Scholarly Journals”
urban)
“Topic” (i.e.,
“Natural
title, abstract,
language
Web of author
processing” N/A “Article” N/A
Science keywords, and
AND (city OR
Keywords
urban)
Plus)
2.4 Literature review on python:
Python is a general-purpose language .It has an easily understandable syntax. Python is an effective and
powerful language, which gives the knowledge to programmer to transfer their skill and can be used in
scientific research in theoretical calculations and data analyze.[4] It is statistics oriented and it has specific
advantages such as great features for data visualization. Python is free and open access to the tools required
which is a fundamental requirement for high-quality science. Unlike MATLAB or LabView python can be used
for any programming task. Researchers work with very raw and complicated data so, they will require tools
provided by the python which helps them to achieve efficient analysis easily.
Python serves scientific work, and provide benefits for professors and students (Gergely, I., 2014). Tony, J.
(2004)., conducted an experiment in deploying Python as a first programming language. Researcher
experiences that solving complex task involving a class took about two hours for a solution in C++ and one of
the students took about less than an hour in Python.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[33]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com
Python is high-level, flexible, dynamic and can be used in a vast domain of applications. Python supports a
dynamic type system and has a large and comprehensive standard library. (Srinath, K.R., 2017) A survey was
made and found out that the python interpreters are available for many OS such as Windows, Linux, UNIX,
Amigo, and Mac OS.
2.5 Literature review on Web Design:
Internet Users are reaching millions and can be expected to increase more over the years. The websites are the
crucial media of information, transmission, dissemination.[5] Current paper purposes to review previous
studies that have been done in the field of web development. As the result, literatures either proposed set of
guidelines or assistive technologies particularly web interfaces, adaptive systems. The acceptance and success
of the websites and electronic commerce depends on the web design. The purpose of this paper is to analyse
and know the users' perceptions and behaviors, in order to achieve a successful e-commerce website.
According to a survey(Lee & Kozar, 2012) there is currently no consensus on how to properly operationalize
and assess website usability. Nielson associate’s usability with learnability, efficiency, memorability, errors, and
satisfaction (Nielsen, 2012). Right now we do not have any guidelines that individuals can follow when
designing websites to increase users engagement.
 "Hypertext" are the links to connect web pages to one another, either within a single website or between
websites. Links are a fundamental aspect of the Web, by uploading content to the Internet and linking it to
pages.
 HTML uses "markup" to annotate text, images, and other content for display in a Web browser to describe the
presentation of a document written in HTML or XML.
 CSS is the core languages of the open web, standardized across Web browsers according to W3C
specifications. CSS describes how elements should be rendered on screen, on paper, in speech, or on other
media means like the styling part of the webpage.
III. METHODOLOGY
3.1 DATA ANALYSIS
It is a process of cleaning, transforming, inspecting and modelling data with the goal of discovering some useful
information and finally indicating some conclusions. Analysis means it breaks a whole component into its
separate components for individual examination. Data analysis is a process for acquiring raw data and
transforming it into useful information for decision-making by users. This project provides a basic statistical
analysis WhatsApp chat. Following are the analysis made :
• To find total messages, total words, total media and links shared in the WhatsApp chat
• To find the most active people in the group.
• To find the most used emojis in the group.
• To find the busiest day and least busy in a month.
• To find the most frequently and commonly used words in the group.
• To find the frequency of chat in every day and month.
3.2 PROPOSED SYSTEM
Data pre-processing is the initial part of the project, it is to understand the implementation and usage of various
python inbuilt modules. These various modules provide better user understandability and code representation.
The following libraries are used such as NumPy, pandas, matplotlib, sys, re, emoji, seaborn etc. It analyses the
data and gives top statistics like total messages, total media, links, images shared, graphs showing the activity
map weekly and monthly, monthly timeline, daily timeline, mostly busy users, chart most common words used,
emojis used.
The working of the system is given in the figure given below

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[34]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

Figure: Flowchart of Proposed System


3.3 WORKING
Steps to Export chat:
 Open WhatsApp chat for a group ->click on the menu ->click on more- ->select export chat->choose without
media.
Working of WhatsApp chat analysis.
1. Intially open WhatsApp chat analyzer web page.
2. Select Date format.
3. Upload the exported chat file.
4. Analyzing of data is done by trained model
5. Preprocessing of data is done by trained model.
6. Select overall or single person analysis
7. Trained model shows analysis it includes top statistics, word cloud, activity map, monthly timeline, daily
timeline, emoji analysis.
3.4 System Modules
(a) Install and import dependencies: In this step Streamlit, matplotlib, pandas, collections, seaborn, emoji,
Wordcloud, URLextract, and re are installed and imported.
(b) Pre-processing: In this step pre-processing of the data is done. Here the data is formatted and separated
in the form of date, time, name of the user and message of the use.
(c) Export chat document from WhatsApp and Upload: Here the document is exported from WhatsApp.
Steps to export chat ->Open individual or Group chat->Tap Options – More – Export Chat->Choose export
without media-> Document is set. Upload the chat file and click on analysis
(d) Train chat model and analyze the data: Here the collected data is read and processed to train our
machine learning classification model on it. The model is then evaluated and serialized. Analysis made:
1. Top Statistics: These involve total messages, total words, media shared, links shared.
2. Monthly Timeline: The frequency of chat in every month.
3. Daily Timeline: The frequency of chat in a day.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[35]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com
4. Activity Map: Shows the busiest day and least busy day similarly with the month
5. Weekly Activity map.
6. Wordcloud: Most commonly and frequently used word.
7. Most Busy Users: Mostly active people.
8. Emoji analysis: Most commonly and frequently used emojis.
(e) Make detections with model: Running the code, predictions of the user’s gestures using the above
trained model are made.
IV. TESTING
Software testing is like an investigation conducted to know about the quality of the product under test.
Software testing provides an objective view of the software to allow the developers to understand the risks of
software implementation. The test techniques include the process of executing a program or application in order
to find some software bugs or some defects in it.
The sample test-cases of this project work are shown in the table below,
Table 2: Test Cases Of The Work
S.NO Test Case Description Expected Result Test Result
Statistics of total
These involve total
messages, words,
1 Top Statistics messages, total words, total
links, media shared in PASS
media and links shared
the group
Monthly The frequency of chat in Graph of monthly
2 PASS
Timeline every month timeline
Bar graph of most
Shows the busiest day and
3 Activity Map busy day and most PASS
least busy day in a month
busy month
Most commonly and Word cloud of most
4 Word Cloud PASS
frequently used words used words
V. RESULTS AND DISCUSSION

Fig 1. Activity Map


It shows the busy days and months. We have used the matplotlib library to plot the graph, the number of
messages in a particular month or day are mapped to the particular day or month

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[36]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

Fig 2. Emoji Analysis


It shows the most commonly used emojis We have used the Emoji library to select or distinguish the emojis
from the messages and plotted the pie chart using matplotlib

Fig 3. Top Statistics


It shows the statistics like total messages, words, images links shared. We have converted the whole chat file
into a data frame and then separated the words and messages and used URLextract to find links

Fig 4. Most Common words


It shows the most commonly used word We have used matplotlib to plot the graph and the top frequently used
words are displayed

Fig 5. Most Busy Users


www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[37]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com
Is shows the busy users and their contribution to chat We have used matplotlib to plot the graph and the users
and how frequently the chat is calculated and plotted

Fig 6. Daily Timeline


It gives the frequency of messages in a day We have used matplotlib to plot the graph and the days are taken
and the count of messages are calculated and plotted
VI. CONCLUSION
We can conclude that the capabilities of the WhatsApp application and the power of the python programming
language in implementing our data analysis intended, cannot be overemphasized. The system was done with
python, and the python libraries that were implemented includes, Streamlit, Emoji, NumPy, Pandas, Re,
Matplotlib, URLextract, collection and Seaborn. Finally results that we intended were obtained. The future of
our project is it is mainly useful for organisers. Then will get to know who is more and least active in the group.
Depending on that they can take decisions.
VII. REFERENCES
[1] Ravishankara K, Dhanush, Vaisakh, Srajan I S, “International Journal of Engineering Research &
Technology (IJERT)”, ISSN: 2278-0181, Vol. 9 Issue 05, May-2020
[2] https://www.analyticsvidhya.com/blog/2021/06/build-web-app-instantly-for-machine-learning-
using-streamlit/
[3] Meng Cai, “PubMed Central”, PMCID: PMC7944036, PMID: 33732917
[4] Dr. D. Lakshminarayanan, S. Prabhakaran, “Dogo Rangsang Research Journal”, UGC Care Group I
Journal, Vol-10 Issue-07 No. 12 July 2020
[5] https://www.interaction-design.org/literature/topics/web-design

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[38]

You might also like