0% found this document useful (0 votes)
83 views

Depression Detection in Tweets Using Logistic Regression Model

Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd41284.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-miining/41284/depression-detection-in-tweets-using-logistic-regression-model/rahul-kumar-sharma

Uploaded by

Editor IJTSRD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Depression Detection in Tweets Using Logistic Regression Model

Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd41284.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-miining/41284/depression-detection-in-tweets-using-logistic-regression-model/rahul-kumar-sharma

Uploaded by

Editor IJTSRD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Journal of Trend in Scientific Research and Development (IJTSRD)

Volume 5 Issue 4, May-June 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470

Depression Detection in Tweets using


Logistic Regression Model
Rahul Kumar Sharma1, Vijayakumar A2
1Final Year MCA Student, Dept of MCA, School of CS & IT, Jain Deemed-to-be University, Bengaluru
2Professor, Dept of MCA, School of CS & IT, Jain Deemed-to-be University, Bengaluru

Email: [email protected], [email protected]

ABSTRACT How to cite this paper: Rahul Kumar


In the growing world of modernization, mental health issues like depression, Sharma | Vijayakumar A "Depression
anxiety and stress are very normal among people and social media like Detection in Tweets using Logistic
Facebook, Instagram and Twitter have boosted the growth of such mental Regression Model"
health. Everything has its legitimacy and negative mark. During this pandemic, Published in
people are more likely to suffer from mental health issues, they are available International Journal
24*7 and are cut off from the real world. Past examinations have shown that of Trend in Scientific
individuals who invest more energy via online media are bound to be Research and
depressed. In this project, we find out people who are depressed based on Development (ijtsrd),
their tweets, followers, following and many other factors. For this, I have ISSN: 2456-6470, IJTSRD41284
trained and tested our text classifier, which will distinguish between the user Volume-5 | Issue-4,
who is depressed or not depressed. June 2021, pp.724-727, URL:
www.ijtsrd.com/papers/ijtsrd41284.pdf
KEYWORDS: Depression, Flask, Mental Health, Natural Language Toolkit (NLTK),
Twitter, Wordcloud Copyright © 2021 by author (s) and
International Journal
of Trend in Scientific
Research and
Development Journal. This is an Open
Access article distributed under the terms
of the Creative Commons Attribution
License (CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)

I. INTRODUCTION II. Literature Survey


Mental health like depression is most common in people In the past few years, researchers have worked to find ways
these days. Ten in every hundred people suffer from mental to detect depression in tweets of user based on his/her
health illness. This issue influences the individual tweets. The number of systems that were proposed to find
perspective, mind-set, feeling and conduct with adjoining depression are large in number. We can use sentiment
individuals. Most of the time people don't talk about their analysis on text, web scrapping of tweets. Thousands of data
mental health which sometimes leads to severe health collected using mining algorithm that can bring some
issues. Depression is one of the primary reasons that cause meaningful insight into data.
disability worldwide. In the pre-mature age of depression,
70% of people don't consult with a doctor or with anyone Jamil Z (2017). proposed monitoring of tweets for
which can cause serious damage in future. There is a depression detection to detect at-risk users. University of
movement going on to leverage social media data for finding, Ottawa [1]. SVM, Naive Bayes (NB), and Decision Tree (DT
estimating, and tracking the changes in the occurrence of are a portion of the generally utilized calculations in natural
disease. The presence of online media gives us the freedom language processing tasks. SVM-linear classifier gives us the
to improve the information accessible to psychiatrist and best result. As we all know no one algorithm can perform all
researchers enabling them to be better informed. Depression task. Scientists attempt to alter the calculation as indicated
is viewed as the superb justification individuals who will in by their, necessities of the task.
general have self-destructive contemplations. Around 80%
who attempt or die by suicide are suffering from depression. a. Decision Tree
But because of denial or ignorance, it is undetected and Decision Tree classifies the instance based on the feature
people suffer. It can be prevented if it is diagnosed in its values. Han J., Pei J., & Kamber M. (Elsevier,2011) gave the
early stage. This project will help us to find whether the concept and technique of Data mining [2]. Every node in the
person is depressed or not depressed based on their tweets. decision tree addresses an element and every division
It can be used to find signs of other issues related to mental addresses a value that the node can undertake. A decision
health. In this study, we used tweets from different users, tree involves partitioning the data into subdivisions that
which are labelled as 0 and 1. If the result is 0, it means the contain occurrences with similar values, utilizing a
user is not depressed and if the result is 1, it means the user significant perspective in a decision tree, called split choice,
is depressed. which means to discover a quality and its connected parting

@ IJTSRD | Unique Paper ID – IJTSRD41284 | Volume – 5 | Issue – 4 | May-June 2021 Page 724
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
capacity for each test node in a decision tree. Podgorelec, V., outstanding characterizing capacity and show quality,
& Zorman, M. (2014) gave insight about Decision tree isolating the information directly into two separate classes,
learning in Encyclopaedia of Complexity and Systems with the most extreme distance between the two classes.
Science 1-28 [3]. Splits are evaluated by calculating entropy.

Fig:2.3 Support Vector Machine

d. K-Nearest Neighbour
K-Nearest Neighbour (K-NN) is potentially the clearest
Fig:2.1 Decision Tree computations embraced in machine learning for
classification and regression issues. In light of the nearest
b. Random Forest gauges, KNN takes data and arranges ongoing data focuses.
The Random forest classifier makes different choice trees The data is then assigned to the class with the premier
from an arbitrarily chosen subset of the preparation dataset. nearest neighbour. Taneja, S., Gupta, C., Goyal, K., Gureja, D.
Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, (2014) proposed an enhanced k nearest neighbour
M., Rigol Sanchez, J. P. (2012) gave an assessment for the algorithm using information gain and clustering in Fourth
effectiveness of a random forest classifier for landcover International Conference on Advanced Computing &
classification. ISPRS Journal of Photogrammetry and Remote Communication Technologies IEEE:325-329. [8] KNN is
Sensing 67: 93-104 [4]. At that point, it adds up to the votes regularly used to characterise future data because of its
from various choice trees to choose for the last class of test simplicity of execution and sufficiency.
objects. Paul, A., Mukherjee, D. P., Das, P., Gangopadhyay, A.,
Chintha, A.R., Kundu, S. (2018) proposed an improved
random forest for classification IEEE Transactions on Image
Processing 27 (8): 4012-4024 [5]. A random forest
classification was proposed in with a diminished number of
trees.

Fig:2.4 K-Nearest Neighbour

III. Existing System


Fig:2.2 Random Forest In the developing universe of the web, individuals like to live
in a virtual world. They share every thought-on platform like
c. Support Vector Machine Facebook, Instagram, Twitter. They will in general contrast
Saitta L., (2000) proposed “Support Vector Networks.” themselves as well as other people. Past examinations have
Kluwer Acad. Publ. Bost.: 273–297 [6]. A Support Vector shown that individuals who invest more time in online
Machine is a machine learning algorithm that works for both media are bound to be depressed. If depression is detected
regression and grouping task, in any case, is mainly used in in an early stage, it can be cured, in later stages, it becomes
arrangement. Hamed, T., Dara, R., Kremer, S. C. (2014) gave more difficult to cure.
an accurate, fast embedded feature selection for SVMs in
2014 13th International Conference on Machine Learning The project is concerned about that whether the person is
and Applications, IEEE :135-140 [7]. This classifier has been depressed or not based on their tweet.
used of late in numerous applications because of its

@ IJTSRD | Unique Paper ID – IJTSRD41284 | Volume – 5 | Issue – 4 | May-June 2021 Page 725
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
IV. Proposed System the reoccurrence of words in each tweet, where each line
The goal of this project is to build a model which can help us demonstrates an archive of tweets and every segment
to detect whether the person is suffering from depression or demonstrates all words utilized in all records. TF-IDF is used
not based on their tweet. People can compose a tweet in the to measure the words' weight. Features are applied on to the
content box and it will be breaking down by the model that DTM are then converged with account measures removed
we have made and it will give us the outcome. I will be using from the social network and user activities. Result of the
the python web framework Flask to integrate it with the merge are then treated as free factors in classification
model and make it more intractable to the common user. algorithms to anticipate the reliant variable of a result of
interest. Ultimately, we decide upon the Logistic Regression
The proposed system can help us to make people aware of algorithm.
their mental health and they can take necessary measure and
help themselves. We take the dataset and clean it with the V. Result and Discussions
justifiable goal. The dataset contains tweets and label (0 and
1). If the tweet is depressing then the result will be 1 and if
the tweet is not depressing the result will be 0. I will be using
the Logistic Regression Model and check the
accuracy. Later the model will be saved using a pickle
library.

a. Logistic Regression Model Fig:5.1 Result of the Logistic Regression Model


In statistics, the logical model is utilized to demonstrate the
likelihood of a specific class or occasion existing like By taking a look at the above picture, we can see that the
win/lose, alive/dead or healthy/unhealthy. Logistic model has performed well. The exactness of our model is
Regression is a characterization algorithm. It is utilized to 96%. Now we can use this model to integrate it into our
foresee a binary outcome based on a set of independent template so that it can be used by the users.
variables. A binary result is one where there are just two
potential situations either the occasion occurs (1) or it a. Model Comparison
doesn't occur (0). Independent factors are those factors or
factors which may impact the result (or ward variable). Table 1: Accuracy and Model comparison table
Model Accuracy Precision F1-measure
Decision Tree 72.5 0.473684 0.5625
Support Vector 72.5 0.681818 0.652174
Machine
Logistic 98 0.98 0.98
Regression

In order to get better result, we are going to compare our


model with other models which have different accuracy. In
the above picture we can see that the Logistic Regression
have the highest accuracy.
Fig:4.1 Logistic Regression
VI. Conclusion and Future Enhancements
b. Proposed System Architecture Machine Learning and Artificial Intelligence assume a
significant part in Healthcare, Banking, Stocks, Cyber
Security, Weather Forecast, etc. We as a client gather
information, clean it, train it, test it and make a forecast. The
exactness of the outcome relies upon the nature of the
information and the type of algorithm used. Many pre-
processing steps are performed, including data preparation
and aligning, data labelling, and feature extraction and
selection. The Logistic Regression Model has accomplished
its ideal exactness and this model can be utilized to foresee
Fig:4.2 Proposed System Architecture Twitter Sentiment that if the client is depressed or not. This project can be
Analysis considered a step forward to build a social media platform
for analyzing user activity and to predict the state of mental
As we can see in the above diagram, the user logs in through health of the user.
their twitter account, and writes tweet. Text pre-processing
is applied to the tweet. Corpus is created, tweet is tokenized Utilizing this stage, the client doesn't need to make an
and in last normalization is applied, where all character is account. They can compose any content or tweet, and the
converted to lower case, links, emoji, punctuations are model will investigate if the individual is depressed or not.
removed. Hornik, K., & Grün, B. Topicmodels: used an R With the help of this result, the user can take precautions
package for fitting topic models. J. of Stat. Softw. 40 (13), 1- and can consult with the psychiatrist. It can act as a
30 (2011) [9]. Stemming is applied and a document term standalone application for depression detection. In future,
network (DTM) is made for each record. The matrix shows we can use voice and facial expression to detect depression.

@ IJTSRD | Unique Paper ID – IJTSRD41284 | Volume – 5 | Issue – 4 | May-June 2021 Page 726
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Reference- forest for classification."IEEE Transactions on Image
[1] Jamil, Z. Monitoring tweets for depression to Processing 27 (8): 4012-4024.
detect at-risk users. Université d'Ottawa/University
[6] Saitta, L., (2000) “Support Vector Networks.”
of Ottawa, 2017.
Kluwer Acad. Publ. Bost.: 273–297.
[2] Han J., Pei J., & Kamber, M. Data mining: concepts
[7] Hamed, T., Dara, R., Kremer, S. C. (2014) "An
and techniques (Elsevier,2011)
accurate, fast embedded feature selection for
[3] Podgorelec, V., & Zorman, M. Decision tree learning SVMs." In2014 13th International Conference on
in Encyclopedia of Complexity and Systems Science Machine Learning and ApplicationsIEEE :135-140.
1-28 (2014)
[8] Taneja, S., Gupta, C., Goyal, K., Gureja, D. (2014) "An
[4] Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., enhanced k nearest neighbour algorithm using
Chica-Olmo, M., Rigol Sanchez, J. P. (2012) “An information gain and clustering." Fourth
assessment of the effectiveness of a random forest International Conference on Advanced Computing &
classifier for landcover classification.”ISPRS Journal Communication Technologies IEEE:325-329.
of Photogrammetry and Remote Sensing 67: 93-
[9] Hornik, K., & Grün, B. Topicmodels: An R package
104.
for fitting topic models. J. of Stat. Softw. 40 (13), 1-
[5] Paul, A., Mukherjee, D. P., Das, P., Gangopadhyay, A., 30 (2011).
Chintha, A.R., Kundu, S. (2018)"Improved random

@ IJTSRD | Unique Paper ID – IJTSRD41284 | Volume – 5 | Issue – 4 | May-June 2021 Page 727

You might also like