Sentiment Analysis of Twitter Data by Making Use of SVM Random Forest and Decision Tree Algorithm
Sentiment Analysis of Twitter Data by Making Use of SVM Random Forest and Decision Tree Algorithm
194
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
Support Vector Machine Algorithm: SVM is an algorithm Data Collection: Here, we have explained the evaluation
that is majorly used for the task of classification. Various carried out by our proposed methodology to analyze Twitter
data points have been extracted through SVM, and mapping data. Our experimental analysis is done on the Kaggle
was done in high dimensional space with a non-linear kernel dataset. Dataset has been obtained from the challenge given
function. The algorithm for SVM has been stated with the by the school of AI (Artificial Intelligence). We have used a
help of the pseudo-code given below. portion of 4000 tweets for training purpose and 10000
tweets for testing purpose [13].
Inputs: provide different datasets for training and testing of
the classifier Data Set Processing: This section provides the details of
Outputs: determine the accuracy calculated the experiments that we have performed to analyze the
1. Choose gamma and cost optimally for SVM proposed methodology in the context of Twitter analysis.
2. While (condition = true) do We have done tests on the Kaggle Twitter data set. Data set
3. SVM training for every data point based on the challenge launched by the KFC and
4. SVM testing for every data points McDonald's of AI - Algiers, which consists of building a
5. End while system that can classify tweets as Sad or Happy. Currently,
6. Return accuracy we have check tweets that are correct or incorrect. In the
KFC and McDonald's dataset, we have taken 14000 tweets
Decision Tree Algorithm: Another classifier we have is a for our research. 10000 tweet for training and 4000 tweets
decision tree classification algorithm. This algorithm can be for testing purpose. The data set link mention below.
majorly used for textual classification. https://www.kaggle.com/mcdonalds/nutrition-facts, details
figure 3 and figure 4.
Input: training and testing dataset (tweets).
Output: accuracy, precision, F1 measure, recall
Start
Pre-processing and data normalization;
For training dataset to do
Features calculation;
Decision tree algorithm;
Classifier building;
End
Value is used for particular tweets;
For testing dataset
Analyze accuracy
End
(training, testing)
End
Figure 3.Without Processed Data
IV. IMPLEMENTATION
There are URL's, own usernames, special characters and
Requirements: The required software and hardware has repeated words and symbols. We have to remove all the
been stated below. Hashtags identified by the # symbols, all the special
For the design Python Programming Language, 15.6 in HD characters, URL's, own usernames and repeated words.
WLED touchscreen (1366 x 768), 10-finger multi-touch
support. 10th Generation Intel Core i7-1065G7 1.3 GHz up
to 3.9 GHz. 8GB DDR4 SDRAM 2666MHz, 512GB SSD,
No Optical Drive. Intel Iris Plus Graphics, HD Audio with
stereo speakers. HP TrueVision HD camera. Realtek
RTL8821CE 802.11b/g/n/ac, Bluetooth 4.2, 1 HDMI 1.4, 1
USB 3.1 Gen 1 Type-C, 2 USB 3.1 Gen 1 Type-A. The
Python Programming was run on Windows 10 64-bit
Operating System platform. The python library was used
during implementation like NumPy, Pandas, Matplotlib,
SciPy, Scikit-Learn, PyTorch, Seaborn, XGBoost, Plotly,
TensorFlow, Keras, Seaborn, TextBlob, Stanford CoreNLP,
Gensim, and Afterword.
Figure 4. After Processed Data
195
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
We have removed all private usernames, special characters,
all hashtags identified by # symbols, repeated words and Recall
symbols. 100
85.7384.8885.67
V. RESULT ANALYSIS 80
%
procedures on a Twitter dataset. Figures have been drawn 40 33
using other products as calculated.
Our proposed method evaluated in the below parameters: 20 11
0
▪ Recall
Existing Work Proposed Work
▪ Precision
▪ Accuracy SVM Decision Tree Random Forest
▪ F1-Score
Figure 5 Recall Figure between Existing Work and
Table 1Evaluate Metric with Contingency Table Proposed Work
Precision
90 80 80.6679.6281.67
80
70
tp 60 50
recall = ……………….. (1) 50
tp+fn
%
tp 33
precision = …………………(2) 40
tp+fp
30
tp+tn 20
accuracy = …………….. (3) 10
tp+tn+fp+fn
Recall: Recall evaluates the quantity of positive class 0
expectations made from every single positive model in the Existing Work Proposed Work
dataset. The recall is calculated using equation 1.
SVM Decision Tree Random Forest
196
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
Accuracy: Accuracy is essentially a proportion of the Figure 8 here, we can say from the above results that the
accurately anticipated groupings (True Positives + True proposed approach is efficient. And running time is reduced
Negatives) to the absolute Test Dataset. The accuracy is to an extent by keeping the quality of recommendation as to
calculated using equation 3. its best. This concludes that the proposed method is scalable
and can be applied to a large dataset.
Accuracy
VI. CONCLUSION
100
Analysis of Twitter Data," 2019 Int. Conf. Comput. Inf. Sci. ICCIS
40 2019, 2019, doi: 10.1109/ICCISci.2019.8716464.
16 [8] A. Shelar and C. Y. Huang, "Sentiment analysis of twitter data,"
20 Proc. - 2018 Int. Conf. Comput. Sci. Comput. Intell. CSCI 2018,
pp. 1301–1302, 2018, doi: 10.1109/CSCI46756.2018.00252.
0 [9] S. Zahoor and R. Rohilla, "Twitter Sentiment Analysis Using
Existing Work Proposed Work Lexical or Rule-Based Approach: A Case Study," ICRITO 2020 -
IEEE 8th Int. Conf. Reliab. Infocom Technol. Optim. (Trends
SVM Decision Tree Random Forest Futur. Dir., pp. 537–542, 2020, doi:
10.1109/ICRITO48877.2020.9197910.
[10] S. Saini, R. Punhani, R. Bathla, and V. K. Shukla, "Sentiment
Analysis on Twitter Data using R," 2019 Int. Conf. Autom.
Figure 8 F1_Score Figures between Existing Work and Comput. Technol. Manag. ICACTM 2019, pp. 68–72, 2019, doi:
Proposed Work 10.1109/ICACTM.2019.8776685.
[11] M. R. Hasan, M. Maliha, and M. Arifuzzaman, "Sentiment
197
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
Analysis with NLP on Twitter Data," 5th Int. Conf. Comput.
Commun. Chem. Mater. Electron. Eng. IC4ME2 2019, pp. 1–4,
2019, doi: 10.1109/IC4ME247184.2019.9036670.
[12] A. S. Al Shammari, "Real-time Twitter Sentiment Analysis using
a 3-way classifier," 21st Saudi Comput. Soc. Natl. Comput. Conf.
NCC 2018, pp. 1–3, 2018, doi: 10.1109/NCG.2018.8593205.
[13] https://www.iflexion.com/blog/sentiment-analysis-python.
[14] S. A. El Rahman, F. A. AlOtaibi and W. A. AlShehri, "Sentiment
Analysis of Twitter Data," 2019 International Conference on
Computer and Information Sciences (ICCIS), Sakaka, Saudi
Arabia, 2019, pp. 1-4, doi: 10.1109/ICCISci.2019.8716464.
198
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.