Final Report Final
Final Report Final
Supervised By
DR. SHAHID IQBAL
2
PROJECT REPORT
Version V 3.0 NUMBER 2
OF
MEMBERS
TITLE A Web App to identify eyewitness messages from twitter data using textual
feature
MEMBERS’ SIGNATURES
Supervisor’s Signatures
3
APPROVAL CERTIFICATE
This Project, entitled as “A Web App to Identify Eyewitness Messages from
Twitter using Textual Features” has been approved for the award of
Committee Signatures:
Supervisor:
Project Coordinator:
Head of Department:
4
DECLARATION
We, hereby, declare that “No portion of the work referred to, in this project has been
submitted in support of an application for another degree or qualification of this or any other
university/institute or other institution of learning”. It is further declared that this
undergraduate project, neither as a whole nor as a part thereof has been copied out from any
sources, wherever references have been provided.
MEMBER’S SIGNATURES
5
Table of Contents
Chapter 1........................................................................................................................................................10
Introduction....................................................................................................................................................10
1.1. Project Introduction........................................................................................................................10
1.2. Problem Statement.........................................................................................................................10
1.3. Business Scope...............................................................................................................................11
1.4. Objectives.......................................................................................................................................11
1.5. Useful Tools and Technologies......................................................................................................11
1.6. Project Work Break Down.............................................................................................................13
1.7. Project Time Lapse.........................................................................................................................14
Chapter 2........................................................................................................................................................15
Requirement Specification and Analysis.......................................................................................................15
2.1. Functional Requirements................................................................................................................15
2.2. Non-Functional Requirements........................................................................................................17
2.3. Use Case Modeling........................................................................................................................18
2.4. Use Case Diagram:.........................................................................................................................18
2.5.1. Train model Use case Description..........................................................................................19
2.5.2. Test Model Use Case Description..........................................................................................21
Chapter 3........................................................................................................................................................22
System Design................................................................................................................................................22
3.1. Layer Definition.............................................................................................................................22
3.1.1. Presentation Layer..................................................................................................................22
3.1.2. Business Logic Layer.............................................................................................................22
3.2. System Design Diagrams...............................................................................................................23
3.2.1. High Level Design..................................................................................................................23
3.2.2. System Sequence Diagrams...................................................................................................23
3.2.2.1. Train Model SSD..............................................................................................................23
3.2.2.2. Test Model SSD...............................................................................................................25
3.3. Domain Model................................................................................................................................26
3.4. Flow Chart......................................................................................................................................27
3.4.1 View Dataset Flow Chart.......................................................................................................29
3.4.2 Features Computation Flow Chart.........................................................................................29
3.4.3 Pre-Processing Flow Chart.....................................................................................................31
3.4.4 Machine Learning Modeling Flow Chart...............................................................................31
3.4.5 Evaluation Metrics Flow Chart..............................................................................................32
3.4.6 Validation Method Flow Chart...............................................................................................33
3.4.7 Save Model Flow Chart..........................................................................................................34
3.4.8 Test Saved Model Flow Chart................................................................................................35
6
3.5. User Interface Design.....................................................................................................................35
3.5.1. Homepage interface................................................................................................................35
3.5.2. About Page.............................................................................................................................36
3.5.3. Train Model............................................................................................................................36
3.5.4. View Dataset Interface...........................................................................................................37
3.5.5. Feature Selection Interface.....................................................................................................37
3.5.6. Data Preprocessing Interface..................................................................................................38
3.5.7. Classification Selection Interface...........................................................................................38
3.5.8. Classifier Result Interface......................................................................................................39
3.5.9. Test Model History Interface..................................................................................................39
3.5.10. Unseen Tweet Prediction Interface........................................................................................40
3.5.11. Text Result Interface..............................................................................................................40
Chapter 4........................................................................................................................................................41
Software Development...................................................................................................................................41
4.1. Coding Standards...........................................................................................................................41
4.1.1 Indentation..............................................................................................................................41
4.1.2 Declaration.............................................................................................................................41
4.1.3 Statement Standards...............................................................................................................41
4.1.4 Naming Convention...............................................................................................................41
4.2 Front End Development Environment............................................................................................42
4.3 Back End Development Environment............................................................................................43
4.4 Software Description......................................................................................................................44
4.4.1. Module Classifier Code..........................................................................................................44
4.4.2. Module Feature Computation Code.......................................................................................47
4.4.3. MODULE WEB APP CODE.................................................................................................57
Chapter 5........................................................................................................................................................86
Software Testing.............................................................................................................................................86
5.1 Testing Methodology.....................................................................................................................86
5.2 Test Cases.......................................................................................................................................86
5.2.1 Choose Dataset Test case.......................................................................................................86
5.2.2 Train Model Test Case...........................................................................................................87
5.2.3 Apply Feature Extraction method on Dataset Test Case........................................................88
5.2.4 Apply Part of speech on Dataset Test Case............................................................................89
5.2.5 Remove Special Characters Test Case...................................................................................90
5.2.6 Apply Preprocessing Technique on Dataset Test Case..........................................................91
5.2.7 Apply Lemmatization Technique on Dataset Test Case........................................................91
5.2.8 Apply All Preprocessing Technique on Dataset Test Case....................................................92
5.2.9 Moving to Classifier Test case...............................................................................................93
5.2.10 Machine Learning Model Test case.......................................................................................94
7
5.2.11 Evaluation Metrics Test case..................................................................................................95
5.2.12 Evaluation Metrics Test case..................................................................................................96
5.2.13 Apply Classifier Test case......................................................................................................97
5.2.14 Save Model Test Case 1.........................................................................................................98
5.2.15 Save Model Test Case 2.........................................................................................................99
5.2.16 Test Model Test case............................................................................................................100
5.2.17 Test Model Test Case 2........................................................................................................101
5.2.18 Unseen Prediction Test Case 1.............................................................................................102
5.2.19 Unseen Prediction Test Case 2.............................................................................................103
5.2.20 Unseen Prediction Test Case 3.............................................................................................103
5.2.21 About Page Test case...........................................................................................................104
Chapter 6......................................................................................................................................................106
Software Deployment...................................................................................................................................106
6.1. Installation / Deployment Process Description.................................................................................106
• GitHub..........................................................................................................................................106
• Heroku..........................................................................................................................................107
Chapter 7......................................................................................................................................................110
REPORT APPROVAL CERTIFICATE......................................................................................................110
References....................................................................................................................................................111
Webpage...............................................................................................................................................111
8
List of Figures
Figure 1 Work Breakdown Chart................................................................................................................13
Figure 2 Project Time-lapse........................................................................................................................14
Figure 3Use case Diagram..........................................................................................................................19
Figure 4Train Model SSD...........................................................................................................................24
Figure 5Test Model SSD............................................................................................................................25
Figure 6 Domain Model..............................................................................................................................26
Figure 7 Domain Model..............................................................................................................................26
Figure 8 Flow Chart....................................................................................................................................28
Figure 9 Select Dataset Flowchart..............................................................................................................29
Figure 10 View Dataset Flow Chart...........................................................................................................29
Figure 11 Features computation flowchart.................................................................................................30
Figure 12 Preprocessing Flowchart............................................................................................................31
Figure 13 ML Model Flowchart.................................................................................................................32
Figure 14 Evaluation Metric Flowchart......................................................................................................33
Figure 15Validation Method Flow Chart....................................................................................................33
Figure 16 Save model Flow Chart..............................................................................................................35
Figure 17 Test Saved Model Flow Chart....................................................................................................35
Figure 18 Main Page Interface....................................................................................................................35
Figure 19About Page Interface...................................................................................................................36
Figure 20Train Model Choose Dataset Interface........................................................................................36
Figure 21View Dataset Interface................................................................................................................37
Figure 22Feature Selection Interface..........................................................................................................37
Figure 23Data Preprocessing Interface.......................................................................................................38
Figure 24Classifier Selection Interface.......................................................................................................38
Figure 25Classifier Result Interface...........................................................................................................39
Figure 26Test Model History Interface.......................................................................................................39
Figure 27Unseen Tweet Prediction Interface..............................................................................................40
Figure 28Text Result Interface...................................................................................................................40
9
List of Tables
Table 1 Functional Requirements....................................................................................................................
Table 2 Non-Functional Requirements............................................................................................................
Table 3 Train Model Use Case Description.....................................................................................................
Table 4 Test Model Use Case Description......................................................................................................
Table 5 Layers Definition................................................................................................................................
Table 6 Choose Dataset Test Case...................................................................................................................
Table 7 Train Model Testcase..........................................................................................................................
Table 8 Apply Feature Extraction method on Dataset.....................................................................................
Table 9 Apply Part of speech on Dataset Test Case........................................................................................
Table 10 Remove special characters Test Case...............................................................................................
Table 11 Apply Preprocessing Technique on Dataset Test Case....................................................................
Table 12 Apply Lemmatization Technique on Dataset Test Case...................................................................
Table 13 Apply Lemmatization Technique on Dataset Test Case...................................................................
Table 14 Moving to Classifier Test Case.........................................................................................................
Table 15 Machine Learning Model Test Case.................................................................................................
Table 16 Evaluation Metrics Test Case...........................................................................................................
Table 17 Evaluation Metrics Test Case...........................................................................................................
Table 18 Apply Classifier Testcase.................................................................................................................
Table 19 Save Model Test Case.......................................................................................................................
Table 20 Save Model Test Case 2....................................................................................................................
Table 21 Test model Test Case......................................................................................................................100
Table 22 Test model Test Case 2...................................................................................................................101
Table 23 Unseen Prediction Test Case..........................................................................................................102
Table 24 Unseen Prediction Test Case 2.......................................................................................................103
Table 25 Unseen Prediction Test Case 3.......................................................................................................103
Table 26 About Page Test Case.....................................................................................................................104
Table 27 Project Evaluation Guidelines........................................................................................................110
10
Chapter 1
Introduction
The following chapter provides the brief summary of project scope, project specification of the
project, this report includes an existing system and technologies which is used for the
development of the software, it also includes the flow of our project timeline and breakdown
structure of the project.
Direct eyewitnesses
Non eyewitnesses
Don’t know.
Moreover, we investigate various characteristics associated with each kind of eyewitness type.
We observe that words related to perceptual senses tend to be present in direct eyewitness
messages, whereas emotions, thoughts, and prayers are more common in indirect witnesses
11
We use these characteristics and labeled data to train several machine learning classifiers. Our
results performed on several real-world Twitter datasets reveal that textual features (bag-of-
words) when combined with domain-expert features achieve better classification
performance. Our approach contributes a successful example for combining crowdsourced
and machine learning analysis, and increases our understanding and capability of identifying
valuable eyewitness reports during disasters. [3]
1.4. Objectives
This project will have following objectives
Textual Feature Computation.
Applying Machine Learning Algorithms for classification purposes.
Model performance is evaluated using 10-fold cross validation & Hold-Out method
Model performance is presented by evaluation metrics such accuracy, precision recall
and f-measure.
13
maintenance, and use of database objects. [9]
14
Figure 1 Work Breakdown Chart
15
1.7. Project Time Lapse
A timeline is a chronological list of events that have happened or are about to happen.
Project timelines are the same, they tell you what tasks you need to complete and how
much time you have to complete them. As mentioned in Figure 2.
16
Chapter 2
17
S. No. Functional Requirement Type Status
1. The user should reach the classified text with one button Usability
press if possible
2. The system also should be user friendly for admins because Usability
anyone can be admin instead of programmers.
A Use Case depicts how actors will interact with the system. A use case is a methodology
used in system analysis to identify, clarify and organize system requirements. The use case is
made up of a set of possible sequences of interactions between systems and users in a
particular environment and related to a particular goal. Following use case diagrams will
19
depict how our system works.
Train model
User
Test model
21
2. User selects view dataset.
3. User cancel to save the model.
4. User selects view result.
22
2.5.2. Test Model Use Case Description
Table 4 Test Model Use Case Description
23
Chapter 3
System Design
Layers Description
Presentation Layer This layer will be used for the interaction with the user
through a graphical user interface.
Business Logic Layer This layer contains the business logic. All the
constraints and majority of the functions reside under
this layer.
Occupies the top level and displays information related to services available on a website.
This tier communicates with other tiers by sending results to the browser and other tiers in
the network.
Application Layer also called the middle tier, logic tier, business logic or logic tier, this tier
is pulled from the presentation tier. It controls application functionality by performing
detailed processing.
24
3.2. System Design Diagrams
High-level design provides a view of the system at an abstract level. It shows how the major
pieces of the finished application will fit together and interact with each other. The high-
level design does not focus on the details of how the pieces of the application will work.
Those details can be worked out later during low-level design and implementation.
System sequence diagram (SSD) is a sequence diagram that shows, for a particular scenario
of a use case, the events that external actors generate their order, and possible inter-system
events.
This is Train Model’s system sequence diagram which shows that when the user clicks on
Train model button, datasets are displayed. User first selects the dataset and dataset
selected. Then user selects disaster type and disaster selected as well. After this user
selects the feature extraction and after selection user selects the preprocessing technique.
After this user selects the weighting technique and then selects the ML model, evaluation
metrics and validation technique. After this user requests system to train the model.
System trains the model successfully. After that user can view result as well as save the
trained model as shown in Figure#4.
25
Figure 3Train Model SSD
26
3.2.2.2. Test Model SSD
When the user selects the Test model button, user selects the already trained model and
system in response displays the selected trained model. Then user enter the unseen data
and clicks on predict button and after this system displays the predicted class label as
shown in Figure#5.
27
3.3. Domain Model
The Domain Model is your organized and structured knowledge of the problem. The Domain
Model should represent the vocabulary and key concepts of the problem domain and it should
identify the relationships among all of the entities within the scope of the domain. In our
system we have twelve entities, the user entity is used to register and login to the system, and
train model entity is used to train the model, to train the model we first need the dataset, so we
have a choose dataset entity, and after choosing data we have to apply feature extraction
method and preprocessing technique so we have a feature extraction entity and preprocessing
entity, after this we need machine learning model, evaluation metrics and validation technique
entities, and after getting the result of train model, we have to store that model in our database
so we have save model entity, we can also test our model by giving unseen review so we
have test model entity, then system predict result so will also be having prediction entity.
28
3.4. Flow Chart
29
flow chart Flow chart
Start
End
Select Features
Train Model
End
Select Preprocessing
Technique
Select Validation
Technique
30
3.4.1 View Dataset Flow Chart
Select Dataset
View Dataset
End
User will select one of the feature extraction methods from the given options, e.g.
Bag of words.
Part of speech tagging.
Unigram.
Bigram.
TF-IDF (Term Frequency – Inverse Document Frequency),
Word2vec
Fasttext
Figure 8 View Dataset Flow Chart
31
flow chart Feature Computation FC
Part of Speech
Tagging
Bag of Words
Unigram
Select Features
Bigram
End
TF-IDF
TFIDF - BIGRAM
FastText
Word2vec
All Features
32
3.4.3 Pre-Processing Flow Chart
User will select preprocessing techniques from the given options, e.g.
Stopwords Removal.
Stopwords Removal and Special Character Removal.
Stopword Removal, Special Character Removal and Lemmatization.
flow chart Preprocessing Flowchart
Select Stopword
Removal
User will select one machine learning model from the given options:
Naïve Bayes
Random Forest
33
flow chart ML MODEL FC
Start
End
Select Random
Forest
Accuracy
F-measure
Precision
Recall
34
flow chart Evaluation Metric FC
Select Accuracy
End
Start
Select Recall
Select F-Measure
35
flow chart Validation Technique FC
Start End
Select Hold-out
Method
36
3.4.7 Save Model Flow Chart
User will save the trained model into the system with following credentials:
Current date and time
Dataset Name
Feature Computation
Pre-Processing Technique
Machine Learning Model
Accuracy
Precision
Recall
F-measure
Validation Technique
37
flow chart Save Model FC
Start
End
Save Dataset
Name
Save Evaluation
Meterics
Save Feature
Computation Save
Preprocessing Save Machine
Methods
Techniques Learning Model
Enter Unseen
Predicted Label
Tweet
Start End
Main page shows tabs train model, test model, View Dataset and About. User can select any
of the tab as shown in Figure 16.
38
Figure 16 Main Page Interface
39
3.5.3. Train Model
User selects the ML model, evaluation metrics and validation technique and click on train
model button. User can also click on back button to make any change or view result button to
check the results as shown in Figure 18.
In this user can select the dataset and upload that file from system location and press button
view data to preview data as shown in figure19.
40
Figure 19View Dataset Interface
User can select any of the feature and click on the next button for further process or back.
User can select any of the preprocessing technique and click on the next button for further
process or back button if he wants to change dataset or disaster or feature as shown in Figure
21.
41
Figure 21Data Preprocessing Interface
42
Figure 23Classifier Result Interface
43
Figure 25Unseen Tweet Prediction Interface
Chapter 4
Software Development
44
4.1. Coding Standards
4.1.1 Indentation
Proper code indention is used in this project. The indentation of blocks of code
enhances readability, understandability and hierarchy of lines of code.
4.1.2 Declaration
In this project we have used one declaration per line is to increase clarity and better
understanding of code. Following is the order of declaration:
All the widgets have been imported at the beginning.
The sequence of class variables is: First public, protected then private.
Instance variables follow the sequence: First public then private instance variables.
Then class constructors are declared with proper names.
Class methods are grouped by functionality rather than by scope or accessibility to
make reading and understanding the code easier.
Declarations for local variables are only at the beginning of code after importing
packages and libraries
Each line of code contains one declaration at most. Compound statements in this project
contain lines of code enclosed in braces. The inner block of code of compound statements
begins after the opening braces from next line. Proper indentation is also followed for lines
of codes inside the compound statements. Proper braces are used in code around all
statements such as if-else, try-catch etc.
Proper naming convention rules are followed while implementation of this project which
make programs more understandable by making them easier to read.
45
While implementing this project, we have used words from Natural Language (English) to
properly assign understandable names to classes, variables and methods. Such as Requests,
Document Collection, Basic Information etc. instead of un-understandable names like myc
method, a1, b1 etc.
Terminologies applicable to the domain of project are used. Implying that if user refers to
Email as Registration Number, then term Registration Number is used.
Mixed case is used to make names readable with lower case letters in general capitalizing the
first letter of class names and interface names.
46
while adhering to modern web design principles like browser portability, device
independence, and graceful degradation. [7]
48
4.4 Software Description
4.4.1. Module Classifier Code
# from mlxtend.classifier import StackingClassifier
import time
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score,
classification_report, accuracy_score, \
f1_score
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler, LabelEncoder
def get_preprocessing(pre_processing):
if pre_processing == 'Stopwords Removal':
return "a1"
elif pre_processing == 'Stopwords + Special Characters':
return "a2"
else:
return "a3"
50
dataset.drop(dataset.columns[[0, -1]], axis=1, inplace=True)
dataset.drop('text', axis=1, inplace=True)
dataset.reset_index(drop=True, inplace=True)
else:
raise Exception('Unknown Feature Type')
return dataset
def generate_random_forest(dataset):
label_Label = LabelEncoder()
# covernverting text into numbers
dataset["label"] = label_Label.fit_transform(dataset['label'])
X = dataset.drop("label", axis=1)
y = dataset['label']
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
start = time.time()
classifier = RandomForestClassifier(n_estimators=42, criterion='entropy')
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
51
cv = ShuffleSplit(n_splits=5, test_size=0.3)
52
scores = cross_val_score(classifier, X, y, cv=10)
print(classification_report(y_test, y_pred))
print("Random Forest accuracy after 10 fold CV: %0.2f (+/- %0.2f)" %
(scores.mean(), scores.std() * 2) + ", " + str(
round(time.time() - start, 3)) + "s")
print("******************************")
print("******************************")
print("******************************")
print("***********************************************************************
*******************")
def generateNaiveBayes(dataset):
start = time.time()
label_Label = LabelEncoder()
print(dataset["label"])
# covernverting text into numbers
dataset["label"] = label_Label.fit_transform(dataset['label'])
print(dataset["label"])
X = dataset.drop("label", axis=1)
y = dataset['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
nb = GaussianNB()
print(X.head())
print(y)
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)
cv = ShuffleSplit(n_splits=5, test_size=0.3)
scores = cross_val_score(nb, X, y, cv=10)
print(classification_report(y_test, y_pred))
53
print("Naive Bayes accuracy after 10 fold CV: %0.2f (+/- %0.2f)" %
(scores.mean(), scores.std() * 2) + ", " + str(
54
round(time.time() - start, 3)) + "s")
print("******************************")
print("******************************")
print("******************************")
print(" ")
print('Precision:', precision_score(y_test, y_pred, average='weighted'))
precision = precision_score(y_test, y_pred, average='weighted')
# print ('Precision:', precision_score(y_test, y_pred))
print(" ")
print(" ")
print(" ")
return accuracy, precision, recall, f1score, nb
Part-of-Speech Tagging
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language
and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc.
Following code is for POS
import re import nltk
import pandas as pd
from NGRAM import output_to_csv
from NGRAM import stopword_removal
feature_list = ["NN", "NNP", "CD", "VBD", "VBN", "NNS", "JJ", "PRP", "RB",
55
"VBP", "VBG", "VBZ", "IN", "DT", "NNPS",
"VB", "CC", "JJS", "PRP$", "JJR", "MD", "WRB", "UH", "EX",
"FW", "RBR", "WP", "TO", "RBS", "RP", "WDT",
"PDT"]
def main():
Tweets_df =
pd.read_csv("D:/FYP/Dataset/hurricanes_eyewitness_annotations_2004.csv")
texts_list = Tweets_df['text'].tolist()
print(texts_list)
pos_list = []
# texts_list[0] = "Playing...."
#for i in range(len(texts_list)):
#texts_list[i] =
texts_list[i].lower()
# Return a match at every NON word character (characters NOT between a
and Z. Like "!", "?" white-space etc.)
#texts_list[i] = re.sub(r'\W', ' ', texts_list[i])
## Replace all white-space characters with ""
#texts_list[i] = re.sub(r'\s+', ' ', texts_list[i])
# print(texts_list[i])
for text in texts_list:
tokens = nltk.word_tokenize(text)
print(tokens)
tokens = stopword_removal(tokens)
print(tokens)
tagged = nltk.pos_tag(tokens)
print(tagged)
counts = nltk.Counter(tag for word, tag in tagged)
print(counts)
pos_list.append(counts)
tokens = nltk.word_tokenize(text)
tokens = stopword_removal(tokens)
tagged = nltk.pos_tag(tokens)
counts = nltk.Counter(tag for word, tag in tagged)
counts = dict(counts)
result = []
for feature in feature_list:
if feature in counts:
56
result.append(counts[feature])
else:
57
result.append(0)
print(result)
return result
Bag-of-words :
A bag-of-words model, or BoW for short, is a way of extracting features
from text for use in modeling, such as with machine learning algorithms.
The approach is very simple and flexible, and can be used in a myriad of
ways for extracting features from documents Code import re
#def stopword_rem(token):
#tokens_without_sw = [word for word in token if not word in stopwords.words()]
#return tokens_without_sw
#def lemmitization(token):
#token = wordnet_lemmatizer.lemmatize(token, pos="v")
#return token
58
def
main():
Review_df = pd.read_csv("C:/FYP/POS tagging/bagofwords/abc.csv") texts_list =
Review_df['text'].tolist() # texts_list[0] = "Playing...." for i in range(len(texts_list)):
texts_list[i] = texts_list[i].lower()
# Return a match at every NON word character (characters NOT between a and Z. Like
"!", "?" white-space etc.) texts_list[i] =
re.sub(r'\W', ' ', texts_list[i]) # Replace all
white-space characters with "" texts_list[i]
= re.sub(r'\s+', ' ', texts_list[i])
# TODO Number remove
bag_of_words_list = [] count = 0
"""
['The', 'The', asim] wordfreq['The'] wordfreq {
'key': value
The: 2
Samad: 1
}
sentence_1 = ['The', 'The', Asim] sentence_2 = ['The', 'BAG', asim]
59
#token = wordnet_lemmatizer.lemmatize(token, pos="v")
if token not in wordfreq.keys():
wordfreq[token] = 1
else:
wordfreq[token] += 1
count += 1
bag_of_words_list.append(wordfreq)
TF-IDF
TF-IDF is a statistical measure that evaluates how relevant a word is to a
document in a collection of documents. It has many uses, most importantly
in automated text analysis, and is
very useful for scoring words in machine learning algorithms for Natural
Language Processing (NLP).
CODE
import pandas as pd import re import nltk from Pre_Processing
punctuations = "?:!.,;"
def compute_idf(doc_list):
import math idf_dict = {}
N = len(doc_list) # [{}, {}, {}] for doc in doc_list:
for word, val in doc.items(): if val > 0: if idf_dict.get(word):
idf_dict[word] += 1 else:
idf_dict[word] = 1
return idf_dict
61
df = pd.DataFrame(data_list) df = df.fillna(0) df.index.name = "Review #" if
review_df is not None: df['text'] = review_df['text'] cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1] df = df[cols] df.to_csv(file_name)
def main():
texts_list = ["it is going to rain today",
"today i am not going outside",
"i am going to watch the season premiere"]
# reviews_df = pd.read_csv("abc.csv")
# texts_list = reviews_df['text'].tolist()
for i in range(len(texts_list)):
texts_list[i] = texts_list[i].lower() texts_list[i] = re.sub(r'\W', ' ', texts_list[i])
texts_list[i] = re.sub(r'\s+', ' ', texts_list[i])
62
# Lemmatization for i in range(len(token)): token[i] = lemmitization(token[i])
tf, freq = compute_tf(token) all_tfs.append(tf) all_freqs.append(freq)
Pre-processing
In pre-processing we are doing stop-word removal , special character removal and lemmatization
Code from nltk import WordNetLemmatizer from nltk.corpus import stopwords
wordnet_lemmatizer = WordNetLemmatizer() def stopword_rem(token):
tokens_without_sw = [word for word in token if not word in stopwords.words()] return
tokens_without_sw
def lemmitization(token):
token = wordnet_lemmatizer.lemmatize(token, pos="v") return token
Word2vec Code
import xlrd import sys import codecs import json import nltk
from nltk import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize
import gensim
import re
from gensim.models import Word2Vec
import array as ar
import xlsxwriter
wordnet_lemmatizer = WordNetLemmatizer()
from nltk.corpus import wordnet
63
# nltk.download('punkt')
# nltk.download('stopwords')
from Preprocessing import lemmitization
def get_wordnet_pos(word):
"""Map POS tag to first character lemmatize() accepts"""
tag = nltk.pos_tag([word])[0][1][0].upper()
tag_dict = {"J": wordnet.ADJ,
"N": wordnet.NOUN,
"V": wordnet.VERB,
"R": wordnet.ADV}
lemmatizer = WordNetLemmatizer()
word = 'feet'
print(lemmatizer.lemmatize(word, get_wordnet_pos(word)))
def lemmatize_stemming(text):
# return stemmer.stem(WordNetLemmatizer().lemmatize(text, pos='v'))
lemmatizer = WordNetLemmatizer()
return lemmatizer.lemmatize(text, get_wordnet_pos(text))
def tokenize_Words(tokenized_text):
processed_article = tokenized_text.lower()
processed_article = re.sub(
r'''(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-
z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\
([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))''',
" ", processed_article)
processed_article = processed_article.lower()
# Return a match at every NON word character (characters NOT between a and
Z. Like "!", "?" white-space etc.)
processed_article = re.sub(r'\W', ' ', processed_article)
# print(texts_list[i])
# Number remove
processed_article = re.sub("\d+", "", processed_article)
# print(texts_list[i])
# articles removal
processed_article = re.sub('\s+(a|an|and|the|they|them|is|am|are)(\s+)', '
', processed_article)
# Replace all white-space characters with ""
processed_article = re.sub(r'\s+', ' ', processed_article)
64
stopwords.words('english')]
65
for rw in range(len(all_words)):
for cl in range(len(all_words[rw])):
all_words[rw][cl] = lemmatize_stemming(all_words[rw][cl])
return all_words
loc = "wildfire.xlsx"
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
# For row 0 and column 0
tokenized_text = ""
reviewList = []
print(tokenized_text)
# Lemmatization
vecfinal = []
for vr in vecReview:
count = 0
temp = []
for wr in vr:
if count == 0:
count = 1
for cnt in range(0, 100):
temp.append(wr[cnt])
else:
for cnt in range(0, 100):
temp[cnt] = temp[cnt] + wr[cnt]
vecfinal.append(temp)
# with open("tt.txt", "w") as
output: #
output.write(str(vecfinal))
66
print(len(vecfinal))
67
workbook = xlsxwriter.Workbook('res.xlsx')
worksheet = workbook.add_worksheet()
col = 0
row = 1
for ii in vecfinal:
col = 0
for jj in ii:
worksheet.write(row, col, jj)
col = col + 1
row = row + 1
workbook.close()
print(vecfinal)
Home
Component.html
<mat-drawer-container>
<mat-drawer-content>
<div class="silder">
<mat-icon aria-hidden="false" aria-label="Example home icon" class="desktop-hide"
(click)="drawer.toggle()">
reorder
</mat-icon>
<div class="silder-content">
<h1 style="font-family: flex;">EYEWITNESS IDENTIFICATION</h1>
</div>
</div>
<footer class="footer" style="background-color: gray; padding: auto; height: 155px;">
<div class="container-fluid">
About Component.html
<div class="main-content">
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<div class="card">
<div class="header">
<h4 class="title" style="font-family: flex;">Identification of Eye Witness
Tweets</h4>
</div>
<div class="content">
<div>Identifying Eyewitness during Disaster
Identifying Eyewitness is an important area of research in cognitive
psychology and human
memory.
Eyewitnesses frequently play a vital role in uncovering the truth about a
disaster.
The evidence they provide can be critical in identifying, charging, and
ultimately saving
the stucked people.
That is why it is absolutely essential that eyewitness evidence be accurate and
reliable and
twitter is one such source.
69
</div>
</div>
</div>
</div>
</div>
</div>
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<div class="card">
<div class="header">
<h4 class="title" style="font-family: flex;" >Major Steps Include</h4>
</div>
<div class="content">
<div>
<ul>
<li>Dataset Selection</li>
<li>Train Model</li>
<li>Data Preprocessing</li>
<li>Feature Extraction</li>
<li>Machine Learning Model</li>
<li>Model Validation Techniques</li>
<li>Evaluation</li> <li>Model Testing</li>
<li>Save Model</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<footer class="footer">
<div class="container-fluid">
</div>
</footer>
Sidebar Component.html
<div class="sidebar-wrapper">
<div class="logo">
<img src="/assets/img/angular2-logo-white.png"/>
</div> -->
</a>
</div>
<li *ngIf="isMobileMenu()">
</a>
</li>
71
<!-- <li class="dropdown" *ngIf="isMobileMenu()">
5 Notifications
<b class="caret"></b>
</p>
</a>
<ul class="dropdown-menu">
</ul>
</li> -->
<a>
72
<p class="hidden-lg hidden-
md">Search</p>
</a>
</li> -->
<a href="">
<p>Account</p>
</a>
</li> -->
<p>Dropdown
<b class="caret"></b>
</p>
</a>
<ul class="dropdown-menu">
<li><a href="#">Action</a></li>
<li><a href="#">Something</a></li>
<li><a href="#">Something</a></li>
<li class="divider"></li>
73
<li><a href="#">Separated link</a></li>
</ul>
</li> -->
<a>
<p>Log out</p>
</a>
</li> -->
<a [routerLink]="[menuItem.path]">
<i class="{{menuItem.icon}}"></i>
<p>{{menuItem.title}}</p>
</a>
</li>
</ul>
</div>
<div class="main-content">
<div class="container-fluid">
<div class="row">
74
<div class="col-md-12">
<div class="card">
</div>
</div>
The actual rendered columns are set as a property on the row definition" -->
<ng-container matColumnDef="ID">
</ng-container>
<ng-container matColumnDef="Dataset">
<ng-container matColumnDef="Feature">
</ng-container>
<ng-container matColumnDef="ClassBalancing">
<ng-container matColumnDef="Preprocessing">
</ng-container>
<ng-container matColumnDef="Model">
</ng-container>
76
<ng-container matColumnDef="Validation">
</ng-container>
<ng-container matColumnDef="Metrics">
</ng-container>
<ng-container matColumnDef="createDate">
</ng-container>
<ng-container matColumnDef="Action">
</div>
</td>
</ng-container>
</table>
</div>
</div>
</div>
</div>
</div>
78
</div>
</div>
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
</ol>
</nav>
<div class="column">
<div>
<div class="form-group">
</div>
</div>
</div>
<div class="form-check">
Enable Text
</label>
</div>
<div>
<div class="form-group">
</div>
</div>
</div>
</div>
<div class="clearfix"></div>
81
<button mat-raised-button color='primary' matStepperNext
[disabled]='firstFormGroup.invalid || !enableViewButton'>Next</button>
</div>
</form>
</mat-step>
82
<mat-select matNativeControl required formControlName="feature">
<mat-option value="Part of Speech Tagging">Part of Speech
Tagging</mat-option>
<mat-option value="Bag of Words Technique">Bag of Words </mat-
option>
<mat-option value="unigram">Uni-gram </mat-option>
<mat-option value="bigram">Bi-gram </mat-option>
<mat-option value="Tf-Idf Technique">TF-IDF </mat-option>
<mat-option value="Word2vec">Word2vec</mat-option>
</mat-select>
</mat-form-field>
<mat-checkbox
id="class-balancing"
color="primary"
formControlName="classBalancing">
Class Balancing
</mat-checkbox>
<div>
<button mat-raised-button color='accent' matStepperPrevious
style="margin-right: 4px;">Back</button>
<button mat-raised-button color='primary' matStepperNext
[disabled]='thirdFormGroup.invalid'>Next</button>
</div>
</form>
</mat-step>
<mat-step [stepControl]="secondFormGroup" [editable]="isEditable">
<form [formGroup]="secondFormGroup">
<ng-template matStepLabel>Pre-Processing</ng-template>
<mat-form-field appearance="outline">
<mat-label>Pre-Processing</mat-label>
<mat-select matNativeControl required
formControlName="preprocessing">
83
<mat-option value="Stopwords Removal">Stopwords Removal</mat-
option>
<mat-option value="Stopwords + Special Characters">Stopwords +
Special Characters</mat-option>
<mat-option value="Stopwords + Special Characters + Lemmatization"
[disabled]='disableP3'>Stopwords + Special Characters + Lemmatization</mat-option>
</mat-select>
</mat-form-field>
<div>
<button mat-raised-button color='accent' matStepperPrevious
style="margin-right: 4px;">Back</button>
<button mat-raised-button color='primary' matStepperNext
[disabled]='secondFormGroup.invalid'>Next</button>
</div>
</form>
</mat-step>
<mat-step [stepControl]="forthFormGroup" [editable]="isEditable">
<form [formGroup]="forthFormGroup">
<ng-template matStepLabel>Prediction</ng-template>
<mat-form-field appearance="outline">
<mat-label>Machine Learning Model</mat-label>
<mat-select matNativeControl required
formControlName="machineLearning">
<mat-option value="Random Forest">Random Forest</mat-option>
<mat-option value="Naive Bayes">Naive Bayes</mat-option>
</mat-select>
</mat-form-field>
<mat-form-field appearance="outline">
<mat-label>Validation Techniques</mat-label>
<mat-select matNativeControl required formControlName="validation">
<mat-option value="10-fold Cross Validation">10-fold Cross
Validation</mat-option>
84
<mat-option value="Hold-Out Method">Hold-Out Method</mat-option>
</mat-select>
</mat-form-field>
<div *ngIf='enableSlider'>
<mat-label>Training Test Split (%)</mat-label>
<mat-slider
class="slider"
[invert]="false"
[max]="90"
[min]="50"
[step]="1"
[thumbLabel]="true"
formControlName="slider"
[vertical]="false">
</mat-slider>
</div>
<div>
<span class="example-list-section">
<mat-checkbox class="example-margin"
[checked]="allComplete"
[color]="task.color"
[indeterminate]="someComplete()"
(change)="setAll($event.checked)">
{{task.name}}
</mat-checkbox>
</span>
<span class="example-list-section">
<ul>
85
<li *ngFor="let subtask of task.subtasks;">
<mat-checkbox
[ngModelOptions]="{standalone: true}"
[color]="subtask.color"
[(ngModel)]="subtask.completed"
(change)='updateAllComplete()'>
{{subtask.name}}
</mat-checkbox>
</li>
</ul>
</span>
</div>
<div>
<button mat-raised-button color='accent' matStepperPrevious
style="margin-right: 4px;">Back</button>
<button mat-raised-button color='primary' matStepperNext
[disabled]='forthFormGroup.invalid' (click)='trainModel()'>TRAIN</button>
</div>
</form>
</mat-step>
<mat-step>
<ng-template matStepLabel>Results</ng-template>
<div *ngIf='response'>
<p style="font-family: flex;">Model has been trained successfully.</p>
</div>
<div style="margin-top: 16px;" *ngIf='displayResults'>
<div id="successAlert">
<div id='accuracy-div' class="progress"
*ngIf='task.subtasks[0].completed'>
<div id='accuracy-bar' class="progress-bar" role="progress-bar" aria-
valuenow=response.accuracy aria-valuemin="0" aria-valuemax="100"
[style.width]="response.accuracy + '%'">
86
<span class="sr-only">60% Complete</span>
</div>
<span class="progress-type">Accuracy</span>
<span id='accuracy' class="progress-
completed">{{response.accuracy.toFixed(2)}}%</span>
</div>
<div id='precision-div' class="progress"
*ngIf='task.subtasks[1].completed'>
<div id='precision-bar' class="progress-bar progress-bar-success"
role="progressbar" aria-valuenow=response.precision aria-valuemin="0" aria-valuemax="100"
[style.width]="response.precision + '%'">
<span class="sr-only">40% Complete (success)</span>
</div>
<span class="progress-type">Precision</span>
<span id='precision' class="progress-
completed">{{response.precision.toFixed(2)}}%</span>
</div>
<div id='recall-div' class="progress" *ngIf='task.subtasks[2].completed'>
<div id='recall-bar' class="progress-bar progress-bar-info"
role="progressbar" aria-valuenow=response.recall aria-valuemin="0" aria-valuemax="100"
[style.width]="response.recall + '%'">
<span class="sr-only">20% Complete (info)</span>
</div>
<span class="progress-type">Recall</span>
<span id='recall' class="progress-
completed">{{response.recall.toFixed(2)}}%</span>
</div>
<div id='f1score-div' class="progress" *ngIf='task.subtasks[3].completed'>
<div id='f1score-bar' class="progress-bar progress-bar-warning"
role="progressbar" aria-valuenow=response.f1score aria-valuemin="0" aria-valuemax="100"
[style.width]="response.f1score + '%'">
<span class="sr-only">60% Complete (warning)</span>
</div>
87
<span class="progress-type">F1-Score</span>
<span id='f1score' class="progress-
completed">{{response.f1score.toFixed(2)}}%</span>
</div>
</div>
</div>
<div>
<button mat-raised-button color='warn' (click)="stepper.reset();
fileReset()" style="margin-right: 4px;">Reset</button>
<button mat-raised-button color='accent' style="margin-right: 4px;"
[disabled]='!response' (click)='onDisplay()'>Display Result</button>
<button mat-raised-button color='primary' [disabled]='!response'
(click)='onSave()'>Save Result</button>
</div>
</mat-step>
</mat-horizontal-stepper>
</div>
</div>
</div>
</div>
</div>
</div>
app.component.ts
88
@Component({
89
selector: 'app-root',
templateUrl: './app.component.html',
styleUrls: ['./app.component.css']
})
export class AppComponent implements OnInit {
ngOnInit(){
}
isMap(path){
var titlee = this.location.prepareExternalUrl(this.location.path());
titlee = titlee.slice( 1 );
if(path == titlee){
return false;
}
else {
return true;
}
}
}
app.module.ts
90
import { AppRoutingModule } from './app.routing';
91
import { NavbarModule } from './shared/navbar/navbar.module';
import { FooterModule } from './shared/footer/footer.module';
import { SidebarModule } from './sidebar/sidebar.module';
import {CustomDatePipe} from './custom.datepipe';
// Material Modules
import { MatStepperModule } from '@angular/material/stepper';
import { MatFormFieldModule } from '@angular/material/form-field';
import { MatInputModule } from '@angular/material/input';
import { MatSelectModule } from '@angular/material/select';
import { MatButtonModule } from '@angular/material/button';
import { MatCheckboxModule } from '@angular/material/checkbox';
import { MatSliderModule } from '@angular/material/slider';
import { MatDialogModule } from '@angular/material/dialog';
import { MatProgressSpinnerModule } from '@angular/material/progress-spinner';
import { MatSnackBarModule } from '@angular/material/snack-bar';
import { MatTableModule } from '@angular/material/table';
import { TestPredictionComponent } from './test-model/test-prediction/test-
prediction.component';
92
@NgModule({
imports: [
BrowserAnimationsModule,
FormsModule,
RouterModule,
HttpClientModule,
NavbarModule,
FooterModule,
SidebarModule,
AppRoutingModule,
MatStepperModule,
BrowserModule,
ReactiveFormsModule,
MatFormFieldModule,
MatInputModule,
MatSelectModule,
MatButtonModule,
MatCheckboxModule,
MatSliderModule,
MatDialogModule,
MatProgressSpinnerModule,
MatSnackBarModule,
MatTableModule
],
declarations: [
AppComponent,
CustomDatePipe,
AdminLayoutComponent,
AboutComponent,
93
ViewDataComponent,
TrainModelComponent,
TestModelComponent,
ViewDataDialogComponent,
TestPredictionComponent
],
providers: [ModelService],
bootstrap: [AppComponent]
})
export class AppModule { }
app.routing.ts
@NgModule({ imp
orts: [
CommonModule,
BrowserModule,
RouterModule.forRoot(routes,{
useHash: true
})
],
exports: [
],
})
export class AppRoutingModule { }
custom.datepipe.ts
@Pipe({
name: 'customDate'
})
export class CustomDatePipe extends
DatePipe implements PipeTransform
95
{
96
transform(value: any, args?: any): any {
return super.transform(value, "EEEE d MMMM y h:mm a");
}
}
http.service.ts
const httpOptions = {
headers: new HttpHeaders({ 'Content-Type': 'application/json' })
};
@Injectable()
export class ModelService {
base_url = 'http://localhost:5000/'
constructor(private http:HttpClient) {}
postModel(data: any){
let url = this.base_url + 'train';
return this.http.post(url, data, httpOptions)
}
saveModel(data: any){
let url = this.base_url + 'save-model';
97
return this.http.post(url, data, httpOptions)
98
}
result.service.ts
@Injectable({ pro
videdIn: 'root'
})
export class ResultService {
private results: Result[] = [];
private selectedResult: Result = null;
private resultSubject: BehaviorSubject<Result[]> = new BehaviorSubject(this.results);
private selectedResultSubject: BehaviorSubject<Result> = new
BehaviorSubject(this.selectedResult);
constructor() {}
100
public addToResult(result: Result){
result.ID = this.results.length + 1;
this.results.push(result);
this.resultSubject.next(this.results);
}
public getResults(){
return this.resultSubject;
}
public getSelectedResults(){
return this.selectedResultSubject;
}
public clearResults(){
this.results = [];
this.resultSubject.next(this.results);
}
101
Chapter 5
Software Testing
This chapter provides a description about the adopted testing procedure. This includes the
selected testing methodology, test suite and the test results of the developed software.
Expected Output
Dataset selected
Expected Exceptions
Date: 1/8/2021
Tweets Data
Expected Output
Actual Output
Expected Exceptions
Backend exception
103
5.2.3 Apply Feature Extraction method on Dataset Test Case
Table 8 Apply Feature Extraction method on Dataset
Date: 4/8/2021
Input
Expected Output
Actual Output
Expected Exceptions
104
5.2.4 Apply Part of speech on Dataset Test Case
Table 9 Apply Part of speech on Dataset Test Case
Date: 4/8/2021
Input
Expected Output
Actual Output
Expected Exceptions
105
5.2.5 Remove Special Characters Test Case
Table 10 Remove special characters Test Case
Date: 4/8/2021
Expected Output
System will apply special character removal from the given dataset.
Actual Output
Expected Exceptions
106
5.2.6 Apply Preprocessing Technique on Dataset Test Case
Date: 4/8/2021
Input
Stopword Removal
Expected Output
Actual Output
Expected Exceptions
Date: 1/8/2021
Lemmatization
Expected Output
Actual Output
Exceptions Backend
working
Invalid File
Date: 1/8/2021
108
Input
109
Expected Output
Actual Output
Exceptions Backend
working
Invalid File
Date: 4/8/2021
Input
No input
Expected Output
System displayed the display Machine learning model and Evaluation Metrics.
Expected Exceptions
Backend exception
110
5.2.10 Machine Learning Model Test case
Table 15 Machine Learning Model Test Case
Date: 4/8/2021
Naive Bayes
Expected Output
Actual Output
Expected Exceptions
111
5.2.11 Evaluation Metrics Test case
Table 16 Evaluation Metrics Test Case
Date: 4/8/2021
Input
Expected Output
Actual Output
Exceptions
112
5.2.12 Evaluation Metrics Test case
Table 17 Evaluation Metrics Test Case
Date: 4/8/2021
Expected Output
Actual Output
Exceptions
113
5.2.13 Apply Classifier
Test case
Table 18 Apply Classifier Testcase
114
5.2.14 Date: 4/8/2021
Save
System: Automatic Eye-witness
Identification
Objective: Apply Classifier Test ID: 15 Table 19
Save Model
Test Case
Version: 1 Test Type: Black Box Testing
Input
Expected Output
Actual Output
System displayed the display Machine learning model and Evaluation Metrics
results.
Expected Exceptions
115
Test Type: Black Box
Version: 1 Testing
Input
Expected Output
Actual Output
Expected Exceptions
Date: 4/8/2021
116
System: Automatic Eye-witness
Identification
Input
Expected Output
Actual Output
Expected Exceptions
117
5.2.16 Test Model
Test case
Date: 1/8/2021
Input
Expected Output
Actual Output
Expected Exceptions
118
5.2.17 Test Model
Test Case 2
Date: 1/8/2021
Expected Output
Actual Output
Expected Exceptions
119
5.2.18 Unseen Prediction Test Case 1
Date: 4/8/2021
Input
Expected Output
Actual Output
Direct-Eyewitness
Expected Exceptions
120
5.2.19 Unseen Prediction Test Case 2
Table 24 Unseen Prediction Test Case 2
Date: 4/8/2021
Input
Text field e.g. There are no tropical cyclones in the Atlantic at this time.
Expected Output
Non-Eyewitness
Actual Output
Don’t Know
Expected Exceptions
Date: 4/8/2021
Non-eyewitness
Expected Exceptions
Date: 1/8/2021
Expected Exceptions
None
• GitHub
First, we have to install git on the system then we will make the account on GitHub.
Then we will install GitToolBox on PyCharm, plugins.
• Then we will push the project on GitHub hub using this tool.
• Heroku
• Then we will check status of deployment on Heroku and also from PyCharm.
The report of the project, “Web App to classify Shopify User Review Using Textual
Features” has been approved based on the following evaluation guideline.
Artifacts Guidelines
Analysis and Design artifacts are syntactically correct (use-case model, SSDs,
domain model, class diagram, SDs, ERDs, Flow charts, Activity Diagram,
DFDs)
Consistency and traceability have been maintained among different artifacts
General Guidelines
Formatting (font style, indentation) is according to the FYP template and
consistent throughout the document
Captions are added to all the figures and tables. Figure captions must be placed
below each figure, and table captions must be provided above the table
Each figure or table is followed by some text describing what it represents
Webpage
[1] https://www.jetbrains.com/pycharm/features/, last accessed July 24, 2021.