Disaster Response Classification Using NLP
Disaster Response Classification Using NLP
RESPONSE
CLASSIFICATION
USING
NLP
Under Supervision Of -
Mrs. Sonali Mathur
Project
i. RESEARCH Ii. RESEARCH
Project Introduction. implementation.
PAPER 1 PAPER 2
(DFD)
Our project aims on building a disaster response web application which will help
classify the message sent by the user into different categories like earthquakes,
typhoon, etc, with the help of Python, flask and NLP.
The project consists of three major parts - ETL Pipeline, ML Pipeline, Web
Application Dashboard.
IMPLEMENTATION
IMPLEMENTATION PROCESS
PROCESS (DFD)
(DFD)
LITERATURE REVIEW
LITERATURE REVIEW
RESEARCH PAPER 1
RESEARCH PAPER 1
Title : SMS Classification Method for Disaster Response using Naïve Bayes Algorithm.
Publication Year : 2019
Introduction : This research paper is focused on extraction of information from SMS with the help of data mining and
training a model with that information.
Methodology : The paper was focused on performance analysis of Naïve-Bayes and J48 classification. There technique used
pre-processing techniques to eliminate insignificant words also known as STOP WORDS and to make a learned classifier.
Conclusion : Naïve-Bayes algorithm is more probabilistic in nature and fits more appropriately in the study of text
extraction. This technique showed 89% accuracy with 11% false-negative results.
RESEARCH PAPER 2
RESEARCH PAPER 2
Publication Year : 2016
Introduction : This paper presents a system for classifying disaster-related tweets. The focus is on Twitter data
generated before, during, and after Hurricane Sandy, which impacted New York in the fall of 2012.
Methodology : Three classification models (support vector machines (SVMs), maximum entropy (MaxEnt ) models,
and Naive Bayes) were assessed and the best performing one was used. SVM being the best F1 performance, was
used.
Conclusion : Their proposed classifiers are both more general (identifying all relevant tweets, not just situational
awareness) and richer (with fine-grained categorizations).
RESEARCH PAPER 3
RESEARCH PAPER 3
Title : Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages.
Introduction : Three well-known learning algorithms are presented here Naïve Bayes (NB), Support Vector Machines
(SVM), and Random Forest (RF). They created a large corpora namely 52 million crisis related tweets during 19 different
crisis events to train there models.
Methodology : The main part technique implemented in their paper is the use of uni-grams and bi-grams as their main
features. Initial vocabularies are built consisting of lexical variations to identify OOV (Out of Vocabulary) words.
Conclusion : By training there models against a large corpus (52 Million tweets) accuracy is guaranteed to be high.
Microblogging sites can be really helpful for information gathering. The word embeddings are generated using
Continuous Bag Of Words (CBOW).
IMPLEMENTATION OF THE PROJECT
IMPLEMENTATION OF THE PROJECT
ETL Pipeline
ML Pipeline
Web Application
1. Data Visualization Screen
2. Classification Screen
ETL Pipeline :
ETL Pipeline : resembles the set of processes extracting data from input,
ETL Pipeline itresembles
transforming the itsetinto
and loading of processes extracting data
an output destination from input,
for example: database, data
transforming
warehouse foritreporting,
and loading it into and
analysis an output destination for example: database, data
data synchronization.
warehouse for reporting, analysis and data synchronization.
Figure 1 ETL
Pipeline
Load messages and categories and merge them.
ETL Pipeline
This fig shows uncleaned merged data from messages and categories dataset.
This fig shows uncleaned merged data from messages and categories dataset.
FIGURE 3:-
Machine learning pipeline is a means of automating the machine learning workflow by enabling data to be
transformed and correlated into a model that can then be analyzed to achieve outputs. This type of ML pipeline
makes the process of inputting data into the ML model fully automated.
1. Data collection.
2. Data cleaning.
3. Feature extraction (labelling and
dimensionality reduction)
4. Model validation.
5. Visualization.
ML Pipeline
Machine learning (ML) pipelines consist of several steps to train a model. Machine learning pipelines are iterative
as every step is repeated to continuously improve the accuracy of the model and achieve a successful algorithm
A pipeline consists of a sequence of components which are a compilation of computations. Data is sent through
these components and is manipulated with the help of computation.
These multiple sequential steps do everything from data extraction and preprocessing to model training and
deployment.
Training and building a model
With Tfidf transformer you will systematically compute word counts using Count
Vectorizer and then compute the Inverse Document Frequency (IDF) values and only
then compute the Tf-idf scores.
Web Application
Screen 1: Data Visualization
QUERY(MESSA
QUERY(MESSA
GE)
GE) RESULTS
RESULTS
(QUERY
(QUERY RELATED
RELATED TO
TO THESE)
THESE)
THANK YOU