0% found this document useful (0 votes)
117 views24 pages

Disaster Response Classification Using NLP

The Project aims on Disaster Management using Natural Language Processing (NLP). NLP is a field of Artificial Intelligence that gives machines the ability to read, understand and create understandable data out of human languages for it to use it meaningfully. Disaster Response is the most important part of any Disaster Management. Delay in response to each second can make a difference between life and death in a lot of situations. Usually when the message arrives to the disaster management, then

Uploaded by

Karanveer Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views24 pages

Disaster Response Classification Using NLP

The Project aims on Disaster Management using Natural Language Processing (NLP). NLP is a field of Artificial Intelligence that gives machines the ability to read, understand and create understandable data out of human languages for it to use it meaningfully. Disaster Response is the most important part of any Disaster Management. Delay in response to each second can make a difference between life and death in a lot of situations. Usually when the message arrives to the disaster management, then

Uploaded by

Karanveer Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

DISASTER 

RESPONSE
CLASSIFICATION
USING  
NLP 
            Under Supervision Of -
                                               Mrs. Sonali Mathur

Kumar Shantanu (1709110081) 


Hardik Sharma (1709110065) 
Karanveer Singh  (1709110074) 
Nikhil Vats   (1709110096) 
PRESENTATION OVERVIEW
PRESENTATION OVERVIEW

Project
i. RESEARCH Ii. RESEARCH
Project Introduction. implementation.
PAPER 1 PAPER 2
(DFD)

iii. RESEARCH iv. RESEARCH SNAPSHOT


PAPER 3 PAPER 4  OF PROJECT
PROJECT INTRODUCTION
PROJECT INTRODUCTION

Our project aims on building a disaster response web application which will help
classify the message sent by the user into different categories like earthquakes,
typhoon, etc, with the help of Python, flask and NLP. 

The project consists of three major parts - ETL Pipeline, ML Pipeline, Web
Application Dashboard. 
IMPLEMENTATION
IMPLEMENTATION PROCESS
PROCESS (DFD)
(DFD)
LITERATURE  REVIEW
LITERATURE  REVIEW
RESEARCH PAPER 1
RESEARCH PAPER 1

Title : SMS Classification Method for Disaster Response using Naïve Bayes Algorithm. 

Publication Year : 2019

Introduction : This research paper is focused on extraction of information from SMS with the help of data mining and
training a model with that information.

Methodology : The paper was focused on performance analysis of Naïve-Bayes and J48 classification. There technique used
pre-processing techniques to eliminate insignificant words also known as STOP WORDS and to make a learned classifier. 

Conclusion : Naïve-Bayes algorithm is more probabilistic in nature and fits more appropriately in the study of text
extraction. This technique showed 89% accuracy with 11% false-negative results. 
RESEARCH PAPER 2
RESEARCH PAPER 2

Title : Identifying And Categorizing Disaster-related Tweets

Publication Year : 2016

Introduction : This paper presents a system for classifying disaster-related tweets. The focus is on Twitter data
generated before, during, and after Hurricane Sandy, which impacted New York in the fall of 2012.
Methodology : Three classification models (support vector machines (SVMs), maximum entropy (MaxEnt ) models,
and Naive Bayes) were assessed and the best performing one was used. SVM being the best F1 performance, was
used.
Conclusion : Their proposed classifiers are both more general (identifying all relevant tweets, not just situational
awareness) and richer (with fine-grained categorizations).
RESEARCH PAPER 3
RESEARCH PAPER 3
Title : Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages.

Publication Year : 2016

Introduction : Three well-known learning algorithms are presented here Naïve Bayes (NB), Support Vector Machines
(SVM), and Random Forest (RF). They created a large corpora namely 52 million crisis related tweets during 19 different
crisis events to train there models.
Methodology : The main part technique implemented in their paper is the use of uni-grams and bi-grams as their main
features. Initial vocabularies are built consisting of lexical variations to identify OOV (Out of Vocabulary) words.

Conclusion : By training there models against a large corpus (52 Million tweets) accuracy is guaranteed to be high.
Microblogging sites can be really helpful for information gathering. The word embeddings are generated using
Continuous Bag Of Words (CBOW). 
IMPLEMENTATION OF THE PROJECT
IMPLEMENTATION OF THE PROJECT

ETL Pipeline

ML Pipeline

Web Application
1. Data Visualization Screen
2. Classification Screen
ETL Pipeline : 
ETL Pipeline : resembles the set of processes extracting data from input,
ETL Pipeline itresembles
transforming the itsetinto
and loading of processes extracting data
an output destination from input,
for example: database, data
transforming
warehouse foritreporting,
and loading it into and
analysis an output destination for example: database, data
data synchronization.
warehouse for reporting, analysis and data synchronization.

Figure 1 ETL 
 Pipeline 
Load messages and categories and merge them.

ETL Pipeline

Two important steps in FIGURE 1:- 


Load_ data() function :
Clean_ data() function:

Remove useless data from the dataset and cleans it.


FIGURE 2 
FIGURE 2 

This fig shows uncleaned merged data from messages and categories dataset.
This fig shows uncleaned merged data from messages and categories dataset.
FIGURE 3:-

The fig shows the clean data, columns


with category names are generated so that
it becomes easy to process numbers 0 and
1 instead of processing language data. In
this process around 35 category columns
are created.
ML Pipeline

Machine learning pipeline is a means of automating the machine learning workflow by enabling data to be
transformed and correlated into a model that can then be analyzed to achieve outputs. This type of ML pipeline
makes the process of inputting data into the ML model fully automated. 

A typical machine learning pipeline


would consist of the following
processes:

1. Data collection.
2. Data cleaning.
3. Feature extraction (labelling and
dimensionality reduction)
4. Model validation.
5. Visualization.
ML Pipeline

Machine learning (ML) pipelines consist of several steps to train a model. Machine learning pipelines are iterative
as every step is repeated to continuously improve the accuracy of the model and achieve a successful algorithm
A pipeline consists of a sequence of components which are a compilation of computations. Data is sent through
these components and is manipulated with the help of computation.
These multiple sequential steps do everything from data extraction and preprocessing to model training and
deployment.
Training and building a model

Average accuracy ~95.56%


In order to use textual data for predictive modeling, the text must be parsed to remove certain
words – this process is called tokenization. These words need to then be encoded as integers, or
floating-point values, for use as inputs in machine learning algorithms. This process is called
feature extraction (or vectorization).
Scikit-learn’s CountVectorizer is used to convert a collection of text documents to a vector of
term/token counts. It also enables the ​pre-processing of text data prior to generating the vector
representation. This functionality makes it a highly flexible feature representation module for text.
Tf means term-frequency while tf-idf means term-frequency times inverse
document-frequency. This is a common term weighting scheme in information
retrieval, that has also found good use in document classification. Idf is “t” when
use_ idf is given, “n” (none)

With Tfidf transformer you will systematically compute word counts using Count
Vectorizer and then compute the Inverse Document Frequency (IDF) values and only
then compute the Tf-idf scores.
Web Application
Screen 1: Data Visualization

This Screen contains a text box to input a Message to


classify that contains words related to disaster.

Also Some Visualizations are added based on the data


used to train the classifier.
CATEGORIES WHICH ARE PRESENT IN THE TRAINING DATASET
Web Application This Screen contains the categories to which
the massage is related.
Screen 2: Message Classification Related categories are highlighted in the list.

QUERY(MESSA
QUERY(MESSA
GE)
GE) RESULTS
RESULTS
(QUERY
(QUERY RELATED
RELATED TO
TO THESE)
THESE)
THANK YOU

You might also like