Analytical Study On Unstructured Data Management in Application Data Base Through NLP and Datamining
Analytical Study On Unstructured Data Management in Application Data Base Through NLP and Datamining
ISSN No:-2456-2165
The unstructured data have commonly appeared in II. CONVERTING UNSTRUCTURED DATA INTO
portals, blogs, bulk excel, emails, notes from call centers, and BUSINESS-ORIENTED DATA
all forms of human communications including the system to
stem processing. All these process and media starts producing To create entity of unstructured data, it needs to be
large amounts of unstructured and semi- structured data. associated with subject related application database structure.
Creating value and extracting the right information from large The unstructured data is to be transformed into more concrete
sets of unstructured data is a tiresome process. Many large and firm data that can be used by the organizations in
organizations like IBM, GE, and Siemens have developed development of database in application domain for decision-
analytical tools for unstructured data management; this system making process. This study proposes five processes for the
is superior in terms of handling data using natural language. said transformation from unstructured data to structured data,
However, many middle-level software companies not which are Data Extraction, Word embedding, Clustering,
adopting the above things due to complexity and hesitation. In Classification and Data Mapping. Data Extraction is about
this paper, some simple models for unstructured data identifying, analyzing unstructured data from multiple sources
management is proposed. and formats. Data Classification upon the extraction process,
the unstructured data need to be classified or categorized based
The objective of this paper is to provide insight on how on the requirements needed. The Two main steps involve are
to apply the principles of Data mining and AI via NLP to the determining the main data classes and categorizing the data
unstructured data in the application domain database. according to its main classes. Categorization of unstructured
data is important in helping the data searching process much
C. LSTM
LSTM is one of the forms of RNN and can be used for
learning long-term dependencies during classification and
efficient gradient-based technique. LSTM is designed to get
rid of the vanishing error problems [1]. It works extremely
well on a large variety of problems and are now widely used.
LSTMs to have this chain like structure, but the repeating
module has a slightly different structure, there are multiple
layers, interrelating in unique way. LSTM is efficient than
simple RNN [2].
CONFLICT OF INTEREST