0% found this document useful (0 votes)

33 views

Analytical Study On Unstructured Data Management in Application Data Base Through NLP and Datamining

Business Organizations are flooded with large pool of unstructured data. Loading these data into business database warranted a lot of processes. Companies having BPO and KPO are working for converting unstructured data into their software database with huge resources through programming, with multiple queries and users. To deal with such complex and perplexed situations need an automated system in place and thereby saving a large amount of time and resources.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Analytical Study On Unstructured Data Management in Application Data Base Through NLP and Datamining

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Analytical Study on Unstructured Data

Management in Application Data Base through
NLP and Datamining
Anisha S1 and Dr. S Thiyagarajan 2
1
Department of Computer Science, St. Joseph University, Dimapur, Nagaland, India
2
Department of Computer Science, St. Joseph University, Dimapur, Nagaland, India

Abstract:- Business Organizations are flooded with large  Heterogeneous Data

pool of unstructured data. Loading these data into Multiple sources and formats of data need to be
business database warranted a lot of processes. Companies associated with organizational interest for it to be meaningful
having BPO and KPO are working for converting in decision-making and subject related to business activities.
unstructured data into their software database with huge This study is considering different sample data for
resources through programming, with multiple queries examination. Even though unstructured data comes in various
and users. To deal with such complex and perplexed formats, we are concerned only with unstructured data in the
situations need an automated system in place and thereby forms of text, excel. Extraction and classification of
saving a large amount of time and resources. The aim of unstructured data according to the subjects and issue able to
the present research was to analyse methodically, the transform the data into more concrete and firm data for
technical works relating to the application of data mining, effective use of organization in its decision-making process.
artificial intelligence (AI) and machine learning (ML) in Optimal utilization and manipulation of unstructured data
the software industry. In this paper combining with requires a good business intelligence model to enable the
different disciplines of data mining techniques, ML and association of unstructured data with the subjects and issues
NLP. Objective of this paper is to improve the related to organizational interest.
organization's business intelligence process through
maximum exploitation of unstructured data owned by Purpose of this study is to provide an in-depth overview
them. This paper primarily attempts to examine the on applicability of various data mining ML algorithm in
applicability of combination of data mining techniques, application domain database instead of SQL queries for
NLP and ML in handling unstructured data and reduces unstructured data management. This paper addresses the
the burden on users by minimizing the usage of multiple following research questions: -
queries and make them user-friendly to extract data from  Is AI, NLP and other data mining processes can replace
large database. entire skilled resources and long database queries during
conversion structured and unstructured data in to
Keywords:- Application Database, Data mining, ML, NLP. Application Database?
 If it so, is it reliable?
I. INTRODUCTION  Will it reduce conversion time and cost of testing database?

The unstructured data have commonly appeared in II. CONVERTING UNSTRUCTURED DATA INTO
portals, blogs, bulk excel, emails, notes from call centers, and BUSINESS-ORIENTED DATA
all forms of human communications including the system to
stem processing. All these process and media starts producing To create entity of unstructured data, it needs to be
large amounts of unstructured and semi- structured data. associated with subject related application database structure.
Creating value and extracting the right information from large The unstructured data is to be transformed into more concrete
sets of unstructured data is a tiresome process. Many large and firm data that can be used by the organizations in
organizations like IBM, GE, and Siemens have developed development of database in application domain for decision-
analytical tools for unstructured data management; this system making process. This study proposes five processes for the
is superior in terms of handling data using natural language. said transformation from unstructured data to structured data,
However, many middle-level software companies not which are Data Extraction, Word embedding, Clustering,
adopting the above things due to complexity and hesitation. In Classification and Data Mapping. Data Extraction is about
this paper, some simple models for unstructured data identifying, analyzing unstructured data from multiple sources
management is proposed. and formats. Data Classification upon the extraction process,
the unstructured data need to be classified or categorized based
The objective of this paper is to provide insight on how on the requirements needed. The Two main steps involve are
to apply the principles of Data mining and AI via NLP to the determining the main data classes and categorizing the data
unstructured data in the application domain database. according to its main classes. Categorization of unstructured
data is important in helping the data searching process much

IJISRT24JAN1677 www.ijisrt.com 1786

Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
better by grouping unstructured data which has been equipped decide what new information are going to store in the cell
with metadata to the class of the same characteristics. The state. This has two components. First, a sigmoid layer called
categorization also is essential to facilitate in repositories the “input gate layer” decides which values we’ll update. Next,
development for each data class as well as facilitate data a layer creates a vector of new candidate values that could be
mapping of unstructured data from main classes to thematic added to the state. Next step is to combine these two to create
topics. an update to the state. [3]. Output gate: The output gate
controls the value of the next hidden state. It contains
III. TEXT MINING THROUGH DEEP LEARNING information on previous inputs.

A. Word representation D. CNN

In mathematical model, all word represents a vector form CNN models achieved excellent results in semantic
of text and each dimension of the vector represents a single parsing and query retrieval and found to be effective for NLP
word. Particular word if found in the sentence will be flagged [4],[5].CNN model is used as a feature extractor, that encodes
as ‘1’ and if not ‘0’. Measurement of vocabulary words is semantic features of sentences before these features are fed to
equal to the measurement of total vectors. a classifier.
wdj = v1, j, v2, j, ..., Vt, j
E. Support Vector Machine
An embedding layer serves as a look-up table which Support Vector Machine (SVM) approach is used to
takes words’ indexes in the vocabulary as input and output. classify related documents using vector method. This
word vector consists of total size of vocabulary and dimension. technique was studied as per ref. [11]. SVM give a two-class
research problem that depends on the distribution of hyper
B. RNN planes represented by the data classes. In machine learning,
Recurrent Neural Networksor RNN was designed to support vector machines drive learning models to explore data.
work with sequence prediction problems. Sequence prediction The optimal hyper plane shown in Figure 1 is such that the
comes in the following forms: - space of the plane up to some point is maximized. The highest
marginal hyper plane best divides the image shown in the
 One-to-Many: An observation as input mapped to a figure. Basically, only the points closest to the boundary
multiclass or label as an output. matter when choosing a hyper plane; all others are pale. These
points are called support vectors, and the hyper plane is
 Many-to-One: The input are sequences of words, output is understood as a support vector classifier (SVC) because it
one single class. places each support vector in the same class or in the opposite
direction of real adjacent values.
 Many-to-Many: The input are sequences of words, output
is multiclass.

C. LSTM
LSTM is one of the forms of RNN and can be used for
learning long-term dependencies during classification and
efficient gradient-based technique. LSTM is designed to get
rid of the vanishing error problems [1]. It works extremely
well on a large variety of problems and are now widely used.
LSTMs to have this chain like structure, but the repeating
module has a slightly different structure, there are multiple
layers, interrelating in unique way. LSTM is efficient than
simple RNN [2].

The main to LSTMs is the cell state, and is like a

conveyor belt. It runs straight down the entire chain, with only Fig. 1. SVM using hyper plane
minor linear interactions. It is very convenient for the
information to flow unchanged. The LSTM have the capacity  Proposed Frame Work: Data Loading Automation Model
to remove or add information to the cell state, carefully Implementation of simple model framework that
regulated by structures called gates. [3] combines data pre-processing, clustering and classification of
algorithm for easy implementation in NLP (python). The
Gates are a path for the information to pass. They are specified model classifies the unstructured text into predefined
self-possessed out of a sigmoid neural net layer and a point classes and used various set of data as input. Even though,
wise multiplication operation.The sigmoid layer outputs unstructured data comes in various format, this study
numbers between zero and one, describing how much of each considered only unstructured data in the forms of text.
component should be able to pass. A value of zero means “let
nothing pass,” while a value of one means “let everything
pass!”An LSTM has three of following gates: -forget gate
layer: first step in LSTM in which decide what information
we’re going to throw away from the cell state. input layer:

IJISRT24JAN1677 www.ijisrt.com 1787

Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Data Extraction (LSTM) model. For extracting semantic structure efficiently,
The first step involved in this model is extraction of data we will be using CNN with LSTM.
using beautiful soup library, panda and sk learn libraries for
text extraction. Metadata Management are used for A. Clustering with Classification (LSTM)
identification of the file types and sources. The Classifier works well with the clustered data.
Accuracy of classifier can be improved by applying clustered
data before classification algorithm on data set [6]. The
clustering algorithms used in the proposed framework are K-
means algorithm implemented through sklearn library in
python. Here sentence vectors are clustered into k sub classes,
here we can train the data as table wise or column wise
according to database structure. Cluster ID is to be applied to
the resultant clustered data and the same is considered as input
into the classification. For each method, training and test of
data sets to be conducted distinctly.

Fig. 2. Data Loading Automation model

IV. DATA PRE-PROCESSING

Fig. 3 Flow chart clustering with LSTM model
This process removes all noise from data, cleaning,
padding, and fills blank data with mean or constant value B. CNN with LSTM
according to the business logic. The process used is sklearn
The output generated from word representation as
pre-processing and pandas’ libraries. For word representation
embedding layer is used here as input. Embedded layer will be
the data model used is Gensim model which further generate a passed to convolution layer and these outputs are passed in to
word embedding layer. The output generates here is used as the pooling layers. The resultant output merged together and
input for next step.
reduced as linear layer output, that will be passed
toLSTM.CNN learn spatial structure, learned spatial structure
In data pre-processing, the data set is split to train and passed to LSTM layer for further learning.
test. For this purpose, we will use the function provided by
sklearn. If the data set has more dimension and long sentences,
the method used is combination of clustering and classification

IJISRT24JAN1677 www.ijisrt.com 1788

Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
VI. OBSERVATION AND ANALYSIS

NLP, machine learning techniques and data mining could

map the unstructured text into structured form as well as
enable automatic identification and extraction of relevant
information which can load data into database of the
application domain.

Replacing the procedure of loading unstructured data

into database through long queries with human intervention by
applying the same logic to rewrite the code in Python libraries
and ML algorithms with minimal coding. The cleaning during
conversion time can replace with data mining and ML
algorithms. Data mapping can be done efficiently in AI
techniques. To improve the accuracy by applying clustering
technique preceding classification algorithms on data set and
combine CNN with LSTM.

Classification accuracy is calculated as per below

mentioned formula and the resultant output is displayed in
Table 1: -
𝑡𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑑𝑎𝑡𝑎
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑑𝑎𝑡𝑎
∗ 100(1)

Table. 1. Comparisons of data loading process with different

Fig. 4. Flow chart CNN with LSTM model model
Model Accuracy
C. SVM LSTM 86%
The above preprocessing steps were performed for CNN with LSTM 88%
Prediction and the resultant data set applies for classification SQL NA
for using the SVM algorithm function. Here, the support
vector machine in Figure 1 represents the points of the hyper
plane so that the data points belonging to two different classes
are separated by the support vector with the largest gap, it can
be observed that the predicted values of the SVM model are
very close to the actual adjacent values. The confidence
interval of the SVM model is 0.986 and it can use for
prediction and feature extraction.

V. MODEL ANALYSIS USING THE TITANIC

DATASET AND IMDB

Deep neural networks have been trained on the IMDB

dataset. In our investigation, we used many deep learning
techniques. Models that use the IMDB dataset for
classification are successful in reaching validation accuracy Fig. 5. Model accuracy Graph
levels. Our enormous data collection was used for data
cleaning and clustering analysis; for that, we used the Titanic Table. 2. Comparisons of data loading process with SQL
data set. It includes details about the people on board the SQL and Programming Data mining with NLP and
Titanic, such as their age, gender, class of travel, cabin, and python libraries
level of survival. In this project, we will use Python to conduct Data extraction
preprocessing, clustering, and classification on the Titanic and
IMDB datasets. The transformation of unstructured data into  Programs 
Pandas
structured data requires data cleaning; we have to deal with the  Database queries & 
Beautiful Soup
dataset's outliers, inconsistent values, and missing values.  More Human 
sklearn libraries &
intervention 
Minimal Human
intervention
Data preprocessing and cleaning
 Long database  NTLK, Gensim
procedures

IJISRT24JAN1677 www.ijisrt.com 1789

Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 More Human The paper is for academic purpose as a part of pursuing
intervention PhD, non-sponsored and non-financial conflicts of interest. It
Classification is the primary findings of the analysis and study.
 More Human  Clustering and
intervention classification REFERENCES
 Database queries Algorithms
procedures  K means deep learning [1]. Hochreiter, S., and Schmidhuber, J. 1997. Long short-
 Application programs model term memory. Neural Computation 9(8):1735–1780.
 LSTM, CNN [2]. Ilya Sutskever, OriolVinyals, Quoc V. Le., "Sequence to
 Kera’s, tensor flow Sequence Learning with Neural Networks" Google.
[3]. Christopher Olah, http://colah.github.io/posts/2015-08-
 Field Mapping  AI algorithms or labeling
Understanding-LSTMs
through classification [4]. Yih, X. He, C. Meek. 2014.”Semantic Parsing for Single-
algorithm Relation Question Answering. JO - 52nd Annual
Meeting of the Association for Computational
VII. FUTURE WORK Linguistics, ACL 2014 - Proceedings of the Conference.
[5]. Shen, X. He, J. Gao, L. Deng, et al,” Learning Semantic
How efficient automatedfield mapping has to be done Representations Using Convolutional Neural Networks
with AI features in accordance with organizational interest. for Web Search.” In Proceedings of WWW 2014.
Further to evaluate performance of different ML algorithm [6]. Yaswanth Kumar Alapati and Korrapati Sindhu.,
with different data sets for text classification. “Combining Clustering with Classification: A Technique
to Improve Classification Accuracy”, International
VIII. CONCLUSION Journal of Computer Science Engineering (IJCSE), Vol.
5 No.06 Nov 2016.
The purpose of this paper is to scrutinize the applicability [7]. Sepp Hochreiter, YoshuaBengio, Paolo Frasconi, et al.
of data mining and ML techniques for extracting unstructured "Gradient Flow in Recurrent Nets: The Difficulty of
data in various software firms for their application domain Learning Long-Term Dependencies." Wiley-IEEE Press;
database using NLP and python instead of SQL queries and 2001
programs. This model bridges the gap between SQL developer [8]. Kyosuke Nishida, KugatsuSadamitsu, Ryuichiro
and data mining algorithm. Higashinaka, et al. "Understanding the Semantic
Structures of Tables with a Hybrid Deep Neural Network
In this model, examined few classifications and Architecture." Proceedings of the Thirty-First AAAI
clustering algorithm. Using k means algorithm, deep learning Conference on Artificial Intelligence (AAAI-17)
model (LSTM) and CNN & LSTM. The said combination is [9]. Gers, F. A.; Schmidhuber, J.; and Cummins, F. A. 2000.
applied after preprocessing steps for better results. Learning to forget: Continual prediction with LSTM.
Implementation of the above is less complex through python Neural Computation 12(10):2451–2471
libraries (Keras, Tensor Flow and PyTorch frame works) than [10]. Yelong Shen,Xiaodong He., et al."Learning Semantic
long SQL queries and programs. Representations Using Convolutional Neural Networks
for Web Search".Microsoft.
We concluded that whatever doing through SQL queries [11]. Ertekin S eyda.: Learning in extreme conditions: online
for unstructured data management application domain can be and active learning with massive,imbalanced and noisy
done through data mining and deep learning algorithm (RQ1). data; A Dissertation.
The efficiency and accuracy depend on how train data set and
construct model (RQ 2). Performance efficiency depends upon
input data and choice of classification model [RQ 3].

Efficient data management enables programmers to

spend minimal time in the creation of programming code and
focusing more time on aligning the right data to solve complex
business issues. The study identified how to overcome the
existing gap between theoretical researches and application
domain programmers and thereby help in improve the
decision-making process of the organizations.

CONFLICT OF INTEREST

 Anisha S is Part Time Research Scholar, at St. Joseph

University, Dimapur, Nagaland.
 Dr. S Thiyagarajan is an internal Research Supervisor
St. Joseph University, Dimapur, Nagaland

IJISRT24JAN1677 www.ijisrt.com 1790

Database Systems Model Exam Question
100% (1)
Database Systems Model Exam Question
4 pages
Big Data Analytics (CS443) IV B.Tech (IT) 2018-19 I Semester
No ratings yet
Big Data Analytics (CS443) IV B.Tech (IT) 2018-19 I Semester
72 pages
FCE Use of English - Part 4
50% (2)
FCE Use of English - Part 4
244 pages
Automating Data Analyses Using Artificial Intelligence
No ratings yet
Automating Data Analyses Using Artificial Intelligence
114 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mastering Vector Databases: The Future of Data Retrieval and AI
From Everand
Mastering Vector Databases: The Future of Data Retrieval and AI
Robert Johnson
No ratings yet
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
From Everand
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
Neal Fishman
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Basic Concepts in Data Structures
From Everand
Basic Concepts in Data Structures
K.Meenendranath Reddy
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Assignment On Business Analytics
No ratings yet
Assignment On Business Analytics
6 pages
Extensive Database Management Using Artificial Intelligence
100% (2)
Extensive Database Management Using Artificial Intelligence
7 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Article-7
No ratings yet
Article-7
5 pages
Building Data Mining Models in The Oracle 9i Environment
No ratings yet
Building Data Mining Models in The Oracle 9i Environment
10 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Data Science with Python: From Zero to Machine Learning
From Everand
Data Science with Python: From Zero to Machine Learning
Pouvo
No ratings yet
2023_IT_22IT405_U1-LM1 (1)
No ratings yet
2023_IT_22IT405_U1-LM1 (1)
11 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Cognitive Computing and Big Data Analytics
From Everand
Cognitive Computing and Big Data Analytics
Judith S. Hurwitz
No ratings yet
KajalReview
No ratings yet
KajalReview
5 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Data Mining Algorithem Imp
No ratings yet
Data Mining Algorithem Imp
5 pages
Big Data and Analytics Cse448 Module 1 L
No ratings yet
Big Data and Analytics Cse448 Module 1 L
38 pages
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
DA_Unit_1
No ratings yet
DA_Unit_1
44 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Exploring AI-driven Approaches For Unstructured Document Analysis and Future Horizons
No ratings yet
Exploring AI-driven Approaches For Unstructured Document Analysis and Future Horizons
54 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
From Everand
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Chitra Lele
No ratings yet
Practical No.10 Aim:Case Study Case Study Topic: Structureddata vs. Unstructureddata
No ratings yet
Practical No.10 Aim:Case Study Case Study Topic: Structureddata vs. Unstructureddata
5 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kai Yu ML Emea 2022 Rooug
No ratings yet
Kai Yu ML Emea 2022 Rooug
54 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Unit 4 DigitalData
No ratings yet
Unit 4 DigitalData
22 pages
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Big Data
No ratings yet
Big Data
18 pages
The Predictive Project Manager
From Everand
The Predictive Project Manager
Puneet Mathur
No ratings yet
IMPORTANTE - Fuzzy Database Modeling of Imprecise and Uncertain Engineering Information - Zongmin Ma - Springer
No ratings yet
IMPORTANTE - Fuzzy Database Modeling of Imprecise and Uncertain Engineering Information - Zongmin Ma - Springer
220 pages
Mod 2 Business Analytics
No ratings yet
Mod 2 Business Analytics
43 pages
A Datamining Model For Detection of Fraudulent Behaviour in Water
No ratings yet
A Datamining Model For Detection of Fraudulent Behaviour in Water
36 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Cti Oracle Data Mining
No ratings yet
Cti Oracle Data Mining
4 pages
Instant download Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger pdf all chapter
100% (2)
Instant download Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger pdf all chapter
81 pages
Papakyriakou 2022 Ijca 921884
No ratings yet
Papakyriakou 2022 Ijca 921884
16 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Transition to Telepsychotherapy: Experiential Perspective of Debutant Therapists
No ratings yet
Transition to Telepsychotherapy: Experiential Perspective of Debutant Therapists
6 pages
Assessment of Underground Water Quality of Gosa Landfill Site of the Federal Capital Territory, Abuja Nigeria
No ratings yet
Assessment of Underground Water Quality of Gosa Landfill Site of the Federal Capital Territory, Abuja Nigeria
11 pages
Investigating the Interplay between Climate Change and Sustainable Environment Development: Challenges, Strategies and Future Directions
No ratings yet
Investigating the Interplay between Climate Change and Sustainable Environment Development: Challenges, Strategies and Future Directions
11 pages
Unlocking the Therapeutic Power of Coriander: A Review of Coriandrum Sativum’s Bioactive Compounds and Health Benefits
No ratings yet
Unlocking the Therapeutic Power of Coriander: A Review of Coriandrum Sativum’s Bioactive Compounds and Health Benefits
15 pages
Perception, Attitude, and Readiness in Artificial Intelligence Adoption among Academic Librarians in the Bicol Region Librarians Council (BRLC)
No ratings yet
Perception, Attitude, and Readiness in Artificial Intelligence Adoption among Academic Librarians in the Bicol Region Librarians Council (BRLC)
6 pages
Developing Gamified Educational Technologies to Enhance Learning and Motivate Student Engagement in Education: A Quantitative Study Using Human-Computer Interaction (HCI)
No ratings yet
Developing Gamified Educational Technologies to Enhance Learning and Motivate Student Engagement in Education: A Quantitative Study Using Human-Computer Interaction (HCI)
16 pages
Analysis of the Role of Websites, Design, and Performance Metrics in Improving Company Performance in Medan City
No ratings yet
Analysis of the Role of Websites, Design, and Performance Metrics in Improving Company Performance in Medan City
4 pages
Monte Carlo-Based Modeling of 2-D Ising Systems Using Metropolis Algorithm, Simulation Techniques, Thermodynamic Behavior and Magnetization Patterns
No ratings yet
Monte Carlo-Based Modeling of 2-D Ising Systems Using Metropolis Algorithm, Simulation Techniques, Thermodynamic Behavior and Magnetization Patterns
16 pages
Optimal Voltage Regulation in Standalone Photovoltaic Systems Using Model Predictive Control and MOGA
No ratings yet
Optimal Voltage Regulation in Standalone Photovoltaic Systems Using Model Predictive Control and MOGA
8 pages
A Review on Gold Nanoparticles: Properties, Synthesis and Biomedical Application in Drug Delivery and Cancer Therapy
No ratings yet
A Review on Gold Nanoparticles: Properties, Synthesis and Biomedical Application in Drug Delivery and Cancer Therapy
6 pages
A Phytochemical Evaluation of Sierra Leonean Cassia siamea: A Source of Bioactive Compounds
No ratings yet
A Phytochemical Evaluation of Sierra Leonean Cassia siamea: A Source of Bioactive Compounds
5 pages
Crude Oil Price Volatility and its Impact on Nigeria’s Balance of Trade: An Empirical Assessment (2000–2023)
No ratings yet
Crude Oil Price Volatility and its Impact on Nigeria’s Balance of Trade: An Empirical Assessment (2000–2023)
13 pages
Analyzing Social Communication Deficits in Autism Using Wearable Sensors and Real-Time Affective Computing Systems
No ratings yet
Analyzing Social Communication Deficits in Autism Using Wearable Sensors and Real-Time Affective Computing Systems
17 pages
Cost Comparative Analysis of Solar/Utility and Diesel/Utility Hybrid Power System for a Typical Residential Building
No ratings yet
Cost Comparative Analysis of Solar/Utility and Diesel/Utility Hybrid Power System for a Typical Residential Building
8 pages
Real - Time Recognition of Cardiovascular Conditions from ECG Images with Deep Learning
No ratings yet
Real - Time Recognition of Cardiovascular Conditions from ECG Images with Deep Learning
10 pages
Enhancing Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures
No ratings yet
Enhancing Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures
7 pages
A MIC-MAC-Based Structural Exploration of Determinants Impacting Investment Sensitivity
No ratings yet
A MIC-MAC-Based Structural Exploration of Determinants Impacting Investment Sensitivity
8 pages
Smart Narrator Robot: Enhancing Experiential Learning through Conditional Autonomy
No ratings yet
Smart Narrator Robot: Enhancing Experiential Learning through Conditional Autonomy
6 pages
Assessing the Achievements of the Re-Alignment of an Industry Educatiocal Based System in Society
No ratings yet
Assessing the Achievements of the Re-Alignment of an Industry Educatiocal Based System in Society
5 pages
Design and Implementation of a GPS-GSM based Real-Time Vehicle Theft Tracking System for Urban Security in Uganda
No ratings yet
Design and Implementation of a GPS-GSM based Real-Time Vehicle Theft Tracking System for Urban Security in Uganda
7 pages
Continuing Training and Professional Performance of Primary School Teachers in Tchad: The Case of Teachers in the Farchana Refugee Camp
No ratings yet
Continuing Training and Professional Performance of Primary School Teachers in Tchad: The Case of Teachers in the Farchana Refugee Camp
7 pages
Development of Mirror Biosensor in Saliva pH Measurement in Health Services
No ratings yet
Development of Mirror Biosensor in Saliva pH Measurement in Health Services
7 pages
Architecture as a Reflection of Cultural Continuity: A Study of Traditional Trends
No ratings yet
Architecture as a Reflection of Cultural Continuity: A Study of Traditional Trends
3 pages
EduTech Portal: An AI-Powered Student Assistant Chatbot
No ratings yet
EduTech Portal: An AI-Powered Student Assistant Chatbot
12 pages
ResumeMatch: Intelligent Resume Enhancement & Job Fit Analysis
No ratings yet
ResumeMatch: Intelligent Resume Enhancement & Job Fit Analysis
7 pages
Behavior Addiction in Adolescents Post COVID 19: A Systematic Mental Health Review
No ratings yet
Behavior Addiction in Adolescents Post COVID 19: A Systematic Mental Health Review
8 pages
A Decade of Genome Editing: Comparative Review of ZFN, Talen, and CRISPR/CAS9
No ratings yet
A Decade of Genome Editing: Comparative Review of ZFN, Talen, and CRISPR/CAS9
10 pages
Enhancing the Robustness of Computer Vision Models to Adversarial Perturbations Using Multi-Scale Attention Mechanisms
No ratings yet
Enhancing the Robustness of Computer Vision Models to Adversarial Perturbations Using Multi-Scale Attention Mechanisms
14 pages
Analysis of the Export Competitiveness of Indonesia's Horticultural Fruit Products in the International Market
No ratings yet
Analysis of the Export Competitiveness of Indonesia's Horticultural Fruit Products in the International Market
8 pages
Evaluating the Impact of Shopee Mall on Consumer Purchase: Basis for Developing an Effective Marketing Plan
No ratings yet
Evaluating the Impact of Shopee Mall on Consumer Purchase: Basis for Developing an Effective Marketing Plan
61 pages
Kali Linux - Wikipedia
No ratings yet
Kali Linux - Wikipedia
6 pages
Lesson 1 Geographic Linguistic and Ethnic Dimensions of Philippine Literary History From Pre Colonial To The Contemporary
No ratings yet
Lesson 1 Geographic Linguistic and Ethnic Dimensions of Philippine Literary History From Pre Colonial To The Contemporary
45 pages
Minerva MX/MZX Range of Addressable Controllers Product Application & Design Information List of Contents
100% (1)
Minerva MX/MZX Range of Addressable Controllers Product Application & Design Information List of Contents
94 pages
UNIT 4 Emotional Intelligence
No ratings yet
UNIT 4 Emotional Intelligence
5 pages
QB - Boundaries (Medium and Hard)
No ratings yet
QB - Boundaries (Medium and Hard)
23 pages
Horetzky48 A4
No ratings yet
Horetzky48 A4
1 page
Present Simple or Continuous - Stative Verbs Worksheet - Live Worksheets
No ratings yet
Present Simple or Continuous - Stative Verbs Worksheet - Live Worksheets
1 page
SC Compiler Users Guide
No ratings yet
SC Compiler Users Guide
263 pages
Weekly Home Learning Plan: Week 1, Quarter 1, September 13-17, 2021
No ratings yet
Weekly Home Learning Plan: Week 1, Quarter 1, September 13-17, 2021
3 pages
Statistics Assignment
No ratings yet
Statistics Assignment
3 pages
Homework Sheets For 6 Year Olds
100% (1)
Homework Sheets For 6 Year Olds
4 pages
Pavan's Resume
No ratings yet
Pavan's Resume
1 page
2 Telepon Operator Theory
No ratings yet
2 Telepon Operator Theory
14 pages
Year-1-Transit-Forms SKKB 2022 1
No ratings yet
Year-1-Transit-Forms SKKB 2022 1
15 pages
Colour or Tint Your Screen
No ratings yet
Colour or Tint Your Screen
4 pages
Brahmavaivarta Purana 4 (1922) PDF
100% (2)
Brahmavaivarta Purana 4 (1922) PDF
364 pages
CompilerConstruction ClassNotesRSJ
No ratings yet
CompilerConstruction ClassNotesRSJ
48 pages
The Vicar of The Wakefield
No ratings yet
The Vicar of The Wakefield
18 pages
EBCDIC Encoding Essentials
No ratings yet
EBCDIC Encoding Essentials
1 page
Quarter 1 WEEK 1.2: 21 Century Literature From The Philippines and The World
100% (3)
Quarter 1 WEEK 1.2: 21 Century Literature From The Philippines and The World
6 pages
Allama Iqbal Was The Greatest Philospher and Poet of The Present Era
No ratings yet
Allama Iqbal Was The Greatest Philospher and Poet of The Present Era
3 pages
Lec9a SpaceComplex
No ratings yet
Lec9a SpaceComplex
19 pages
Assignment 1 Essay Part B
No ratings yet
Assignment 1 Essay Part B
10 pages
Writing Paragraph - Topic 2
No ratings yet
Writing Paragraph - Topic 2
7 pages
Assignment of Github
No ratings yet
Assignment of Github
2 pages
Pressure Vessel Calculations Manual Examples
100% (2)
Pressure Vessel Calculations Manual Examples
2 pages
Dll-Catch-Up-Friday-Grade-5-Jan 19
No ratings yet
Dll-Catch-Up-Friday-Grade-5-Jan 19
8 pages
Manual VOSviewer 1.6.5 PDF
No ratings yet
Manual VOSviewer 1.6.5 PDF
33 pages

Analytical Study On Unstructured Data Management in Application Data Base Through NLP and Datamining

Uploaded by

Analytical Study On Unstructured Data Management in Application Data Base Through NLP and Datamining

Uploaded by

Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology

Analytical Study on Unstructured Data

Abstract:- Business Organizations are flooded with large  Heterogeneous Data

IJISRT24JAN1677 www.ijisrt.com 1786

A. Word representation D. CNN

The main to LSTMs is the cell state, and is like a

IJISRT24JAN1677 www.ijisrt.com 1787

Fig. 2. Data Loading Automation model

IV. DATA PRE-PROCESSING

IJISRT24JAN1677 www.ijisrt.com 1788

NLP, machine learning techniques and data mining could

Replacing the procedure of loading unstructured data

Classification accuracy is calculated as per below

Table. 1. Comparisons of data loading process with different

V. MODEL ANALYSIS USING THE TITANIC

Deep neural networks have been trained on the IMDB

IJISRT24JAN1677 www.ijisrt.com 1789

Efficient data management enables programmers to

 Anisha S is Part Time Research Scholar, at St. Joseph

IJISRT24JAN1677 www.ijisrt.com 1790

You might also like