Automated Dynamic Schema Generation Using Knowledge Graph
Automated Dynamic Schema Generation Using Knowledge Graph
Priyank Kumar Singh1, Sami Ur Rehman1, Darshan J.1, Shobha G.2, Deepamala N.2
1
Department of Computer Science and Engineering, R.V College of Engineering, Bengaluru, India
2
Department of Computer Science and Engineering, Faculty of Computer Science and Engineering, R.V College of Engineering,
Bengaluru, India
Corresponding Author:
Priyank Kumar Singh
Department of Computer Science and Engineering
R.V College of Engineering
Banglore-560059, Karnataka, India
Email: [email protected]
1. INTRODUCTION
The database is a very integral part of any software application and relational database is the most
popular type of database system available. Designing a database is a very tedious and critical part of any
software engineering process. A large share of the total time is spent on designing the database. To design a
database, professional database designers are required to avoid any problems in the future. So, in this ever-
advancing world of technology, there is tremendous growth in the field of software engineering and hence
there is great demand for efficient data storage and processing techniques. Hence under this big hood of
software development, there is an implicit need of automating the entire process of designing a database
which is a key support of any software project.
A huge amount of data is generated and stored in a structured and unstructured format. Database
schemas are a structured method of storing the data. Even though the data is structured, there exist multiple
schemas and tables which are related, but the relations cannot be established by simple databases. One of the
methods to represent schemas as a combined knowledge source is to design a knowledge graph (KG). The
term “KG” was known to be used in writing since no less than 1973 in the paper presented by Edward W.
Schneider. Google created its own knowledge graph [1], which was announced in 2012, is a more
sophisticated embodiment of the knowledge graph ever seen in the technology world [2]. Following then,
knowledge graphs have become more important in terms of research and application. As a consequence of
the research that took place, quite a few KG products were developed. Cyc [3], Freebase [4], Google
knowledge vault [5], DBpedia [6], YAGO [7], NELL [8], PROSPERA [9], Microsoft’s Probase [10] are
among the prominent KGs ever created. Many real-world applications like Semantic Web search engine [11],
named entity recognition and disambiguation [12], [13], Extracting Information [14], [15], Recommendation
systems [16], and Siri [17], Apple’s personal assistant, Watson [18] from IBM also make use of generic
knowledge graphs. Even in the field of medicines, KG called as Medical KG [19] is gaining lot of attention
recently.
Graph neural network (GNN) is also one of the techniques that are used in graph applications, the
input method in this also works the same as that of the neural network system, but the nodes are converted
into graph embeddings which is an important step involved in any GNN model. The main reason behind
using GNN would be to be able to predict relationships(edges) between completely unrelated nodes,
technically known as Edge prediction [20], [21]. Zhao et al. [22], discusses KG’s construction and
architecture. It also gives an outline of KG's construction and looks at the approaches and difficulties that
were encountered during the process. Building KG requires Semantic relatedness which plays a key role in
linking a word with an existing word in our knowledge graph. It is a quantitative measure of how two words
or concepts are related under a context, without any discrepancy of the syntactical form of the word.
Conceptually speaking, semantic relatedness is based on functional relations like hyponymic, hypernymic,
metronymic [23]. At the point when restricted to hyponymy/hypernymy relations, the action evaluates
semantic similarities between two words or concepts.
This paper discusses a method to build a robust knowledge graph and query the same to build a
database by suggesting tables and columns in an automated way. The proposed system is designed to be
based on a knowledge graph, where it creates a smaller graph for each of the nuclear schemas and then
merges this graph with the global KG intelligently based on the similarity between the nodes by comparing
their word vectors. So, the proposed algorithm not only suggests database schemas but also serves as an
efficient tool to store information while minimizing redundancy.
2. METHOD
This section describes the approach that was applied to produce a knowledge graph of different
entities. The workflow of the pipeline is shown in Figure 1. In short, the framework includes the following
steps:
− Schema to knowledge graph: convert each of the schemas to the knowledge graph
− Executing the Algorithm: which exploits the link prediction using multiple natural language processing
and machine learning tools.
− Storing the data: in which the final knowledge graph is stored in the TinkerPop database.
− Interaction with the System: in which multiple questions are asked to the user to get complete insights.
− Query refining and output: in which keywords are extracted using natural language processing tools and
the final schema is given to the user.
Figure 1. Shows the flowchart of the AI-based models and experimental methods applied
Automated dynamic schema generation using knowledge graph (Priyank Kumar Singh)
1264 ISSN: 2252-8938
properties (attributes) are merged to get a single node. The final knowledge is generated as shown in
Figure 4. The final knowledge graph contains much more context than the individual knowledge graphs.
Figure 4. Merging student and school schema and result is a combined knowledge graph
Figure 6. Output grade schema with details up to level 2 when queried over the merged graph
Automated dynamic schema generation using knowledge graph (Priyank Kumar Singh)
1266 ISSN: 2252-8938
For the query “Give the schema for payments” as shown in Figure 10, the query is processed to
remove the unwanted words, and the keyword extracted is “payments”. Using this keyword subgraph is
extracted from the merged graph as shown in Figure 7. The subgraph generated has details up to depth 2 as
shown in Figure 11.
Figure 11. Payments schema when queried over the merged graph
4. CONCLUSION
Technological advancements in software industries have led to an increase in the usage of data
storage systems, especially, relational database systems are being extensively used. Designing a database
schema takes a considerable share of total development time in any project. Hence, there is a need for
automating the entire process of designing a database schema. In this paper, various approaches were
discussed to solve the problem and it was evident that the knowledge graph-based approach suits best for the
purpose. Here, usage of KG was investigated to dynamically generate schemas that satisfy all the constraints
discussed in the paper and also reduces the redundancy of the data in the knowledge graph. The existing KG
built from various schemas is queried to suggest schemas to the user. The proposed model was tested with
multiple schemas to generate the global knowledge graph and it was queried and verified manually. This
method helps the user in developing relationships between given schemas to draw the KG and query the
same to suggest a schema.
REFERENCES
[1] E. W. Schneider, “Course modularization applied: the interface system and its implications for sequence control and data
analysis,” Association for the Development of Instructional Systems (ADIS), pp. 10–73, 1973, doi: 10.1037/e436252004-001.
[2] L. Ehrlinger and W. Wöß, “Towards a definition of knowledge graphs,” in Conference: Joint Proceedings of the Posters and
Demos Track of 12th International Conference on Semantic Systems - SEMANTiCS2016 and 1st International Workshop on
Semantic Change & Evolving Semantics (SuCCESS16), 2016, vol. 1695, pp. 1–4.
[3] B. L. Douglas, “CYC: A Large-Scale Investment in Knowledge Infrastructure,” Communications of the ACM, vol. 38, no. 11,
pp. 33–38, 1995, doi: 10.1145/219717.219745.
[4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: A collaboratively created graph database for structuring
human knowledge,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1247–1249, 2008,
doi: 10.1145/1376616.1376746.
[5] X. Dong et al., “Knowledge vault: A web-scale approach to probabilistic knowledge fusion,” Proceedings of the ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp. 601–610, 2014, doi: 10.1145/2623330.2623623.
[6] J. Lehmann et al., “DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia,” Semantic Web, vol. 6,
Automated dynamic schema generation using knowledge graph (Priyank Kumar Singh)
1268 ISSN: 2252-8938
BIOGRAPHIES OF AUTHORS
Automated dynamic schema generation using knowledge graph (Priyank Kumar Singh)