0% found this document useful (0 votes)
19 views12 pages

Unit III

This document discusses various data models and query languages, focusing on the relational model, document model, key-value pairs, and NoSQL databases. It highlights the properties, advantages, and use cases of each model, emphasizing the flexibility and performance of document databases and the limitations of traditional relational databases. Additionally, it addresses the object-relational mismatch and the challenges of representing complex relationships in relational schemas compared to more flexible formats like JSON.

Uploaded by

DIVYALAKSHMI K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

Unit III

This document discusses various data models and query languages, focusing on the relational model, document model, key-value pairs, and NoSQL databases. It highlights the properties, advantages, and use cases of each model, emphasizing the flexibility and performance of document databases and the limitations of traditional relational databases. Additionally, it addresses the object-relational mismatch and the challenges of representing complex relationships in relational schemas compared to more flexible formats like JSON.

Uploaded by

DIVYALAKSHMI K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT III

DATA MODELS AND QUERY LANGUAGES


Relational Model, Document Model, Key-Value Pairs, NoSQL, Object-Relational
Mismatch, Many-to-One and Many-to-Many Relationships, Network data models, Schema
Flexibility, Structured Query Language, Data Locality for Queries, Declarative Queries,
Graph Data models, Cypher Query Language, Graph Queries in SQL, The Semantic Web,
CODASYL, SPARQL
Relational model:
The relational model represents DB in the form of a collection of various relations.
This relation refers to a table of various values. And every row present in the table happens to
denote some real-world entities or relationships. The names of tables and columns help us
interpret the meaning of the values present in every row of the table. This data gets
represented in the form of a set of various relations. In the relational model, basically, this
data is stored in the form of tables.

Properties of a Relational Model


The relational databases consist of the following properties:
 Every row is unique
 All of the values present in a column hold the same data type
 Values are atomic
 The columns sequence is not significant
 The rows sequence is not significant
 The name of every column is unique
Important Terminologies
Here are some Relational Model concepts in DBMS:
 Attribute: It refers to every column present in a table. The attributes refer to the
properties that help us define a relation. E.g., Employee_ID, Student_Rollno,
SECTION, NAME, etc.
 Tuple – It is a single row of a table that consists of a single record. The relation above
consists of four tuples, one of which is like:
 Tables – In the case of the relational model, all relations are saved in the table format,
and it is stored along with the entities. A table consists of two properties: columns and
rows. While rows represent records, the columns represent attributes.
 Degree: It refers to the total number of attributes that are there in the relation.
 Relation Schema: It represents the relation’s name along with its attributes.
 Column: It represents the set of values for a certain attribute.
 Cardinality: It refers to the total number of rows present in the given table.
 Relation instance – It refers to a finite set of tuples present in the RDBMS system. A
relation instance never has duplicate tuples.
 Attribute domain – Every attribute has some predefined value and scope, which is
known as the attribute domain.
 Relation key – Each and every row consists of a single or multiple attributes. It is
known as a relation key.
 NULL Values: The value that is NOT known or the value that is unavailable is
known as a NULL value. This null value is represented by the blank spaces.

Document model:
The document model stores all of an object’s information in a single instance in the
database, and every object in the database can be starkly different than the next. This
capability, in theory, removes the need for an object-relational mapper (ORM) depending on
the use case.
A document database is a type of NoSQL database that can be used to store and query
data as JSON-like documents. JavaScript Object Notation (JSON) is an open data interchange
format that is both human and machine-readable. Developers can use JSON documents in
their code and save them directly into the document database. The flexible, semi-structured,
and hierarchical nature of documents and document databases allows them to evolve with
applications’ needs.

Advantages of document databases


Document databases enable flexible indexing, powerful ad hoc queries, and analytics over
collections of documents. Read more about the benefits below.
 Ease of development
JSON documents map to objects—a common data type in most programming
languages. When building applications, developers can flexibly create and update
documents directly from the code. This means they spend less time creating data
models beforehand. Therefore, application development is more rapid and efficient.
 Flexible schema
A document-oriented database allows you to create multiple documents with
different fields within the same collection. This can be handy when storing
unstructured data like emails or social media posts. However, some document
databases offer schema validation, so you can impose some restrictions on the
structure.
 Performance at scale
Document databases offer built-in distribution capabilities. You can scale
them horizontally across multiple servers without impacting performance, which is
cost-efficient as well. Moreover, document databases provide fault tolerance and
availability through built-in replication.

Use cases of document databases


The document model works well with use cases such as content management,
catalogs, sensor management, and more. For each use case, each document is unique and
evolves over time.
 Content management
A document database is an excellent choice for content management
applications such as blogs and video platforms. With a document database, each entity
the application tracks can be stored as a single document. The document database is a
more intuitive way for a developer to update an application as the requirements
evolve. In addition, if the data model needs to change, only the affected documents
need to be updated. No schema update is required and no database downtime is
necessary to make the changes.
 Catalogs
Document databases are efficient and effective for storing catalog information.
For example, in an e-commerce application, different products usually have different
numbers of attributes. Managing thousands of attributes in relational databases is
inefficient, and the reading performance is affected. Using a document database, each
product’s attributes can be described in a single document for easy management and
faster reading speed. Changing the attributes of one product won’t affect others.
 Sensor management
The Internet of Things (IoT) has resulted in organizations regularly collecting
data from smart devices like sensors and meters. Sensor data typically comes in as a
continuous stream of variable values. Due to latency issues, some data objects might
be incomplete, duplicated, or missing. Additionally, you must collect a large volume
of data before you can filter or summarize it for analytics.

Document stores are more convenient in this case. The sensor data can be easily
stored as it is, without cleaning it or making it conform to pre-determined schemas. You can
also scale it as required and delete entire documents once analytics is done.
Advantages of document database:
 Schema-less
o No restrictions on the format and structure of data storage.
 Faster creation and care
o Minimal maintenance is required once you create the document,which can be
as simple as adding your complex object once.
 No foreign keys
 The absence of this relationship dynamic, documents can be independent of one
another.
 Open formats
 A clean build process that uses XML, JSON, and other derivatives to describe
documents.
 Built-in versioning
o While documents grow in size they can also grow in complexity
o It decreases conflicts

Key-value Pairs:
A key-value data model or database is also referred to as a key-value store. For the
values, keys are special identifiers. The collection of key-value pairs stored on separate
records is called key-value databases and they do not have an already defined structure and
querying language.
 A key-value stores can be considered as the most primary and simplest version of all
databases.
 One-way mapping from the key to the value to store data.
 Keys in key value pairs must be unique
 Since the values are accessed directly through the keys, you don't have to search
through the database sequentially one by one.
 Values can be accessed easily using keys (Low latency & High Throughput).
 Data querying and retrieving are done manually
 Key value stores provides much high performance than RDBMS

Types of key value databases


Key value databases are optimized for performance, and depending on the use case,
there are different types of key value databases. For example, if the purpose of using a key
value database is caching, then use an in-memory key value store. If looking for persistent
storage (disk), a persistent key value database can be used. There are also multi-model key
value databases that support multiple data models, like document, graph, and key value, thus
offering flexibility. Examples of popular key value databases are:
 In-memory key value databases: MongoDB.
 Persistent key value databases: MongoDB.
 Multi-model key value databases: MongoDB.

How do key value databases work?


A key value database, AKA key value store, associates a value (which can be
anything from a number or simple string to a complex object) with a unique identifier (key),
which is used to keep track of the object. In its simplest form, a key value store is like a
dictionary/array/map object as it exists in most programming paradigms but is managed by a
database management system (DBMS).
There are three main operations performed by a key value database:
 put(key, value): insert or update a value into the database.
 get(key): read a value from the database.
 delete(key): delete a value from the database.

Key value database schema


Unlike relational databases, key value data stores do not follow a specific schema,
making them flexible. This makes them a good choice for unstructured and semi structured
data. Storing and retrieving data becomes much simpler when it is done in a key value store
when compared to that of relational database table structure. In the below image, due to the
JSON structure, all the data is grouped together. You can store anything from simple types to
complex types as values. While fetching the data, you can retrieve the individual records by
iterating through the JSON data.

Features of a key value database:


 Flexible data models
 No query language
 Support for complex data types
 Indexing support for performance

NoSQL:
problems with conventional database approaches:
 Relational databased are great for things that fit easily into rows and columns.
 However, some problems require a different approach. Not everything fit into rows
and columns.
 There are also some scenarios in which the relationships themselves an hold their own
meta data.
 Each of the preceding scenarios has a type of NoSQL database that overcomes the
limitations of an RDBMS for those data types.
 Some problems:
o Schema redesign overhead
o Unstructured data explosion
o The sparse data problem
o Dynamically changing relationships
o Global distribution and access.

Limitations of RDBMS:
 Maintenance problem
o The maintenance of the relational database becomes difficult over time due to
increase in the data. Developers and programmers have to spend a lot of time
in maintaining the database.
 Cost:
o The relational database system is costly to set up and maintain. The initial cost
of the software alone can be quite pricey for smaller businesses.
 Physical storage:
o A relational database is comprised of rows and columns, which requires a lot
of physical memory because each operation performed depends on a separate
storage. The requirements of the physical memory may increase over time
along with the increase of the data.
 Lack of scalability
o As the database becomes larger or more distributed with a greater number of
servers, this will have negative effects like latency and availability issues
affecting overall performance.
 Decrease in performance over time
o When there is a large number of tables and data in the system, it causes an
increase in complexity. It can lead to slow response times from queries or even
complete failure for them depending on how many people are logged into the
server at any given time.
What is NoSQL?
NoSQL is the non-relational database management systems, different from traditional
relational database management systems in some significant ways. It is designed for
distributed data stores where very large scale of data storing needs (for example google or
Facebook which collects terabits of data every day for their users. These type of data storing
may not require fixed schema, avoid join operations and typically scale horizontally.

Features of NoSQL:
NoSQL DBs are optimized for horizontal scalability and agile development. Here are
some of the main features and benefits (compared to relational DBs) of NoSQL databases:
 Flexible data structures, instead of standard tabular relationships.
 Low latency.
 Horizontal scalability.
 Large number of concurrent users supported.
 Optimized for large data volumes — either structured, semi-structured or
unstructured.
 Distributed architecture that allows handling bigger amounts of data.
 Adapted to agile development sprints.
 Higher performance, speed and scalability.

The Object-Relational Mismatch


Most application development today is done in object-oriented programming languages,
which leads to a common criticism of the SQL data model: if data is stored in relational
tables, an awkward translation layer is required between the objects in the application code
and the database model of tables, rows, and columns. The disconnect between the models is
sometimes called an impedance mismatch.
Object-relational mapping (ORM) frameworks like ActiveRecord and Hibernate reduce
the amount of boilerplate code required for this translation layer, but they can’t completely
hide the differences between the two models.
For example, the following diagram illustrates how a résumé (a LinkedIn profile) could be
expressed in a relational schema. The profile as a whole can be identified by a unique
identifier, user_id. Fields like first_name and last_name appear exactly once per user, so they
can be modeled as columns on the users table. However, most people have had more than one
job in their career (positions), and people may have varying numbers of periods of education
and any number of pieces of contact information. There is a one-to-many relationship from
the user to these items, which can be represented in various ways:
•In the traditional SQL model (prior to SQL:1999), the most common normalized
representation is to put positions, education, and contact information in separate tables, with a
foreign key reference to the users table.
• Later versions of the SQL standard added support for structured datatypes and XML
data; this allowed multi-valued data to be stored within a single row, with support for
querying and indexing inside those documents. These features are supported to varying
degrees by Oracle, IBM DB2, MS SQL Server, and PostgreSQL [6, 7]. A JSON datatype is
also supported by several databases, including IBM DB2, MySQL, and PostgreSQL [8].
• A third option is to encode jobs, education, and contact info as a JSON or XML
document, store it on a text column in the database, and let the application interpret its
structure and content. In this setup, you typically cannot use the database to query for values
inside that encoded column.

The following diagram represents the Representing a LinkedIn profile using a relational
schema. Photo of Bill Gates courtesy of Wikimedia Commons, Ricardo Stuckert, Agência
Brasil.
For a data structure like a résumé, which is mostly a self-contained document, a JSON
representation can be quite appropriate: Representing a LinkedIn profile as a JSON document
{
"user_id": 251,
"first_name": "Bill",
"last_name": "Gates",
"summary": "Co-chair of the Bill & Melinda Gates... Active blogger.",
"region_id": "us:91",
"industry_id": 131,
"photo_url": "/p/7/000/253/05b/308dd6e.jpg",
"positions": [
{"job_title": "Co-chair", "organization": "Bill & Melinda Gates Foundation"},
{"job_title": "Co-founder, Chairman", "organization": "Microsoft"}
],
"education": [
{"school_name": "Harvard University", "start": 1973, "end": 1975},
{"school_name": "Lakeside School, Seattle", "start": null, "end": null}
],
"contact_info": {
"blog": "http://thegatesnotes.com",
"twitter": "http://twitter.com/BillGates"
}
}
Some developers feel that the JSON model reduces the impedance mismatch between
the application code and the storage layer. There are also problems with JSON as a data
encoding format. The lack of a schema is often cited as an advantage;
The JSON representation has better locality than the multi-table schema in the above
diagram. If you want to fetch a profile in the relational example, you need to either perform
multiple queries (query each table by user_id) or perform a messy multiway join between the
users table and its subordinate tables. In the JSON representation, all the relevant information
is in one place, and one query is sufficient.
The one-to-many relationships from the user profile to the user’s positions,
educational history, and contact information imply a tree structure in the data, and the JSON
representation makes this tree structure explicit

Many-to-One and Many-to-Many Relationships:


In JSON document in the preceding section, region_id and industry_id are given as
IDs, not as plain-text strings "Greater Seattle Area" and "Philanthropy".
If the user interface has free-text fields for entering the region and the industry, it
makes sense to store them as plain-text strings. But there are advantages to having
standardized lists of geographic regions and industries, and letting users choose from a drop-
down list or autocompleter:
• Consistent style and spelling across profiles
• Avoiding ambiguity (e.g., if there are several cities with the same name)
• Ease of updating—the name is stored in only one place, so it is easy to update
across the board if it ever needs to be changed (e.g., change of a city name due to
political events)
• Localization support—when the site is translated into other languages, the stand ardized
lists can be localized, so the region and industry can be displayed in the viewer’s language
• Better search—e.g., a search for philanthropists in the state of Washington can match this
profile, because the list of regions can encode the fact that Seattle is in Washington (which
is not apparent from the string "Greater Seattle Area")

The advantage of using an ID is that because it has no meaning to humans, it never needs
to change: the ID can remain the same, even if the information it identifies changes. Anything
that is meaningful to humans may need to change sometime in the future—and if that
information is duplicated, all the redundant copies need to be updated. That incurs write
overheads, and risks inconsistencies (where some copies of the information are updated but
others aren’t). Removing such duplication is the key idea behind normalization in databases.

Unfortunately, normalizing this data requires many-to-one relationships (many people live
in one particular region, many people work in one particular industry), which don’t fit nicely
into the document model. In relational databases, it’s normal to refer to rows in other tables
by ID, because joins are easy. In document databases, joins are not needed for one-to-many
tree structures, and support for joins is often weak.
Moreover, even if the initial version of an application fits well in a join-free document
model, data has a tendency of becoming more interconnected as features are added to
applications. For example, consider some changes we could make to the résumé example:
Organizations and schools as entities
In the previous description, organization (the company where the user worked) and
school_name (where they studied) are just strings. Perhaps they should be references to
entities instead? Then each organization, school, or university could have its own web page
(with logo, news feed, etc.); each résumé could link to the organizations and schools that it
mentions, and include their logos and other information (see Figure 2-3 for an example from
LinkedIn).
Recommendations
Say you want to add a new feature: one user can write a recommendation for another user.
The recommendation is shown on the résumé of the user who was recommended, together
with the name and photo of the user making the recommendation. If the recommender
updates their photo, any recommendations they have written need to reflect the new photo.
Therefore, the recommendation should have a reference to the author’s profile.

The company name is not just a string, but a link to a company entity.Screenshot of
linkedin.com.
The following diagram illustrates how these new features require many-to-many
relationships. The data within each dotted rectangle can be grouped into one document, but
the references to organizations, schools, and other users need to be represented as references,
and require joins when queried.

Extending résumés with many-to-many relationships

Network data models

You might also like