Unit III
Unit III
Document model:
The document model stores all of an object’s information in a single instance in the
database, and every object in the database can be starkly different than the next. This
capability, in theory, removes the need for an object-relational mapper (ORM) depending on
the use case.
A document database is a type of NoSQL database that can be used to store and query
data as JSON-like documents. JavaScript Object Notation (JSON) is an open data interchange
format that is both human and machine-readable. Developers can use JSON documents in
their code and save them directly into the document database. The flexible, semi-structured,
and hierarchical nature of documents and document databases allows them to evolve with
applications’ needs.
Document stores are more convenient in this case. The sensor data can be easily
stored as it is, without cleaning it or making it conform to pre-determined schemas. You can
also scale it as required and delete entire documents once analytics is done.
Advantages of document database:
Schema-less
o No restrictions on the format and structure of data storage.
Faster creation and care
o Minimal maintenance is required once you create the document,which can be
as simple as adding your complex object once.
No foreign keys
The absence of this relationship dynamic, documents can be independent of one
another.
Open formats
A clean build process that uses XML, JSON, and other derivatives to describe
documents.
Built-in versioning
o While documents grow in size they can also grow in complexity
o It decreases conflicts
Key-value Pairs:
A key-value data model or database is also referred to as a key-value store. For the
values, keys are special identifiers. The collection of key-value pairs stored on separate
records is called key-value databases and they do not have an already defined structure and
querying language.
A key-value stores can be considered as the most primary and simplest version of all
databases.
One-way mapping from the key to the value to store data.
Keys in key value pairs must be unique
Since the values are accessed directly through the keys, you don't have to search
through the database sequentially one by one.
Values can be accessed easily using keys (Low latency & High Throughput).
Data querying and retrieving are done manually
Key value stores provides much high performance than RDBMS
NoSQL:
problems with conventional database approaches:
Relational databased are great for things that fit easily into rows and columns.
However, some problems require a different approach. Not everything fit into rows
and columns.
There are also some scenarios in which the relationships themselves an hold their own
meta data.
Each of the preceding scenarios has a type of NoSQL database that overcomes the
limitations of an RDBMS for those data types.
Some problems:
o Schema redesign overhead
o Unstructured data explosion
o The sparse data problem
o Dynamically changing relationships
o Global distribution and access.
Limitations of RDBMS:
Maintenance problem
o The maintenance of the relational database becomes difficult over time due to
increase in the data. Developers and programmers have to spend a lot of time
in maintaining the database.
Cost:
o The relational database system is costly to set up and maintain. The initial cost
of the software alone can be quite pricey for smaller businesses.
Physical storage:
o A relational database is comprised of rows and columns, which requires a lot
of physical memory because each operation performed depends on a separate
storage. The requirements of the physical memory may increase over time
along with the increase of the data.
Lack of scalability
o As the database becomes larger or more distributed with a greater number of
servers, this will have negative effects like latency and availability issues
affecting overall performance.
Decrease in performance over time
o When there is a large number of tables and data in the system, it causes an
increase in complexity. It can lead to slow response times from queries or even
complete failure for them depending on how many people are logged into the
server at any given time.
What is NoSQL?
NoSQL is the non-relational database management systems, different from traditional
relational database management systems in some significant ways. It is designed for
distributed data stores where very large scale of data storing needs (for example google or
Facebook which collects terabits of data every day for their users. These type of data storing
may not require fixed schema, avoid join operations and typically scale horizontally.
Features of NoSQL:
NoSQL DBs are optimized for horizontal scalability and agile development. Here are
some of the main features and benefits (compared to relational DBs) of NoSQL databases:
Flexible data structures, instead of standard tabular relationships.
Low latency.
Horizontal scalability.
Large number of concurrent users supported.
Optimized for large data volumes — either structured, semi-structured or
unstructured.
Distributed architecture that allows handling bigger amounts of data.
Adapted to agile development sprints.
Higher performance, speed and scalability.
The following diagram represents the Representing a LinkedIn profile using a relational
schema. Photo of Bill Gates courtesy of Wikimedia Commons, Ricardo Stuckert, Agência
Brasil.
For a data structure like a résumé, which is mostly a self-contained document, a JSON
representation can be quite appropriate: Representing a LinkedIn profile as a JSON document
{
"user_id": 251,
"first_name": "Bill",
"last_name": "Gates",
"summary": "Co-chair of the Bill & Melinda Gates... Active blogger.",
"region_id": "us:91",
"industry_id": 131,
"photo_url": "/p/7/000/253/05b/308dd6e.jpg",
"positions": [
{"job_title": "Co-chair", "organization": "Bill & Melinda Gates Foundation"},
{"job_title": "Co-founder, Chairman", "organization": "Microsoft"}
],
"education": [
{"school_name": "Harvard University", "start": 1973, "end": 1975},
{"school_name": "Lakeside School, Seattle", "start": null, "end": null}
],
"contact_info": {
"blog": "http://thegatesnotes.com",
"twitter": "http://twitter.com/BillGates"
}
}
Some developers feel that the JSON model reduces the impedance mismatch between
the application code and the storage layer. There are also problems with JSON as a data
encoding format. The lack of a schema is often cited as an advantage;
The JSON representation has better locality than the multi-table schema in the above
diagram. If you want to fetch a profile in the relational example, you need to either perform
multiple queries (query each table by user_id) or perform a messy multiway join between the
users table and its subordinate tables. In the JSON representation, all the relevant information
is in one place, and one query is sufficient.
The one-to-many relationships from the user profile to the user’s positions,
educational history, and contact information imply a tree structure in the data, and the JSON
representation makes this tree structure explicit
The advantage of using an ID is that because it has no meaning to humans, it never needs
to change: the ID can remain the same, even if the information it identifies changes. Anything
that is meaningful to humans may need to change sometime in the future—and if that
information is duplicated, all the redundant copies need to be updated. That incurs write
overheads, and risks inconsistencies (where some copies of the information are updated but
others aren’t). Removing such duplication is the key idea behind normalization in databases.
Unfortunately, normalizing this data requires many-to-one relationships (many people live
in one particular region, many people work in one particular industry), which don’t fit nicely
into the document model. In relational databases, it’s normal to refer to rows in other tables
by ID, because joins are easy. In document databases, joins are not needed for one-to-many
tree structures, and support for joins is often weak.
Moreover, even if the initial version of an application fits well in a join-free document
model, data has a tendency of becoming more interconnected as features are added to
applications. For example, consider some changes we could make to the résumé example:
Organizations and schools as entities
In the previous description, organization (the company where the user worked) and
school_name (where they studied) are just strings. Perhaps they should be references to
entities instead? Then each organization, school, or university could have its own web page
(with logo, news feed, etc.); each résumé could link to the organizations and schools that it
mentions, and include their logos and other information (see Figure 2-3 for an example from
LinkedIn).
Recommendations
Say you want to add a new feature: one user can write a recommendation for another user.
The recommendation is shown on the résumé of the user who was recommended, together
with the name and photo of the user making the recommendation. If the recommender
updates their photo, any recommendations they have written need to reflect the new photo.
Therefore, the recommendation should have a reference to the author’s profile.
The company name is not just a string, but a link to a company entity.Screenshot of
linkedin.com.
The following diagram illustrates how these new features require many-to-many
relationships. The data within each dotted rectangle can be grouped into one document, but
the references to organizations, schools, and other users need to be represented as references,
and require joins when queried.