0% found this document useful (0 votes)
60 views

Nosql: (Lesson 11)

Uploaded by

AbdulSamad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Nosql: (Lesson 11)

Uploaded by

AbdulSamad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

NoSQL

Copyright © 2020, Victorian Institute of Technology.


The contents contained in this document may not be reproduced in any form or by any means, without the written permission of VIT,
Total Slides: 35
other than for the purpose for which it has been supplied. VIT and its logo are trademarks of Victorian Institute of Technology.

Page: 1
MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 1
Topics

• Introduction of NoSQL
• NoSQL Means
• Features of NoSQL
• Problems with conventional approaches
• Advantages of NoSQL over RDBMS

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 2


Introduction of NoSQL

• The data landscape has changed. During the past 15 years,


the explosion of the World Wide Web, social media, web
forms you have to fill in, and greater connectivity to the
Internet means that more than ever before a vast array of
data is in use.
• New and often crucial information is generated hourly, from
simple tweets about what people have for dinner to critical
medical notes by healthcare providers.
• As a result, systems designers no longer have the luxury of
closeting themselves in a room for a couple of years
designing systems to handle new data.
• Instead, they must quickly create systems that store data
and make information readily available for search,
consolidation, and analysis.
MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 3
Introduction of NoSQL (Cont…)

• All of this means that a particular kind of systems


technology is needed. The good news is that a huge array
of these kinds of systems already exists in the form of
NoSQL databases.
• The concepts behind NoSQL developed slowly over several
years. Independent groups then took those ideas and
applied them to their own data problems, thereby creating
the various NoSQL databases that exist today.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 4


Introduction of NoSQL (Cont…)

• In 2006, Google released a paper that described Bigtable as


follows: “Bigtable is a distributed storage system for
managing structured data that is designed to scale to a very
large size: petabytes of data across thousands of
commodity servers.”
• Similar to an RDBMS model at first sight, Bigtable stores
rows with a single key and stores data in the rows within
related column families. Therefore, accessing all related
data is as easy as retrieving a record by using an ID rather
than a complex join, as in relational database SQL.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 5


Introduction of NoSQL (Cont…)

• This model also means that distributing data is more


straightforward than with relational databases. By using
simple keys, related data - such as all pages on the same
website can be grouped together, which increases the
speed of analysis.
• We can think of Bigtable as an alternative to many tables
with relationships. That is, with Bigtable, column families
allow related data to be stored in a single record.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 6


Introduction of NoSQL (Cont…)

• Amazon released in 2007 describing its Dynamo data


storage application. “Dynamo is used to manage the state of
services that have very high reliability requirements and
need tight control over the tradeoffs between availability,
consistency, cost‐effectiveness and performance.”
• It describe how a lot of Amazon data is stored by use of a
primary key, how consistent hashing is used to partition and
distribute data, and how object versioning is used to
maintain consistency across data centers.
• These two papers inspired many different organizations to
create their NoSQL databases

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 7


Introduction of NoSQL (Cont…)

• Many open-source NoSQL databases had emerged by


2009.
– Riak
– MongoDB
– Hbase
– Accumulo
– Hypertable
– Redis
– Cassandra
– Neo4j

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 8


NoSQL Means

• Movement includes hundreds of NoSQL database products,


which has led to a variety of definitions for the term - some
with very common tenets, and others not so common.
• This explosion of databases happened because
nonrelational approaches have been applied to a wide
range of problems where an RDBMS has traditionally been
weak.
• NoSQL databases were also created for data structures and
models that in an RDBMS required considerable
management or shredding and the reconstitution of data in
complex plumbing code.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 9


NoSQL Means (Cont…)

• Each problem resulted in its own solution - and its own


NoSQL database, which is why so many new databases
emerged.
• It’s unlikely that one NoSQL database can solve all the
issues in a particular business area.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 10


Features of NoSQL

• Schema agnostic: A database schema is the description of


all possible data and data structures in a relational
database. With a NoSQL database, a schema isn’t required,
giving you the freedom to store information without doing
up‐front schema design.
• Nonrelational: Relations in a database establish connections
between tables of data. For example, a list of transaction
details can be connected to a separate list of delivery
details. With a NoSQL database, this information is stored
as an aggregate - a single record with everything about the
transaction, including the delivery address.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 11


Features of NoSQL (Cont…)

• Commodity hardware: Some databases are designed to


operate best (or only) with specialized storage and
processing hardware. With a NoSQL database, cheap
off‐the‐shelf servers can be used. Adding more of these
cheap servers allows NoSQL databases to scale to handle
more data.
• Highly distributable: Distributed databases can store and
process a set of information on more than one device. With
a NoSQL database, a cluster of servers can be used to hold
a single large database.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 12


Features of NoSQL (Cont…)

• Open‐source: NoSQL software is unique because the


open‐source movement has driven development rather than
follow a set of commercial companies. When developers
couldn’t find a NoSQL database for their needs, they
created one, and published it initially as open‐source.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 13


Problems with conventional approaches

• Relational databases are great for things that fit easily into
rows and columns.
• However, some problems require a different approach. Not
everything fits well into rows and columns - for example, a
book with a tree structure of cover, parts, chapters, main
headings, and subheadings.
• Likewise, what if a particular record has a field that could
contain two or more values? Breaking this out into another
sheet or table is a bit of overkill, and makes it harder to work
with the data as a single unit.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 14


Problems with conventional approaches
(Cont…)
• Schema redesign overhead
– Consider a retail website. The original design has a single order with
a single set of delivery information. What if the retailer now needs to
package the products into potentially multiple deliveries?
– With a relational system, you now have to spend a lot of time
deciding how best to handle this redesign.
– If you use a document NoSQL database instead, you can start
storing your new structure immediately. Queries on indexes still work
because the same data is stored in a single document, just
elsewhere within it.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 15


Problems with conventional approaches
(Cont…)
• The sparse data problem
– Relational databases can suffer from a sparse data problem - this is
where it’s possible for columns to have particular values, but often
the columns are blank.
– Consider a contact management system, which may have a field for
home phone, cell phone, twitter ID, email, and other contact fields.
Using an RDBMS requires a null value be placed into unused
columns. Potentially, there could be 200 different fields, 99 percent
with blank null values. An RDBMS will still allocate disk space for
these columns, though, because they potentially could have a value
after future edits of the contact data. This is a great waste of
resources. It’s also inefficient to retrieve 198 null values over SQL in
a result set.
– NoSQL databases are designed to bypass this problem. They store
and index only what is provided by the client application. No nulls
stored, and no storage space previously allocated. You just store
what you need to use
[Lesson 11]
MITS4003 Copyright © 2020 VIT, All Rights Reserved 16
Problems with conventional approaches
(Cont…)
• Dynamically changing relationships
– You may discover facts and relationships over time. Consider
LinkedIn where someone may be a second‐level connection (a friend
of a friend). You realize you know the person, so you add her as a
first level relationship by inserting a single fact or relationship in the
application.
– You could go one step further and define subclasses of these
relationships, such as worked with, friends with, or married to. You
may even add metadata to these relationships, such as a “known
since” date.
– Relational databases aren’t great at managing these things
dynamically. Sure you could model the above relationships, but what
if you discover or infer a new class of relationship between entities or
subjects that wasn’t considered during the original system design?

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 17


Problems with conventional approaches
(Cont…)
• Dynamically changing relationships (Cont…)
– Using an RDBMS for this would require an ever‐increasing storm of
many‐to‐ many relationships and linking tables, one table schema for
each relationship class. This approach would be hard to keep up
with and maintain.
– NoSQL databases are designed with dynamically changing
relationships in mind.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 18


Advantages of NoSQL over RDBMS

• Support for Unstructured Text


– The vast majority of data in enterprise systems is unstructured. Many
NoSQL databases can handle indexing of unstructured text.
– Being able to manage unstructured text greatly increases information
and can help organizations make better decisions.
• Ability to Handle Change over Time
– Because of the schema agnostic nature of NoSQL databases,
they’re very capable of managing change.
– Microsoft DocumentDB and MarkLogic Server both provide this
capability.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 19


Advantages of NoSQL over RDBMS
(Cont…)
• No Reliance on SQL Magic
– Structured Query Language (SQL) is the predominant language
used to query relational database management systems.
– Although several NoSQL databases support SQL access, they do so
for compatibility with existing applications such as business
intelligence (BI) tools.
– NoSQL databases support their own access languages that can
interpret the data being stored, rather than require a relational model
within the underlying database.
– This more developer‐centric mentality to the design of databases
and their access application programming interfaces (API) are the
reason NoSQL databases have become very popular among
application developers.
– Application developers don’t need to know the inner workings and
vagaries of databases before using them.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 20


Advantages of NoSQL over RDBMS
(Cont…)
• Ability to Scale Horizontally on Commodity Hardware
– NoSQL databases handle partitioning of a database across several
servers. So, if data storage requirements grow too much, it can
continue to add inexpensive servers and connect it to database
cluster (horizontal scaling) making them work as a single data
service.
– Contrast this to the relational database world where it need to buy
new, more powerful and thus more expensive hardware to scale up
(vertical scaling).

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 21


Advantages of NoSQL over RDBMS
(Cont…)
• Breadth of Functionality
– Most relational databases support the same features but in a slightly
different way, so they are all similar.
– NoSQL databases, in contrast, come in four core types:
• key‐value
• Columnar
• Document
• triple stores.
– Within these types, we can find a database to suit our particular
needs. With so much choice, it is sure to find a NoSQL database that
will solve application woes.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 22


Advantages of NoSQL over RDBMS
(Cont…)
• Support for Multiple Data Structures
– Many applications need simple object storage, whereas others
require highly complex and interrelated structure storage.
– NoSQL databases provide support for a range of data structures.
• Simple binary values, lists, maps, and strings can be handled at
high speed in key‐value stores.
• Related information values can be grouped in column families
within Bigtable clones.
• Highly complex parent‐child hierarchal structures can be
managed within document databases.
• A web of interrelated information can be described flexibly and
related in triple and graph stores.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 23


Advantages of NoSQL over RDBMS
(Cont…)
• Vendor Choice
– The NoSQL industry is awash with databases, though many have
been around for less than ten years. For example, IBM, Microsoft,
and Oracle comes recently into this market.
– Consequently, many vendors are targeting particular audiences with
their own brew of innovation.
– Open‐source variants are available for most NoSQL databases,
which enables companies to explore and start using NoSQL
databases at minimal risk.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 24


Advantages of NoSQL over RDBMS
(Cont…)
• No Legacy Code
– Because they are so new, NoSQL databases don’t have legacy
code, which means they don’t need to provide support for old
hardware platforms or keep strange and infrequently used
functionality updated.
– NoSQL databases enjoy a quick pace in terms of development and
maturation. New features are released all the time, and new and
existing features are updated frequently. In fact, new major releases
occur annually rather than every three to five years.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 25


Advantages of NoSQL over RDBMS
(Cont…)
• Executing Code Next to the Data
– NoSQL databases were created in the era of Hadoop. Hadoop’s
highly distributed file‐system (HDFS) and batch‐processing
environment (Map/ Reduce) signaled changes in the way data is
stored, queried, and processed.
– Queries and processing work now pass to several servers, which
provides high levels of parallelization for both ingest and query
workloads. Being able to calculate aggregations next to the data has
also become the norm.
– Analysis is passed to the database for execution next to the data,
which means you don’t have to ship a lot of data around a network to
achieve locally combined analysis.

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 26


Summary

• Revision of Key Concepts

• Questions and Answer

MITS4003 [Lesson 11] Copyright © 2020 VIT, All Rights Reserved 27

You might also like