0% found this document useful (0 votes)
16 views

Unit 6 NOSQL Databases and Data Warehousing

Uploaded by

dwightschrute826
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Unit 6 NOSQL Databases and Data Warehousing

Uploaded by

dwightschrute826
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Unit 6 NOSQL Databases

and Data Warehousing


Contents
• NOSQL Databases :
• Introduction to NOSQL Databases,
• Types of NOSQL Databases
• BASE properties
• CAP theorem
• Data Warehousing: Architecture and Components of Data
Warehouse, OLAP
NoSQL databases "not only SQL"
• NoSQL databases (aka "not only SQL") are
non-tabular databases and store data
differently than relational tables.
• NoSQL databases come in a variety of
types based on their data model. The
main types are document, key-value,
wide-column, and graph.
• Traditional RDBMS uses SQL syntax to
store and retrieve data for further insights.
Instead, a NoSQL database system
encompasses a wide range of database
technologies that can store structured,
semi-structured, unstructured data.
Types of NoSQL databases
• NoSQL Databases are mainly categorized into four types: Key-value pair,
Column-oriented(Wide-column), Graph-based and Document-oriented.
Every category has its unique attributes and limitations
Types of NoSQL databases
• Key Value Pair Based
• Data is stored in key/value pairs. It is designed in such a way to handle lots
of data and heavy load.
• Key-value pair storage databases store data as a hash table where each
key is unique, and the value can be a JSON, BLOB(Binary Large Objects),
string, etc.
• This kind of NoSQL database is used as a collection, dictionaries,
associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
• Redis, Dynamo, Riak are some NoSQL examples of key-value store
DataBases.
Types of NoSQL databases
• Column-based
• Column-oriented databases
work on columns and are based
on BigTable paper by Google.
Every column is treated
separately.
• More specifically, column
databases use the concept
of keyspace, which is sort of like
a schema in relational models.
This keyspace contains all the
column families, which then
contain rows, which then
contain columns.
Types of NoSQL databases
• HBase, Cassandra, HBase,
Hypertable are NoSQL query
examples of column based
database.
Types of NoSQL databases

• Document-Oriented:
• Document-Oriented NoSQL DB stores and retrieves data as a key value
pair but the value part is stored as a document. The document is stored
in JSON or XML formats. The value is understood by the DB and can be
queried.
• The document type is mostly used for CMS systems, blogging platforms,
real-time analytics & e-commerce applications.
• Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are
popular Document originated DBMS systems.
Types of NoSQL databases
• Graph-Based
• A graph type database stores entities as well
the relations amongst those entities. The
entity is stored as a node with the
relationship as edges.
• An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
• Graph base database mostly used for social
networks, logistics, spatial data.
• Neo4J, Infinite Graph, OrientDB, FlockDB are
some popular graph-based databases.
What is the CAP Theorem?
• CAP theorem states that is impossible for a distributed data store to offer more than two out
of three guarantees.
1.Consistency
2.Availability
3.Partition Tolerance
• Consistency:
• The data should remain consistent even after the execution of an operation. This means once
data is written, any future read request should contain that data. For example, after
updating the order status, all the clients should be able to see the same data.
• Availability:
• The database should always be available and responsive. It should not have any downtime.
• Partition Tolerance:
• Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be partitioned
into multiple groups which may not communicate with each other. Here, if part of the
database is unavailable, other parts are always unaffected.
BASE: Basically Available, Soft state, Eventual consistency
• Basically, available means DB is
available all the time as per CAP
theorem
• Soft state means even without an
input; the system state may change
• Eventual consistency means that
the system will become consistent
over time
Data Warehousing
• A Data Warehousing (DW) is process for
collecting and managing data from
varied sources to provide meaningful
business insights.
• A Data Warehouse is a collection of
software tools that facilitates analysis
of a large set of business data used to
help an organization make decisions.
• A data warehouse is mainly a data management system that’s designed to
enable and support business intelligence (BI) activities, particularly analytics.
Data warehouses are alleged to perform queries, cleaning, manipulating,
transforming and analyzing the data.
• A large amount of data in data warehouses comes from numerous sources
such that internal applications like marketing, sales, and finance; customer-
facing apps
Why You Need a Data Warehouse?
• During the early days, you may be using your
regular database to run SQL queries for analytics.
But, with the increase in the size of the data and
individuals using the data to perform various
analysis, your regular database becomes
extremely slow in query processing.
• This is where companies understood the need
for Data Warehouse that on the other hand, is
designed to handle huge volumes of data. It allows
you to swiftly Filter, Sort, Aggregate, and Analyze
the data.
• It allows organizations to make quality business
decisions. The data warehouse benefits by
improving data analytics, it also helps to gain
considerable revenue and the strength to compete
more strategically in the market.
Characteristics of data warehousing
• Subject-Oriented
• A data warehouse target on the modeling and analysis of
data for decision-makers. Therefore, data warehouses
typically provide a concise and straightforward view
around a particular subject, such as customer, product, or
sales, instead of the global organization's ongoing
operations.
• Integrated
• In Data Warehouse, integration means the establishment
of a common unit of measure for all similar data from the
dissimilar database
• Time-Variant
• It contains an element of time, explicitly or implicitly.
• Non-volatile
• Also, the data warehouse is non-volatile, meaning that
prior data will not be erased when new data are entered
into it
Characteristics of data warehousing
Architecture & Components of Data Warehouse
• The architecture of the data warehouse mainly consists of the proper arrangement of
its elements, to build an efficient data warehouse with software and hardware
components. The elements and components may vary based on the requirement of
organizations. All of these depend on the organization’s circumstances.
Architecture & Components of Data Warehouse
• 1. Source Data Component:
• In the Data Warehouse, the source data comes from different places.
They are group into four categories:
• External Data: For data gathering, most of the executives and data
analysts rely on information coming from external sources for a
numerous amount of the information they use. They use statistical
features associated with their organization that is brought out by some
external sources and department.
• Internal Data: In every organization, the consumer keeps their “private” spreadsheets,
reports, client profiles, and generally even department databases.
• Operational System data: Operational systems are principally meant to run the business.
In each operation system, we periodically take the old data and store it in achieved files.
• Flat files: A flat file is nothing but a text database that stores data in a plain text format.
Flat files generally are text files that have all data processing and structure markup
removed. A flat file contains a table with a single record per line.
Architecture & Components of Data Warehouse
• 2. Data Staging: After the data is extracted from various sources, now it’s time to prepare
the data files for storing in the data warehouse. The extracted data collected from various
sources must be transformed and made ready in a format that is suitable to be saved in
the data warehouse for querying and analysis.

• Data Extraction: This stage handles various data sources. Data analysts should employ suitable
techniques for every data source.
• Data Transformation: We tend to perform many individual tasks as a part of information
transformation. First, we tend to clean the info extracted from every source of data. Once the data
transformation performs ends, we’ve got a set of integrated information that’s clean, standardized, and
summarized.
• Data Loading: When we complete the structure and construction of the data warehouse we do the
initial loading of the data into the data warehouse storage.
Architecture & Components of Data Warehouse
• 3. Data Storage in Warehouse: Data storage for data
warehousing is split into multiple repositories.
• These data repositories contain structured data in a very highly
normalized form for fast and efficient processing.

• Metadata: Metadata means data about data i.e. it summarizes basic details
regarding data, creating findings & operating with explicit instances of data.
• Raw Data: Raw data is a set of data and information that has not yet been
processed and was delivered from a particular data entity to the data supplier
and hasn’t been processed by machine or human.
• Summary Data or Data summary: Data summary is an easy term for a brief
conclusion of an enormous theory or a paragraph. This is often one thing where
analysts write the code and in the end, they declare the ultimate end in the form
of summarizing data.
Architecture & Components of Data Warehouse
• 4. Data Marts:
• It can store the information of a specific function of an
organization that is handled by a
single authority.
• There may be any number of data marts in a particular
organization depending upon the functions. In short, data
marts contain subsets of the data stored in data
warehouses.

• 5. Users/Analysts
• Now, the users and analysts can use data for various applications like reporting,
analyzing, mining, etc. The data is made available to them whenever required.

• https://www.analyticsvidhya.com/blog/2021/07/a-brief-introduction-to-dat
a-warehouse/
Detail architecture

• https://www.jamesserra.com/archive/2013/07/why-you-need-a-data-wareh
ouse/
Online Analytical Processing (OLAP)
• OLAP stands for On-Line Analytical Processing. OLAP is a classification of
software technology which authorizes analysts, managers, and
executives to gain insight into information through fast, consistent,
interactive access in a wide variety of possible views of data that has
been transformed from raw information to reflect the real dimensionality
of the enterprise as understood by the clients.
• OLAP implement the multidimensional analysis of business information
and support the capability for complex estimations, trend analysis, and
sophisticated data modeling.
• It is rapidly enhancing the essential foundation for Intelligent Solutions
containing Business Performance Management, Planning, Budgeting,
Forecasting, Financial Documenting, Analysis, Simulation-Models,
Knowledge Discovery, and Data Warehouses Reporting.
Online Analytical Processing (OLAP)
• How OLAP systems work
• To facilitate this kind of analysis, data is collected
from multiple data sources and stored in data
warehouses then cleansed and organized into data
cubes.
• Each OLAP cube contains data categorized by
dimensions (such as customers, geographic sales
region and time period) derived by dimensional
tables in the data warehouses.
• Dimensions are then populated by members (such
as customer names, countries and months) that are
organized hierarchically.
• OLAP cubes are often pre-summarized across
dimensions to drastically improve query time over
relational databases.
• A data cube is a multi-dimensional array of values
used to bring together data to be organized and
modeled for analysis.
Online Analytical Processing (OLAP)
• Analysts can then perform five types of OLAP analytical operations against
multidimensional databases:
• Roll-up. Also known as consolidation, or drill-up, this operation
summarizes the data along the dimension.
• Drill-down. This allows analysts to navigate deeper among the dimensions
of data, for example drilling down from "time period" to "years" and
"months" to chart sales growth for a product.
• Slice. This enables an analyst to take one level of information for display,
such as "sales in 2017."
• Dice. This allows an analyst to select data from multiple dimensions to
analyze, such as "sales of blue beach balls in Iowa in 2017."
• Pivot. Analysts can gain a new view of data by rotating the data axes of
the cube.
Online Analytical Processing (OLAP)
• Analysts can then perform five types of OLAP analytical operations against
multidimensional databases:
• Roll-up. Also known as aggregation operation or drill-up, this
operation summarizes the data along the dimension.
Temperature 64 65 68 69 70 71 72 75 80 81 83 85 Temperature cool mild hot
Week1 1 0 1 0 1 0 0 0 0 0 1 0 Week1 2 1 1
Week2 0 0 0 1 0 0 1 2 0 1 0 0 Week2 2 1 1

• Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in
temperature from the above cubes.
• To do this, we have to group column and add up the value according to the
concept hierarchies.
• This operation is known as a roll-up.
Temperature cool mild hot
Online Analytical Processing (OLAP) Day 1 0 0 0
Day 2 0 0 0
• Drill-down. This allows
Day 3 0 0 1
analysts to navigate deeper Day 4 0 1 0
among the dimensions of data, Day 5 1 0 0
for example drilling down from Day 6 0 0 0
"time period" to "years" and Day 7 1 0 0
"months" to chart sales growth Day 8 0 0 0
for a product. Day 9 1 0 0
Day 10 0 1 0
Day 11 0 1 0
Day 12 0 1 0
Day 13 0 0 1
Day 14 0 0 0
Online Analytical Processing (OLAP)
• Dice. This allows an analyst to select
• Slice. This enables an Temperature cool
data from multiple dimensions to
analyst to take one Day 1 0 analyze.
level of information for Day 2 0 • The dice operation describes a subcube
display, such as "sales Day 3 0 by operating a selection on two or more
Day 4 0 dimension.
in 2017.“
Day 5 1 • For example, Implement the selection
• For example, if we Day 6 1 (time = day 3 OR time = day 4) AND
make the selection, Day 7 1
(temperature = cool OR temperature =
temperature=cool we hot) to the original cubes we get the
Day 8 1 following subcube (still two-dimensional)
will obtain the following Day 9 1 Temperature cool hot
cube: Day 11 0 Day 3 0 1
Day 12 0 Day 4 0 0
Day 13 0
Day 14 0
Online Analytical Processing (OLAP)
• Pivot. Analysts can gain a new view of data by rotating the data axes of
the cube.
• It may contain swapping the rows and columns or moving one of the row-
dimensions into the column dimensions.
Types of OLAP
• There are three main types of OLAP servers are as following:
• ROLAP stands for Relational OLAP, an application based on relational
DBMSs.
• MOLAP stands for Multidimensional OLAP, an application based on
multidimensional DBMSs.
• HOLAP stands for Hybrid OLAP, an application using both relational and
multidimensional techniques.

You might also like