Unit 6 NOSQL Databases and Data Warehousing

Uploaded by

dwightschrute826

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views29 pages

Unit 6 NOSQL Databases and Data Warehousing

Uploaded by

dwightschrute826

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Unit 6 NOSQL Databases

and Data Warehousing

Contents
• NOSQL Databases :
• Introduction to NOSQL Databases,
• Types of NOSQL Databases
• BASE properties
• CAP theorem
• Data Warehousing: Architecture and Components of Data
Warehouse, OLAP
NoSQL databases "not only SQL"
• NoSQL databases (aka "not only SQL") are
non-tabular databases and store data
differently than relational tables.
• NoSQL databases come in a variety of
types based on their data model. The
main types are document, key-value,
wide-column, and graph.
• Traditional RDBMS uses SQL syntax to
store and retrieve data for further insights.
Instead, a NoSQL database system
encompasses a wide range of database
technologies that can store structured,
semi-structured, unstructured data.
Types of NoSQL databases
• NoSQL Databases are mainly categorized into four types: Key-value pair,
Column-oriented(Wide-column), Graph-based and Document-oriented.
Every category has its unique attributes and limitations
Types of NoSQL databases
• Key Value Pair Based
• Data is stored in key/value pairs. It is designed in such a way to handle lots
of data and heavy load.
• Key-value pair storage databases store data as a hash table where each
key is unique, and the value can be a JSON, BLOB(Binary Large Objects),
string, etc.
• This kind of NoSQL database is used as a collection, dictionaries,
associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
• Redis, Dynamo, Riak are some NoSQL examples of key-value store
DataBases.
Types of NoSQL databases
• Column-based
• Column-oriented databases
work on columns and are based
on BigTable paper by Google.
Every column is treated
separately.
• More specifically, column
databases use the concept
of keyspace, which is sort of like
a schema in relational models.
This keyspace contains all the
column families, which then
contain rows, which then
contain columns.
Types of NoSQL databases
• HBase, Cassandra, HBase,
Hypertable are NoSQL query
examples of column based
database.
Types of NoSQL databases

• Document-Oriented:
• Document-Oriented NoSQL DB stores and retrieves data as a key value
pair but the value part is stored as a document. The document is stored
in JSON or XML formats. The value is understood by the DB and can be
queried.
• The document type is mostly used for CMS systems, blogging platforms,
real-time analytics & e-commerce applications.
• Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are
popular Document originated DBMS systems.
Types of NoSQL databases
• Graph-Based
• A graph type database stores entities as well
the relations amongst those entities. The
entity is stored as a node with the
relationship as edges.
• An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
• Graph base database mostly used for social
networks, logistics, spatial data.
• Neo4J, Infinite Graph, OrientDB, FlockDB are
some popular graph-based databases.
What is the CAP Theorem?
• CAP theorem states that is impossible for a distributed data store to offer more than two out
of three guarantees.
1.Consistency
2.Availability
3.Partition Tolerance
• Consistency:
• The data should remain consistent even after the execution of an operation. This means once
data is written, any future read request should contain that data. For example, after
updating the order status, all the clients should be able to see the same data.
• Availability:
• The database should always be available and responsive. It should not have any downtime.
• Partition Tolerance:
• Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be partitioned
into multiple groups which may not communicate with each other. Here, if part of the
database is unavailable, other parts are always unaffected.
BASE: Basically Available, Soft state, Eventual consistency
• Basically, available means DB is
available all the time as per CAP
theorem
• Soft state means even without an
input; the system state may change
• Eventual consistency means that
the system will become consistent
over time
Data Warehousing
• A Data Warehousing (DW) is process for
collecting and managing data from
varied sources to provide meaningful
business insights.
• A Data Warehouse is a collection of
software tools that facilitates analysis
of a large set of business data used to
help an organization make decisions.
• A data warehouse is mainly a data management system that’s designed to
enable and support business intelligence (BI) activities, particularly analytics.
Data warehouses are alleged to perform queries, cleaning, manipulating,
transforming and analyzing the data.
• A large amount of data in data warehouses comes from numerous sources
such that internal applications like marketing, sales, and finance; customer-
facing apps
Why You Need a Data Warehouse?
• During the early days, you may be using your
regular database to run SQL queries for analytics.
But, with the increase in the size of the data and
individuals using the data to perform various
analysis, your regular database becomes
extremely slow in query processing.
• This is where companies understood the need
for Data Warehouse that on the other hand, is
designed to handle huge volumes of data. It allows
you to swiftly Filter, Sort, Aggregate, and Analyze
the data.
• It allows organizations to make quality business
decisions. The data warehouse benefits by
improving data analytics, it also helps to gain
considerable revenue and the strength to compete
more strategically in the market.
Characteristics of data warehousing
• Subject-Oriented
• A data warehouse target on the modeling and analysis of
data for decision-makers. Therefore, data warehouses
typically provide a concise and straightforward view
around a particular subject, such as customer, product, or
sales, instead of the global organization's ongoing
operations.
• Integrated
• In Data Warehouse, integration means the establishment
of a common unit of measure for all similar data from the
dissimilar database
• Time-Variant
• It contains an element of time, explicitly or implicitly.
• Non-volatile
• Also, the data warehouse is non-volatile, meaning that
prior data will not be erased when new data are entered
into it
Characteristics of data warehousing
Architecture & Components of Data Warehouse
• The architecture of the data warehouse mainly consists of the proper arrangement of
its elements, to build an efficient data warehouse with software and hardware
components. The elements and components may vary based on the requirement of
organizations. All of these depend on the organization’s circumstances.
Architecture & Components of Data Warehouse
• 1. Source Data Component:
• In the Data Warehouse, the source data comes from different places.
They are group into four categories:
• External Data: For data gathering, most of the executives and data
analysts rely on information coming from external sources for a
numerous amount of the information they use. They use statistical
features associated with their organization that is brought out by some
external sources and department.
• Internal Data: In every organization, the consumer keeps their “private” spreadsheets,
reports, client profiles, and generally even department databases.
• Operational System data: Operational systems are principally meant to run the business.
In each operation system, we periodically take the old data and store it in achieved files.
• Flat files: A flat file is nothing but a text database that stores data in a plain text format.
Flat files generally are text files that have all data processing and structure markup
removed. A flat file contains a table with a single record per line.
Architecture & Components of Data Warehouse
• 2. Data Staging: After the data is extracted from various sources, now it’s time to prepare
the data files for storing in the data warehouse. The extracted data collected from various
sources must be transformed and made ready in a format that is suitable to be saved in
the data warehouse for querying and analysis.

• Data Extraction: This stage handles various data sources. Data analysts should employ suitable
techniques for every data source.
• Data Transformation: We tend to perform many individual tasks as a part of information
transformation. First, we tend to clean the info extracted from every source of data. Once the data
transformation performs ends, we’ve got a set of integrated information that’s clean, standardized, and
summarized.
• Data Loading: When we complete the structure and construction of the data warehouse we do the
initial loading of the data into the data warehouse storage.
Architecture & Components of Data Warehouse
• 3. Data Storage in Warehouse: Data storage for data
warehousing is split into multiple repositories.
• These data repositories contain structured data in a very highly
normalized form for fast and efficient processing.

• Metadata: Metadata means data about data i.e. it summarizes basic details
regarding data, creating findings & operating with explicit instances of data.
• Raw Data: Raw data is a set of data and information that has not yet been
processed and was delivered from a particular data entity to the data supplier
and hasn’t been processed by machine or human.
• Summary Data or Data summary: Data summary is an easy term for a brief
conclusion of an enormous theory or a paragraph. This is often one thing where
analysts write the code and in the end, they declare the ultimate end in the form
of summarizing data.
Architecture & Components of Data Warehouse
• 4. Data Marts:
• It can store the information of a specific function of an
organization that is handled by a
single authority.
• There may be any number of data marts in a particular
organization depending upon the functions. In short, data
marts contain subsets of the data stored in data
warehouses.

• 5. Users/Analysts
• Now, the users and analysts can use data for various applications like reporting,
analyzing, mining, etc. The data is made available to them whenever required.

• https://www.analyticsvidhya.com/blog/2021/07/a-brief-introduction-to-dat
a-warehouse/
Detail architecture

• https://www.jamesserra.com/archive/2013/07/why-you-need-a-data-wareh
ouse/
Online Analytical Processing (OLAP)
• OLAP stands for On-Line Analytical Processing. OLAP is a classification of
software technology which authorizes analysts, managers, and
executives to gain insight into information through fast, consistent,
interactive access in a wide variety of possible views of data that has
been transformed from raw information to reflect the real dimensionality
of the enterprise as understood by the clients.
• OLAP implement the multidimensional analysis of business information
and support the capability for complex estimations, trend analysis, and
sophisticated data modeling.
• It is rapidly enhancing the essential foundation for Intelligent Solutions
containing Business Performance Management, Planning, Budgeting,
Forecasting, Financial Documenting, Analysis, Simulation-Models,
Knowledge Discovery, and Data Warehouses Reporting.
Online Analytical Processing (OLAP)
• How OLAP systems work
• To facilitate this kind of analysis, data is collected
from multiple data sources and stored in data
warehouses then cleansed and organized into data
cubes.
• Each OLAP cube contains data categorized by
dimensions (such as customers, geographic sales
region and time period) derived by dimensional
tables in the data warehouses.
• Dimensions are then populated by members (such
as customer names, countries and months) that are
organized hierarchically.
• OLAP cubes are often pre-summarized across
dimensions to drastically improve query time over
relational databases.
• A data cube is a multi-dimensional array of values
used to bring together data to be organized and
modeled for analysis.
Online Analytical Processing (OLAP)
• Analysts can then perform five types of OLAP analytical operations against
multidimensional databases:
• Roll-up. Also known as consolidation, or drill-up, this operation
summarizes the data along the dimension.
• Drill-down. This allows analysts to navigate deeper among the dimensions
of data, for example drilling down from "time period" to "years" and
"months" to chart sales growth for a product.
• Slice. This enables an analyst to take one level of information for display,
such as "sales in 2017."
• Dice. This allows an analyst to select data from multiple dimensions to
analyze, such as "sales of blue beach balls in Iowa in 2017."
• Pivot. Analysts can gain a new view of data by rotating the data axes of
the cube.
Online Analytical Processing (OLAP)
• Analysts can then perform five types of OLAP analytical operations against
multidimensional databases:
• Roll-up. Also known as aggregation operation or drill-up, this
operation summarizes the data along the dimension.
Temperature 64 65 68 69 70 71 72 75 80 81 83 85 Temperature cool mild hot
Week1 1 0 1 0 1 0 0 0 0 0 1 0 Week1 2 1 1
Week2 0 0 0 1 0 0 1 2 0 1 0 0 Week2 2 1 1

• Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in
temperature from the above cubes.
• To do this, we have to group column and add up the value according to the
concept hierarchies.
• This operation is known as a roll-up.
Temperature cool mild hot
Online Analytical Processing (OLAP) Day 1 0 0 0
Day 2 0 0 0
• Drill-down. This allows
Day 3 0 0 1
analysts to navigate deeper Day 4 0 1 0
among the dimensions of data, Day 5 1 0 0
for example drilling down from Day 6 0 0 0
"time period" to "years" and Day 7 1 0 0
"months" to chart sales growth Day 8 0 0 0
for a product. Day 9 1 0 0
Day 10 0 1 0
Day 11 0 1 0
Day 12 0 1 0
Day 13 0 0 1
Day 14 0 0 0
Online Analytical Processing (OLAP)
• Dice. This allows an analyst to select
• Slice. This enables an Temperature cool
data from multiple dimensions to
analyst to take one Day 1 0 analyze.
level of information for Day 2 0 • The dice operation describes a subcube
display, such as "sales Day 3 0 by operating a selection on two or more
Day 4 0 dimension.
in 2017.“
Day 5 1 • For example, Implement the selection
• For example, if we Day 6 1 (time = day 3 OR time = day 4) AND
make the selection, Day 7 1
(temperature = cool OR temperature =
temperature=cool we hot) to the original cubes we get the
Day 8 1 following subcube (still two-dimensional)
will obtain the following Day 9 1 Temperature cool hot
cube: Day 11 0 Day 3 0 1
Day 12 0 Day 4 0 0
Day 13 0
Day 14 0
Online Analytical Processing (OLAP)
• Pivot. Analysts can gain a new view of data by rotating the data axes of
the cube.
• It may contain swapping the rows and columns or moving one of the row-
dimensions into the column dimensions.
Types of OLAP
• There are three main types of OLAP servers are as following:
• ROLAP stands for Relational OLAP, an application based on relational
DBMSs.
• MOLAP stands for Multidimensional OLAP, an application based on
multidimensional DBMSs.
• HOLAP stands for Hybrid OLAP, an application using both relational and
multidimensional techniques.

DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
UNIT - 1 - Datawarehouse & Data Mining
100% (1)
UNIT - 1 - Datawarehouse & Data Mining
24 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Unit 6 Data Warehousing
No ratings yet
Unit 6 Data Warehousing
40 pages
Lect 5 Data Warehousing I_240924_033406
No ratings yet
Lect 5 Data Warehousing I_240924_033406
38 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Unit 1 DWDM Complete
No ratings yet
Unit 1 DWDM Complete
104 pages
DWDM
No ratings yet
DWDM
15 pages
Data Warehouse Unit 1
No ratings yet
Data Warehouse Unit 1
7 pages
APznzaY6aDiiFQcZdglMmHWqlfsLZcMKsTESHR9B_kPknhosV26ajqWsdEUKja4p9JYNx0z36dw2DbeRDycS1Y8JawcQ87i9STAqIoxAdievoD9TPhGWCj-VFS9pKfSk5UUHP7K-Uuidt3jVKqNIVOgHGNQbWGsnwt_zCupOzVlvYRIscF3zSsEsHVUnpYTm4Pf6Ft1aUDOxMC_
No ratings yet
APznzaY6aDiiFQcZdglMmHWqlfsLZcMKsTESHR9B_kPknhosV26ajqWsdEUKja4p9JYNx0z36dw2DbeRDycS1Y8JawcQ87i9STAqIoxAdievoD9TPhGWCj-VFS9pKfSk5UUHP7K-Uuidt3jVKqNIVOgHGNQbWGsnwt_zCupOzVlvYRIscF3zSsEsHVUnpYTm4Pf6Ft1aUDOxMC_
47 pages
DW Lecture Unit 1
No ratings yet
DW Lecture Unit 1
19 pages
Database Design
No ratings yet
Database Design
39 pages
INFORMATION MANAGEMENT Unit 3 NEW
100% (1)
INFORMATION MANAGEMENT Unit 3 NEW
61 pages
Data Warehousing
No ratings yet
Data Warehousing
20 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
Unit 3 - Notes
No ratings yet
Unit 3 - Notes
20 pages
612719980-DATA-ware-house-mining-NOTES
No ratings yet
612719980-DATA-ware-house-mining-NOTES
31 pages
Nosql Datawarehouse
No ratings yet
Nosql Datawarehouse
11 pages
DWDM Lecture Notes
No ratings yet
DWDM Lecture Notes
139 pages
DH&DM Unit-1
No ratings yet
DH&DM Unit-1
16 pages
Data Warehousing-1
No ratings yet
Data Warehousing-1
51 pages
DW Unit1
No ratings yet
DW Unit1
26 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
ITBO WEEK 3 PPT - ch03
No ratings yet
ITBO WEEK 3 PPT - ch03
53 pages
3 Marks 1.what Is Data Warehouse?: o o o o o
No ratings yet
3 Marks 1.what Is Data Warehouse?: o o o o o
13 pages
Data warehousing and Data mining Original Notes (1)
No ratings yet
Data warehousing and Data mining Original Notes (1)
47 pages
DATA WAREHOUSE
No ratings yet
DATA WAREHOUSE
143 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
34 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
31 pages
Data Mining Warehousing I & II
No ratings yet
Data Mining Warehousing I & II
7 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Data Vwarehouse
No ratings yet
Data Vwarehouse
5 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Lecture 13
No ratings yet
Lecture 13
17 pages
Data Warehousing PArt B
No ratings yet
Data Warehousing PArt B
7 pages
Data Warehousing and OLAP
No ratings yet
Data Warehousing and OLAP
47 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
1 & 2 Data Warehousing_021052
No ratings yet
1 & 2 Data Warehousing_021052
80 pages
Data Mining & Housing
No ratings yet
Data Mining & Housing
13 pages
Data Warehousing
No ratings yet
Data Warehousing
71 pages
Introduction To Data Warehouse Edited
No ratings yet
Introduction To Data Warehouse Edited
34 pages
Mining Kind of data
No ratings yet
Mining Kind of data
24 pages
MIS - 7 (Compatibility Mode)
No ratings yet
MIS - 7 (Compatibility Mode)
48 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Data Mining
No ratings yet
Data Mining
98 pages
Module 3
No ratings yet
Module 3
17 pages
BigQuery
No ratings yet
BigQuery
8 pages
Data Warehouse Power Point Presentation
No ratings yet
Data Warehouse Power Point Presentation
18 pages
Data Warehousing - Quick Guide - Tutorialspoint
No ratings yet
Data Warehousing - Quick Guide - Tutorialspoint
67 pages
Data Warehouse: Concepts, Architecture and Components
No ratings yet
Data Warehouse: Concepts, Architecture and Components
5 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
9 pages
Decision Support System: Unit 1
No ratings yet
Decision Support System: Unit 1
34 pages
Data Warehouse-Ccs341 Material
No ratings yet
Data Warehouse-Ccs341 Material
58 pages
Medical
No ratings yet
Medical
3 pages
Data Mining
No ratings yet
Data Mining
65 pages
6th_SEM Data Science Notes
No ratings yet
6th_SEM Data Science Notes
46 pages
Advanced Database Presentation
No ratings yet
Advanced Database Presentation
11 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Reactjs Beginners Quick Start Guide v2
No ratings yet
Reactjs Beginners Quick Start Guide v2
55 pages
Konsp Matl1ax
No ratings yet
Konsp Matl1ax
65 pages
global success 8 de cuong on tap hoc ki 2
No ratings yet
global success 8 de cuong on tap hoc ki 2
8 pages
Pom Psse33 PDF
63% (8)
Pom Psse33 PDF
1,504 pages
Unit 1-5 CS8079 HCI QBank Panimalar College PDF
No ratings yet
Unit 1-5 CS8079 HCI QBank Panimalar College PDF
49 pages
SR1CBL01: Product Data Sheet
No ratings yet
SR1CBL01: Product Data Sheet
2 pages
HowToGuide YL EWM Integration R2 V1.5m
No ratings yet
HowToGuide YL EWM Integration R2 V1.5m
51 pages
Design and Implementation of An Rfid Based Automated Students Attendance System R Basas
No ratings yet
Design and Implementation of An Rfid Based Automated Students Attendance System R Basas
6 pages
Adobe Photoshop
No ratings yet
Adobe Photoshop
36 pages
Minisim 1000 User Manual
No ratings yet
Minisim 1000 User Manual
26 pages
Emtech Module #3
No ratings yet
Emtech Module #3
2 pages
Java Programming: From Problem Analysis To Program Design, 5e
No ratings yet
Java Programming: From Problem Analysis To Program Design, 5e
17 pages
A Industrial Training Report On: 161331024 Submitted To
No ratings yet
A Industrial Training Report On: 161331024 Submitted To
20 pages
Prashant Verma
No ratings yet
Prashant Verma
1 page
JD - Associate Developers
No ratings yet
JD - Associate Developers
2 pages
2014 03 17 05 44 11 FUJITSU-PC Log
No ratings yet
2014 03 17 05 44 11 FUJITSU-PC Log
273 pages
Blue Team Tools 1677860442
No ratings yet
Blue Team Tools 1677860442
29 pages
PACK COMPLETO (sensi+config)
No ratings yet
PACK COMPLETO (sensi+config)
2 pages
MP2200 User's Manual
No ratings yet
MP2200 User's Manual
300 pages
Drive Module LT-MODUL INT.2X15A Failed After Checking With Another One - 203524 - Industry Support Siemens
No ratings yet
Drive Module LT-MODUL INT.2X15A Failed After Checking With Another One - 203524 - Industry Support Siemens
2 pages
Python With Selenium Automation Testing
No ratings yet
Python With Selenium Automation Testing
17 pages
Universiti Malaya Thesis Template
No ratings yet
Universiti Malaya Thesis Template
42 pages
Proposal HIK
No ratings yet
Proposal HIK
8 pages
Page 10th Computer Ch-2 Exercise Half
No ratings yet
Page 10th Computer Ch-2 Exercise Half
10 pages
5.2 TOFD With Omniscan Procedure
No ratings yet
5.2 TOFD With Omniscan Procedure
9 pages
04 ISO 27000 Security Standards
No ratings yet
04 ISO 27000 Security Standards
19 pages
Unit 4 Complete Notes
No ratings yet
Unit 4 Complete Notes
17 pages
Test - DataBase Basics - DE Courseware - Elearn
No ratings yet
Test - DataBase Basics - DE Courseware - Elearn
6 pages
Dosxyznrc Users Manual: B. Walters, I. Kawrakow and D.W.O. Rogers
No ratings yet
Dosxyznrc Users Manual: B. Walters, I. Kawrakow and D.W.O. Rogers
129 pages
DWG List For Issued 21-12-27
No ratings yet
DWG List For Issued 21-12-27
2 pages

Unit 6 NOSQL Databases and Data Warehousing

Uploaded by

Unit 6 NOSQL Databases and Data Warehousing

Uploaded by

Unit 6 NOSQL Databases

and Data Warehousing

You might also like