0% found this document useful (0 votes)

17 views12 pages

9 - Analytics Databases

Analytics databases are often used to store data copied from transactional databases to optimize analytics queries. The data is structured using star or snowflake schemas and stored column-oriented to better utilize CPUs. Writes go to an LSM tree then bulk writes to columns. Materialized views precompute common queries but slow down writes and reduce flexibility.

Uploaded by

rohit kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views12 pages

9 - Analytics Databases

Uploaded by

rohit kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Analytics Databases

Analytics Databases Background

Businesses will often want to run large internal queries across their data, that do
full table scans. However, doing such a thing can take a huge performance hit on
their database that deals with client interaction. Hence, they will typically have a
second database, for analytics processing, where data is copied some period of
time after the fact.

This is done using an ETL (Extract, Transform, Load) process which is typically
scheduled as a batch job (we will discuss these later).
Stars and Snowflakes
In the transaction database, there may be many
tables with all different types of relationships
between them. However, in an analytics
database, there is typically one central table
known as the “fact” table. The fact table has many
foreign keys which reference other tables known
as dimension tables - this is called the “star”
schema.

If the dimensions tables reference sub-dimensions tables, this is known as a snowflake schema.
Column Oriented Storage Background
Most transactions based databases use row oriented storage: they store the entire
contents of a row together on disk to improve with locality.

However, in analytical queries, it is rare that we need the entire row, but rather are
aggregating the value of one column over a certain table. Hence, it makes more
sense to store the columns of the table together, known as column oriented
storage.

Note that each column must be stored in the same order.

Compression
If each column has a lot of duplicate values, we can compress it to save space.
Imagine the following column representing coding proficiency out of 10:

8, 8, 8, 4, 5, 2, 2, 2, 2, 1
Compression
If each column has a lot of duplicate values, we can compress it to save space.
Imagine the following column representing coding proficiency out of 10:

8, 8, 8, 4, 5, 2, 2, 2, 2, 1

Bitmap encodings:

For val 8: 1 1 1 0 0 0 0 0 0 0

For val 4: 0 0 0 1 0 0 0 0 0 0

For val 5: 0 0 0 0 1 0 0 0 0 0

For val 2: 0 0 0 0 0 1 1 1 1 0

For val 1: 0 0 0 0 0 0 0 0 0 1
Compression
If each column has a lot of duplicate values, we can compress it to save space.
Imagine the following column representing coding proficiency out of 10:

8, 8, 8, 4, 5, 2, 2, 2, 2, 1

Bitmap encodings: Run-Length encodings:

For val 8: 1 1 1 0 0 0 0 0 0 0 For val 8: 0, 3, 7

For val 4: 0 0 0 1 0 0 0 0 0 0 For val 4: 3, 1, 6

For val 5: 0 0 0 0 1 0 0 0 0 0 For val 5: 4, 1, 6

For val 2: 0 0 0 0 0 1 1 1 1 0 For val 2: 5, 4, 1

For val 1: 0 0 0 0 0 0 0 0 0 1 For val 1: 9, 1

Compression Continued
Perform bitwise operations on the encodings to find rows where multiple fields
match certain values (column A = 10 AND column B = 20, could also do column A
= 10 OR column A = 15)

Allows more data to fit in CPU cache

If you want columns sorted in a different way, can have a replica of the analytics
database with a different sort order:
● Acts as an index for efficient querying if you have a common query pattern
● Allows more column compression
Writing to Column Oriented Storage
Inefficient writes because would have to modify every column file

Instead all writes go to a sorted tree in memory (LSM tree), which is eventually
written in bulk to all of the column files once it gets too big
● Reads must check both the tree and the column files and merge them
Materialized Views
Idea: Database precomputes common queries so they do not constantly have to
be rerun

Pros:
● Do not have to rerun certain expensive common aggregations

Cons:
● Writes take longer since materialized views must be updated
● Less flexibility than querying raw data
Data Cube
Idea: Special type of materialized view that precomputes a multi dimensional table
(e.g. sales of every product_id on every day in the database)
Analytics Databases Summary
Very useful to decouple analytics databases from transactional ones as analytics
queries can take a very long time.

Additionally, using an analytics database allows us to store our data in a column

oriented manner, which once compressed, can be better utilized by a CPU’s cache
and iterated on in tight loops (no function calls). When writing to column oriented
storage, we can use an LSM tree as a buffer.

Finally, materialized views are a potentially very useful caching mechanism by

analytics databases in order to avoid recomputation of popular aggregations, at
the cost of slower writes.

Script Programing Manual
100% (1)
Script Programing Manual
736 pages
Esp32 Cam Board Specs
100% (2)
Esp32 Cam Board Specs
4 pages
Mysql PPT Ver9
No ratings yet
Mysql PPT Ver9
713 pages
10 It Database Managementsystem Notes01-Combined
No ratings yet
10 It Database Managementsystem Notes01-Combined
37 pages
10 It Database Managementsystem Notes01
No ratings yet
10 It Database Managementsystem Notes01
4 pages
Lesson 4 Data, Data Analysis, Database, Database Management
No ratings yet
Lesson 4 Data, Data Analysis, Database, Database Management
42 pages
DBMS
No ratings yet
DBMS
28 pages
Dimensional Data Modeling Day 1
No ratings yet
Dimensional Data Modeling Day 1
19 pages
Boot Loader
No ratings yet
Boot Loader
19 pages
MGate MB3180 QIG v3 PDF
No ratings yet
MGate MB3180 QIG v3 PDF
2 pages
Class3 4
No ratings yet
Class3 4
48 pages
Dbms Introduction 108
No ratings yet
Dbms Introduction 108
39 pages
Compare Performance, Load, Stress Testing
No ratings yet
Compare Performance, Load, Stress Testing
6 pages
Unit 1 DM
No ratings yet
Unit 1 DM
37 pages
Mra Unit III
No ratings yet
Mra Unit III
44 pages
Chapter 07
No ratings yet
Chapter 07
45 pages
Dbms ch1
No ratings yet
Dbms ch1
11 pages
Unit 4 - Basic Data Management
No ratings yet
Unit 4 - Basic Data Management
8 pages
Lesson 4
No ratings yet
Lesson 4
4 pages
Files 1 2020 April NotesHubDocument 1586849482
No ratings yet
Files 1 2020 April NotesHubDocument 1586849482
60 pages
Unit 1 Rdbms
No ratings yet
Unit 1 Rdbms
42 pages
Unit - 1 - Introduction Todatabases
No ratings yet
Unit - 1 - Introduction Todatabases
15 pages
Ec8552 Computer Architecture Unit Ii: Arithmetic Operations
No ratings yet
Ec8552 Computer Architecture Unit Ii: Arithmetic Operations
7 pages
Chapter 1 DBMS
100% (1)
Chapter 1 DBMS
32 pages
UNIT 5 Dbms
No ratings yet
UNIT 5 Dbms
25 pages
CS101 Introduction To Computing: Database Software
No ratings yet
CS101 Introduction To Computing: Database Software
45 pages
Unit 4
No ratings yet
Unit 4
18 pages
Set Up Foundation For Google Cloud Console NEW
100% (1)
Set Up Foundation For Google Cloud Console NEW
93 pages
Info Management
No ratings yet
Info Management
6 pages
Databases by Learny
No ratings yet
Databases by Learny
94 pages
InfoManage Handouts 01 02
No ratings yet
InfoManage Handouts 01 02
6 pages
Unit I - Big Data Programming
No ratings yet
Unit I - Big Data Programming
19 pages
Sqlfordevscom Next Level Database Techniques For Developers 37 40
No ratings yet
Sqlfordevscom Next Level Database Techniques For Developers 37 40
4 pages
DATABASE
No ratings yet
DATABASE
23 pages
L1-5 Merged
No ratings yet
L1-5 Merged
114 pages
Digital and Leadership Acumen
No ratings yet
Digital and Leadership Acumen
35 pages
402-IT ClassX Term2Book
No ratings yet
402-IT ClassX Term2Book
106 pages
402 IT - ClassX TermIIPartB
No ratings yet
402 IT - ClassX TermIIPartB
106 pages
Database - Intro 1
No ratings yet
Database - Intro 1
12 pages
Teradata Columnar
No ratings yet
Teradata Columnar
10 pages
Database Management Systems
No ratings yet
Database Management Systems
84 pages
Final
No ratings yet
Final
3 pages
TN - CLI COMMANDS - Important
No ratings yet
TN - CLI COMMANDS - Important
12 pages
Unit 1
No ratings yet
Unit 1
12 pages
Introduction To Databases: Course Introduction A Review of Database Concepts
No ratings yet
Introduction To Databases: Course Introduction A Review of Database Concepts
43 pages
Intro 2 DB
No ratings yet
Intro 2 DB
126 pages
Lecture 8-Is Infrastructure DBMS
No ratings yet
Lecture 8-Is Infrastructure DBMS
34 pages
CH 11
No ratings yet
CH 11
50 pages
Subject Code & Name: 18UITC51-RDBMS: Class: III B.SC (IT)
No ratings yet
Subject Code & Name: 18UITC51-RDBMS: Class: III B.SC (IT)
95 pages
Getting Started With Satellite 6
No ratings yet
Getting Started With Satellite 6
64 pages
MIS Unit-3
No ratings yet
MIS Unit-3
42 pages
DP900 Chapter1 Notes
No ratings yet
DP900 Chapter1 Notes
10 pages
Managing Data Resources
No ratings yet
Managing Data Resources
21 pages
Lect 1-2pdf
No ratings yet
Lect 1-2pdf
55 pages
Requisitos de Sistema PRTG
No ratings yet
Requisitos de Sistema PRTG
8 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
BackBox QuickstartGuide GettingStartedWithBackBoxEvaluationGuide OVA
No ratings yet
BackBox QuickstartGuide GettingStartedWithBackBoxEvaluationGuide OVA
22 pages
DATABASE SYSTEM Lac1
No ratings yet
DATABASE SYSTEM Lac1
23 pages
Unit-3 Relational Database Management Systems (Basic)
No ratings yet
Unit-3 Relational Database Management Systems (Basic)
55 pages
Data Base Management System
No ratings yet
Data Base Management System
61 pages
CSC 203
No ratings yet
CSC 203
13 pages
DBMS
No ratings yet
DBMS
75 pages
Introduction To Database Management System
No ratings yet
Introduction To Database Management System
33 pages
DBMS
No ratings yet
DBMS
6 pages
Cloud Computing
No ratings yet
Cloud Computing
11 pages
Guide LinuxMint PDF
No ratings yet
Guide LinuxMint PDF
48 pages
Gab Assignment
No ratings yet
Gab Assignment
7 pages
Designing Databases: Data Storage Design Objectives
No ratings yet
Designing Databases: Data Storage Design Objectives
8 pages
Entities Such As Students, Faculty, Courses, and Classrooms. - Relationships Between Entities, Such As Students' Enrollment in Courses, Faculty
No ratings yet
Entities Such As Students, Faculty, Courses, and Classrooms. - Relationships Between Entities, Such As Students' Enrollment in Courses, Faculty
7 pages
Teledyne PDS System Requirements
No ratings yet
Teledyne PDS System Requirements
3 pages
FileIO Connector UserGuide
No ratings yet
FileIO Connector UserGuide
18 pages
Introduction To Database
No ratings yet
Introduction To Database
6 pages
OSA 4.3 Manual
No ratings yet
OSA 4.3 Manual
16 pages
Questionario - What Is A Computer
No ratings yet
Questionario - What Is A Computer
4 pages
Bedrock Owners Manual
No ratings yet
Bedrock Owners Manual
25 pages
Data Sheet P1127e-500
No ratings yet
Data Sheet P1127e-500
2 pages
VA Computer pART 1
No ratings yet
VA Computer pART 1
41 pages
Katalog
No ratings yet
Katalog
35 pages
5 Partitioning
No ratings yet
5 Partitioning
23 pages
HiveOS HiveManager ReleaseNotes
No ratings yet
HiveOS HiveManager ReleaseNotes
12 pages
6 Transactions
No ratings yet
6 Transactions
26 pages
MTK Otp
No ratings yet
MTK Otp
31 pages
Cyclic Redundancy Check (CRC)
No ratings yet
Cyclic Redundancy Check (CRC)
3 pages
Lab 4 - Group 13
No ratings yet
Lab 4 - Group 13
18 pages
System Design Top Interview Questions
No ratings yet
System Design Top Interview Questions
21 pages
Session Tracking in Servlets
No ratings yet
Session Tracking in Servlets
8 pages
22BCE1726 Chat Application
No ratings yet
22BCE1726 Chat Application
6 pages
v400m Brochure 072318
No ratings yet
v400m Brochure 072318
2 pages
Fit Short Questions
No ratings yet
Fit Short Questions
6 pages
Hyperion Essbase
No ratings yet
Hyperion Essbase
16 pages
Ramayan I Sweets Domain
No ratings yet
Ramayan I Sweets Domain
2 pages
System Design Blueprint
No ratings yet
System Design Blueprint
1 page
Pivot Tables In Depth For Microsoft Excel 2016
From Everand
Pivot Tables In Depth For Microsoft Excel 2016
Suljan Qeska
3.5/5 (3)
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet

9 - Analytics Databases

Uploaded by

9 - Analytics Databases

Uploaded by

Analytics Databases

Analytics Databases Background

Note that each column must be stored in the same order.

Bitmap encodings: Run-Length encodings:

For val 8: 1 1 1 0 0 0 0 0 0 0 For val 8: 0, 3, 7

For val 4: 0 0 0 1 0 0 0 0 0 0 For val 4: 3, 1, 6

For val 5: 0 0 0 0 1 0 0 0 0 0 For val 5: 4, 1, 6

For val 2: 0 0 0 0 0 1 1 1 1 0 For val 2: 5, 4, 1

For val 1: 0 0 0 0 0 0 0 0 0 1 For val 1: 9, 1

Allows more data to fit in CPU cache

Additionally, using an analytics database allows us to store our data in a column

Finally, materialized views are a potentially very useful caching mechanism by

You might also like