0% found this document useful (0 votes)

43 views52 pages

06 BigDataAndBigDataDesign

Data Bases

Uploaded by

lolamentosano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views52 pages

06 BigDataAndBigDataDesign

Data Bases

Uploaded by

lolamentosano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Introduction to Big Data

and Big Data Design

BDA GCED

1
Knowledge objectives
1. Define the impedance mismatch
2. Identify applications handling different kinds of data
3. Name four different kinds of NOSQL systems
4. Explain three consequences of schema variability
5. Explain the consequences of physical independence
6. Explain the two dimensions to classify NOSQL systems according to
how they manage schema
7. Explain the three elements of the RUM conjecture

2
Understanding objectives
1. Decide whether two NOSQL designs have a more or less explicit/fix
schema

3
Application objectives
1. Given a relatively small UML conceptual diagram, translate it into a
logical representation of data considering flexible schema
representation

4
From SQL to NOSQL
The need for alternative families of database technologies

5
Law of the instrument
“Over-reliance on a familiar tool.”
Wikipedia

If the only tool you have is a hammer, everything looks like a nail.
A. Maslow

• Golden hammer anti-pattern: “A familiar technology or concept applied

obsessively to many software problems.”

6
Law of the Relational Database
• Since we only know relational databases, every time we want to model a
new domain we’ll automatically think on how to represent it as columns
and rows Ireland et al.

Object-relational impedance mismatch is “… one in which a program written

using an object-oriented language uses a relational database for storage.”

7
The end of an architectural era
WEB 2.0 – Write Era Real time communication

Big Data

Semi-structured & Ubiquitous and concurrent

unstructured data

Maria Belen Bianchi

8
Michael Stonebraker and Ugur

RDBMS: why aren’t they enough? Çetintemel. One size fits all: an
idea whose time has come and
gone. ACM Books, p: 441-462,
2019.

RDBMS
• Generic architecture that can be Designed for consistency and
tuned according to the needs: integrity, making them excellent
• Mainly-write OLTP Systems for applications with complex
• Normalization queries and transaction processing.
• Indexes: B+, Hash
• Joins: BNL, RNL, Hash-Join, Merge We need to deal with massive
Join
reads and writes at the same time!
• Read-only DW (OLAP) Systems
• Denormalized data
• Indexes: Bitmaps Data fragmentation
• Joins: Star-join Data replication
• Materialized views
Distributed RDBMS
9
Distributed RDBMS: limitations
• ACID Transactions
• Relational databases follow ACID (Atomicity, Consistency, Isolation, Durability) properties
to ensure data integrity. Maintaining them across distributed nodes introduces
complexity and performance bottlenecks
• Locking and Contention
• To enforce data integrity, relational databases use locking mechanisms to manage
simultaneous transactions, which is very costly in a distributed environment.
• Schema Rigidness
• Relational databases rely on predefined schemas, which makes it difficult to adapt to
changes in the data structure without significant overhead
• Joins across Nodes
• Queries in relational models often involve complex joins between tables. Distributing
tables across multiple nodes makes these joins inefficient and costly

We need alternative data models and architectures! 10

NOSQL
Different problems entail different solutions

11
New challenges for data management

VOLUME Verac¿ty

Velocity
Variability
Variety
12
NOSQL Goals
• Schemaless: Allow flexible (even runtime) schema definition
• Reliability / availability: Keep delivering service even if its software or
hardware components fail
• Scalability: Continuously evolve to support a growing amount of tasks
• Efficiency: How well the system performs, usually measured in terms of
response time (latency) and throughput (bandwidth)

13
Aggregate data models
• The relational model with simple record structures, referential integrity,
transactions … is not suitable for distribution -> not designed to run on
clusters
• We need to operate on data in units that have a more complex structure
(complex records)
• Think of a complex record as a structure that allows lists and other
record structures to be nested inside it
• These complex records are sometimes referred to as aggregates

14
Aggregate data models
• Aggregate orientation fits well with scaling out (i.e., use lots of small
machines in a cluster)
• The aggregate is a natural unit for distribution
• The aggregate makes a natural unit for replication and sharding

• Key-value, document, and column-family databases all make use of

this more complex record

15
Example
Aggregate dataof Relations
models: Orders and Aggregates (1)

Relational database
perspective: no aggregates

source: Martin Fowler, NoSQL Distilled 16

Example
Aggregate dataof Relations
models: Orders and Aggregates (2)

Relational data model:

Everything is properly
normalized

source: Martin Fowler, NoSQL Distilled 17

Example
Aggregate dataof Relations
models: Orders and Aggregates (3)

Two main aggregates:

Customer and Order

source: Martin Fowler, NoSQL Distilled 18

Example
Aggregate dataof Relations
models: Orders and Aggregates (4)
• There are two main aggregates: customer and
order
• The customer contains a list of billing
addresses and a name; the order contains a list
of order items, a shipping address, and a list
of payments. Each payment contains a billing
address for that payment.
• A single logical address record appears three
times in the example data, but, instead of using
IDs, it is treated as a value and copied each time.
• The link between the customer and the order is
not an aggregation.
• The product name is part of the order to
minimize the number of aggregates we access
during a data interaction

source: Martin Fowler, NoSQL Distilled 19

Example
Aggregate dataof Relations
models: Orders and Aggregates (4)
An alternative way of
aggregating data!

source: Martin Fowler, NoSQL Distilled 20

Aggregate data models: Orders
Consequences of Aggregate Orientation (1)
• The fact that an order consists of order items, a shipping address, and a payment can
be expressed in the relational model in terms of foreign key relationships but there
is nothing to distinguish relationships that represent aggregations from those
that don’t. As a result, the database can’t use the knowledge about an aggregate
structure to help it store and distribute the data
• Aggregation is however, not a logical data property: It is all about how the data is
being used by applications -- a concern that is often outside the boundary of data
modeling
• Also, an aggregate structure may help with some data interactions but be an
obstacle for others (in our example, to get to product sales history, you’ll have to
dig into every aggregate in the database)
• The clinching reason for aggregate orientation is that it helps greatly with running
on a cluster!

source: Martin Fowler, NoSQL Distilled 21

Aggregate data models: Orders
Consequences of Aggregate Orientation (2)
Aggregates have an important consequence for transactions
• Relational databases allow you to manipulate any combination of rows from any
tables in a single (ACID) transaction (i.e., Atomic, Consistent, Isolated, and Durable)
• It’s often said that NoSQL databases don’t support ACID transactions and thus
sacrifice consistency. This is however not true for graph databases (which are, as
relational database, aggregate-agnostic)
• In general, its true that aggregate-oriented databases don’t have ACID transactions
that span multiple aggregates (rows). Instead, they support atomic manipulation of
a single aggregate (row) at a time: This means that if we need to manipulate
multiple aggregates in an atomic way, we have to manage that ourselves in the
application code!
• In practice, we find that most of the time we are able to keep our atomicity needs to
within a single aggregate; indeed, that is part of the consideration for deciding
how to divide up our data into aggregates

source: Martin Fowler, NoSQL Distilled 22

Different data models
Relational (OLTP) Multidimensional (OLAP) Key-Value

Wide-Column Document Graph

(Column-family)

By Aina Montalban, inspired by Daniel G. McCreary and Ann M. Kelly

23
Michael Stonebraker and Ugur

One size does not fit all Çetintemel. One size fits all: an
idea whose time has come and
gone. ACM Books, p: 441-462,
2019.

Different problems entail different solutions

• OLTP • Semantic Web and Open Data

• VoltDB, HANA, Hekaton • GraphDB, Stardog, Virtuoso
• Data warehousing and OLAP • Text
• Vertica, Red Shift, Sybase IQ • ElasticSearch, Google File Syst.
• Scientific data • Documents (XML, JSON)
• R, Matlab, SciDB • MongoDB, CouchDB
• Stream processing
• Spark Streaming, Flink, Storm

24
Evolution of different data models

R. Angles and C. Gutierrez

25
Schema definition

26
Schema variability
• CREATE TABLE Students(id int, name varchar(50),surname varchar(50),enrolment date);
• INSERT INTO Students (1,‘Sergi’,‘Nadal’,‘01/01/2012’,true,‘Igualada’); WRONG
• INSERT INTO Students (1,‘Sergi’,‘Nadal’,NULL); OK
• INSERT INTO Students (1,‘Sergi’,‘Nadal’,‘01/01/2012’); OK

• Schemaless → INSERT INTO Students (1, {‘Sergi’, ‘Nadal’, ‘01/01/2012’, true});

• Consequences
• Gain flexibility
• Lose semantics (also consistency)
• The data independence principle is lost (!)
• The ANSI / SPARC architecture is not followed → Implicit schema
• Applications can access and manipulate the database internal structures

27
ANSI/SPARC (recap)

Physical independence

External Conceptual Internal

schemas schema schema
Logical independence

28
ANSI/SPARC

Physical independence

External Conceptual Internal

schemas schema schema
Logical independence

29
Database
Database models
models

RELATIONAL NOSQL
• Based on the relational model • No single reference model
• Tables, rows and columns • Key-value, document, stream, graph
• Sets, instances and attributes • Ideally, the schema should be
• Constraints are allowed defined at insertion time and not at
• PK, FK, Check, … definition time (schemaless
• When creating the tables you MUST databases)
specify their schema (i.e., columns • The closer the data model in use
and constraints) looks to the way data is stored
• Data is restructured when brought internally the better (read/write
into memory (impedance throughput)
mismatch)
30
Considered database models
• Relational
city(name, population, region) VALUES (’BCN’, ’2,000,000’, ’CAT’)
• Key-Value
[‘BCN’, ‘2,000,000;CAT’]
• Document
{id:‘BCN’, population:‘2,000,000’, region:‘CAT’}
• Wide-Column (Column-Family)
[‘BCN’, population:{value:’2,000,000’}, region:{value:’CAT’}]
[‘BCN’, all:{value:’2,000,000;CAT’}]
[‘BCN’, all:{population:’2,000,000’;region:’CAT’}]

31
Relevant schema dimensions
Some new models lack of an explicit schema (declared by the user)
• An implicit schema (hidden in the application code) always remains
• May reduce the impedance mismatch

The schema is in the mind

of the developer/program

In the same collection I can’t

have different schemas

32
Key-value and Document
Data Models

33
Key-value and Document Data Models
Key-value and document databases are strongly aggregate-oriented
• In a key-value database, the aggregate is opaque to the database: just some big
blob of bits. The advantage of opacity is that we can store whatever we like in the
aggregate. It is the responsibility of the application to understand what was stored.
Since key-value stores always use primary-key access, they generally have great
performances.
• In contrast, a document database is able to see a structure in the aggregate, but
imposes limits on what we can place in it, defining allowable structures and types.
In return, however, we get more flexibility when accessing data.

34
Ordersdata
Aggregate example (1NF)
models: Orders

Customer CreditCard
Customer CustKey Name Phone CustKey CCNum Expiry
Credit card 1 Ann 234 1 02345 04/28
2 Dan 211 2 01221 05/24
Orders
Order lines (Line items)
…
Orders LineItem

OrderID CustKey Price OrderKey LineNum PriceItem Qty TotPrice

1001 1 $210 1001 03214 $50 3 $150
1002 2 $230
1001 03222 $40 1 $40
1001 04114 $10 2 $20
1002 05512 $50 4 $200
1002 03711 $15 2 $30

35
Key-values: Orders Example

Key Value
1001 03214_$50_3_$150, 03222_$40_1_$40 …

LineItem
In a key-value store, we can only access OrderKey LineNum PriceItem Qty TotPrice
an aggregate by lookup based on its key 1001 03214 $50 3 $150

1001 03222 $40 1 $40

1001 04114 $10 2 $20
1002 05512 $50 4 $200
1002 03711 $15 2 $30

36
The Document
Aggregate data models data model – Orders Example
.json
Orders
OrderID CustKey Price
ID:1001 1001 1 $210
Customer
1002 2 $230
customer: Ann CustKey Name Phone
1 Ann 234
line items:
2 Dan 211

03214 $50 3 $150 LineItem

OrderKey LineNum PriItem Qty TotPrice
03222 $40 1 $40
1001 03214 $50 3 $150
04114 $10 2 $20 1001 03222 $40 1 $40
1001 04114 $10 2 $20
payment details: 1002 05512 $50 4 $200
1002 03711 $15 2 $30
card: Amex
cc number: 12345 CreditCard
expiry: 04/28 CustKey CCNum Expiry
1 02345 04/28
2 01221 05/24

An order, which looks like a single document

37
The Document data model - Characteristics
• Document Structure: Data is stored in formats like JSON, BSON, or XML, making it
easy to represent hierarchical and complex data
• Embedded Data: Allows embedding of related data within a single document to reduce the need
for joins and improve performance
• Schema Flexibility: Supports dynamic, schema-less structures, allowing for easy
storage and retrieval of unstructured or semi-structured data
• Scalability and Reliability: Designed for horizontal scaling, making it easy to distribute and
replicate data across multiple nodes or servers
• Efficient Data Access: Optimized for fast read and write operations, especially for applications with
frequently changing data requirements
• Rich Query Capabilities: Supports complex queries, indexing, and aggregations tailored to
document structures

38
The Column-Family data model – Example
Column-Families are organized in terms of distributed The column-family model can be seen as a
maps
two-level aggregate structure
{ Row-identifier • As with key-value stores, the first key is
"1001" : { Column-family often described as a row identifier,
"profile" : { picking up the aggregate of interest
"customer": "Ann”, Columns • This row aggregate is itself formed of a
"card":"Amex”, map of more detailed values. These
"cc_number":"12345", second-level values are referred to as
"expiry":"04/28"}, columns, each being a key-value pair
"line-items" : • Columns are organized into column
"items": families. Each column has to be part of a
"[[03214,$50,3,$150],…]" single column family (data for a particular
} column family will be usually accessed
"1002" : { together)
"profile" : "…", • Each row identifier (i.e., first-level key) is
"line-items" : "…" unique
},
}

39
The Column-Family data model - Characteristics
• Column Family Structure: Data is organized into rows and columns grouped into column families,
allowing efficient data retrieval
• Sparse Data Handling: Optimized for storing sparse datasets, where rows can have a variable
number of columns, saving storage space
• High Write Performance: Designed for high-throughput write operations, making it suitable for
write-heavy workloads
• Scalability: Supports horizontal scaling across distributed nodes, ideal for handling large volumes of
data
• Efficient Data Access: Queries are optimized to read only the necessary columns within a column
family, improving performance
• Data Locality: Related columns are stored together on disk, making access patterns efficient for
certain types of queries
• Flexible Schema: Allows easy addition of new columns to existing rows without schema changes,
supporting evolving data requirements

40
Design choices
• Denormalization

• Partitioning/Fragmenting
• Horizontal
• Vertical

• Data placement
• Distribution
• Clustering

43
Alternative storage structures

44
The problem is not SQL
• Relational systems are too generic…
• OLTP: stored procedures and simple queries
• OLAP: ad-hoc complex queries
• Documents: large objects
• Streams: time windows with volatile data
• Scientific: uncertainty and heterogeneity
• …but the overhead of RDBMS has nothing to do with SQL
• Low-level, record-at-a-time interface is not the solution

Michael Stonebraker
SQL Databases vs. NoSQL Databases
Communications of the ACM, 53(4), 2010

45
The RUM conjecture
“Designing access methods that set an upper bound for two of the RUM
overheads, leads to a hard lower bound for the third overhead which
cannot be further reduced.”

M. Athanassoulis et al.

46
Example of RUM conjecture
Find(x) Space overhead, must
update the index
x

Updating means
Space overhead (appends),
reordering
hard to find
Find(x)
Update(x) Binary search

x x x sorted
M. Athanassoulis et al.

47
RUM classification space Manos Athanassoulis, Michael S.
Kester, Lukas M. Maas, Radu
Stoica, Stratos Idreos, Anastasia
Ailamaki, Mark Callaghan:
Designing Access Methods: The
RUM Conjecture. EDBT 2016.

LSM M. Athanassoulis et al.

48
Data
Data Storage
Storage

RELATIONAL NOSQL
• Generic architecture that can be • Specific architectures for a specific
tuned according to the needs: need:
• Mainly-write OLTP Systems • Primary indexes
• Normalization • Sequential reads
• Indexes: B+, Hash
• Vertical partitioning
• Joins: BNL, RNL, Hash-Join, Merge
Join • Compression
• Read-only DW Systems • Fixed-size values
• Denormalized data • In-memory processing
• Indexes: Bitmaps
• Joins: Star-join • Very specific and good (but very
• Materialized Views good) in solving a particular problem

49
Different internal structures

B-tree LSM Vertical Partioning

MongoDB, Riak HBase, Cassandra, RocksDB Parquet, Hana

Aina Montalban

50
51
Closing

52
Summary
• NOSQL systems
• Schemaless databases
• Impedance mismatch
• Aggregate data models
• Key-value data models
• Document data models
• Column-family data models

53
References

• Pramod J. Sadalage and Martin Fowler. NoSQL Distilled: A Brief Guide to

the Emerging World of Polyglot Persistence. Addison Wesley. 2013
• M. Stonebraker et al. The End of an Architectural Era (It's Time for a
Complete Rewrite). VLDB, 2007
• R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Record 39(4),
2010
• M. Stonebraker. SQL Databases vs. NoSQL Databases. Communications
of the ACM, 53(4), 2010

Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Student Mark Analysis Using DDL
No ratings yet
Student Mark Analysis Using DDL
14 pages
NoSQL DBs
No ratings yet
NoSQL DBs
46 pages
Unit Ii
No ratings yet
Unit Ii
70 pages
Introduction To Big Data and NoSQL
No ratings yet
Introduction To Big Data and NoSQL
52 pages
Nosql Final
No ratings yet
Nosql Final
50 pages
Unit V Big Data Frameworks
No ratings yet
Unit V Big Data Frameworks
42 pages
Aggregate Data Models Unit 2
No ratings yet
Aggregate Data Models Unit 2
16 pages
Unit 2
No ratings yet
Unit 2
41 pages
Unit II - BDA NEW
No ratings yet
Unit II - BDA NEW
48 pages
Nosql MQP Solution
No ratings yet
Nosql MQP Solution
53 pages
Module 1
No ratings yet
Module 1
69 pages
NoSQL Databases
No ratings yet
NoSQL Databases
52 pages
Unit - I - Nosql
No ratings yet
Unit - I - Nosql
12 pages
2024-Evaluating The Impact of Database and Data Warehouse Technologies On Organizational Performance A Systematic Review
No ratings yet
2024-Evaluating The Impact of Database and Data Warehouse Technologies On Organizational Performance A Systematic Review
105 pages
NOSQL
No ratings yet
NOSQL
55 pages
ADBMS-Module 2
No ratings yet
ADBMS-Module 2
33 pages
Unit 2
No ratings yet
Unit 2
65 pages
(Ebook PDF) Database Administration The Complete Guide To DBA Practices and Procedures (2nd Edition) PDF Download
100% (5)
(Ebook PDF) Database Administration The Complete Guide To DBA Practices and Procedures (2nd Edition) PDF Download
54 pages
Unit II Nosql Data Management
No ratings yet
Unit II Nosql Data Management
57 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
NOSQL
No ratings yet
NOSQL
15 pages
01 BigDataDesign
No ratings yet
01 BigDataDesign
38 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Nosqlmodule 1
100% (1)
Nosqlmodule 1
102 pages
NoSQL Module 1 Part1
No ratings yet
NoSQL Module 1 Part1
13 pages
Singer Touch and Sew 648
No ratings yet
Singer Touch and Sew 648
80 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Aggregate Data Models
100% (1)
Aggregate Data Models
55 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
Aggregate Oriented Database
No ratings yet
Aggregate Oriented Database
3 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Unit 6
No ratings yet
Unit 6
143 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
DMND 1
No ratings yet
DMND 1
8 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
NoSQL - U1
No ratings yet
NoSQL - U1
8 pages
NoSQL M1
No ratings yet
NoSQL M1
48 pages
2.1.2 Data Models
No ratings yet
2.1.2 Data Models
13 pages
Bli 223 Imp Notes 2025
No ratings yet
Bli 223 Imp Notes 2025
45 pages
Intro 2 DB
No ratings yet
Intro 2 DB
126 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
BDA Assignment1 BE6 20
No ratings yet
BDA Assignment1 BE6 20
10 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Statement On Limitations To Operations Related To NetAct Database
No ratings yet
Statement On Limitations To Operations Related To NetAct Database
8 pages
Databases Description 1
No ratings yet
Databases Description 1
8 pages
Unit II No-SQL DB Managment
No ratings yet
Unit II No-SQL DB Managment
33 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
UNIT 2 - Part1
No ratings yet
UNIT 2 - Part1
53 pages
Question Bank
No ratings yet
Question Bank
6 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Big Data
No ratings yet
Big Data
53 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Introduction To Nosql
No ratings yet
Introduction To Nosql
73 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
No SQL
No ratings yet
No SQL
12 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Introduction To Nosql: What Is A Nosql Database Used For?
No ratings yet
Introduction To Nosql: What Is A Nosql Database Used For?
6 pages
BGD Mod 2 QB Solns
No ratings yet
BGD Mod 2 QB Solns
11 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
DBMS Chapter 9
No ratings yet
DBMS Chapter 9
3 pages
Dbms
No ratings yet
Dbms
99 pages
NOSQL Data Management
No ratings yet
NOSQL Data Management
21 pages
" DEEP WEB: Surfacing Hidden Values ": Seminar Report ON by
50% (2)
" DEEP WEB: Surfacing Hidden Values ": Seminar Report ON by
27 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
Sample Paper 14 IP
No ratings yet
Sample Paper 14 IP
9 pages
Term Paper On Database Design
100% (1)
Term Paper On Database Design
6 pages
More Details On Data Models
No ratings yet
More Details On Data Models
23 pages
Unit1 CSE
No ratings yet
Unit1 CSE
16 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
10gen Top 5 NoSQL Considerations
No ratings yet
10gen Top 5 NoSQL Considerations
10 pages
EmployeeProject IT10 241018 221909
No ratings yet
EmployeeProject IT10 241018 221909
14 pages
Ue DB 2023 Solved
No ratings yet
Ue DB 2023 Solved
32 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
8 pages
Chapter 6: Database Design Using The E-R Model: Database System Concepts, 7 Ed
No ratings yet
Chapter 6: Database Design Using The E-R Model: Database System Concepts, 7 Ed
80 pages
1601 PerformanceTuningAndBestPracticesForGoogleBigQueryV2Connector en H2L
No ratings yet
1601 PerformanceTuningAndBestPracticesForGoogleBigQueryV2Connector en H2L
11 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
What Is Data Generalization?
No ratings yet
What Is Data Generalization?
5 pages
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dav 9T3 CS 9
No ratings yet
Dav 9T3 CS 9
8 pages
Big Data - Wikipedia, The Free Encyclopedia
No ratings yet
Big Data - Wikipedia, The Free Encyclopedia
10 pages
Solution Rdbms Worksheet
No ratings yet
Solution Rdbms Worksheet
4 pages
DR - Osama Lab 2 Creating and Deleting Tables
No ratings yet
DR - Osama Lab 2 Creating and Deleting Tables
17 pages
Neo4j Cypher Refcard 3.1
No ratings yet
Neo4j Cypher Refcard 3.1
11 pages
Full Functional Dependency (FFD) - DBMS
No ratings yet
Full Functional Dependency (FFD) - DBMS
1 page
1 Tables: Microsoft Access 2010
No ratings yet
1 Tables: Microsoft Access 2010
2 pages
Faizan Resume
No ratings yet
Faizan Resume
2 pages
Command Summary & Troubleshooting Guide: Solutions Enabler
No ratings yet
Command Summary & Troubleshooting Guide: Solutions Enabler
3 pages

06 BigDataAndBigDataDesign

Uploaded by

06 BigDataAndBigDataDesign

Uploaded by

Introduction to Big Data

and Big Data Design

• Golden hammer anti-pattern: “A familiar technology or concept applied

Object-relational impedance mismatch is “… one in which a program written

Semi-structured & Ubiquitous and concurrent

Maria Belen Bianchi

We need alternative data models and architectures! 10

• Key-value, document, and column-family databases all make use of

source: Martin Fowler, NoSQL Distilled 16

Relational data model:

source: Martin Fowler, NoSQL Distilled 17

Two main aggregates:

source: Martin Fowler, NoSQL Distilled 18

source: Martin Fowler, NoSQL Distilled 19

source: Martin Fowler, NoSQL Distilled 20

source: Martin Fowler, NoSQL Distilled 21

source: Martin Fowler, NoSQL Distilled 22

Wide-Column Document Graph

By Aina Montalban, inspired by Daniel G. McCreary and Ann M. Kelly

Different problems entail different solutions

• OLTP • Semantic Web and Open Data

R. Angles and C. Gutierrez

• Schemaless → INSERT INTO Students (1, {‘Sergi’, ‘Nadal’, ‘01/01/2012’, true});

External Conceptual Internal

External Conceptual Internal

The schema is in the mind

In the same collection I can’t

OrderID CustKey Price OrderKey LineNum PriceItem Qty TotPrice

1001 03222 $40 1 $40

03214 $50 3 $150 LineItem

An order, which looks like a single document

LSM M. Athanassoulis et al.

B-tree LSM Vertical Partioning

MongoDB, Riak HBase, Cassandra, RocksDB Parquet, Hana

• Pramod J. Sadalage and Martin Fowler. NoSQL Distilled: A Brief Guide to

You might also like