0% found this document useful (0 votes)

27 views37 pages

02 DistributedDataManagement

Uploaded by

silvshootss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views37 pages

02 DistributedDataManagement

Uploaded by

silvshootss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Distributed Data

Management
Big Data Management

1
Knowledge objectives
1. Give a definition of Distributed System
2. Enumerate the 6 challenges of a Distributed System
3. Give a definition of Distributed Database
4. Explain the different transparency layers in DDBMS
5. Identify the requirements that distribution imposes on the ANSI/SPARC architecture
6. Draw a classical reference functional architecture for DDBMS
7. Enumerate the 8 main features of Cloud Databases
8. Explain the difficulties of Cloud Database providers to have multiple tenants
9. Enumerate the 4 main problems tenants/users need to tackle in Cloud Databases
10. Distinguish the cost of sequential and random access
11. Explain the difference between the cost of sequential and random access
12. Distinguish vertical and horizontal fragmentation
13. Recognize the complexity and benefits of data allocation
14. Explain the benefits of replication
15. Discuss the alternatives of a distributed catalog

2
Understanding Objectives
• Decide when a fragmentation strategy is correct

3
Distributed System

Distributed DBMS

Cloud DBMS

Distributed Systems

4
Distributed system
“One in which components located at networked computers communicate
and coordinate their actions only by passing messages.”
G. Coulouris et al.
• Characteristics:
• Concurrency of components
• Independent failures of components
• Lack of a global clock
Network

5
Challenges of distributed systems
• Openness
• Scalability
• Quality of service
• Performance/Efficiency
• Reliability/Availability
• Confidentiality
• Concurrency Network

• Transparency
• Heterogeneity of components

6
Scalability
Cope with large workloads
• Scale up
• Scale out

• Use: Network
• Automatic load-balancing

• Avoid:
• Bottlenecks
• Unnecessary communication
• Peer-to-peer

7
Performance/Efficiency
Efficient processing
• Minimize latencies
• Maximize throughput

• Use
• Parallelism Network
• Network optimization
• Specific techniques

8
Reliability/Availability
a) Keep consistency
b) Keep the system running
• Even in the case of failures

• Use
Network
• Replication
• Flexible routing
• Heartbeats
• Automatic recovery

9
Concurrency
Share resources as much as possible

• Use
• Consensus Protocols

Network
• Avoid
• Interferences
• Deadlocks

10
Transparency
a) Hide implementation (i.e., physical) details to the users
b) Make transparent to the user all the mechanisms to solve the other
challenges

Network

11
Further objectives
• Use
• Platform-independent software

• Avoid
• Complex configurations
• Specific hardware/software Network

12
Distributed System

Distributed DBMS

Cloud DBMS

Distributed Database Systems

13
Distributed database
“A Distributed DataBase (DDB) is an integrated collection of databases that is physically
distributed across sites in a computer network. A Distributed DataBase Management
System (DDBMS) is the software system that manages a distributed database such that
the distribution aspects are transparent to the users.”
Encyclopedia of Database Systems

Network Network

14
Transparency layers (I)
• Fragmentation transparency
• The user must not be aware of the existence of different fragments
• Replication transparency
• The user must not be aware of the existing replicas
• Network transparency
• Data access must be independent regardless where data is located
• Each data object must have a unique name
• Data independency at the logical and physical level must be guaranteed
• Inherited from centralized DBMSs (ANSI SPARC)

15
Transparency layers (II)

16
Classification According to Degree of Autonomy

Autonomy Central Query Update

schema transparency transparency
DDBMS No Yes Yes Yes
T.C. Federated Low Yes Yes Limited
L.C. Federated Medium No Yes Limited
Multi-database High No No No

17
Extended ANSI-SPARC Architecture of Schemas

• Global catalog (Mappings between ESs – GCS and GCS – LCSs)

• Each node has a local catalog (Mappings between LCSi – ISi)
18
Centralized DBMS Functional Architecture

Query Manager

View Security Constraint Query

Manager Manager Checker Optimizer

Execution Manager

Scheduler

Recovery Data Manager

Manager Log
Operating
system Buffer pool
Buffer
Manager (Memory)
File
system

19
Distributed DBMS Functional Architecture
Global Query Manager External

One coordinator
Schema
View Security Constraint Query

GLOBAL CATALOG
Manager Manager Checker Optimizer Global
Conceptual
Schema

Fragment
Global Execution Manager Schema
Allocation
Schema
Global Scheduler

…
Local Query Manager Local
Conceptual
Schema
Many workers

Local Execution Manager

Local

LOCAL CATALOG
LOCAL CATALOG
Internal
Schema
Operating Recovery Data Manager
Manager Log Data Manager
system

File Buffer Buffer pool

system Manager (Memory)

…
20
Distributed System

Distributed DBMS

Cloud DBMS

Cloud Databases

21
Parallel database architectures

D. DeWitt & J. Gray. Figure by D. Abadi

22
Key Features of Cloud Databases
• Scalability
a) Ability to horizontally scale (scale out)
• Quality of service
• Performance/Efficiency
b) Fragmentation: Replication & Distribution
c) Indexing: Distributed indexes and RAM
• Reliability/Availability
• Concurrency Network
d) Weaker concurrency model than ACID
• Transparency
e) Simple call level interface or protocol
• No declarative query language
• Further objectives
f) Flexible schema
• Ability to dynamically add new attributes
g) Quick/Cheap set up
h) Multi-tenancy

23
Multi-tenancy platform problems (provider side)
• Difficulty: Unpredictable load characteristics
• Variable popularity
• Flash crowds
• Variable resource requirements
• Requirement: Support thousands of tenants
a) Maintain metadata about tenants (e.g., activated features)
b) Self-managing
c) Tolerating failures
d) Scale-out is necessary (sooner or later)
• Rolling upgrades one server at a time
e) Elastic load balancing
• Dynamic partitioning of databases

24
Data management problems (tenant side)
I. (Distributed) data design
• Data fragmentation
• Data allocation
• Data replication
II. (Distributed) catalog management
• Metadata fragmentation
• Metadata allocation
• Metadata replication
III. (Distributed) transaction management
• Enforcement of ACID properties
• Distributed recovery system
• Distributed concurrency control system
• Replica consistency
• Latency&Availability vs. Update performance
IV. (Distributed) query processing
• Optimization considering
1) Distribution/Parallelism
• Communication overhead
2) Replication

25
(Distributed) Data Design
Challenge I

26
DDB Design
• Given a DB and its workload, how should the DB be split and allocated to
sites as to optimize certain objective functions
• Minimize resource consumption for query processing

• Two main issues:

• Data fragmentation
• Data allocation
• Data replication

27
Data Fragmentation
• Usefulness
• An application typically accesses only a subset of data
• Different subsets are (naturally) needed at different sites
• The degree of concurrency is enhanced
• Facilitates parallelism
• Fragments can be even defined dynamicaly (i.e., at query time, not at design time)

• Difficulties
• Complicates the catalog management
• May lead to poorer performance when multiple fragments need to be joined
• Fragments likely to be used jointly can be colocated to minimize communication overhead
• Costly to enforce the dependency between attributes in different fragments

28
Fragmentation Correctness
• Completeness
• Every datum in the relation must be assigned to a fragment
• Disjointness
• There is no redundancy and every datum is assigned to only one fragment
• The decision to replicate data is in the allocation phase
• Reconstruction
• The original relation can be reconstructed from the fragments
• Union for horizontal fragmentation
• Join for vertical fragmentation

29
Finding the best fragmentation strategy
• Consider it per table
• Computational cost is NP-hard
• Needed information
• Workload
• Frequency of each query
• Access plan and cost of each query
• Take intermediate results and repetitive access into account
• Value distribution and selectivity of predicates
• Work in three phases
1. Determine primary partitions (i.e., attribute subsets often accessed together)
2. Generate a disjoint and covering combination of primary partitions
3. Evaluate the cost of all combinations generated in the previous phase

30
Data Allocation
• Given a set of fragments, a set of sites on which a number of applications are
running, allocate each fragment such that some optimization criterion is met (subject
to certain constraints)
• It is known to be an NP-hard problem
• The optimal solution depends on many factors
• Location in which the query originates
• The query processing strategies (e.g., join methods)
• Furthermore, in a dynamic environment the workload and access patterns may change
• The problem is typically simplified with certain assumptions
• E.g., only communication cost considered
• Typical approaches build cost models and any optimization algorithm can be
adapted to solve it
• Sub-optimal solutions
• Heuristics are also available
• E.g., best-fit for non-replicated fragments

31
Data Replication
• Generalization of Allocation (for more than one location)
• Provides execution alternatives
• Improves availability
• Generates consistency problems
• Specially useful for read-only workloads
• No synchronization required

32
(Distributed) Catalog
Management
Challenge II

33
DDBMS Catalog Characteristics
External
• Fragmentation Schema

• Global metadata

GLOBAL CATALOG
Global
• External schemas Conceptual
• Global conceptual schema Schema
• Fragment schema Fragment
• Allocation schema Schema
• Local metadata Allocation
• Local conceptual schema Schema

• Physical schema
• Allocation Local
• Global metadata in the coordinator node Conceptual
• Local metadata in the workers Schema

• Replication Local

LOCAL CATALOG
Internal
a) Single-copy (Coordinator node) Schema
• Single point of failure
• Poor performance (potential bottleneck)
b) Multi-copy (Mirroring, Secondary node)
• Requires synchronization

34
Closing

35
Summary
• Distributed Systems
• Distributed Database Systems
• Distributed Database Systems Architectures
• Cloud Databases
• Distributed Database Design
• Fragmentation
• Kinds
• Characteristics
• Allocation
• Replication
• Distributed Catalog

36
References
• D. DeWitt & J. Gray. Parallel Database Systems: The future of High
Performance Database Processing. Communications of the ACM, June
1992
• N. J. Gunther. A Simple Capacity Model of Massively Parallel Transaction
Systems. CMG National Conference, 1993
• L. Liu, M.T. Özsu (Eds.). Encyclopedia of Database Systems. Springer, 2009
• M. T. Özsu & P. Valduriez. Principles of Distributed Database Systems, 3rd
Ed. Springer, 2011
• G. Coulouris et al. Distributed Systems: Concepts and Design, 5th Ed.
Addisson-Wesley, 2012

ISAPI Developer Guide - Access Control - Face Recognition Terminals - 2022!07!01
No ratings yet
ISAPI Developer Guide - Access Control - Face Recognition Terminals - 2022!07!01
659 pages
Group Disc
No ratings yet
Group Disc
38 pages
Coding BMW Combox (User Photos) PDF
No ratings yet
Coding BMW Combox (User Photos) PDF
23 pages
Namecheap Order 72988261
100% (1)
Namecheap Order 72988261
1 page
Informatica Power Center 8.6.1 - Creating Repository Contents Log File
100% (1)
Informatica Power Center 8.6.1 - Creating Repository Contents Log File
334 pages
07 DistributedDataManagement
No ratings yet
07 DistributedDataManagement
44 pages
Distributed Data Management and Processing
No ratings yet
Distributed Data Management and Processing
54 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
04 - Distributed DBMSs - Concepts and Design
No ratings yet
04 - Distributed DBMSs - Concepts and Design
72 pages
Lecture 8 - Distributed Databases
No ratings yet
Lecture 8 - Distributed Databases
4 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
No ratings yet
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
32 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
73 pages
Distributed DBMS
No ratings yet
Distributed DBMS
62 pages
Unit I (Distributed Databases)
No ratings yet
Unit I (Distributed Databases)
8 pages
Database MC A
No ratings yet
Database MC A
16 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
Distributed Databases AND Client-Server Architechures
No ratings yet
Distributed Databases AND Client-Server Architechures
73 pages
Distributed Database Design
No ratings yet
Distributed Database Design
52 pages
Lecture 1 Ho
No ratings yet
Lecture 1 Ho
62 pages
Lecture 1 Ho PDF
No ratings yet
Lecture 1 Ho PDF
62 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Subject: Dds (512) Distributed Data Processing
No ratings yet
Subject: Dds (512) Distributed Data Processing
12 pages
ADBS Chapter Seven
No ratings yet
ADBS Chapter Seven
22 pages
Distributed Multimedia & Database System
No ratings yet
Distributed Multimedia & Database System
58 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Distributed Systems
No ratings yet
Distributed Systems
25 pages
Distributed DBMS (Good)
No ratings yet
Distributed DBMS (Good)
58 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Data Communication Basics CH 7
No ratings yet
Data Communication Basics CH 7
27 pages
CSE 453 Slide 1
No ratings yet
CSE 453 Slide 1
46 pages
Module 1
No ratings yet
Module 1
24 pages
Unit 1 - Scsa3008 - Distributed Database and Information
No ratings yet
Unit 1 - Scsa3008 - Distributed Database and Information
23 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
Distributed Databases: Benefits and Issues To Be Considered
No ratings yet
Distributed Databases: Benefits and Issues To Be Considered
25 pages
Distributed Database Design
88% (8)
Distributed Database Design
85 pages
Week 12 - Distributed Databases
No ratings yet
Week 12 - Distributed Databases
37 pages
DDB Unit 1-5
No ratings yet
DDB Unit 1-5
190 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Distributed Databases: CMP-3440 - Database Systems
No ratings yet
Distributed Databases: CMP-3440 - Database Systems
12 pages
DDBS Unit 1
No ratings yet
DDBS Unit 1
11 pages
Distibuted System
No ratings yet
Distibuted System
11 pages
Final
No ratings yet
Final
46 pages
Chapter-7 Distributed Database Systems
No ratings yet
Chapter-7 Distributed Database Systems
40 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
Distributed Database Chapter 1 Modified
No ratings yet
Distributed Database Chapter 1 Modified
47 pages
Publication 4 2259 1575
No ratings yet
Publication 4 2259 1575
6 pages
Distributed Database MID Notes
No ratings yet
Distributed Database MID Notes
19 pages
Distributed Database Design
100% (3)
Distributed Database Design
86 pages
Lecture 1
No ratings yet
Lecture 1
46 pages
DragonFly BSD System Design and Administration: Definitive Reference for Developers and Engineers
From Everand
DragonFly BSD System Design and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
From Everand
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Podman Essentials: Definitive Reference for Developers and Engineers
From Everand
Podman Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Containerd in Practice: Definitive Reference for Developers and Engineers
From Everand
Containerd in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dissertation Cad Cam
75% (4)
Dissertation Cad Cam
7 pages
SPREADSHEET KELOMPOK 6. BAHASA INGGRIS PROFESI - Id.en
No ratings yet
SPREADSHEET KELOMPOK 6. BAHASA INGGRIS PROFESI - Id.en
11 pages
BAd Block MGMT in Nand
No ratings yet
BAd Block MGMT in Nand
7 pages
Lesson08-SQL Functions, Subqueries, and Joins
No ratings yet
Lesson08-SQL Functions, Subqueries, and Joins
7 pages
Forge-Data Log
No ratings yet
Forge-Data Log
295 pages
Esther's 2025 Portfolio
No ratings yet
Esther's 2025 Portfolio
15 pages
Sangfor HCI V6.9.0 - Configuration Guide - Cross - Cluster - Migrate&upgrade
No ratings yet
Sangfor HCI V6.9.0 - Configuration Guide - Cross - Cluster - Migrate&upgrade
14 pages
IMONST2 2023 Primary Solutions and Discussion
100% (1)
IMONST2 2023 Primary Solutions and Discussion
14 pages
Stephanie Kondogonis Resume 2022
No ratings yet
Stephanie Kondogonis Resume 2022
4 pages
Voucher-PATIENCE WIFI-30J-up-675-04.14.24
No ratings yet
Voucher-PATIENCE WIFI-30J-up-675-04.14.24
3 pages
Question Type: True/False
No ratings yet
Question Type: True/False
28 pages
Q3 Module 3 Comprog 2
No ratings yet
Q3 Module 3 Comprog 2
14 pages
Rounding To 1 Decimal Place
No ratings yet
Rounding To 1 Decimal Place
1 page
OneDrive Instructor-Led End User Training FINAL
No ratings yet
OneDrive Instructor-Led End User Training FINAL
45 pages
Base Programming Ref Sheet
No ratings yet
Base Programming Ref Sheet
4 pages
Angular
No ratings yet
Angular
39 pages
Python 6 Weeks
No ratings yet
Python 6 Weeks
2 pages
Gmail - Esubmission Validation Report Q2
No ratings yet
Gmail - Esubmission Validation Report Q2
2 pages
Operating System DEMO
No ratings yet
Operating System DEMO
14 pages
Skanect 3D Scanning Quickstart Guide: in A+D 235A
No ratings yet
Skanect 3D Scanning Quickstart Guide: in A+D 235A
3 pages
Introduction To CUDA C
No ratings yet
Introduction To CUDA C
67 pages
Functional Programming in R 4 - Second Edition Thomas Mailund PDF Download
No ratings yet
Functional Programming in R 4 - Second Edition Thomas Mailund PDF Download
47 pages
Madisen K. Michel - Resume
No ratings yet
Madisen K. Michel - Resume
1 page
Programming Assignment
No ratings yet
Programming Assignment
110 pages
C Programming (Assignment) !
No ratings yet
C Programming (Assignment) !
8 pages
E-Paarvai Mini Case - Final
No ratings yet
E-Paarvai Mini Case - Final
4 pages

02 DistributedDataManagement

Uploaded by

02 DistributedDataManagement

Uploaded by

Distributed Data

Distributed Database Systems

Autonomy Central Query Update

• Global catalog (Mappings between ESs – GCS and GCS – LCSs)

View Security Constraint Query

Recovery Data Manager

Local Execution Manager

File Buffer Buffer pool

D. DeWitt & J. Gray. Figure by D. Abadi

• Two main issues:

You might also like