0% found this document useful (0 votes)

31 views6 pages

Database

Uploaded by

lalujoshi23102001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views6 pages

Database

Uploaded by

lalujoshi23102001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Database

• Database:- Logical interrelated collection of shared data, along with description of data, physically distributed over
a computer network.

Distributed Database
• A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer
network.

A distributed database management system (DDBMS) is the software that manages the DDB and provides an access
mechanism that makes this distribution transparent to the users

A DDBMS mainly classified into two types:

• Homogeneous Distributed database. o management systems

• Heterogeneous Distributed database O management systems

Characteristics
• All sites are interconnected.
• Fragments can be replicated.
• Logically related shared data can be collected.
• Data at each and every site is controlled by the DBMS.
• Each Distributed Database Management System takes part in at least one global application.

Functionality
• Security
• Keeping track of data
• Replicated data management
• System catalog management
• Distributed transaction management
• Distributed database recovery

Homogeneous DDBMS
• In a homogeneous distributed database all sites have identical software and are aware of each other and agree to
cooperate in processing user requests.
• The homogeneous system is much easier to design and manage
• The operating system used, at each location must be same or compatible.
• The database application (or DBMS) used at each location must be same or compatible.

Heterogeneous DDBMS
• In a heterogeneous distributed database different sites may use different schema and software.
• In heterogeneous systems, different nodes may have different hardware & software and data structures at various
nodes or locations are also incompatible.
• Different computers and operating systems, database applications or data models may be used at each of the
locations.

• On heterogeneous system, translations are required to allow communication between different sites (or DBMS).

• The heterogeneous system is often not technically or economically feasible. In this system, a user at one location
may be able to read but not update the data at another location.

Advantages
• Less danger of a single-point failure. When one of the computers fails, the workload is picked up by other
workstations.

• Data are also distributed at multiple sites.

• The end user is able to access any available copy of the data, and an end user's request is processed by any
processor at the data location.

• Improved communications. Because local sites are smaller and located closer to customers.

• Reduced operating costs. It is more cost- effective to add workstations to a network than to update a mainframe
system.

• Faster data access, faster data processing.

• A distributed database system spreads out the systems workload by processing data at several sites.

Disadvantages
• Complexity of management and control.
• Applications must recognize data location, and they must be able to stitch together data from various sites.
• Security.
• Increased storage and infrastructure requirements.
• Multiple copies of data has to be at different sites, thus an additional disk storage space will be required.
• The probability of security lapses increases when data are located at multiple sites.

What is Parallel database...??

• A parallel database system is to improve performance through parallelization of various operations, such as
loading data, building indexes and evaluating queries.
• The distribution is solely done on the bases of performance.
• Parallel databases improve processing and input/output speeds by using multiple CPUs and disks in parallel.
• Many operations are performed simultaneously
• Data may be stored in a distributed fashion.

Data fragmentation
• Fragmentation is a process of division or the mapping of the tables based on the columns and rows of data into
the smallest unit of data.
• Data that has broken down is still possible to be combined again with the intention to complete the data collection
using fragmentation.
• Fragmentation is a database server feature that allows you to control where data is stored at the table level.
• Fragmentation enables you to define groups of rows or index keys within a table.

Replication
• Replication is that we store several copies of a relation or relation fragment. An entire relation can be replicated at
one or more sites.
• Similarly, one or more fragments of a relation can be replicated at other sites.
• For example, if a relation R is fragmented into R1,R2, and R3, there might be just one copy of R1, whereas R2 is
replicated at two other sites and R3 is replicated at all sites.

Two Fold Replication

The motivation for replication is twofold:

1. Increased Availability of Data: If a site that contains a replica goes down, we can find the same data at other sites.
Similarly, if local copies of remote relations are available, we are less vulnerable to failure of communication links.
2. Faster Query Evaluation: Queries can execute faster by using a local copy of a relation instead of going to a remote
site.

Distributed Transaction
• In a distributed DBMS, a given transaction is submitted at some one site, but it can access data at other sites as
well.
• When a transaction is submitted at some site, the transaction manager at that site breaks it up into a collection of
one or more sub-transactions that execute at different sites, submits them to transaction managers at the other
sites, and coordinates their activity.
• Distributed Concurrency Control: How can locks for objects stored across several sites be managed?
• Distributed Recovery: Transaction atomicity must be ensured when a transaction commits, all its actions, across all
the sites at which it executes, must persist. Similarly, when a transaction aborts, none of its actions must be
allowed to persist.

Distributed Concurrency Control

• The choice of technique determines which objects are to be locked. When locks are obtained and released is
determined by the concurrency control protocol. We now consider how lock and unlock requests are implemented
in a distributed environment. Lock management can be distributed across sites in many ways.
• Centralized: A single site is in charge of handling lock and unlock requests for all objects.
• Primary Copy: One copy of each object is designated the primary copy. All requests to lock or unlock a copy of this
object are handled by the lock manager at the site where the primary copy is stored, regardless of where the copy
itself is stored.
• Fully Distributed : Requests to lock or unlock a copy of an object stored at a site are handled by the lock manager at
the site where the copy is stored.

DISTRIBUTED RECOVERY
• Recovery in a distributed DBMS is more complicated than in a centralized DBMS for the following reasons:
• New kinds of failure can arise : Failure of communication links and failure of a remote site at which a sub-
transaction is executing.
• Either all sub-transactions of a given transaction must commit or none must commit, and this property must be
guaranteed despite any combination of site and link failures. This guarantee is achieved using a commit protocol.

Concepts Of Locks
• A lock is used when multiple users need to access a database concurrently. This prevents data from being
corrupted or invalidated when multiple users try to write to the database.
• Any single user can only modify those database records (that is, items in the database) to which they have applied
a lock that gives them exclusive access to the record until the lock is released. Locking not only provides exclusivity
to write but also prevents (or controls) reading of unfinished. modifications.

Byzantine General's Problem

The Problem: "Several divisions of the Byzantine army are camped outside an enemy city, each division commanded by
its own general. After observing the enemy, they must decide upon a common plan of action. Some of the generals
may be traitors, trying to prevent the loyal generals from reaching agreement."

Goal:
• All loyal generals decide upon the same plan of action.
• A small number of traitors cannot cause the loyal generals to adopt a bad plan.
• The paper considers a slightly different version from the standpoint of one general (i.e. process) and multiple
lieutenants.

Goal:
• All loyal lieutenants obey the same order.
• If the commanding general is loyal, the every loyal lieutenant obeys the order he sends.

Hadoop
• Hadoop is an open-source software framework for storing data and running applications on clusters of commodity
hardware.
• It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually
limitless concurrent tasks or jobs.
• The core of Apache Hadoop consists of a storage part (HDFS) and a processing part (MapReduce).
• Developer - Apache Software Foundation
• Written in Java

Benefits
• Computing power - Distributed computing model ideal for big data
• Flexibility - Store any amount of any kind of data.
• Fault Tolerance - If a node goes down, jobs are automatically redirected to other nodes. And it automatically stores
multiple copies/replicas of all data.
• Low Cost - The open-source framework is free and uses commodity hardware to store large quantities of data.
• Scalability System can be grown easily by adding more nodes.

HDFS Goals
• Detection of faults and automatic recovery.
• High throughput of data access rather than low latency.
• Provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster.
• Write-once-read-many access model for files.
• Applications move themselves closer to where the data is located.
• Easily portable.

Some Nomenclature
• A Rack is a collection of nodes that are physically stored close together and are all on the same network.
• A Cluster is a collection of racks.
• NameNode Manages the files system namespace and regulates access to clients. There is a single NameNode for a
cluster.
• DataNode - Serves read, write requests, and performs block creation, deletion, and replication upon instruction
from NameNode.
• A file is split in one or more blocks and a set of blocks are stored in DataNodes.
• A Hadoop block is a file on the underlying file system. Default size 64 MB. All blocks in a file except the last block are
the same size.

Replica Management
• The NameNode keeps track of the rack id each Data Node belongs to.
• The default replica placement policy is as follows:
• One third of replicas are on one node
• Two thirds of replicas (including the above) are on one rack
• The other third are evenly distributed across the remaining racks.
• This policy improves write performance without compromising data reliability or read performance.
• HDFS tries to satisfy a read request from a replica that is closest to the reader.

NameNode
• Stores the HDFS namespace
• Record every change to file system metadata in a transaction log called EditLog
• The namespace, including the mapping of blocks to files and file system properties, is stored in a file called FsImage
• Both EditLog and FsImage are stored on the NameNode's local file system
• Keeps an image of the namespace and file blockmap in memory

• On startup

• Reads FsImage and EditLog from the disk

• Applies all transactions from the EditLog to the in-memory copy of FsImage
• Flushes the modified FsImage onto the disk
• This is called checkpointing.

• Checkpointing only occurs when the NameNode starts up

• Currently no checkpointing after startup
• After checkpointing, the NameNode enters safemode

Safemode:
• Replication of data blocks does not occur in safemode
• Receives Heartbeat and Blockreport from DataNodes
• Blockreport contains list of data blocks at a DataNode
• Each block has a specified minimum number of replicas
• A block is considered safely replicated when the minimum number of replicas has checked in with the NameNode.
• After a configurable percentage of safely replicated data blocks check in, the NameNode exits safemode.
• Replicates all blocks that were not safely replicated.

DataNode
• Stores HDFS data in files in its local file system
• Has no knowledge about HDFS files
• Stores each HDFS block in a separate file
• Stores files in subdirectories instead of one single directory
• On startup
• Scans through local file system
• Generates a list of all HDFS data blocks
• Sends the report to the NameNode
• This is called the Blockreport

Staging
• A client request to create a file does not reach the NameNode immediately
• Initially, the client caches file data into a temporary local file
• Once the local file has data over one HDFS block size, the NameNode is contacted
• The NameNode inserts the file name into the FS and allocates a data block to it
• It replies with the identity of the DataNode and the destination data block
• It also sends a list of the DataNodes replicating the block.
• The client then flushes the block of data to the DataNode.
• When a file is closed, the remaining data is also flushed to the DataNode
• It then tells the NameNode that the file is closed
• The NameNode commits the file creation operation into a persistent store.

Replication Pipelining
• The client sends the data block to the DataNode in small portions
• The DataNode writes each portion to its local filesystem
• It then passes on the portion to another DataNode for replication as determined by the NameNode
• Each DataNode, on receiving the portion, writes it to their filesystem and passes it to the next Data Node
• This continues till it reaches the last DataNode holding a replica of the data block.
HDFS:
• The Hadoop Distributed File System (HDFS) is the file system component of Hadoop. It is designed to store very
large data sets (1) reliably, and to stream those data sets (2) at high bandwidth to user applications. These are
achieved by replicating file content on multiple machines(DataNodes).
• HDFS is a block-structured file system: Files broken into blocks of 128MB (per-file configurable).
• A file can be made of several blocks, and they are stored across a cluster of one or more machines with data storage
capacity.
• Each block of a file is replicated across a number of machines, To prevent loss of data.

Distributed Hash Table

Definition:
• It is a class of decentralized distributed system that provide lookup service similar to hash
table (key, value pair).
• Responsibility for maintaining the mapping from keys to values is distributed among the
nodes, in such a way that a change in the set of participants causes a minimal amount of
disruption.

CoCu-03 IT-020-3-2013
No ratings yet
CoCu-03 IT-020-3-2013
14 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
60 pages
Distributed Database System
No ratings yet
Distributed Database System
5 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
25 pages
Distributed Database
100% (1)
Distributed Database
24 pages
DDBS Unit 1
No ratings yet
DDBS Unit 1
11 pages
Distributed Database: Database Storage Devices CPU Database Management System Computers Network
No ratings yet
Distributed Database: Database Storage Devices CPU Database Management System Computers Network
9 pages
Distributed Databases: Not Just A Client/server System
No ratings yet
Distributed Databases: Not Just A Client/server System
43 pages
Unit V
No ratings yet
Unit V
22 pages
Distributed Databases: Not Just A Client/server System
No ratings yet
Distributed Databases: Not Just A Client/server System
43 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
30 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
Adbms
No ratings yet
Adbms
70 pages
Distributed Database Vs Conventional Database
50% (2)
Distributed Database Vs Conventional Database
4 pages
DistributedDatabases 3
No ratings yet
DistributedDatabases 3
14 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Assignment 1st DDB
No ratings yet
Assignment 1st DDB
9 pages
Practical No. 1: Aim: Study About Distributed Database System. Theory
No ratings yet
Practical No. 1: Aim: Study About Distributed Database System. Theory
22 pages
DBMS
No ratings yet
DBMS
17 pages
Distributed Database System
No ratings yet
Distributed Database System
4 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
No ratings yet
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
24 pages
Distributed Database: Database Database Management System Storage Devices CPU Computers Network
No ratings yet
Distributed Database: Database Database Management System Storage Devices CPU Computers Network
15 pages
Databases in A Distributed Environment
No ratings yet
Databases in A Distributed Environment
4 pages
Distributed DB
No ratings yet
Distributed DB
4 pages
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
No ratings yet
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
67 pages
DDB Slides
No ratings yet
DDB Slides
67 pages
Oral Questions 2021 - Database
No ratings yet
Oral Questions 2021 - Database
9 pages
DDBS Lec1
No ratings yet
DDBS Lec1
20 pages
ADS Chapter 7 Distributed Database
No ratings yet
ADS Chapter 7 Distributed Database
16 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
Distributed Database - Unit 5
No ratings yet
Distributed Database - Unit 5
4 pages
DDBMS
No ratings yet
DDBMS
44 pages
Unit 5
No ratings yet
Unit 5
17 pages
Module 3 ADS
No ratings yet
Module 3 ADS
17 pages
Unit V NoSQL Databases
No ratings yet
Unit V NoSQL Databases
124 pages
DDB Unit 1-5
No ratings yet
DDB Unit 1-5
190 pages
Unit 5
No ratings yet
Unit 5
28 pages
Distributed Database I
No ratings yet
Distributed Database I
20 pages
Module 1
No ratings yet
Module 1
24 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
DDBMS (3,4 & 14)
No ratings yet
DDBMS (3,4 & 14)
11 pages
ADBMS
No ratings yet
ADBMS
84 pages
DISTRIBUTED DATABASES Presentation
No ratings yet
DISTRIBUTED DATABASES Presentation
13 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
Chapter 6
No ratings yet
Chapter 6
45 pages
Unit 3
No ratings yet
Unit 3
62 pages
Distributed Database: Source
No ratings yet
Distributed Database: Source
19 pages
Ddbms Notes
No ratings yet
Ddbms Notes
21 pages
Unit 5 DBMS LNU
No ratings yet
Unit 5 DBMS LNU
21 pages
Unit 5
No ratings yet
Unit 5
23 pages
Unit 4 DBMS
No ratings yet
Unit 4 DBMS
15 pages
Week 2 Parallel and Distributed Database
No ratings yet
Week 2 Parallel and Distributed Database
7 pages
Distributed Database Systems: January 2002
No ratings yet
Distributed Database Systems: January 2002
25 pages
Data Communication Basics CH 7
No ratings yet
Data Communication Basics CH 7
27 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
NEW Playstation 3 Slim Repair Guide
No ratings yet
NEW Playstation 3 Slim Repair Guide
6 pages
Dell m620 Owners
No ratings yet
Dell m620 Owners
140 pages
Homework Exercise (Unit 6)
No ratings yet
Homework Exercise (Unit 6)
2 pages
Demand Notice and Plaint... CBM Official
No ratings yet
Demand Notice and Plaint... CBM Official
12 pages
Open Source Development Boards
No ratings yet
Open Source Development Boards
20 pages
ISCOM5208 User Manual 201004
No ratings yet
ISCOM5208 User Manual 201004
18 pages
Class 8 Chapter 2
No ratings yet
Class 8 Chapter 2
4 pages
Acs580 p20
No ratings yet
Acs580 p20
1 page
Cambridge International AS & A Level: Computer Science 9608/12
No ratings yet
Cambridge International AS & A Level: Computer Science 9608/12
16 pages
Radeon R7 370 Gaming 4G: Graphics Card
No ratings yet
Radeon R7 370 Gaming 4G: Graphics Card
1 page
Ameca Set-Up Guide A5 v6 2022 (High)
No ratings yet
Ameca Set-Up Guide A5 v6 2022 (High)
30 pages
SimplyNUCProductBrief NUC7i7DNFE
No ratings yet
SimplyNUCProductBrief NUC7i7DNFE
2 pages
Credit Card Data - Json
50% (2)
Credit Card Data - Json
2 pages
Vlsi Rec Merged Merged
No ratings yet
Vlsi Rec Merged Merged
35 pages
Template Set - Asset Management
No ratings yet
Template Set - Asset Management
19 pages
Evolution of Graphics API: Sujal Bista CMSC 828V
No ratings yet
Evolution of Graphics API: Sujal Bista CMSC 828V
26 pages
Atmega32A DataSheet Complete DS40002072A 9
No ratings yet
Atmega32A DataSheet Complete DS40002072A 9
15 pages
4108 Network Topology - E0876100I90DM002
No ratings yet
4108 Network Topology - E0876100I90DM002
26 pages
3M Grafoplast TRASP Catalog
No ratings yet
3M Grafoplast TRASP Catalog
20 pages
Server Maintenance A Practical Guide - Hypertec SP
0% (1)
Server Maintenance A Practical Guide - Hypertec SP
2 pages
FRST - 24-06-2024 14.13.36
No ratings yet
FRST - 24-06-2024 14.13.36
25 pages
00.real-Time Systems Study Guide
No ratings yet
00.real-Time Systems Study Guide
25 pages
OCR31x Toolkit User Guide V1
No ratings yet
OCR31x Toolkit User Guide V1
7 pages
Catálogo Bticino Antillon Tableros y Flipones 1
No ratings yet
Catálogo Bticino Antillon Tableros y Flipones 1
6 pages
Vtu 4TH Sem Cse Microprocessors Notes 10CS45
95% (22)
Vtu 4TH Sem Cse Microprocessors Notes 10CS45
125 pages
DVP - Special Register
No ratings yet
DVP - Special Register
12 pages
L13 - Modern Processors
No ratings yet
L13 - Modern Processors
19 pages
DS-3E0105P-E MB Datasheet 20240307
No ratings yet
DS-3E0105P-E MB Datasheet 20240307
3 pages
LM24-A200 Datasheet20231211
No ratings yet
LM24-A200 Datasheet20231211
1 page

Database

Uploaded by

Database

Uploaded by

Database

A DDBMS mainly classified into two types:

• Heterogeneous Distributed database O management systems

• Data are also distributed at multiple sites.

• Faster data access, faster data processing.

What is Parallel database...??

Two Fold Replication

Distributed Concurrency Control

Byzantine General's Problem

• Reads FsImage and EditLog from the disk

• Checkpointing only occurs when the NameNode starts up

Distributed Hash Table

You might also like