Unit1 Ddbms Notes PDF
Unit1 Ddbms Notes PDF
Advantages –
• Since all data is stored at a single location only thus it is easier to access and coordinate data.
• The centralized database has very minimal data redundancy since all data is stored in a single place.
• It is cheaper in comparison to all other databases available.
Disadvantages –
• This database can be easily expanded as data is already spread across different physical locations.
• The distributed database can easily be accessed from different networks.
• This database is more secure in comparison to centralized database.
Disadvantages –
• This database is very costly and it is difficult to maintain because of its complexity.
o In this database, it is difficult to provide a uniform view to user since it is spread across
different physical locations.
The data access time in the case of multiple The data access time in the case of multiple users is
2.
users is more in a centralized database. less in a distributed database.
The management, modification, and backup of The management, modification, and backup of this
3. this database are easier as the entire data is database are very difficult as it is spread across
present at the same location. different physical locations.
This database provides a uniform and complete Since it is spread across different locations thus it is
4.
view to the user. difficult to provide a uniform view to the user.
This database has more data consistency in This database may have some data replications thus
5.
comparison to distributed database. data consistency is less.
The users cannot access the database in case of In distributed database, if one database fails users have
6.
database failure occurs. access to other databases.
Reliability is basically defined as the probability that a system is running at a certain time whereas
Availability is defined as the probability that the system is continuously available during a time interval.
When the data and DBMS software are distributed over several sites one site may fail while other sites
continue to operate and we are not able to only access the data that exist at the failed site and this basically
If data is distributed in an efficient manner, then user requests can be met from local data itself, thus providing a faster
response. On the other hand, in centralized systems, all queries have to pass through the central computer for
• Modular Development
If the system needs to be expanded to new locations or new units, in centralized database systems, the action requires
substantial efforts and disruption in the existing functioning. However, in distributed databases, the work simply
requires adding new computers and local data to the new site and finally connecting them to the distributed system,
The more replicas of, a relation are there, the greater are the chances that the required data is found where the
transaction is executing. Hence, data replication reduces the movement of data among sites and. increases .speed of
processing.
Production databases must be fully managed for regular backups, database optimization, and other common tasks.
With a single large database, these routine tasks can be very difficult to accomplish, if only in terms of the time
window required for completion. Routine table and index optimizations can stretch from hours to days, in some cases
making regular maintenance infeasible. By using the sharding approach, each individual “shard” can be maintained
independently, providing a far more manageable scenario, performing such maintenance tasks in parallel.
The scalability of sharding is apparent and achieved through the distribution of processing across multiple shards and
servers in the network. What is less apparent is the fact that each individual shard database will outperform a single
large database due to its smaller size. By hosting each shard database on its own server, the ratio between memory
and data on disk is properly balanced, thereby reducing disk I/O and maximizing system resources. This results in less
contention, greater join performance, faster index searches, and fewer database locks. Therefore, not only can a
sharded system scale to new levels of capacity, individual transaction performance is benefited as well.
Most database sharding implementations take advantage of low-cost open-source databases and commodity
databases. The technique can also take full advantage of reasonably priced “workgroup” versions of many
commercial databases. Sharding works well with commodity multi-core server hardware, systems that are far less
expensive when compared to high-end, multi-CPU servers, and expensive storage area networks (SANs). The overall
reduction in cost due to savings in license fees, software maintenance, and hardware investment is substantial in some
you could distribute your data across multi nodes using many different system architectures
Shard Memory
CPUs have access to common memory address space via a fast interconnect.
• Each processor has a global view of all the in-memory data structures.
• Each DBMS instance on a processor has to “know” about the other instances.
Shard Disc
All CPUs can access a single logical disk directly via an interconnect, but each has its own private memories.
• Must send messages between CPUs to learn about their current state.
Shard Nothing
Each DBMS instance has its own CPU, memory, and disk. Nodes only communicate with each other via a network.
→ Hard to increase capacity. → Hard to ensure consistency. → Better performance & efficiency.
4. Review of database
Depending upon the usage requirements, there are following types of databases available in the market −
• Centralised database.
• Distributed database.
• Personal database.
• End-user database.
• Commercial database.
• NoSQL database.
• Operational database.
• Relational database.
• Cloud database.
• Object-oriented database.
• Graph database.
(ii).Distributed Database
Just opposite of the centralized database concept, the distributed database has contributions from the common database
as well as the information captured by local computers also. The data is not at one place and is distributed at various
sites of an organization. These sites are connected to each other with the help of communication links which helps
them to access the distributed data easily.
You can imagine a distributed database as a one in which various portions of a database are stored in multiple different
locations(physical) along with the application procedures which are replicated and distributed among various points
in a network.
There are two kinds of distributed database, viz. homogenous and heterogeneous. The databases which have same
underlying hardware and run over same operating systems and application procedures are known as homogeneous
DDB, for eg. All physical locations in a DDB. Whereas, the operating systems, underlying hardware as well as
application procedures can be different at various sites of a DDB which is known as heterogeneous DDB.
(iii).Personal Database
Data is collected and stored on personal computers which is small and easily manageable. The data is generally used
by the same department of an organization and is accessed by a small group of people.
(iv).End User Database
The end user is usually not concerned about the transaction or operations done at various levels and is only aware of
the product which may be a software or an application. Therefore, this is a shared database which is specifically
designed for the end user, just like different levels’ managers. Summary of whole information is collected in this
database.
(v)Commercial Database
These are the paid versions of the huge databases designed uniquely for the users who want to access the information
for help. These databases are subject specific, and one cannot afford to maintain such a huge information. Access to
such databases is provided through commercial links.
A cloud database also gives enterprises the opportunity to support business applications in a software-as-a-service
deployment.
(x)Object-Oriented Databases
An object-oriented database is a collection of object-oriented programming and relational database. There are various
items which are created using object-oriented programming languages like C++, Java which can be stored in relational
databases, but object-oriented databases are well-suited for those items.
An object-oriented database is organized around objects rather than actions, and data rather than logic. For example,
a multimedia record in a relational database can be a definable data object, as opposed to an alphanumeric value.
(xi)Graph Databases
The graph is a collection of nodes and edges where each node is used to represent an entity and each edge describes
the relationship between entities. A graph-oriented database, or graph database, is a type of NoSQL database that uses
graph theory to store, map and query relationships.
Graph databases are basically used for analyzing interconnections. For example, companies might use a graph database
to mine data about customers from social media.
5.Review of Networks:
A system of interconnected computers and computerized peripherals such as printers is called computer network.
This interconnection among computers facilitates information sharing among them. Computers may connect to each
other by either wired or wireless media.
Classification of Computer Networks
Computer networks are classified based on various factors.They includes:
• Geographical span
• Inter-connectivity
• Administration
• Architecture
• It may be spanned across your table, among Bluetooth enabled devices,. Ranging not more than few
meters.
• It may be spanned across a whole building, including intermediate devices to connect all floors.
• It may be spanned across a whole city.
• It may be spanned across multiple cities or provinces.
• It may be one network covering whole world.
(ii) Inter-Connectivity
Components of a network can be connected to each other differently in some fashion. By connectedness we mean
either logically , physically , or both ways.
• Every single device can be connected to every other device on network, making the network mesh.
• All devices can be connected to a single medium but geographically disconnected, created bus like
structure.
• Each device is connected to its left and right peers only, creating linear structure.
• All devices connected together with a single device, creating star like structure.
• All devices connected arbitrarily using all previous ways to connect each other, resulting in a hybrid
structure.
(iii) Administration
From an administrator’s point of view, a network can be private network which belongs a single autonomous system
and cannot be accessed outside its physical or logical domain. A network can be public which is accessed by all.
(iv) Network Architecture
Computer networks can be discriminated into various types such as Client-Server,peer-to-peer or hybrid,
depending upon its architecture.
• There can be one or more systems acting as Server. Other being Client, requests the Server to serve
requests.Server takes and processes request on behalf of Clients.
• Two systems can be connected Point-to-Point, or in back-to-back fashion. They both reside at the same
level and called peers.
• There can be hybrid network which involves network architecture of both the above types.
Network Applications
Computer systems and peripherals are connected to form a network.They provide numerous advantages:
• Application Layer:
o The application layer is in charge of providing an interface to the application user. This layer contains
protocols that communicate directly with the user.
• Presentation Layer:
o This layer deals with the appearance and format of the data on the end devices.
• Session Layer:
o This layer is responsible for maintaining connections between remote hosts.
o For example, after user/password authentication is complete, the remote host retains the session and does not
request authentication again within that time period.
• Transport Layer:
o The Transport Layer is in charge of end-to-end delivery between hosts.
• Network Layer:
o This layer is in charge of assigning addresses and uniquely addressing hosts in a network.
• Data Link Layer:
o
The Data Link Layer is in charge of reading and writing data from and onto the line. At this layer, link
problems are identified.
• Physical Layer:
o This layer tells us about the hardware, cabling wiring, power output, pulse rate, and so on.
• Application Layer:
o
The application layer specifies the protocol that allows users to communicate with the network. FTP, HTTP
are some such protocols.
• Transport Layer:
o The Transport Layer describes how data should move between hosts.
o The Transmission Control Protocol is the most important protocol at this layer (TCP).
o This layer guarantees that data transferred between hosts is in the correct sequence and is in charge of end-to-
end delivery.
• Internet Layer:
o The Internet Protocol (IP) operates on this layer.
o This layer makes host addressing and identification easier.
o This layer is responsible for routing.
• Network Interface Layer:
o This layer offers the means for delivering and receiving real data.
o This layer, unlike its OSI Model equivalent, is independent of the underlying network architecture and
hardware.
• Divide-and-conquer method:
o
The divide-and-conquer technique divides unmanageable tasks into tiny and manageable jobs during the
design phase.
o In a nutshell, this technique minimises the complexity of the design.
• Modularity:
o Layered architecture has a higher level of modularity.
o Layer independence is provided through modularity, making it easier to comprehend and apply.
• Simple to modify:
o It provides layer independence, allowing changes to one layer’s implementation to have no effect on other
levels.
• Simple to test:
o Each layer of the layered architecture may be separately studied and tested
Distribution transparency is the property of distributed databases by the virtue of which the internal details of the
distribution are hidden from the users. The DDBMS designer may choose to fragment tables, replicate the fragments
and store them at different sites. However, since users are oblivious of these details, they find the distributed database
easy to use like any centralized database.
The three dimensions of distribution transparency are −
• Location transparency
• Fragmentation transparency
• Replication transparency
Location Transparency
Location transparency ensures that the user can query on any table(s) or fragment(s) of a table as if they were stored
locally in the user’s site. The fact that the table or its fragments are stored at remote site in the distributed database
system, should be completely oblivious to the end user. The address of the remote site(s) and the access mechanisms
are completely hidden.
In order to incorporate location transparency, DDBMS should have access to updated and accurate data dictionary
and DDBMS directory which contains the details of locations of data.
Fragmentation Transparency
Fragmentation transparency enables users to query upon any table as if it were unfragmented. Thus, it hides the fact
that the table the user is querying on is actually a fragment or union of some fragments. It also conceals the fact that
the fragments are located at diverse sites.
This is somewhat similar to users of SQL views, where the user may not know that they are using a view of a table
instead of the table itself.
Replication Transparency
Replication transparency ensures that replication of databases are hidden from the users. It enables users to query
upon a table as if only a single copy of the table exists.
Replication transparency is associated with concurrency transparency and failure transparency. Whenever a user
updates a data item, the update is reflected in all the copies of the table. However, this operation should not be known
to the user. This is concurrency transparency. Also, in case of failure of a site, the user can still proceed with his
queries using replicated copies without any knowledge of failure. This is failure transparency.
Combination of Transparencies
In any distributed database system, the designer should ensure that all the stated transparencies are maintained to a
considerable extent. The designer may choose to fragment tables, replicate them and store them at different sites; all
oblivious to the end user. However, complete distribution transparency is a tough task and requires considerable
design efforts
What is fragmentation?
• The process of dividing the database into a smaller multiple parts is called as fragmentation.
Horizontal fragmentation divides a relation(table) horizontally into the group of rows to create subsets
of tables.
Example:
Account (Acc_No, Balance, Branch_Name, Type).
In this example if values are inserted in table Branch_Name as Pune, Baroda, Delhi.
Example:
Fragmentation1:
SELECT * FROM Account WHERE Branch_Name= 'Pune' AND Balance < 50,000
Fragmentation2:
SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000
Fragmentation1:
SELECT * FROM Account WHERE Branch_Name= 'Baroda' AND Balance < 50,000
Fragmentation2:
SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000
• The complete horizontal fragmentation generates a set of horizontal fragmentation, which includes every table of
original relation.
• Completeness is required for reconstruction of relation so that every table belongs to at least one of the partitions.
d)Disjoint horizontal fragmentation
The disjoint horizontal fragmentation generates a set of horizontal fragmentation in which no two fragments have
common tables. That means every table of relation belongs to only one fragment.
Vertical fragmentation divides a relation(table) vertically into groups of columns to create subsets of tables.
Example:
Fragmentation1:
SELECT * FROM Acc_NO
Fragmentation2:
SELECT * FROM Balance
• The complete vertical fragmentation generates a set of vertical fragments, which can include all the attributes of
original relation.
• Reconstruction of vertical fragmentation is performed by using Full Outer Join operation on fragments.
• Hybrid fragmentation can be achieved by performing horizontal and vertical partition together.
Fragmentation1:
SELECT * FROM Emp_Name WHERE Emp_Age < 40
Fragmentation2:
SELECT * FROM Emp_Id WHERE Emp_Address= 'Pune' AND Salary < 14000
Database control refers to the task of enforcing regulations so as to provide correct data to authentic users and
applications of a database. In order that correct data is available to users, all data should conform to the integrity
constraints defined in the database. Besides, data should be screened away from unauthorized users so as to maintain
security and privacy of the database. Database control is one of the primary tasks of the database administrator
(DBA).
The three dimensions of database control are −
• Authentication
• Access rights
• Integrity constraints
Authentication
In a distributed database system, authentication is the process through which only legitimate users can gain access to
the data resources.
Authentication can be enforced in two levels −
• Controlling Access to Client Computer − At this level, user access is restricted while login to the client
computer that provides user-interface to the database server. The most common method is a
username/password combination. However, more sophisticated methods like biometric authentication may
be used for high security data.
• Controlling Access to the Database Software − At this level, the database software/administrator assigns
some credentials to the user. The user gains access to the database using these credentials. One of the
methods is to create a login account within the database server.
Access Rights
A user’s access rights refers to the privileges that the user is given regarding DBMS operations such as the rights to
create a table, drop a table, add/delete/update tuples in a table or query upon the table.
In distributed environments, since there are large number of tables and yet larger number of users, it is not feasible
to assign individual access rights to users. So, DDBMS defines certain roles. A role is a construct with certain
privileges within a database system. Once the different roles are defined, the individual users are assigned one of
these roles. Often a hierarchy of roles are defined according to the organization’s hierarchy of authority and
responsibility.
For example, the following SQL statements create a role "Accountant" and then assigns this role to user "ABC".