IntroToDbms-Test1Notes
IntroToDbms-Test1Notes
A Database Management System (DBMS) is the software used to manage and interact with
the database, enabling users to store, retrieve, and edit data. The combination of the DBMS
and the data it manages is often referred to as a “database system,” or simply a “database.”
Databases are stored on servers either on-premises at an organization’s office or off-
premises at an organization’s data center (or even within their cloud infrastructure).
Databases come in many formats in order to do different things with various types of data.
Advantages of databases
Computerized databases were first introduced to the world in the 1960s and have since
become the foundation for products, analysis, business processes and more. Many of the
services you use online every day (banking, social media, shopping, email) are all built on
top of databases.
We use databases because they are an extremely efficient way of holding vast amounts of
data and information. Databases around the world store everything from your credit card
transactions to every click you make within one of your social media accounts. Given there
are nearly eight billion people on the planet, that’s a lot of data.
Databases allow access to various services which, in turn, allow you to access your accounts
and perform transactions all across the internet. For example, your bank’s login page will
ping a database to figure out if you’ve entered the right password and username. Your
favorite online shop pings your credit card’s database to pull down the funds needed for you
to buy that item you’ve been eyeing.
Databases allow for easy information updates on a regular basis. Adding a video to your
social media account, directly depositing your salary into your bank account or buying a
plane ticket for your next vacation are all updates made to a database and displayed back to
you almost instantaneously.
Databases Simplify Data Analysis
Databases make research and data analysis much easier because they are highly structured
storage areas of data and information. This means businesses and organizations can easily
analyze databases once they know how a database is structured. Common structures (e.g.
table formats, cell structures like date or currency fields) and common database querying
languages (e.g., SQL) make database analysis easy and efficient.
History of databases
The database as we know it today dates back to the 1960s when the use of computers
became popular. Below are some of the main milestones in the history of databases.
SQL in 1970s:
In the 1970s, IBM computer scientist Edgar Codd published his paper “A Relational Model of
Data for Large Shared Data Banks.” This paper coined the term “relational database” and
established a new way to store and access data.
Following Codd’s paper, Michael Stonebraker and Eugene Wong at the University of
California in Berkeley created INGRES (Interactive Graphics and Retrieval System). INGRES
was a relational database model that used QUEL query language. IBM released their version
of a relational database called System R that used Structured Query Language (SQL) in 1974.
RDBMS in 1980s:
Relational databases grew in popularity during the 1980s, and SQL became the standard
language for querying and managing the data. Database Management Systems (DBMSes)
became essential tools for handling data storage, retrieval, and security for multiple users.
Internet in 1990s:
The rise of the internet in the 1990s fueled the next round of growth in the database
industry. The Relational Database Management System (RDBMS) model, designed to
manage the data of a single organization, wasn’t prepared to handle the volume of data that
web applications were generating. Furthermore, with the decline in performance and
increase in maintenance costs, developers looked for a new solution, and found MySQL, an
open-source relational database.
This period also saw the need to organize data more efficiently, leading to advancements in
database architecture and the management of structured and unstructured data.
NoSQL in 2000s:
NoSQL (“not only structured query language”) was initially coined in 1998 and referred to
databases that used query languages other than SQL. However, as the internet continued to
grow, there was a need for a new kind of database that could store unstructured and semi-
structured data. This led to the emergence of NoSQL databases, which became popular due
to their speed and flexibility in handling large amounts of unstructured data.
NoSQL databases support different data models, including document, key-value, graph, and
column-family. They also provide solutions for modern applications that require scalability
and fast access to data.
Today:
In recent years organizations have increasingly been adopting cloud-native and purpose-
built databases. They are moving away from on-premises and legacy databases to cloud-
native databases to improve agility, scalability, and decrease total cost of ownership.
Modern databases now support hybrid cloud computing platforms and integrated data
stores with both structured and unstructured data. These advancements help manage
distributed data across multiple users and systems. They also ensure data security and
compliance.
About SQL
SQL is a Query language that is used to communicate with relational databases. The
American National Standards Institute (ANSI) has considered SQL the standard language for
relational database management systems. SQL statements are used to add, remove, modify,
and query data, and they can also be used to grant permissions to users or roles. Popular
RDBMSes that use SQL are Oracle, Microsoft SQL Server, IBM, MySQL, PostgreSQL,
Microsoft Access, Ingres, and more.
Applications of Databases
When used correctly, databases can be a helpful tool for organizations in various industries
looking to better arrange their information. Common use cases include:
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the
information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case
of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the
requirements of the user.
Types of Databases
There are many types of databases used today. Below are some of the more prominent
ones.
1. Hierarchical Databases
Hierarchical databases were the earliest form of databases. You can think of these
databases like a simplified family tree. There’s a singular parent object (like a table) that has
child objects (or tables) under it. A parent can have one or many child objects but a child
object only has one parent. The benefit of these databases are that they’re incredibly fast
and efficient plus there’s a clear, threaded relationship from one object to another. The
downside to hierarchical databases is that they’re very rigid and highly structured.
2. Relational Databases
Relational databases are perhaps the most popular type of database. Relational databases
are set up to connect their objects (like tables) to each other with keys. For example, there
might be one table with user information (name, username, date of birth, customer
number) and another table with purchase information (customer number, item purchased,
price paid). In this example, the key that creates a relationship between the tables is the
customer number.
Non-relational databases were invented more recently than relational databases and
hierarchical databases in response to the growing complexity of web applications. Non-
relational databases are any database that doesn’t use a relational model. You might also
see them referred to as NoSQL databases. Non-relational databases store data in different
ways such as unstructured data, structured document format or as a graph. Relational
databases are based on a rigid structure whereas non-relational databases are more
flexible.
4. Cloud Databases
Cloud databases refer to information that’s accessible in a hybrid or cloud environment. All
users need is an internet connection to reach their files and manipulate them like any other
database. A convenience of cloud databases is that they don’t require extra hardware to
create more storage space. Users can either build a cloud database themselves or pay for a
service to get started.
5. Centralized Databases
Centralized databases are contained within a single computer or another physical system.
Although users may access data through devices connected within a network, the database
itself operates from one location. This approach may work best for larger companies or
organizations that want to prioritize data security and efficiency.
6. Distributed Databases
Distributed databases run on more than one device. That can be as simple as operating
several computers on the same site, or a network that connects to many devices. An
advantage of this method is that if one computer goes down, the other computers and
devices keep functioning.
7. Object-Oriented Databases
Object-oriented databases perceive data as objects and classes. Objects are specific data —
like names and videos — while classes are groups of objects. Storing data as objects means
users don’t have to distribute data across tables. This makes it easier to determine the
relationships between variables and analyze the data.
8. Graph Databases
Graph databases highlight the relationships between various data points. While users may
have to do extra work to determine trends in other types of databases, graph databases
store relationships right next to the data itself. Users can then immediately see how various
data points are connected to each other.
Database Languages
Storage. A DBMS provides efficient data storage and retrieval by ensuring that
data is stored in tables, rows and columns.
Centralized view. A DBMS provides a centralized view of data that multiple users
can access from multiple locations in a controlled manner. A DBMS can limit
what data end users see and how they view the data, providing many views of a
single database schema. End users and software programs are free from having
to understand where the data is physically located or on what type of storage
medium it resides because the DBMS handles all requests.
Data independence. A DBMS offers both logical and physical data independence
to protect users and applications from having to know where data is stored or
from being concerned about changes to the physical structure of data. As long as
programs use the application programming interface (API) for the database that
the DBMS provides, developers won't have to modify programs just because
changes have been made to the database.
Database Schema
A database schema logically describes a part or all of a database by displaying the data
structure in tables, fields, and relationships. You can think of it as a blueprint for
understanding an organization’s data resources.
Within a database management system (DBMS), the term "schema" pertains to the logical
structure or arrangement of data, dictating how it is stored and accessed. "Architecture"
denotes the comprehensive layout and organization of the database. The three-schema
architecture in DBMS segregates the logical and physical aspects of the system, enabling
modifications to one layer without impacting the others. This segregation facilitates the
preservation of data integrity and consistency.
External Layer
Conceptual Layer
Internal Layer
In a DBMS, the External layer offers a logical perspective of the database, serving as the
accessible portion that users interact with. This topmost layer is specifically designed to
provide a user-friendly interface for the database. To illustrate, consider an example of an
Employee Management system. When an employee logs into the system, the External layer
enables the display of the employee’s information.
The Conceptual schema in a database refers to the segment that defines the distinctions
among various datasets and establishes the overall structure of the database. For instance,
in an employee database, it outlines the columns or attributes of the table. It serves as a
high-level representation of the database. The Conceptual schema is commonly depicted
using the Entity-Relationship Model (ER Model), which employs symbols to visually
represent data elements and relationships specific to a given system. In an ER Model, the
database is portrayed through an ER Diagram. Let’s now examine the ER Diagram for an
Employee Management system, represented as follows.
This ER Diagram illustrates the relationships among the Employee, Department, Employee’s
Role, and Login System.
The internal schema in a database management system (DBMS) refers to the lowest level of
the three-schema architecture. It describes the physical storage structure and organization
of data within the database. The internal schema defines how the data is stored on the
storage media, such as disks or tapes, and how it is accessed by the system. This includes
details like data file formats, indexing techniques, storage allocation methods, and any
physical constraints or optimizations implemented in the database. The internal schema is
primarily concerned with the efficient storage and retrieval of data, and it is hidden from
the users and applications that interact with the database through the higher-level schemas.
1. Query Processor:
It interprets the requests (queries) received from end user via an application program into
instructions. It also executes the user request which is received from the DML compiler.
Query Processor contains the following components –
DML Compiler: It processes the DML statements into low level instruction
(machine language), so that they can be executed.
DDL Interpreter: It processes the DDL statements into a set of table containing
meta data (data about data).
Embedded DML Pre-compiler: It processes DML statements embedded in an
application program into procedural calls.
Query Optimizer: It executes the instruction generated by DML Compiler.
2. Storage Manager:
Storage Manager is a program that provides an interface between the data stored in the
database and the queries received. It is also known as Database Control System. It
maintains the consistency and integrity of the database by applying the constraints and
executing the DCL statements. It is responsible for updating, storing, deleting, and
retrieving data in the database.
It contains the following components –
Authorization Manager: It ensures role-based access control, i.e,. checks
whether the particular person is privileged to perform the requested operation
or not.
File Manager: It manages the file space and the data structure used to
represent information in the database.
Buffer Manager: It is responsible for cache memory and the transfer of data
between the secondary storage and main memory.
3. Disk Storage:
It contains the following components:
Data Files: It stores the data.
Data Dictionary: It contains the information about the structure of any
database object. It is the repository of information that governs the metadata.
Indices: It provides faster retrieval of data item.
Properties of Transaction
The ACID properties, in totality, provide a mechanism to ensure the correctness and
consistency of a database in a way such that each transaction is a group of operations that
acts as a single unit, produces consistent results, acts in isolation from other operations,
and updates that it makes are durably stored.
ACID properties are the four key characteristics that define the reliability and consistency
of a transaction in a Database Management System (DBMS). The acronym ACID stands for
Atomicity, Consistency, Isolation, and Durability. Here is a brief description of each of
these properties:
1. Atomicity: Atomicity ensures that a transaction is treated as a single, indivisible
unit of work. Either all the operations within the transaction are completed
successfully, or none of them are. If any part of the transaction fails, the entire
transaction is rolled back to its original state, ensuring data consistency and
integrity.
1. Consistency: Consistency ensures that a transaction takes the database from
one consistent state to another consistent state. The database is in a consistent
state both before and after the transaction is executed. Constraints, such as
unique keys and foreign keys, must be maintained to ensure data consistency.
1. Isolation: Isolation ensures that multiple transactions can execute concurrently
without interfering with each other. Each transaction must be isolated from
other transactions until it is completed. This isolation prevents dirty reads, non-
repeatable reads, and phantom reads.
1. Durability: Durability ensures that once a transaction is committed, its changes
are permanent and will survive any subsequent system failures. The
transaction’s changes are saved to the database permanently, and even if the
system crashes, the changes remain intact and can be recovered.
Overall, ACID properties provide a framework for ensuring data consistency, integrity, and
reliability in DBMS. They ensure that transactions are executed in a reliable and consistent
manner, even in the presence of system failures, network issues, or other problems. These
properties make DBMS a reliable and efficient tool for managing data in modern
organizations.
Advantages of ACID Properties in DBMS
1. Data Consistency: ACID properties ensure that the data remains consistent and
accurate after any transaction execution.
1. Data Integrity: ACID properties maintain the integrity of the data by ensuring
that any changes to the database are permanent and cannot be lost.
1. Concurrency Control: ACID properties help to manage multiple transactions
occurring concurrently by preventing interference between them.
1. Recovery: ACID properties ensure that in case of any failure or crash, the
system can recover the data up to the point of failure or crash.
Relational databases are the backbone of many applications and systems in today's digital
world. They provide a structured way to store, organize, and retrieve data. In this article, we
will delve into the fundamental components of a relational database: tables, records, and
fields.
Tables
For example, a 'Customers' table might include columns for CustomerID, FirstName,
LastName, Email, and PhoneNumber. Each column represents a different attribute of the
customer entity.
Records
Each row in a table is known as a record. A record is a set of related data items that are
grouped together. In the 'Customers' table example, each record would represent a single
customer. The record would include the customer's ID, first name, last name, email, and
phone number.
Fields
A field is a single piece of data within a record. In the 'Customers' table, the 'FirstName' field
of a record would contain the first name of a specific customer. Each field in a table is
associated with a specific data type, such as integer, text, date/time, etc., which determines
what kind of data it can store.
Relationships
Relational databases get their name from the fact that they allow relationships to be
established between different tables. These relationships are based on the use of keys.
A primary key is a unique identifier for a record in a table. For example, in the 'Customers'
table, 'CustomerID' could be the primary key. A primary key ensures that each record in the
table is unique.
A foreign key is a field (or collection of fields) in one table, that uniquely identifies a record
in another table. The table containing the foreign key is called the child table, and the table
containing the candidate key is called the referenced or parent table.
For example, in an 'Orders' table, there might be a 'CustomerID' field that acts as a foreign
key linking each order to a specific customer in the 'Customers' table. This allows for a
relationship to be established between the 'Customers' and 'Orders' tables, where each
customer can have multiple associated orders.
In conclusion, understanding the concepts of tables, records, and fields, and how they
interact, is fundamental to working with relational databases. These components provide
the structure that allows data to be efficiently stored, organized, and retrieved in a
relational database.
The primary key column has a unique value and doesn’t store repeating values. A Primary
key can never take NULL values.
For example, in the case of a student when identification needs to be done in the class, the
roll number of the student plays the role of Primary key.
Similarly, when we talk about employees in a company, the employee ID is functioning as
the Primary key for identification.
What is a Foreign key in SQL?
A Foreign key is beneficial when we connect two or more tables so that data from both can
be put to use parallelly.
A foreign key is a field or collection of fields in a table that refers to the Primary key of the
other table. It is responsible for managing the relationship between the tables.
The table which contains the foreign key is often called the child table, and the table whose
primary key is being referred by the foreign key is called the Parent Table.
For example: When we talk about students and the courses they have enrolled in, now if we
try to store all the data in a single table, the problem of redundancy arises.
To solve this table, we make two tables, one the student detail table and the other
department table. In the student table, we store the details of students and the courses
they have enrolled in.
And in the department table, we store all the details of the department. Here the courseId
acts as the Primary key for the department table whereas it acts as the Foreign key in the
student table.