0% found this document useful (0 votes)
23 views37 pages

Chapter 1

Uploaded by

Mihai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views37 pages

Chapter 1

Uploaded by

Mihai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Database class

Chapter 1: Introduction to Database Concepts

We build together 1
Agenda

1.1 The purpose and definition of a database


1.2 The advantages of a database
1.3 Introduction to the table concepts: – columns,
rows – primary key – candidate key – foreign key –
indexes

We build together 2
1.1 The purpose and definition of a database

We build together 3
1.1. The purpose and definition of a database
Definition of a Database
A database is a structured collection of data organized and stored electronically in a computer system. It consists of one
or more tables, each containing rows and columns, where each row represents a record or entity, and each column
represents a different attribute or characteristic of that record.

Purpose of a Database
The primary purpose of a database is to provide an efficient and reliable method for storing, managing, and retrieving
data. It serves as a centralized repository where information can be stored, organized, and accessed as needed by users
or applications.

Key Components of a Database


Data Storage:
A database stores data in a structured format, allowing it to be easily accessed, updated, and managed. Data is typically
organized into tables, with each table representing a specific type of entity or concept (e.g., customers, orders,
products).
We build together 4
1.1. The purpose and definition of a database
Data Management:
Databases provide mechanisms for managing data integrity, security, and concurrency. Data integrity ensures
that the data remains accurate and consistent, while security features such as authentication, authorization, and
encryption protect it from unauthorized access or modification. Concurrency control mechanisms enable
multiple users or applications to access and manipulate data simultaneously without conflicting with each
other.
Data Retrieval:
Databases support efficient data retrieval through query languages such as SQL (Structured Query Language).
Users can write queries to retrieve specific data based on criteria such as filters, sorting, and aggregation.
Indexes and other optimization techniques are used to enhance query performance and speed up data retrieval
operations.

We build together 5
1.1. The purpose and definition of a database
Types of Databases
Relational Databases:
Relational databases organize data into tables with predefined relationships between them. They use structured
query language (SQL) to manipulate and retrieve data. Examples include MySQL, PostgreSQL, Oracle
Database, and Microsoft SQL Server.
NoSQL Databases:
NoSQL databases are designed to handle large volumes of unstructured or semi-structured data. They provide
flexible data models and scalable architectures suitable for modern applications with diverse data
requirements. Examples include MongoDB, Cassandra, Redis, and Couchbase.
NewSQL Databases:
NewSQL databases combine the benefits of traditional relational databases with the scalability and
performance characteristics of NoSQL databases. They aim to address the limitations of traditional databases
in handling distributed architectures and large-scale data processing. Examples include Google Spanner,
CockroachDB, and VoltDB.

We build together 6
1.1. The purpose and definition of a database

Conclusion
In summary, a database is a vital component of modern information systems, providing a centralized repository
for storing, managing, and retrieving data. Its purpose is to facilitate efficient data storage, organization, and
access, enabling users and applications to make informed decisions and perform critical business operations.

We build together 7
1.2. The advantages of a database

We build together 8
1.2. The advantages of a database
Databases offer numerous advantages over traditional methods of data management. Understanding these advantages
is crucial for appreciating the importance of adopting a database system for efficient data handling.
Below are some key advantages:
Data Organization and Structured Storage
Databases provide a structured approach to organizing and storing data. Unlike flat file systems or spreadsheets,
where data is often stored in an unstructured or semi-structured format, databases enforce a predefined schema that
specifies the structure of the data. This structured storage ensures consistency and coherence in data representation,
making it easier to manage and retrieve information.
Data Integrity and Consistency
One of the fundamental principles of databases is maintaining data integrity and consistency. Databases enforce
integrity constraints and validation rules to ensure that data remains accurate and reliable. Common integrity
constraints include primary key constraints, foreign key constraints, unique constraints, and check constraints. By
adhering to these constraints, databases prevent data anomalies such as duplicate records, inconsistent data, and
referential integrity violations.

We build together 9
1.2. The advantages of a database
Data Security and Access Control
Databases offer robust security features to protect sensitive information from unauthorized access, modification,
or disclosure. Access control mechanisms enable administrators to define user roles and privileges, restricting
access to specific data based on user permissions. Authentication mechanisms such as username/password
authentication, role-based access control (RBAC), and multi-factor authentication (MFA) ensure that only
authorized users can access the database. Encryption techniques such as data encryption at rest and in transit
further enhance data security by encrypting data to prevent unauthorized interception or tampering.
Concurrent Access and Transaction Management
Databases support concurrent access by multiple users or applications, allowing them to simultaneously read
and modify data without conflicts.

We build together 10
1.2. The advantages of a database
Transaction management ensures data consistency and
isolation by grouping database operations into atomic,
consistent, isolated, and durable (ACID) transactions.
ACID properties guarantee that transactions are executed
reliably and maintain data integrity, even in the event of
system failures or crashes. Databases use locking
mechanisms, concurrency control algorithms, and transaction
isolation levels to manage concurrent access and ensure data
consistency.

We build together 11
1.2. The advantages of a database
Scalability and Performance Optimization
Databases are designed to scale efficiently to accommodate growing volumes of data and increasing user loads.
They employ various scalability techniques such as sharding, replication, partitioning, and clustering to
distribute data across multiple nodes and handle large-scale deployments. Performance optimization techniques
such as query optimization, indexing, caching, and database tuning enhance the efficiency and responsiveness of
database operations, ensuring optimal performance even under heavy workloads.
Data Analysis and Decision Support
Databases serve as a valuable resource for data analysis, reporting, and decision support. They provide tools and
techniques for extracting, transforming, and analyzing data to derive meaningful insights and support informed
decision-making. Business intelligence (BI) tools, data analytics platforms, and reporting solutions integrate
with databases to visualize data, generate reports, and perform advanced analytics tasks. Databases enable
organizations to leverage their data assets effectively and gain a competitive advantage through data-driven
decision-making

We build together 12
1.2. The advantages of a database
Conclusion

In conclusion, databases offer a wide range of advantages that make them indispensable for modern data
management. From organizing and securing data to supporting concurrent access and enabling data
analysis, databases play a crucial role in facilitating efficient and effective data handling across diverse
applications and industries.

We build together 13
1.3. Introduction to the table concepts

We build together 14
1.3. Introduction to table concepts – tables, columns, rows
Tables
Tables are the Microsoft Excel equivalent of a single spreadsheet. They can
also be classified as standalone datasets. Tables are used to organize the
most closely related data together. A very basic example of a table could be
a dataset about people that contains a bunch of people’s names, job titles,
manager numbers, hiring dates, salaries, and commissions.

This information would be stored in a column and row format. Rows and columns also happen to be the
very foundation of a table.
Where columns are used to store different information about one person, rows store information about
different people. With both of them paired together, it ends up becoming a table full of information.
Columns
Columns are used to differentiate the information we have on a single
observable entity.

We build together 15
1.3. Introduction to table concepts – tables, columns, rows
In a Table that contains information about people, the columns would be used to hold different
information. If a Table, as mentioned above, contains people’s names, job titles, manager numbers,
hiring dates, salaries, and commissions, then that table will have 6 columns plus a Primary Key column
that we will discuss in later sections.
Each column can be set up to allow only a specific type of information to be entered into it. This aspect
allows for much-needed data integrity. For example, a column about salary should only contain
numbers, right? While that is true, the people operating the databases are humans and can therefore
accidentally enter something else in it. To prevent this from happening, columns can be designed to only
let a specific type of information to be entered.
The same goes for an email column. Anything that does not end in the typical ‘@abc.com’ should not be
allowed inside that column.
The customization that goes into a column is pretty much endless. There are many presets available
and custom options too.

We build together 16
1.3. Introduction to table concepts – tables, columns, rows
Rows
Rows of a table represent the number of observable entities we are looking
at. To put it simply, if the people table has 3 rows, it means it has the data
of 3 different people. Each row represents an individual person, and the
columns will display their respective information.
Rows allow us to see individual entries in the table. Each row also contains a Primary Key that allows us
to search for individual entries with ease.
Keys
Keys allow unique identification for all rows in the table. Without keys
there would be no way to differentiate between entries that have
identical information in their columns. Two people in a table can have
the same names and birthdays and without a unique key, it will be hard
to differentiate between them and can lead to unnecessary confusion.
Suppose you’re an HR person who has to send a termination letter to a guy named John Doe and a
promotion letter to another person with the same name. Imagine if that gets mixed up, both receive the
termination or promotion letter. Talk about a corporate nightmare, right?

We build together 17
1.3. Introduction to table concepts – keys
Types of Keys in Relational Model (Candidate, Super, Primary, Alternate and Foreign)

We build together 18
1.3. Introduction to table concepts – keys
What is Primary Key?

Definition
Primary Key is a collection of attributes or a single attribute that has the
capability to uniquely identify records in the table. The attribute is a
characteristic of an entity or table. In general, we can say that we can
find the required record(or row) from the table with the help of a primary
key.
The primary key is basically a minimal super key. That's why One table can have only one primary key.
The primary key is unique and not null as it provides individual uniqueness. This means that the attribute
which is selected as the primary key should have a different value for every record and it should not have
any null value.

We build together 19
1.3. Introduction to table concepts – keys
Example
To understand the primary key with an example, consider below Employee table:

This table illustrates 4 attributes Employee_Id, Employee_name, Employee_Deisgnation, and


Employee_Salary. Here as you can see Employee name can be the same for two employees, a designation
also cannot be unique and the salary of the employees can also be the same. In the above table, only
Employee_id can be used as the primary key.

We build together 20
1.3. Introduction to table concepts – keys
What is Candidate Key?
Definition
Candidate Key is a collection of attributes or a single attribute(single
column) that has the ability to uniquely identify records in the table. Now
confusion between the primary key and the candidate key arises. As the
primary key is a minimal super key, one database table can have only one
primary key, but one table can have more than one candidate key.

Candidate key attribute can contain null values, but it does not allow the same value in the attribute field as
its definition says that it uniquely identifies the record(or tuple) in the field. Also, Candidate keys cannot have
redundant attributes as in the super keys, or in other words, the candidate key should contain minimum
fields.

We build together 21
1.3. Introduction to table concepts – keys
Example
The below table of Student Records illustrates the purpose of the candidate key.

The above table contains 4 attributes, Id No, Name, Email-id, and Department of Student. Among these two
attributes Id No. and Email-Id can both be considered candidate keys. Because both attributes have the
ability to uniquely identify records or rows in the table.
But here noticeable thing is that the email-id of Sachin is not present, still, it can be considered as a
candidate key. And among these two attributes, one will become a primary key of the table. By looking at
both candidate keys, it is obvious that Id No. will become a primary key of this table because email-id
contains the null value and primary is the type of key that does not allow null values.

We build together 22
1.3. Introduction to table concepts – keys
Differences Between Primary Key and Candidate Key

Primary Key Candidate Key

The primary key uniquely identifies a record or The candidate key also uniquely identifies a record or
row in the table, and it follows one key per table row in the table, but it can have one key per table or
approach. many keys per table.
The primary key is a minimal super key. The candidate key is not a minimal super key.
For the attribute which is selected as a candidate key, it
For the attribute which is selected as the
will always have a unique value. It may contain null
primary key, it will always have unique and non-
values unless the attribute constraint is specified as not
null values.
null.

The primary key for the table is chosen from a The candidate key for the table is chosen from the
set of candidate keys. super key.

Poor choice of the primary key can lead to


unsuccessful relations between tables, and
The candidate key does not have choices.
eventually it complexes the whole database
schema.

When the relationship between one table is


The candidate key does not have much of a role while
created, the primary key in one table will
creating relationships between tables.
become a foreign key in another table.

We build together 23
1.3. Introduction to table concepts – keys
What is a Foreign Key?
Definition
A foreign key is generally used to establish the relationships
between tables in the RDBMS. That is, it is used to navigate
between two tables and maintain referential integrity. The primary
key of one table acts as a foreign key for another and helps us
to cross-reference two or more tables.
To ensure the links between a foreign key and primary keys aren't
broken, foreign key constraints are created to prevent actions that
would damage the links between the two tables and prevent
erroneous data from being added to the foreign key column.

We build together 24
1.3. Introduction to table concepts – keys
Importance of foreign keys
Foreign keys are important for several reasons, including the following:
• Streamline data sets. With foreign keys, database administrators don't have to store repeated data in
multiple tables. They make data available to different tables without creating redundant data sets. In other
words, they act as a cross-referencing system among tables.
• Promote efficiency. Primary and foreign keys work together to create structure in relational databases,
enabling them to sort, search and query data faster.
• Ensure data integrity. Primary and foreign key relationships also help maintain the data integrity of
relational databases. They ensure the existence of a value in the primary table and the accuracy of the
foreign key reference in the primary table, even if the primary table is changed or deleted.

We build together 25
1.3. Introduction to table concepts – index
Understanding Database Indexes
In simple terms, index is a pointer to data in a table. The pointer helps the storage engines to
locate data with a reduced latency. In other words, an index is like a supercharged table of contents in
a book. Just as a table of contents helps you quickly find where certain topics are discussed in a
book, an index in a database helps the database system quickly find where specific data is stored in a
table.
How Indexes Work
When you create an index on a column or set of columns in a database table, the database system
creates a separate data structure that sorts and stores the values of those columns. This data
structure is like a quick-reference guide that allows the database to jump directly to the relevant rows
in the table without having to search through every row.

We build together 26
1.3. Introduction to table concepts – index
Benefits of Indexes
 Faster Data Retrieval: Indexes speed up data retrieval by allowing the database to quickly locate
rows that match certain criteria in queries. Instead of scanning every row in a table, the database
can use the index to narrow down the search and locate the desired rows much faster.
 Improved Query Performance: Queries that involve columns with indexes can often be executed
more efficiently, resulting in faster query execution times. This is especially beneficial for tables with
large amounts of data, as indexes can significantly reduce the time it takes to process queries.

Common Types of Indexes


There are several types of indexes commonly used in databases, including:
 Single-Column Index: An index created on a single column.
 Composite Index: An index created on multiple columns. Composite indexes can be useful for
queries that involve multiple columns in the WHERE clause.

We build together 27
1.3. Introduction to table concepts
 Unique Index: An index that enforces uniqueness on the indexed columns, preventing duplicate
values from being inserted into the table.
 Clustered Index: In some database systems, such as SQL Server and PostgreSQL, the primary
key constraint automatically creates a clustered index. In a clustered index, the physical order of
rows in the table matches the order of the index, which can further improve query performance for
certain types of queries.

We build together 28
1.3. Introduction to table concepts – index
Considerations When Using Indexes
While indexes offer significant performance benefits, there are some trade-offs to consider:
 Storage Overhead: Indexes consume additional storage space in the database. This is because the
index data structure needs to be maintained alongside the table data. As a result, creating indexes
on every column may not be practical or efficient, especially for tables with a large number of
columns.
 Impact on Write Operations: While indexes speed up read operations such as SELECT queries,
they can also slightly slow down write operations such as INSERT, UPDATE, and DELETE queries.
This is because the database may need to update the index data structure whenever data in the
indexed columns is modified.

We build together 29
1.3. Introduction to table concepts

Search 120 in the given B-Tree. Solution:

We build together 30
1.3. Introduction to table concepts
Conclusion
In summary, database indexes are powerful tools for improving query performance and speeding up
data retrieval in a database. By creating indexes on columns that are frequently queried or used as
search criteria, you can optimize the performance of your database and provide a better user
experience for applications that rely on it. However, it's essential to consider the trade-offs and
carefully choose which columns to index to avoid unnecessary storage overhead and performance
implications.

We build together 31
1.3. Introduction to table concepts - Database Normalization
Database normalization is a process used in designing databases to organize data efficiently and reduce redundancy.
It helps ensure that data is stored logically and prevents certain types of data anomalies, such as insertion, deletion,
and update anomalies.

By organizing the data this way, we can:


1. Reduce redundancy: Each piece of data is stored in only one place, making the database more efficient and easier
to maintain.
2. Ensure data consistency: Since data is stored in a structured way, it's less likely to become inconsistent or
contradictory.
3. Improve data integrity: By defining relationships between tables, you can enforce rules that ensure the data
remains accurate and reliable.

Overall, normalization is about structuring data in a way that makes it easier to manage, query, and update, while also
reducing the risk of errors and inconsistencies.

We build together 32
1.3. Introduction to table concepts - Database Normalization
First Normal Form (1NF):
In the first normal form, each column in a table must contain atomic values, meaning that each value in a
column should be indivisible.
Example: Consider a table storing information about books. Without following 1NF, you might have a table
like this:

In this example, the ISBN column contains multiple values separated by commas, violating 1NF because the
ISBN is not atomic. To bring it into 1NF, you would split the ISBNs into separate rows:

Each row now contains only atomic values in the ISBN column, satisfying the requirements of 1NF.

We build together 33
1.3. Introduction to table concepts - Database Normalization
Second Normal Form (2NF):
In the second normal form, a table must first be in 1NF, and all non-key attributes must be fully functionally
dependent on the entire primary key.
Example: Let's expand our book example. Suppose we have a table with information about books and the authors:

In this table, the Author Name depends only on the Author ID, not on the Book ID. It's partially dependent on
the primary key (Book ID), violating 2NF. To bring it into 2NF, we split it into two tables:

Books Authors

Now, the Author Name is fully dependent on the Author ID, satisfying 2NF.

We build together 34
1.3. Introduction to table concepts - Database Normalization
Third Normal Form (3NF):
In the third normal form, a table must first be in 2NF, and it must not have any transitive dependencies.
Example: Extending our book example, suppose we add information about the publishers:

In this table, Publisher Name depends on Publisher ID, not on the Book ID. This is a transitive dependency
because Publisher Name is dependent on Publisher ID, which is dependent on Book ID.
To bring it into 3NF, we split it into three tables:

Now, the Publisher Name is no longer transitively dependent


on the Book ID, satisfying 3NF

We build together 35
Database class

Questions?

We build together 36
Database class

Thank you!

We build together 37

You might also like