Reviewer in Database
Reviewer in Database
TOPIC 1 HUHU
Importance of Databases
Efficient data organization and storage.
Quick and reliable data access and retrieval.
Support for concurrent data access by multiple users.
Data security and integrity.
Scalability to handle large volumes of data.
Crucial for business intelligence, analytics, and decision-making.
Risks/Disadvantages ·
Expensive
Education & training
Types of Databases
Relational Databases: Organized in tables with predefined relationships between data.
NoSQL Databases: Flexible schema, suitable for unstructured or semi-structured data.
Object-Oriented Databases: Store objects and their relationships directly.
Graph Databases: Store data in nodes and edges for efficient relationship representation.
Components of a Database
Data: Represents the actual information stored in the database.
Database Management System (DBMS): Software that manages the database and facilitates interactions.
Users: Individuals or applications that interact with the database.
NoSQL Databases
Non-relational databases designed for flexible and scalable data storage.
Suitable for big data, real-time applications, and dynamic data structures.
Types: Document-based (e.g., MongoDB), Key-Value (e.g., Redis),
Column-Family (e.g., Cassandra),Graph (e.g., Neo4j).
Database Design
Crucial step in creating an efficient and reliable database.
Key considerations: Data modeling, normalization, indexing, and relationships.
Conclusion
Databases are the backbone of modern data-driven applications.
Understanding databases is essential for developers, data analysts, and business
professionals.
Continuous learning and exploration in the ever-evolving field of database
Data types
define the type of data that can be stored in a column.
Connecting to a Database
Use the CONNECT or USE statement to establish a connection to the database.
Provide necessary credentials and connection details.
ALTER TABLE: Used to modify an existing table by adding, modifying, or dropping columns.
Example: ALTER TABLE table_name ADD column_name datatype;
Example: ALTER TABLE table_name MODIFY column_name new_datatype;
Example: ALTER TABLE table_name DROP column_name;
DROP TABLE: Used to delete an existing table and its data permanently.
Example: DROP TABLE table_name;
CREATE INDEX: Used to create an index on one or more columns to improve query performance.
Example: CREATE INDEX index_name ON table_name (column1, column2, ...);
FOREIGN KEY: Establishes a relationship between two tables based on a column's value. Example:
CREATE TABLE table_name (..., column_name datatype, FOREIGN KEY(column_name) REFERENCES
other_table(column_name))
UNIQUE: Ensures the uniqueness of a column's values, but allows NULL values.
Example: CREATE TABLE table_name (column1 datatype UNIQUE, ...);
DML is a subset of SQL used for interacting with and manipulating data stored in a database.
Arithmetic Operators
SQL supports various arithmetic operators for numeric calculations.
These operators are used in SELECT, WHERE, and other clauses.
Addition (+): Performs addition of two numeric values.
Example: SELECT column1 + column2 AS sum_result FROM table_name;
Order of Evaluation
SQL follows the standard order of evaluation for arithmetic operations.
Parentheses can be used to control the order of evaluation.
Null Handling
Arithmetic operations involving NULL may produce unexpected results.
SQL treats NULL as an unknown value, and any arithmetic operation with NULL results in NULL.
SQL Functions
SQL functions are built-in operations that can be applied to data in the database to perform specific tasks
or calculations. Let's explore some common SQL functions with examples:
Aggregate Functions - These functions perform operations on multiple rows and return a single result.
This query will return the total number of tracks in the "Tracks" table.
SUM(): Calculates the sum of values in a column.
This query will give the total invoice amount for all invoices.
AVG(): Computes the average of values in a column.
Scalar Functions:
UPPER(): Converts text to uppercase.
Mathematical Functions
ABS(): Returns the absolute value of a number.
SQRT(): Calculates the square root of a number.
RAND(): Generates a random number.
POWER(): Raises a number to a specified power.
SQL Wildcards
Combining Wildcards
You can combine wildcards for more complex patterns.
Escaping Wildcards
To search for the actual % or _ characters, you can escape them with a backslash.
Subqueries
Subquery, also known as a nested query or a query within another query.
Used to retrieve data based on results from another query.
A subquery is enclosed within parentheses and can be used in various parts of a query, such as the
SELECT, FROM, WHERE, or HAVING clauses.
Subqueries in the WHERE clause can be used to filter rows based on conditions from another table.
Subqueries in the SELECT clause can be used to calculate values dynamically.
Subqueries in the FROM clause can be used to create temporary result sets that can be further queried.
Principle 4: Indexing
Implement indexes on frequently queried columns.
Speed up data retrieval and query performance.
Be cautious not to over-index, as it may impact insert and update operations.
Principle 6: Scalability
Design the database to handle increasing data volumes.
Consider horizontal and vertical scaling options.
Use sharding and partitioning techniques for large-scale databases.
Principle 7: Security
Implement robust security measures to protect sensitive data.
Authenticate and authorize users with appropriate access levels.
Encrypt data at rest and during transmission.
DEMO 2
Principle 1: Data Modeling
A database schema is a structured representation of data storage in a database.
The meaning of schema varies across different database systems.
Three-Schema Architecture:
These three types of schema form the basis of the Three-Schema Architecture.
Table Structure
A database holds multiple tables, often referred to as relations.
In a logical sense, a table is also known as an entity.
In object-oriented databases (OODB), an entity is an object with attributes (similar to columns in a table).
In relational databases, rows are called records.
Every column has a data type defining the type of values it can hold.
Domains define legal values for attributes (columns).
Each record in a table is uniquely identified by a primary key
Tables in a database are not isolated; they have relationships.
Tables are linked through a key column, known as the primary key in one table and the foreign key in the
related table.
Types of attributes:
Single-Valued Attributes: Store one value per entity (e.g., date of birth).
Composite Attributes: Can be split into sub-components (e.g., name -> first and last name).
Derived Attributes: Values are derived from other attributes (e.g., age from date of birth).
Key Attributes: Hold unique values to identify entities (e.g., student ID).
In many contexts, the terms "simple attribute" and single attribute are used interchangeably to refer to
attributes that represent a single, indivisible piece of data associated with an entity in a database. Both
terms essentially mean the same thing and describe attributes that are not composed of multiple subparts
or components.
To clarify:
A "simple attribute" is an attribute that represents a single, atomic piece of data within an entity in a
database.
It cannot be further divided into smaller meaningful attributes.
A "single attribute" is also an attribute that represents a single piece of data within an entity, without any
subcomponents.
So, whether you use the term "simple attribute" or "single attribute," you are referring to the same
concept: an attribute that is not composite and represents a single data element
Considerations in Database Design:
Only include entities and attributes relevant to your project.
Capture data that facilitates user tasks and activities.
By understanding entities and attribute types, you can design a well-structured relational database.
Keys in depth
In this reading, you’ll explore the concept of keys in more depth, so you have a better understanding of how
to choose a primary key from a list of “candidate keys”, and how to connect tables together with foreign
keys in SQL.
A relational database is a collection of data that is managed and maintained in a database management
system such as Oracle or MySQL.
A relational database enables you to retrieve every single piece of stored data. This can be done by
specifying:
the name of the target table (or tables),
the name of the required column (or columns),
and the primary key of the table.
In a relational database, the primary key can be selected from any candidate key attribute that contains a
unique instance value in each row of the table.
Composite Key - consist of multiple attributes that determine a record
As a database engineer, dealing with duplicate data and multiple values in table columns is common.
Normalization helps in organizing data for easier viewing, searching, and sorting.
This will teach you how to design a database in First Normal Form (1NF), enforce the atomicity rule, and
eliminate repeating group data problems
1NF.
Normalization Overview:
Normalization simplifies basic database tasks and resolves insert, delete, and update anomalies.
Three fundamental normalization forms: 1NF, 2NF, and 3NF.
Focus here is on achieving 1NF, which enforces data atomicity and eliminates repeating groups.
Data Atomicity:
Data atomicity means having only one single instance value of a column attribute in any table field. Tables
should contain only one value per field to avoid redundancy and inconsistency.
Course ID - Key Attribute
Course Name - Simple Attribute
Tutor Name - Simple Attribute
Tutor Surname - Simple Attribute
Contact Number - Multi-valued attribute
Fixing Data Atomicity:
Option 1: Create a new row for each number, ensuring one value per field.
Problem: Primary key no longer unique due to multiple rows with the same course ID.
Redesigning for 1NF:
Identify repeating groups of data: Tutor's name and contact numbers.
Identify entities: Course and Tutor.
Split the course table into two separate tables: one for courses and one for tutors.
Benefits:
Achieve data atomicity and eliminate repeating groups of data.
Ensure consistency when updating tutor details in multiple tables.
Conclusion
Understanding 1NF and its rules is essential for effective database design.
Applying these principles helps in maintaining clean and consistent data in your database.
2NF
Why Database Normalization?
Database developers aim for a well-structured database to reduce duplication and ensure accurate data
analysis and retrieval.
Functional Dependency:
Functional dependency is the relationship between two attributes in a table.
One column's unique values determine the values of another column.
Illustrated using a table called R with columns X and Y. X is unique (e.g., primary key), Y is non-unique.
Sample:
Date of Birth is fully functional into our StudentID
Name is also fully functional into our StudentID
StudentID determines both name and date of birth
Partial Dependency:
Occurs when a table has a composite primary key (composed of two or more columns).
All non-key attributes must depend on the entire primary key, not just part of it.
Sample:
Vaccine Name fully functions with the Vaccine ID but is a partial dependency
Patient Name fully functions with the Patient ID but is a partial dependency
Status if fully functional with both PatientID and Vaccine ID
Upgrading to 2NF:
To meet 2NF, ensure all non-key columns depend on the entire primary key.
3NF
3NF and Transitive Dependency:
To achieve 3NF, a database must first be in 1NF and 2NF.
In addition to these rules, 3NF deals with transitive dependency, where a non-key attribute cannot be
functionally dependent on another non-key attribute.
Transitive Dependency Explained:
Key attribute: Helps uniquely identify a row of data in a table.
Transitive dependency occurs when one non-key attribute depends on another non-key attribute. For
instance, if A determines B, and B determines C, this is a transitive dependency, represented as A -> B ->
C.
Sample
Country is fully functional with Language (non-key attribute) = transitive dependency
Entity Relationship Diagram (ER-D)
This reading covers the Entity Relationship Diagram (ER-D) and demonstrates how to use it to support the design of a
relational database.
The relational database model organizes information into tables to ensure a good data structure to maintain
consistency and accuracy, which makes the design of the tables and their relationships very crucial.
The relational database design is very well connected with the entity relationship modelling process including
entities, attributes and relationship identification and definition.
The entity-relationship diagram (ER-D) is commonly used to represent and document the entity relationship
models.
The use of entity relationship diagrams helps to provide the big picture of your database.
It also ensures the data requirements and operations are well defined and documented in your project.
The entities, attributes and relationships between entities can be shown in a variety of diagrammatic formats in
the ER diagrams.
Entity representation
In the ER-D, a box with two compartments is used to represent the entity and its related attributes.
The top compartment represents the entity name, and the bottom compartment includes the related attributes.
Relationship representation
The ER diagram uses different styles of lines to define the distinct types of relationships between entities.
The line style depends on the cardinality of the relationship, which refers to the number of elements in a set of data as
clarified in the following three cases.
1:1 (one-to-one): The ER-D uses a straight-line representation for a one-to-one cardinality relationship.
For example, each passenger on a train should have only one ticket.
1:N (one-to-many): The ER-D is a straight line with a crow’s foot notation on one side only to represent a
one-to-many cardinality relationship. For example, one parent can have many children.
M:N (many-to-many): The ER-D is a straight line with crow’s foot notations on both sides of entities to
represent a many-to-many cardinality relationship. For example, many players play many games.
Attributes representation
Each entity has a set of attributes that hold relevant information about it. Each attribute must be defined
with a data type.
FULL NORMALIZATION
Principle 3: Normalization
The normalization process aims to minimize data duplications, avoid errors during data modifications and
simplify data queries from the database.
The three fundamental normalization forms are known as:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
First normal form
To simplify the data structure of the surgery table, let’s apply the first normal form rules to enforce the data
atomicity rule and eliminate unnecessary repeating groups of data.
The data atomicity rule means that you can only have one single instance value of the column attribute in
any cell of the table.
The atomicity problem only exists in the columns of data related to the patients. Therefore, it is important
to create a new table for patient data to fix this.
In other words, you can organize all data related to the patient entity in one separate table, where each cell
of any column contains only one single instance of data as depicted in the following example.
Second normal form
In the second normal form, you need to avoid any partial dependency relationships between data. Partial
dependency refers to tables with a composite primary key.
Namely a key that consists of a combination of two or more columns, where a non-key attribute value
depends only on one part of the composite key.
Third normal form
For a relation in a database to be in the third normal form, it must already be in the second normal form
(2NF). In addition, it must have no transitive dependency.
This means that any non-key attribute in the surgery table may not be functionally dependent on another
non-key attribute in the same table.
In the surgery table, the postcode and the council are non-key attributes, and the postcode is dependent on
the council.
Therefore, if you change the council value, you must also change the postcode. This is called transitive
dependency, which is not allowed in the third normal form.
Location table
This ensures that the database now conforms to first, second and third normal forms.
The third normal form is typically good enough to deal with the three anomaly challenges – insertion,
update and deletion anomalies – that the normalization process aims to tackle.