Database 1
Database 1
SCIENCE (DECCOMS)
UGHELLI, DELTA STATE.
in affiliation with,
TEMPLE GATE POLYTECHNIC
ABA, ABIA STATE.
LECTURE NOTES
ON
DATABASE DESIGN I
(COM 312)
BY
A database typically consists of one or more tables, which represent entities or objects in the
real world. Each table is composed of rows (records) and columns (fields) that define the
specific data attributes or properties. The tables are interconnected through relationships that
establish associations between the data entities.
Database System: A database system refers to the complete software, hardware, and data
management components that work together to manage and control access to databases. It
encompasses the database software, the underlying hardware infrastructure, and the set of
tools and utilities used to perform various database operations.
A database system provides a set of functionalities and services for creating, organizing, and
maintaining databases. It includes components such as:
Database Management System (DBMS): The DBMS is the software responsible for managing
the database. It provides mechanisms to create, modify, and delete data, enforce data integrity
constraints, handle concurrency control, and optimize data retrieval and storage operations.
Popular DBMSs include MySQL, Oracle, Microsoft SQL Server, and PostgreSQL.
The database system manages the storage of data on disk or other storage media. It defines the
structure of files used to store data, such as pages, blocks, or other storage units. Efficient
storage mechanisms are used to organize and access data quickly.
The database system includes query processing and optimization components that analyze and
optimize queries written in a database query language (such as SQL) to retrieve data efficiently.
It determines the most efficient execution plan for executing queries based on factors like
indexing, join algorithms, and data access methods.
Database systems provide mechanisms for securing data and controlling access to it. Users and
roles are defined with appropriate permissions to ensure data confidentiality, integrity, and
availability. Access control mechanisms limit unauthorized access and protect the database
from security threats.
Data Backup and Recovery
Database systems include features for backing up data to prevent data loss in the event of
hardware failure, human error, or other disasters. Recovery mechanisms allow for restoring
data to a consistent state after a failure.
Database systems are vital for managing large volumes of data, ensuring data integrity,
enabling efficient data retrieval, and supporting complex data operations in various domains
such as business, finance, healthcare, and more. They provide a structured and organized
approach to store, retrieve, and manipulate data effectively and securely.
Types of information needs can vary depending on the context and the specific requirements of
individuals or organizations. Here are some common types of information needs:
Descriptive information needs refer to the need for factual and objective information about a
particular topic or subject. It includes information about entities, events, processes, or
characteristics. Descriptive information helps in understanding and describing the current state
or condition of something.
Diagnostic information needs involve the need for information to diagnose problems,
determine causes, or analyze situations. This type of information helps in identifying the root
cause of an issue or understanding the underlying factors contributing to a specific situation or
problem.
Predictive information needs pertain to the requirement for information that can help in
making forecasts, predictions, or projections about future events or outcomes. This type of
information is valuable for planning, decision-making, and anticipating potential trends or
scenarios.
Procedural information needs refer to the requirement for information that guides individuals
or organizations through specific processes, tasks, or procedures. It includes step-by-step
instructions, guidelines, or manuals that explain how to perform certain activities or tasks
effectively.
Comparative Information Needs:
Comparative information needs involve the need for information that allows for comparisons
between different entities, options, or alternatives. Comparative information helps in
evaluating and making informed choices by considering the similarities, differences,
advantages, and disadvantages of various options.
Normative information needs relate to the requirement for information that establishes
standards, norms, guidelines, or benchmarks for performance, behavior, or quality. Normative
information helps in assessing performance, compliance, or adherence to established criteria.
Strategic information needs involve the requirement for information that supports long-term
planning, decision-making, and the formulation of organizational strategies. This type of
information helps in understanding market trends, competitive intelligence, industry analysis,
and other factors crucial for strategic planning.
These are some general categories of information needs, and in practice, an information need
can often fall into multiple categories or have unique characteristics based on the specific
context and purpose. Identifying the type of information need is essential to effectively search,
evaluate, and fulfill the information requirements of individuals or organizations.
Database systems serve various purposes and provide numerous benefits for managing and
organizing data. Some key purposes of database systems include:
Database systems offer a structured and efficient way to store, organize, and manage large
volumes of data. They provide mechanisms for defining tables, records, and fields, enabling
systematic storage of data in a structured format.
Database systems allow users to retrieve and query data based on specific criteria. They
provide powerful querying languages (e.g., SQL) that enable users to extract relevant
information from the database quickly and efficiently.
Database systems enforce data integrity by implementing constraints and rules to ensure the
accuracy and consistency of data. They support the validation and enforcement of business
rules, referential integrity, and data validation rules, minimizing data inconsistencies.
Data Concurrency and Transaction Management: Database systems provide mechanisms for
securing data and controlling access to it. They allow for defining user roles and permissions,
ensuring that only authorized users can access and modify the data. Security features include
authentication, encryption, and auditing capabilities.
Database systems handle concurrent access to data by multiple users or processes. They ensure
data consistency by managing transactions, which are units of work that involve multiple
database operations. Transactions ensure that data is correctly updated and maintained, and
provide mechanisms for rollback and recovery in case of failures.
Database systems offer features for backing up data to prevent data loss in case of system
failures, disasters, or human errors. They provide mechanisms for restoring data to a consistent
state and ensure data availability and recoverability.
Database systems facilitate data sharing and collaboration among multiple users or
applications. They allow concurrent access to data, support data sharing across different
systems or platforms, and provide features for data integration and interoperability.
Database systems provide capabilities for data analysis, reporting, and business intelligence.
They support the storage and processing of large datasets, enabling organizations to extract
insights, generate reports, and make informed decisions based on data-driven analysis.
Database systems are designed to handle the storage and retrieval of large amounts of data
efficiently. They offer mechanisms for optimizing query performance, indexing data, and
ensuring the system scales effectively as the data volume and user load increase.
Database systems provide APIs, tools, and frameworks to support the development of
applications that interact with the database. They offer data modeling capabilities, support for
defining relationships, and integration with programming languages, making it easier to build
robust and scalable applications.
These purposes highlight the significant advantages of using database systems, ranging from
efficient data storage and retrieval to ensuring data integrity, security, and support for decision-
making processes. Database systems play a crucial role in managing and leveraging data
effectively within organizations and across various domains
CHAPTER TWO
Data View:
A data view refers to a logical representation or subset of a database that presents data in a
specific way to meet the needs of particular users or applications. It provides a customized and
simplified perspective of the underlying database, tailored to the requirements of a specific
user or group.
Data views are designed to abstract the complexities of the underlying database structure and
present data in a more meaningful and relevant manner. They can include selected tables,
specific columns, filtered rows, computed fields, and even join multiple tables together.
Simplification: Data views simplify data access by providing a focused and concise
representation of relevant data, eliminating the need to navigate through the entire database
schema.
Data Security: Data views can enforce access control by restricting users to specific data
subsets. They allow organizations to grant appropriate privileges and limit sensitive information
exposure.
Data Integration: Data views can combine and integrate data from multiple tables or even
multiple databases, providing a consolidated view for reporting or analysis purposes.
Data Model:
A data model is a conceptual representation of how data is organized and structured within a
database or information system. It defines the entities, attributes, relationships, and constraints
that govern the organization and manipulation of data.
Data models serve as blueprints or diagrams that document the logical and physical structure of
a database. They provide a visual representation of the database schema, allowing stakeholders
to understand the data's structure and relationships.
Entity-Relationship Model (ER Model): This model represents the relationships between
entities using entities, attributes, and relationships. It is widely used to design relational
databases.
Relational Model: The relational model organizes data into tables (relations) with rows (tuples)
and columns (attributes). It emphasizes the relationships and dependencies between tables.
Hierarchical Model: The hierarchical model represents data in a tree-like structure, where each
record has a single parent and can have multiple children.
Network Model: The network model extends the hierarchical model by allowing records to
have multiple parent-child relationships, forming complex network structures.
Object-Oriented Model: The object-oriented model represents data as objects with properties
and behaviors, encapsulating both data and the methods that operate on that data.
NoSQL Data Models: NoSQL databases, such as document-oriented, key-value, columnar, and
graph databases, have their own data models optimized for specific use cases and scalability
requirements.
Data models serve as a foundation for database design and development. They facilitate
communication among stakeholders, guide the construction of database schemas, and ensure
data integrity and consistency. Data models provide a structured approach to organizing data
and serve as a basis for database management and application development.
Database Administrators:
Database administrators (DBAs) are responsible for the overall management, maintenance, and
performance of a database system. Their primary role is to ensure the smooth and efficient
operation of databases within an organization. Some key responsibilities of database
administrators include:
Database Design and Creation: DBAs participate in the design and development of the
database schema, defining tables, relationships, constraints, and other database structures.
Database Security: DBAs implement and manage security measures to protect the database
from unauthorized access or data breaches. They define user roles, permissions, and access
controls, and ensure compliance with security policies and regulations.
Data Backup and Recovery: DBAs establish backup and recovery strategies to prevent data loss
in case of system failures, disasters, or human errors. They perform regular backups, monitor
data integrity, and implement recovery procedures.
Database Maintenance and Upgrades: DBAs handle routine maintenance tasks such as
database backups, software patches, and version upgrades. They ensure database systems are
up-to-date, stable, and reliable.
Database Users:
Database users are individuals or applications that interact with a database system to perform
various operations. They can be categorized into different types based on their roles and access
privileges:
End Users: End users are individuals who interact with the database through user interfaces or
applications. They perform tasks such as data entry, querying the database for information,
generating reports, and analyzing data.
Database Application Developers: These users develop software applications that interact with
the database system. They write code, implement business logic, and design user interfaces to
access and manipulate the data stored in the database.
Data Analysts: Data analysts are responsible for interpreting and analyzing data to derive
insights and make data-driven decisions. They use tools and techniques to extract, transform,
and analyze data from databases to uncover patterns, trends, and relationships.
Database Managers: Database managers oversee the usage and administration of databases
within an organization. They work closely with DBAs, coordinate database-related activities,
and ensure compliance with data governance and data management policies.
Database Languages:
Database languages are used to communicate with and manipulate databases. There are three
main types of database languages:
Data Definition Language (DDL): DDL is used to define the structure and schema of a database.
It includes commands such as CREATE, ALTER, and DROP, which create tables, modify their
structure, or remove them.
Data Manipulation Language (DML): DML is used to retrieve, insert, update, and delete data
within a database. It includes commands such as SELECT, INSERT, UPDATE, and DELETE, which
perform operations on the data stored in tables.
Data Control Language (DCL): DCL is used to control access to the database and manage user
privileges. It includes commands such as GRANT and REVOKE, which grant or revoke
permissions and privileges to users or roles.
Popular examples of database languages include Structured Query Language (SQL), which is
widely used for interacting with relational databases, and other query languages specific to
certain database management systems.
These database languages allow users and applications to interact with databases, retrieve and
manipulate data, define database structures, and control access, providing a means to
communicate with the database system effectively.
Different types of data models provide varying ways to organize and represent data. Here are
explanations of three types: hierarchical, network, and relational models:
Hierarchical Model:
The hierarchical model organizes data in a tree-like structure with a top-down hierarchical
relationship. In this model, data is structured as parent-child relationships, where each parent
can have multiple children, but each child has only one parent. It follows a one-to-many
relationship.
Key features of the hierarchical model include:
Each parent can have multiple children, but each child has only one parent.
Examples of hierarchical databases include IBM's Information Management System (IMS) and
Windows Registry.
The hierarchical model works well for data with clear parent-child relationships, but it can be
inflexible for representing complex or dynamic relationships between data entities.
Network Model:
The network model extends the hierarchical model by allowing records to have multiple parent-
child relationships, forming complex network structures. It supports many-to-many
relationships, which the hierarchical model does not handle easily.
The network model is well-suited for representing complex relationships and can handle more
diverse data structures than the hierarchical model.
Examples of database systems based on the network model include Integrated Data Store (IDS)
and Integrated Database Management System (IDMS).
While the network model allows for more complex relationships, it can be complex to
implement and maintain compared to other models.
Relational Model:
The relational model is the most widely used data model in modern database systems. It
organizes data into tables (relations) consisting of rows (tuples) and columns (attributes). The
relationships between tables are established through keys (primary and foreign keys).
Relationships between tables are defined using primary and foreign keys.
Each row represents a record, and each column represents an attribute or data field.
Data retrieval and manipulation are performed using the structured query language (SQL).
The relational model provides a high level of data independence and flexibility.
Relational databases, such as Oracle, MySQL, and Microsoft SQL Server, are based on the
relational model. The relational model's strength lies in its ability to handle complex
relationships and provide flexibility for querying and data manipulation.
The choice of data model depends on the nature of data and the requirements of the
application. While the hierarchical and network models have specific use cases, the relational
model's versatility has made it the dominant choice in modern database systems.
Certainly! Let's translate the concepts of Entity-Relationship (E-R) modeling into simpler terms:
Entity Sets:
In E-R modeling, an entity set represents a collection or group of similar entities. An entity is a
distinct object, person, place, concept, or event that we want to represent and store data
about. For example, in a university database, the "Student" entity set represents all the
students enrolled in the university, while the "Course" entity set represents all the courses
offered by the university.
Entity Relationship:
An entity relationship describes the association or connection between two or more entities. It
represents how entities are related to each other in a database. Relationships can have various
types, such as one-to-one, one-to-many, or many-to-many, depending on the cardinality and
participation constraints. For example, in a university database, a relationship could exist
between the "Student" entity set and the "Course" entity set, indicating that students enroll in
courses.
A weak entity set is an entity set that does not have sufficient attributes to uniquely identify its
entities without considering the attributes of another related entity set. It depends on a strong
or identifying relationship with another entity set. The identifying relationship provides the
necessary attributes to uniquely identify the weak entity set's entities. For example, in a
database for a bank, the "Account" entity set may be weak if it relies on the "Customer" entity
set's identifier to uniquely identify each account.
E-R modeling is a popular technique used in database design to represent the structure and
relationships of data entities in a clear and intuitive manner. It helps in conceptualizing and
visualizing the organization of data within a database system, facilitating the design process and
ensuring data integrity.
E-R database schema
Entities:
Entities represent the real-world objects, concepts, or things that we want to store data about.
Each entity in the schema is represented as a rectangle and is labeled with its name. For
example, in a university database, entities could include "Student," "Course," "Faculty," etc.
Attributes:
Attributes are the properties or characteristics that describe the entities. They provide details
or information about the entities. Each entity has one or more attributes associated with it.
Attributes are depicted as ovals connected to the corresponding entity. For example, a
"Student" entity may have attributes like "StudentID," "Name," "Age," and "Email."
Relationships:
Relationships define the associations or connections between entities. They represent how
entities are related to each other and provide important insights into the database structure.
Relationships are depicted as lines connecting entities, with labels indicating the nature of the
relationship (e.g., one-to-one, one-to-many, many-to-many). For example, a relationship
between the "Student" and "Course" entities could represent the fact that a student can enroll
in multiple courses.
Cardinality refers to the number of instances or occurrences of one entity that can be
associated with another entity through a relationship. It defines the multiplicity or how many
entities can be involved in a relationship. Participation constraints define whether an entity's
presence is mandatory (total participation) or optional (partial participation) in a relationship.
Keys are attributes that uniquely identify each instance of an entity. A primary key is a specific
key attribute chosen to uniquely identify each entity instance within an entity set. It ensures
data integrity and uniqueness. Primary keys are typically underlined in the E-R schema.
The E-R database schema provides a high-level, conceptual view of the database structure,
emphasizing the entities, their attributes, and the relationships between them. It serves as a
foundation for the logical and physical design of the database, guiding the creation of tables,
data types, constraints, and indexes. The E-R schema helps in visualizing and communicating
the database structure, facilitating the database design and development process.
Describe reduction of E-R schema into tables.
The process of reducing an Entity-Relationship (E-R) schema into tables is known as schema
mapping or database normalization. It involves converting the conceptual representation of
entities, relationships, and attributes into a set of relational database tables with well-defined
structures.
Here are the steps involved in reducing an E-R schema into tables:
Review the entities in the E-R schema and identify their attributes. Each entity will become a
separate table, and its attributes will become the table columns.
Determine the primary key for each table. The primary key uniquely identifies each row (tuple)
in the table and is essential for data integrity. It is typically chosen from one or more attributes
that uniquely identify the entity instances.
Define Relationships:
Identify the relationships between entities and determine how they are represented in the
table structure. There are three common approaches for representing relationships:
One-to-One Relationships: In this case, you can include the primary key of one entity as a
foreign key in the other entity's table.
One-to-Many Relationships: In this case, the table representing the "one" side of the
relationship will include the primary key as a foreign key in the table representing the "many"
side.
Many-to-Many Relationships: In this case, you need to create an additional table, known as a
junction or associative table, to represent the relationship. This table will include foreign keys
referencing the primary keys of the entities involved in the relationship.
If there are weak entities in the E-R schema, where the identification depends on a related
entity, you need to consider how to represent them in the tables. Typically, you will include a
foreign key referencing the related entity's primary key as part of the weak entity's table.
If there are many-to-many relationships, as mentioned earlier, you will need to create a
separate table to represent the relationship. This table will have foreign keys referencing the
primary keys of the entities involved.
Normalize the Tables:
Normalize the tables to eliminate redundancy and ensure data integrity. This involves applying
normalization rules, such as removing repeating groups and dependencies, to organize the data
more efficiently.
Specify the appropriate data types for each table column, considering factors such as the
nature of the data and storage requirements. Additionally, define constraints such as unique
constraints, foreign key constraints, and any other constraints necessary to maintain data
consistency and integrity.
By following these steps, the E-R schema is effectively transformed into a set of well-structured
relational database tables that represent the entities, attributes, and relationships of the
original schema. This mapping process ensures that data can be stored, retrieved, and
manipulated efficiently and accurately within a relational database management system.
Relational-database design plays a critical role in ensuring the efficiency, reliability, and
integrity of data storage and retrieval. However, there are several pitfalls that designers should
be aware of to avoid potential issues. Here are some common pitfalls in relational-database
design:
Insufficient Normalization:
Failing to properly normalize the database can lead to data redundancy, inconsistencies, and
anomalies. Designers should follow normalization rules to eliminate data redundancy and
dependency issues, ensuring data integrity and reducing storage space requirements.
Over-Reliance on Denormalization:
While denormalization can improve query performance in certain cases, overusing it can
introduce redundancy and compromise data consistency. Designers should carefully assess the
trade-offs and apply denormalization judiciously based on specific performance requirements.
Neglecting the enforcement of data integrity constraints, such as primary key, foreign key, and
unique constraints, can result in data inconsistencies and invalid relationships. Designers should
define and enforce appropriate constraints to maintain data accuracy and integrity.
Poorly Defined Data Types:
Incorrectly selecting data types for database columns can lead to wasted storage space,
inefficient queries, or potential data loss. It is crucial to choose data types that accurately
represent the data being stored while considering factors like size, range, and expected usage.
Insufficient attention to database security can expose sensitive data to unauthorized access and
compromise data confidentiality. Designers should implement appropriate security measures
such as access controls, encryption, and authentication mechanisms to safeguard the database.
Failing to anticipate future growth and scalability needs can result in performance issues and
system limitations. Designers should consider factors like data volume, concurrent user access,
and potential expansion requirements to ensure the database can handle future demands.
Inadequate Documentation:
Poor documentation of the database design can lead to difficulties in understanding the
database structure, relationships, and business rules. It is important to document the design
decisions, schema definitions, constraints, and other relevant information for future reference
and maintenance.
Avoiding these pitfalls requires careful planning, adherence to best practices, and thorough
understanding of the application requirements. Regular reviews, performance monitoring, and
maintenance activities can help identify and rectify design issues and ensure the database
functions optimally and reliably over time.
Decomposition and normalization are two fundamental concepts in relational database design
that focus on breaking down and organizing data into well-structured tables to ensure data
integrity, minimize redundancy, and optimize query performance. Here's an explanation of
each concept:
Decomposition:
Decomposition refers to the process of breaking down a complex table or relation into multiple
smaller tables based on functional dependencies and normalization principles. It involves
identifying and extracting logical components or attributes from a table to form separate tables.
Eliminating data redundancy: By dividing data into smaller tables, duplication of information is
minimized, ensuring that each piece of data is stored in only one place.
Preserving data integrity: Decomposition helps ensure that relationships between entities and
attributes are accurately represented and maintained, reducing the risk of data anomalies and
inconsistencies.
Improving query performance: Decomposing large tables into smaller, more specialized tables
can enhance query performance by allowing for efficient data retrieval and reducing
unnecessary data access.
Normalization:
Normalization is the process of applying a set of rules or normal forms to a relational database
schema to eliminate data redundancy, dependencies, and anomalies. It helps ensure data
integrity and consistency by organizing data into well-structured tables with proper
relationships.
First Normal Form (1NF): Ensures atomicity by requiring each cell in a table to hold a single
value.
Second Normal Form (2NF): Eliminates partial dependencies by ensuring that all non-key
attributes depend on the entire primary key.
Third Normal Form (3NF): Eliminates transitive dependencies by ensuring that non-key
attributes depend only on the primary key and not on other non-key attributes.
Boyce-Codd Normal Form (BCNF): Ensures that every determinant (attribute that determines
the value of another attribute) is a candidate key, removing all non-trivial functional
dependencies.
Fourth Normal Form (4NF): Eliminates multivalued dependencies, ensuring that one-to-many
relationships are properly represented.
Fifth Normal Form (5NF): Addresses join dependencies by decomposing tables to eliminate
redundant join operations.
Each normal form builds upon the previous one, with higher normal forms resulting in more
rigorous data organization and fewer dependencies.
Normalization helps improve data integrity, reduces redundancy, and minimizes update
anomalies. However, it's important to strike a balance between normalization and performance
optimization, as excessive normalization can lead to complex query operations and potential
performance drawbacks.
Overall, decomposition and normalization are crucial steps in relational database design to
create efficient, well-structured, and maintainable database schemas that accurately represent
the underlying data and promote data integrity.
Domain-Key Normal Form (DK/NF) is an extension of the Boyce-Codd Normal Form (BCNF) and
a more stringent level of normalization in database design. It aims to address certain types of
anomalies that can arise due to the presence of multiple candidate keys and composite
attributes.
In DK/NF, the focus is on ensuring that each non-key attribute is dependent on the key as a
whole and not on any part of the key. This helps eliminate certain types of functional
dependencies that can lead to data redundancy and anomalies.
Atomicity: Each attribute in a table should hold a single value, ensuring atomicity at the
attribute level.
Key Dependencies: Every non-key attribute in a table must be functionally dependent on the
entire candidate key, rather than on only a portion of the key. In other words, no proper subset
of the key should determine any non-key attribute.
By enforcing these conditions, DK/NF ensures that data is organized in a way that eliminates
partial dependencies and minimizes redundancy and anomalies related to multiple candidate
keys and composite attributes.
It's important to note that DK/NF is considered a higher level of normalization beyond BCNF.
While achieving DK/NF can further improve data integrity and reduce redundancy, it may come
at the cost of more complex table structures and potential performance implications. Designers
need to carefully evaluate the benefits and trade-offs based on specific requirements and
considerations
DK/NF is not as widely known or used as BCNF or other lower normal forms like 3NF. Its usage
is more specialized and typically found in scenarios where complex composite keys and multiple
candidate keys are present, and stricter normalization is desired to achieve a higher level of
data integrity.
In database design, there are alternative approaches or methodologies that can be adopted
depending on the specific requirements, constraints, and goals of the project. Here are some
notable alternative approaches to consider:
NoSQL (Not only SQL) databases offer a departure from traditional relational database models
and provide flexible data models better suited for specific use cases. NoSQL databases, such as
document-oriented, key-value, columnar, and graph databases, prioritize scalability, high
performance, and schema flexibility. They are often used for applications with rapidly changing
requirements, large-scale data storage, and the need for distributed computing.
Data Warehousing and Dimensional Modeling:
Data warehousing focuses on organizing and analyzing large volumes of data to support
business intelligence and decision-making. Dimensional modeling is a technique used in data
warehousing to structure data for efficient analysis. It involves creating dimensional schemas
using facts (measurable data) and dimensions (descriptive data). Data warehousing and
dimensional modeling are particularly useful for reporting, analytics, and data-driven decision-
making.
Agile database design takes an iterative and incremental approach to database development,
aligning with the principles of Agile software development methodologies. It emphasizes
collaboration, adaptability, and responsiveness to changing requirements. Agile database
design encourages close collaboration between developers, database administrators, and
stakeholders to continuously refine and evolve the database design based on feedback and
emerging needs.
Data Vault modeling is a methodology that focuses on creating a flexible and scalable data
architecture to support data integration and historical tracking. It involves modeling the data
using three primary components: hubs (business keys), links (associations between hubs), and
satellites (attributes of hubs and links). Data Vault modeling is particularly useful for data
warehousing, data integration, and handling complex data scenarios.
Big data design approaches are tailored for managing and processing massive volumes of data
from diverse sources. This includes techniques like distributed file systems (e.g., Hadoop),
parallel processing, and data partitioning to handle the velocity, variety, and volume of big data.
Big data design focuses on scalability, fault tolerance, and high-performance processing of
large-scale data sets.
Each alternative approach has its strengths and limitations, and the choice of approach
depends on factors such as the nature of the data, scalability requirements, performance
considerations, and specific project needs. Understanding these alternative approaches allows
designers to select the most suitable methodology that aligns with the project's objectives and
requirements.
CHAPTER THREE
SQL (Structured Query Language) is a standard programming language for managing relational
databases. It was developed in the early 1970s by IBM researchers Donald D. Chamberlin and
Raymond F. Boyce, who initially called it SEQUEL (Structured English Query Language). SQL was
designed to provide a user-friendly and standardized method for interacting with relational
database systems.
The development of SQL was influenced by the need for a language that could effectively
retrieve and manipulate data stored in databases. Prior to SQL, data manipulation was typically
performed using low-level programming languages or specialized query languages specific to
individual database systems, which created challenges in portability and ease of use.
With SQL, users could use a declarative approach to express queries and commands in a more
natural language-like syntax. This made it easier for non-technical users and programmers to
interact with databases and perform operations such as querying data, inserting, updating, and
deleting records, creating tables, defining relationships, and managing database structures.
Over the years, SQL evolved as a standard language for relational databases, and different
versions and implementations emerged. The American National Standards Institute (ANSI) and
the International Organization for Standardization (ISO) have developed SQL standards to
ensure consistency and compatibility across different database management systems.
The SQL language consists of several components, including Data Definition Language (DDL) for
creating and modifying database structures, Data Manipulation Language (DML) for
manipulating data, Data Control Language (DCL) for managing permissions and access control,
and Transaction Control Language (TCL) for managing transactions.
SQL has become ubiquitous in the world of databases and is supported by most relational
database management systems, including popular systems like Oracle, MySQL, Microsoft SQL
Server, PostgreSQL, and SQLite. It has also influenced the development of other database
languages and technologies.
Today, SQL is widely used for a variety of purposes, including database administration, data
analysis, application development, and business intelligence. Its ease of use, portability, and
standardization have made it a fundamental tool for working with relational databases and
managing data effectively.
The basic structure of SQL (Structured Query Language) consists of several components that
allow users to interact with relational databases. These components include:
DDL statements are used to define and modify the structure of the database. They include
commands such as CREATE, ALTER, and DROP to create and modify database objects like tables,
views, indexes, and constraints.
Data Manipulation Language (DML):
DML statements are used to retrieve, insert, update, and delete data within the database. The
main DML commands are SELECT, INSERT, UPDATE, and DELETE, which allow users to query,
add, modify, and remove data from database tables.
DCL statements are used to control access to the database and manage user privileges. They
include commands such as GRANT and REVOKE to grant or revoke permissions and privileges
for database objects.
TCL statements are used to manage transactions within the database. Transactions ensure that
a series of database operations are performed as a single, indivisible unit. TCL commands
include COMMIT to save changes, ROLLBACK to undo changes, and SAVEPOINT to set
intermediate points for rollback.
Keywords:
SQL statements begin with keywords that define the type of operation being performed, such
as SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, GRANT, and so on.
Clauses:
Clauses provide additional details and conditions for the SQL statement. Common clauses
include FROM (specifying the table to retrieve data from), WHERE (specifying conditions for
data retrieval or modification), GROUP BY (grouping data for aggregation), ORDER BY (sorting
the result set), and others.
Expressions:
SQL statements often involve expressions, which are calculations, comparisons, or data
manipulations. Expressions can involve arithmetic operators (+, -, *, /), logical operators (AND,
OR, NOT), comparison operators (=, <, >, etc.), and functions (e.g., AVG, COUNT, MAX, MIN) to
perform calculations or transformations on data.
Semicolon:
SQL statements are typically terminated with a semicolon (;) to indicate the end of the
statement.
sql
Copy code
SELECT column1, column2
FROM table_name
WHERE condition;
In this example, the SELECT keyword is used to retrieve specific columns from a table, the
FROM clause specifies the table to retrieve data from, and the WHERE clause provides a
condition to filter the data.
By following this basic structure and utilizing appropriate SQL statements, users can perform a
wide range of operations to manage and manipulate data in relational databases.
nested sub-queries
Nested sub-queries, also known as subquery within a subquery or nested queries, refer to the
use of one query inside another query within an SQL statement. They allow for more complex
and advanced data retrieval and manipulation by combining multiple levels of queries.
In a nested sub-query, the inner query (subquery) is executed first, and its results are then used
as input or criteria for the outer query. The subquery can be used in various parts of the outer
query, such as the SELECT, FROM, WHERE, or HAVING clauses.
Let's assume we have two tables: "Customers" and "Orders." We want to retrieve the names of
customers who have placed at least one order.
Outer Query:
sql
Copy code
SELECT customer_name
FROM Customers
WHERE customer_id IN (
SELECT customer_id
FROM Orders
);
In this example, the outer query selects the "customer_name" column from the "Customers"
table. The subquery, enclosed in parentheses, retrieves the "customer_id" column from the
"Orders" table.
The subquery is executed first, retrieving a list of customer IDs from the "Orders" table. The
outer query then uses this list of customer IDs as a criterion in the WHERE clause to retrieve the
corresponding customer names from the "Customers" table.
Nested sub-queries provide flexibility and power in SQL because they allow for more complex
conditions and comparisons. The results of the inner query can be used to filter, aggregate, join,
or perform other operations in the outer query.
It's important to note that nesting sub-queries can impact performance and readability of SQL
statements. Care should be taken to optimize the query execution plan and ensure the
readability and maintainability of the code.
Additionally, some database systems have limitations on the level of nesting allowed, so it's
important to be aware of any restrictions imposed by the specific database platform being
used.
Nested sub-queries are a valuable tool in SQL for performing intricate data retrieval and
manipulation tasks, offering the ability to combine multiple levels of queries to obtain the
desired result set.
Derived relations and views are concepts in relational database systems that allow users to
create virtual or temporary representations of data based on existing tables. Both derived
relations and views serve similar purposes but differ in their underlying implementation and
usage.
Derived Relations:
A derived relation, also known as a derived table or virtual table, is a result of applying
operations (such as selections, projections, joins, or aggregations) on one or more existing
tables in the database. Derived relations are not explicitly stored in the database but are
computed or derived on the fly whenever they are referenced or queried.
Derived relations are typically used for intermediate calculations or complex queries to simplify
data access and reduce redundancy.
They provide a way to transform or manipulate data from existing tables without modifying the
underlying schema or data.
Views:
A view, also known as a virtual table or logical table, is a named and saved query that acts as a
virtual table in the database. Views are created by defining a query on one or more tables, and
the result set of the query is saved as a view in the database. Users can then interact with the
view as if it were a physical table.
They encapsulate complex queries and provide a simplified and controlled way of accessing
data.
Views can be used to restrict access to certain columns or rows, enforcing security and privacy
policies.
Changes made to the underlying tables are automatically reflected in the view, maintaining
data consistency.
Views are often used to present a specific subset of data or to aggregate data from multiple
tables into a single logical table.
While both derived relations and views provide virtual representations of data, views offer
more flexibility and persistence as they are stored objects in the database. Views can be
queried and manipulated just like regular tables, while derived relations are generated on the
fly and exist only during the execution of a specific query or operation.
Both derived relations and views are powerful tools in database systems, allowing users to
create customized data representations, simplify complex queries, and enforce security and
data integrity.
views
Views in a relational database system are virtual tables that are created based on the result of a
query. They provide a way to present a subset of data from one or more tables or perform
complex calculations on the fly. Views appear and can be queried like regular tables, but they
do not store any data themselves. Instead, they dynamically retrieve and present data from the
underlying tables based on the defined query.
Views allow users to create customized virtual tables that present a subset of data from one or
more tables. Views can include specific columns, filter rows based on certain conditions, or
combine data from multiple tables into a single logical view. This abstraction helps simplify data
access and presents a tailored perspective of the database.
Views can be used to enforce security and access control by limiting the visibility of certain
columns or rows. With views, you can grant or restrict users' access to specific data, ensuring
that they only see the information they are authorized to access. This provides an additional
layer of data protection and privacy.
Data Integrity and Consistency:
Views maintain data integrity and consistency by reflecting changes made to the underlying
tables. When data is modified in the original tables, the corresponding changes are
automatically reflected in the view. This ensures that the data presented through the view is
always up to date and consistent with the underlying data.
Views are commonly used to simplify complex queries by encapsulating them into a single
named object. Instead of writing complex and lengthy queries repeatedly, users can create a
view with the desired query logic. The view can then be queried like a regular table, making the
query process more efficient and readable.
Query Optimization:
Views can also improve query performance and optimization. By predefining and saving
frequently used queries as views, the database system can optimize the execution plan for
those queries. This can result in faster query execution and reduced overhead in query
compilation.
It's important to note that while views provide a convenient way to interact with data, they do
not physically store any data. Views are based on the underlying tables and their defined
queries. Any modifications to the view's data (insertion, update, or deletion) are automatically
propagated to the underlying tables.
Views are widely used in database systems to simplify data access, improve security, enhance
query performance, and provide a customizable perspective of the data. They offer a powerful
mechanism to abstract and manipulate data without altering the underlying table structure,
making database management more efficient and flexible.
Joined relations
Joined relations, also known as joins, are an essential concept in relational database systems
that enable the combination of data from multiple tables based on common columns or
relationships. Joining tables allows users to retrieve and analyze data that is distributed across
different tables, providing a comprehensive view of the related information.
Join operations bring together related data from multiple tables based on shared columns or
specified relationships. By joining tables, users can access information from different tables in a
single result set, enabling more comprehensive and meaningful analysis of the data.
Inner Join: Returns only the rows that have matching values in both tables being joined.
Left Join: Returns all the rows from the left table and the matching rows from the right table. If
there are no matches, null values are included.
Right Join: Returns all the rows from the right table and the matching rows from the left table.
If there are no matches, null values are included.
Full Join: Returns all the rows from both tables, including both matching and non-matching
rows. Null values are included where there are no matches.
Cross Join: Returns the Cartesian product of the two tables, generating all possible
combinations of rows between the tables.
Join Conditions:
Join operations rely on specific conditions to determine how the tables are linked together.
These conditions are typically specified in the ON clause or the WHERE clause of the SQL
statement. Join conditions define the relationship between the columns in the joined tables,
such as equality between the values or other comparison operators.
It is possible to join more than two tables together to combine data from multiple sources. This
is done by sequentially joining additional tables to the result of a previous join operation.
Complex queries may involve joining several tables to retrieve the desired information.
When joining tables, it is common to use table aliases to provide shorter and more readable
names for the tables being joined. Aliasing helps differentiate columns from different tables,
especially when there are column name conflicts.
Joined relations are a fundamental concept in relational databases, allowing users to extract
meaningful information by combining related data from multiple tables. The ability to join
tables effectively is crucial in creating comprehensive queries and obtaining a holistic view of
data stored across different tables in a database.
CHAPTER FOUR
Data Definition Language (DDL) and Embedded SQL are two concepts related to database
management and the interaction between programming languages and databases.
DDL is a subset of SQL that is used to define and manage the structure and schema of a
database. It includes commands for creating, altering, and dropping database objects such as
tables, views, indexes, constraints, and stored procedures. DDL statements are responsible for
defining the logical and physical structure of the database, specifying data types, constraints,
relationships, and other properties.
TRUNCATE: Used to delete all data from a table while keeping its structure intact.
DDL statements are executed outside the scope of a specific transaction and are automatically
committed once executed successfully. They are typically executed by database administrators
or authorized users responsible for managing the database structure.
Embedded SQL:
Embedded SQL refers to the integration of SQL statements within a host programming
language, such as C, Java, or Python. It allows programmers to interact with a database from
within their application code by embedding SQL statements directly within the programming
language's syntax. The embedded SQL statements are recognized and processed by a
preprocessor or compiler specific to the programming language.
Embedded SQL offers the advantage of combining the power and flexibility of SQL with the
programming capabilities of the host language. It enables seamless interaction with the
database, allowing the application to retrieve, modify, and manage data stored in the database.
Both DDL and Embedded SQL are important components in managing databases and
developing database-driven applications. DDL defines and manages the structure of the
database, while Embedded SQL enables seamless integration of SQL statements within
application code to interact with the database. Together, they facilitate effective database
management and application development.
CENTRALIZED SYSTEMS
In a centralized system, all data, resources, and operations are managed and controlled from a
central point. This includes data storage, processing, access control, security measures, and
decision-making.
Centralized Administration:
The administration and management of the system, including maintenance, upgrades, and
system configuration, are performed by a central authority or team. This centralized
administration ensures consistent and coordinated management of the system.
Users and clients typically access the centralized system through a single point of entry, such as
a central server or mainframe computer. This provides a unified and controlled access point for
users to interact with the system.
Centralized systems often facilitate data and service sharing among different users or
components of the system. Data and services can be centrally stored and shared, enabling
efficient collaboration and access across the system.
While centralized systems offer certain advantages, such as centralized control and resource
optimization, they can also have limitations, including single points of failure, potential
performance bottlenecks, and limited scalability. As a result, decentralized or distributed
systems have gained prominence, where data and control are distributed across multiple nodes
or entities to achieve better fault tolerance, scalability, and performance.
Clients:
Clients are end-user devices or applications that initiate requests for services or resources.
Clients can be desktop computers, laptops, smartphones, or other devices. They interact with
the user, gather input, and send requests to servers for processing or data retrieval. Clients can
range from simple web browsers to complex software applications.
Servers:
Servers are dedicated systems or processes that receive and fulfill client requests. They provide
services, resources, or data to clients based on the nature of the request. Servers are designed
to handle multiple client connections simultaneously and can range from web servers,
application servers, file servers, database servers, or other specialized servers.
Request-Response Model:
The communication between clients and servers follows a request-response model. Clients
send requests to servers, specifying the desired service or resource, and servers process those
requests and send back the corresponding response. The response can include data, results of a
computation, or an acknowledgment.
Distributed Processing:
In a client-server system, the processing workload is distributed between clients and servers.
Clients handle the presentation logic, user interaction, and local data processing, while servers
handle the intensive processing, data storage, and business logic. This division of labor allows
for scalable and efficient utilization of resources.
Network Communication:
Client-server systems rely on network communication for clients and servers to interact. Clients
send requests over the network to servers, which receive and process those requests. The
communication can occur over a local area network (LAN), wide area network (WAN), or the
Internet using protocols such as HTTP, TCP/IP, or other network protocols.
Client-server systems can be designed to scale by adding more servers to handle increasing
client loads. This scalability allows for handling a large number of concurrent users or
accommodating growing demands. Additionally, client devices can be updated or replaced
independently without impacting the server infrastructure, providing flexibility in client
management.
Client-server systems are widely used in various domains, such as web applications, enterprise
systems, cloud computing, and distributed databases. They offer advantages like centralized
management, scalability, resource utilization, and separation of concerns between clients and
servers. However, they also introduce challenges such as maintaining server availability,
managing network communication, and ensuring data integrity and security.
PARALLEL SYSTEMS
Parallel systems are computing architectures designed to carry out multiple computations or
tasks simultaneously, thereby increasing processing speed and performance. Here are some
examples of parallel systems:
Parallel Supercomputers:
Parallel database systems distribute the storage and processing of data across multiple nodes
or servers, enabling efficient handling of large-scale databases. These systems employ parallel
query processing techniques to execute queries across multiple processors simultaneously,
improving query response time and scalability. Examples of parallel database systems include
Oracle Parallel Database and Teradata.
Graphics Processing Units (GPUs) are highly parallel processors originally designed for rendering
graphics. However, their architecture and parallel processing capabilities have made them
popular for general-purpose computing. GPU parallelism enables tasks to be divided into
parallel threads that can execute simultaneously, accelerating computations in areas such as
scientific simulations, machine learning, and data analytics.
Distributed Computing Systems:
Multi-core processors incorporate multiple processing cores on a single chip, allowing for
parallel execution of tasks. Each core can execute instructions independently, enabling
concurrent processing and improving overall performance. Applications that are designed to
take advantage of multi-core architectures can execute multiple threads or tasks in parallel,
enhancing efficiency and responsiveness.
Distributed systems refer to computing systems that consist of multiple interconnected nodes
or computers that work together to perform a unified task or provide a shared service. In a
distributed system, these nodes communicate and coordinate with each other to achieve a
common goal. Here's an explanation of distributed systems and some common network types
associated with them:
A Local Area Network is a network that covers a small geographic area, such as a home, office,
or campus. In a distributed system, nodes within a LAN can communicate directly with each
other at high speeds, enabling efficient data sharing and coordination. LANs are commonly used
in environments where multiple computers need to interact and share resources.
A Wide Area Network covers a larger geographical area and typically spans multiple locations,
such as cities, states, or even countries. WANs connect LANs over long distances, allowing
distributed systems to function across different physical locations. WANs rely on
telecommunications networks, such as leased lines, satellite links, or the Internet, to establish
connectivity between nodes.
A Metropolitan Area Network is a network that spans a larger area than a LAN but smaller than
a WAN, typically covering a city or metropolitan region. MANs provide connectivity between
different LANs within a defined geographic area. They enable distributed systems to operate
within a specific locality or city-wide region, facilitating efficient communication and data
sharing.
Internet:
The Internet is a global network that connects computers and networks worldwide. It provides
a vast infrastructure for distributed systems to operate on a global scale. Distributed systems
can leverage the Internet to communicate and exchange data between nodes located in
different regions or countries. The Internet's reach and ubiquity make it an essential network
type for distributed systems.
Wireless Networks: