Unit I
Unit I
Unit I
UNIT-I
Introduction:
Concepts and Definitions:
1. NoSQL Databases
2. NewSQL Databases
Definition: Modern relational databases that aim to provide the same scalability and
performance as NoSQL systems but maintain SQL-based querying.
Examples: Google Spanner, CockroachDB, NuoDB.
3. Distributed Databases
Definition: Databases that spread data across multiple locations or nodes to ensure
reliability and scalability.
Characteristics:
o Replication: Duplication of data across different nodes for redundancy.
o Sharding: Splitting data into smaller chunks and distributing them across
nodes to improve performance.
4. Data Warehousing
Definition: A system used for reporting and data analysis, designed to handle large
volumes of data from different sources.
Components:
o ETL (Extract, Transform, Load): Processes to gather, clean, and integrate
data.
o OLAP (Online Analytical Processing): Techniques for querying and
analyzing multidimensional data.
Definition: Technologies designed to process and analyze large volumes of data that
traditional databases cannot handle efficiently.
Tools:
o Hadoop: An open-source framework for distributed storage and processing.
o Spark: A unified analytics engine for large-scale data processing.
6. Data Lakes
7. Multi-Model Databases
Definition: Databases that support multiple data models (e.g., relational, document,
graph) within a single system.
Examples: ArangoDB, OrientDB.
Relational models:
Relational models in database management systems (DBMS) are based on the concept of
relations, which are essentially tables. Here's a brief overview of the key components and
concepts:
1. Tables (Relations):
o Rows (Tuples): Each row represents a single record or data entry.
o Columns (Attributes): Each column represents a field or property of the
record.
2. Primary Key:
o A unique identifier for each row in a table. It ensures that each record can be
uniquely identified. For example, a student ID in a student table.
3. Foreign Key:
o An attribute in one table that links to the primary key of another table. It
establishes a relationship between the two tables. For example, a course ID in
a student enrollment table that refers to the course ID in a courses table.
4. Schema:
o The structure of the database, including tables, columns, data types, and
relationships between tables.
5. Normalization:
o The process of organizing data to reduce redundancy and improve data
integrity. It involves dividing tables into smaller tables and defining
relationships between them.
6. SQL (Structured Query Language):
o A language used to interact with the relational database, allowing you to
query, insert, update, and delete data.
7. Integrity Constraints:
o Rules applied to ensure the accuracy and consistency of data. Examples
include primary key constraints, foreign key constraints, and unique
constraints.
8. Relationships:
o One-to-One: Each record in one table is related to one record in another table.
o One-to-Many: A record in one table can be related to multiple records in
another table.
o Many-to-Many: Records in one table can be related to multiple records in
another table, and vice versa. This often requires a junction table to manage
the relationships.
The relational model provides a systematic way to manage data and relationships, making it
easier to handle and query complex datasets.
In database management systems (DBMS), data modeling and query languages are
fundamental for designing, managing, and interacting with databases. Here’s a breakdown of
both concepts:
Data Modelling
Data modeling is the process of creating a conceptual framework for organizing and
structuring data in a database. It involves defining how data will be stored, related, and
managed. The primary components of data modeling are:
1. Entities: Objects or things in the real world that are relevant to the database (e.g.,
customers, products).
2. Attributes: Properties or details about entities (e.g., customer name, product price).
3. Relationships: Associations between entities (e.g., customers place orders, orders
contain products).
4. Schema: The overall structure of the database, including tables, columns, and
relationships.
Query Languages
Query languages are used to retrieve, manipulate, and manage data in a database. The most
common query languages are:
1. SQL (Structured Query Language): The standard language for relational databases.
It includes commands for querying (e.g., SELECT), updating (e.g., UPDATE), inserting
(e.g., INSERT), and deleting (e.g., DELETE) data.
o DML (Data Manipulation Language): Includes commands like SELECT,
INSERT, UPDATE, and DELETE.
o DDL (Data Definition Language): Includes commands like CREATE, ALTER,
and DROP to define and modify database structures.
o DCL (Data Control Language): Includes commands like GRANT and REVOKE
for managing permissions.
2. NoSQL Query Languages: Used in non-relational databases (NoSQL). These can
vary widely depending on the database type. Examples include:
o MongoDB Query Language (MQL): For querying MongoDB databases.
o CQL (Cassandra Query Language): For querying Cassandra databases.
o Gremlin: For querying graph databases.
3. SPARQL: Used for querying RDF (Resource Description Framework) data in
semantic web and linked data contexts.
Data modeling provides the structure and design for how data will be stored, while query
languages enable users to interact with and manipulate this data. Effective data modeling
ensures that the database is well-organized and optimized for querying, and understanding the
query language helps users to efficiently retrieve and manage the data they need.
Database Objects:
In a Database Management System (DBMS), database objects are various structures that
store and organize data. Here’s a rundown of some common types:
1. Tables: The fundamental building blocks where data is stored. They consist of rows
(records) and columns (fields).
2. Views: Virtual tables created by querying one or more tables. They present data in a
specific format or subset, without storing it separately.
3. Indexes: Structures that improve the speed of data retrieval operations on a table.
They work like book indexes, allowing quick lookups.
4. Sequences: Objects used to generate unique numbers, often for primary keys. They
provide a way to automatically generate sequential numbers.
5. Stored Procedures: Precompiled collections of SQL statements that can be executed
as a unit. They help encapsulate logic and improve performance.
6. Functions: Similar to stored procedures, but they return a single value. They can be
used in SQL statements like expressions.
7. Triggers: Procedures that are automatically executed in response to certain events on
a table, like insertions or updates.
8. Constraints: Rules applied to table columns to enforce data integrity, such as primary
keys, foreign keys, unique constraints, and check constraints.
9. Views: Virtual tables based on the result of a SELECT query. They don't store data
themselves but provide a way to present it in a specific format or subset.
10. Schemas: Collections of database objects that group together tables, views, indexes,
etc. They help organize and manage these objects.
These objects work together to store, manage, and retrieve data efficiently in a DBMS.
Normalization Techniques:
Functional Dependency:
Functional Dependency
Formally, if XXX and YYY are sets of attributes in a relation, we say that YYY is
functionally dependent on XXX, denoted as X→YX \rightarrow YX→Y, if and only if, for
any two tuples t1t_1t1 and t2t_2t2 in the relation, if t1[X]=t2[X]t_1[X] = t_2[X]t1[X]=t2[X],
then t1[Y]=t2[Y]t_1[Y] = t_2[Y]t1[Y]=t2[Y].
Example: Consider a table that records student information with subjects they are enrolled
in.
This table is not in 1NF because the Subjects column contains multiple values. To convert it
to 1NF, you would split the subjects into separate rows.
Example: If you have a table where StudentID and CourseID together form the primary
key, but CourseName depends only on CourseID, this is a partial dependency.
Here, CourseName is dependent only on CourseID, not the entire composite key (StudentID,
CourseID). To convert this to 2NF, you would split the table into:
Students-Courses Table:
StudentID CourseID
1 101
1 102
Courses Table:
It is in 2NF.
There are no transitive dependencies (i.e., non-key attributes should not depend on
other non-key attributes).
Example: Suppose you have a table where StudentID is the primary key and there are
attributes AdvisorName and AdvisorOffice.
Students Table:
Advisors Table:
It is in 3NF.
For every one of its non-trivial functional dependencies, the left side is a superkey.
Example: If you have a table where CourseID determines Instructor and Instructor
determines CourseID, this can be problematic because neither CourseID nor Instructor
alone is a superkey.
CourseID Instructor
101 Dr. Smith
102 Dr. Jones
Definition
A Multi-Valued Dependency occurs when, for a given value of an attribute AAA, a table
contains multiple values for another attribute BBB independently of a third attribute CCC.
Formally, for a relation RRR with attributes AAA, BBB, and CCC, a multi-valued
dependency A→→BA \rightarrow\rightarrow BA→→B means that for each value of AAA,
the set of values of BBB is independent of CCC. This means if two tuples have the same
AAA, they will have the same set of BBB values regardless of CCC.
Example
In this relation, StudentID →→ Hobby means that the set of hobbies for each student is
independent of the course. For each StudentID, the set of hobbies remains the same
regardless of which course they are taking.
Importance in Normalization
MVDs are significant in database normalization. They help in achieving the Fourth Normal
Form (4NF), which states that a relation is in 4NF if it is in Boyce-Codd Normal Form
(BCNF) and has no non-trivial multi-valued dependencies.
To decompose a relation with multi-valued dependencies into 4NF, you split it into two
relations: one for the multi-valued dependency and one for the remaining attributes.
Decomposition
This decomposition ensures that the multi-valued dependency is preserved while eliminating
redundancy and potential anomalies.
In database management systems (DBMS), the concepts of loss-less join and dependency
preservation are crucial when decomposing a database schema into multiple relations. Here’s
a breakdown of these concepts:
Definition: A decomposition of a relation RRR into R1R1R1 and R2R2R2 is said to be loss-
less if, when you join R1R1R1 and R2R2R2, you can recover the original relation RRR
without any loss of information.
Formally: For a decomposition of RRR into R1R1R1 and R2R2R2 to be loss-less, the
following must hold true: R=R1⋈R2R = R1 \bowtie R2R=R1⋈R2 where ⋈\bowtie⋈
denotes the natural join operation.
2. Dependency Preservation
Formally: Given a set of functional dependencies FFF on RRR, the decomposition of RRR
into R1R1R1 and R2R2R2 is dependency-preserving if: F=FR1∪FR2F = F_{R1} \cup
F_{R2}F=FR1∪FR2 where FR1F_{R1}FR1 and FR2F_{R2}FR2 are the functional
dependencies inferred from R1R1R1 and R2R2R2, respectively.
Why It's Important: Dependency preservation ensures that all functional dependencies are
enforced directly on the decomposed relations without requiring a joint operation to enforce
them.
In practice, achieving both loss-less join and dependency preservation can be challenging.
Some decompositions might preserve dependencies but not be loss-less, or they might be
loss-less but not preserve all dependencies. Therefore, careful consideration is needed when
designing a schema.
For example, the BCNF (Boyce-Codd Normal Form) decomposition guarantees loss-less join
but does not necessarily preserve all functional dependencies. On the other hand, 3NF (Third
Normal Form) decomposition might preserve dependencies but not always guarantee loss-
less joins.
Practical Approach