0% found this document useful (0 votes)
17 views47 pages

RDBMS Concepts

This document provides an overview of data modeling and management. It discusses relational data modeling, including concepts like the relational model, SQL, relational database design and normalization. It also covers NoSQL data modeling and management, data distribution, transaction processing, consistency models, and large scale data handling. The document introduces related topics in data engineering and the data science process life cycle.

Uploaded by

surangauor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views47 pages

RDBMS Concepts

This document provides an overview of data modeling and management. It discusses relational data modeling, including concepts like the relational model, SQL, relational database design and normalization. It also covers NoSQL data modeling and management, data distribution, transaction processing, consistency models, and large scale data handling. The document introduces related topics in data engineering and the data science process life cycle.

Uploaded by

surangauor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

DATA MODELING AND

MANAGEMENT

MGT 6302 - Management Information Systems


Agenda

DATA MODELING
DATA MANAGEMENT

2
Relational Data Model
and Management
NoSQL Data Modeling and
Management
Data Distribution
DATA Transaction Processing and
MODELING Consistency Models

AND Large Scale Data Handling


Application and Case Studies
MANAGEMENT Data Engineering

Data Science Process Life


Cycle

Introduction to Related Topics


3
Relational Model Concepts

Relational Data SQL

Model and
Management
Relational Database
Design and Normalization

Relational Database
Management Systems
(RDMSs)

4
WHAT IS RELATIONAL MODEL?

Although the relational


The relational model, model is based on
mathematics and uses
proposed by E.F.
mathematical terms such The physical storage of
Codd, represents the
as domains, unions and data is independent of
database as a set of
ranges, the how the data is
relationships used to
characteristics and logically organized.
store and process the
conditions it describes are
data in the database.
easy to define using
English. The relational model is
A relationship is popular for its
nothing more than a simplicity and ability to
table of values. Each hide low-level
relationship has implementation details
columns and rows from the developer
that are formally and database users.
called attributes and
tuples.

5
THE 12 RULES OF CODD

01 02 03
In a relational database, all data, Null values are supported to represent Although a relational system can support
including the data dictionary itself, is information not available or not several languages, there should be at
represented in one form, in two- applicable, regardless of the domain of least one language with different
dimensional tables. the respective attributes. characteristics.

04 05 06
Each data element is well determined by
The metadata is represented and In a view, all updatable data that is
the combination of the table name
accessed in the same way as the data modified, should see these changes
where it is stored, primary key value and
itself. translated into the base tables.
respective column (attribute).

6
THE 12 RULES OF CODD

07 08 09
There is the ability to treat a table (basic Changes in the database schema The fact that a database is centralized on
or virtual) as a simple operand (i.e. using (conceptual level), which do not involve one machine, or distributed over several
a set-oriented language), both in query element removals, should not affect the machines, should not have any impact
and update operations. external level - logical independence. on the level of data manipulation.

10 11 12
Changes in the physical organization of The integrity restrictions must be able to
If there is a lower level of record-oriented
the database files or in the methods of be specified in a relational language,
language in the system, it should not
accessing them (internal level) should not independently of the application
allow the integrity and security constraints
affect the conceptual level - physical programs, and stored in the data
to be overcome.
independence. dictionary.

7
OBJECTIVES OF THE RELATIONAL MODEL?

To provide a high To simplify the To allow the expansion


degree of data potentially formidable of set-oriented data
job of the DBA manipulation
independence
languages.

To provide a
Eliminate redundancy,
community view of the
consistency, etc.
data of spartan
problems
simplicity

8
RELATIONAL MODEL CONCEPTS

Attribute Table
Each column in a Table. Attributes are the properties In the Relational model the, relations are saved in the
which define a relation table format. It is stored along with its entities.

Tuple Relation Schema


It is nothing but a single row of a table, which A relation schema represents the name of the
contains a single record. relation with its attributes.

Degree Cardinality
The total number of attributes which in the relation is
called the degree of the relation. Total number of rows present in the Table.

Column Relation Instance


The column represents the set of values for a specific Relation instance is a finite set of tuples. Relation
attribute. instances never have duplicate tuples.

9
RELATIONAL MODEL CONCEPTS

Relation Key Attribute Domain


Every row has one, two or multiple attributes, which is Every attribute has some pre-defined value and
called relation key. scope which is known as attribute domain

Primary Key Candidate Key


Selected from among the various candidate keys, to A subset of attributes of a relationship that together
effectively identify each tuple. identify univocally any tuple.

Foreign Key Entity


Attribute, or set of attributes, of a relationship that is Representative of a class of objects on which
the primary key in another relationship. information is to be stored

10
SQL
STRUCTURED QUERY LANGUAGE

As a language, the ISO SQL standard has two


In 1986, a standard for SQL was defined main components:
by the American National Standards Data Definition Language (DDL) - to define
Institute (ANSI), which was adopted in the structure of the database and control
1987 as an international standard by the data access;
International Organization for Data Manipulation Language (DML) - for data
Standardization (ISO, 1987). This standard retrieval and updating.
is called SQL and has proven to be the
standard relational database language. SQL does not involve the specification of data
Most Database Management Systems access methods.
support this language, which makes it As most modern languages, SQL is virtually
almost universal. format-free, which means declarations are
not required.
11
OBJECTIVES OF SQL

A database language should allow a user to:

perform basic data


create the management tasks
database and the perform simple and such as entering,
relationship complex queries. modifying and
structures; deleting data from
relationships;

A database language should perform these tasks with minimal effort from the
user, and their structure and syntax should be easy to learn

12
SQL LANGUAGE

Data Manipulation Language


(DML)
S E L E CT I NS E R T
to query data in the to insert data into a
database. table.

U P DAT E DE L E T E
to update data from a to delete the data from a
table. table.

13
SQL LANGUAGE: SELECT

The processing sequence of a SELECT instruction is:

GROUP BY
FROM WHERE forms groups SELECT
Specifies the filters the lines of rows with HAVING specifies ORDER BY
table or subject to condition- which specifies the
the same
tables to be some based group columns output order
value(s) as
used condition filters should
the
appear in the
column(s)
output

The order of the clauses in the SELECT instruction cannot be changed.


Only the first two clauses are mandatory: SELECT and FROM; the rest are optional.

14
SQL LANGUAGE: DISTINCT

The SELECT DISTINCT statement is used to return only distinct (different)


values.

Inside a table, a column often contains many duplicate values; and


sometimes you only want to list the different (distinct) values.

SELECT DISTINCT column1, column2, ...


FROM table_name;

15
SQL LANGUAGE: WHERE

Match - Compares the value of one Group - checks whether the value of
statement with the value of another a statement falls within a specified
statement. range of values.

It belongs - if the value of an Pattern match - if a string matches a


expression belongs to a set of specified pattern.
values.

Null - test if a column has a null value (unknown).

SELECT column1, column2, ... FROM table_name WHERE condition;

16
SQL LANGUAGE: OPERATORS

Rules for logical operators: AND, OR and


= equals
NOT, with parentheses (if necessary or
<> different(ISO standard) desired) to change normal precedence:
! = different(allowed in SGBD) • evaluate an expression from left to
< is less than right;

<= is less than or equal to • in brackets are evaluated first;


• NOTs are evaluated first AND and
> is bigger than
OR operators;
> = is bigger than or equal to
• ANDs are evaluated before ORs.
17
SQL LANGUAGE: BETWEEN

The BETWEEN operator selects values within a given range. The values can
be numbers, text, or dates.

The BETWEEN operator is inclusive: begin and end values are included.

SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2;

18
SQL LANGUAGE: IN

The IN operator allows you to specify multiple values in a WHERE


clause.

The IN operator is a shorthand for multiple OR conditions.

SELECT column_name(s) SELECT column_name(s) FROM


FROM table_name table_name WHERE
OR
WHERE column_name IN column_name IN (SELECT
(value1, value2, ...); STATEMENT);

19
SQL LANGUAGE: LIKE

The LIKE operator is used in a WHERE clause to search for a specified pattern
in a column.
SQL has two special pattern matching symbols:

% represents any sequence of zero or more characters (wildcard).


_ represents any unique character.

SELECT column1, column2, ...


FROM table_name
WHERE column LIKE pattern;

20
SQL LANGUAGE: NULL

It is not possible to test for NULL values with comparison operators, such
as =, <, or <>.
We will have to use the IS NULL and IS NOT NULL operators instead.

SELECT column_names SELECT column_names


FROM table_name FROM table_name
OR
WHERE column_name IS NULL; WHERE column_name IS NOT
NULL;

21
SQL LANGUAGE: ORDER BY

The ORDER BY clause allows rows to be displayed in ascending (ASC) or


descending (DESC) order of any column or combination of columns,
regardless of what appears in the result.

In some implementations ORDER BY only allows SELECT list elements. In both


cases, ORDER BY should always be the last clause of the SELECT statement.

SELECT column1, column2, ...


FROM table_name
ORDER BY column1, column2, ... ASC|DESC;

22
SQL LANGUAGE: AGREGATION FUNCTION

The ISO standard defines five aggregate functions:

COUNT - returns the SUM - returns the sum AVG - returns the MIN - returns the
number of values in a of the values of a average of the values smallest value of a
specified column; specified column; of a specified specified column;
column;

MAX - returns the


highest value of a COUNT, MIN, MAX and apply to all column types, SUM and AVG
specified column. can only be used in numeric columns. Each function deletes nulls
first and operates only on the remaining non-zero values. COUNT
(*) has a special use, which counts all rows of a table.

23
SQL LANGUAGE: GROUP BY

The columns referenced in GROUP BY are called grouping columns.


When GROUP BY is used, each item in the selection list must be of unique
value per group.

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

24
SQL LANGUAGE: HAVING

The HAVING clause was added to SQL because the WHERE keyword
could not be used with aggregate functions.

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
25
SQL LANGUAGE: MULTI TABLE QUERIES

A JOIN clause is used to combine rows from two or more tables, based
on a related column between them.

SELECT Orders.OrderID, Customers.CustomerName,


Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;

26
SQL LANGUAGE: EXISTS AND NOT EXISTS

They can be used in subqueries and produce a true or false result.


- EXISTS is true if, and only if, there is at least one row in the table returned by the
subquery; it is false if the subquery returns empty.
- EXISTS is the opposite of EXISTING.
EXISTS and NOT EXISTS only verify the existence or not of rows in the result of the
subquery.
SELECT column_name(s)
FROM table_name
WHERE EXISTS
(SELECT column_name FROM table_name WHERE condition);

27
SQL LANGUAGE: COMBINE TABLES

(UNION, INTERSECT, EXCEPT)


In SQL, we can use the operations of Union, Intersection and Exception to
combine the results of two or more queries in a single result:

The Union of two


The Intersection of The Exception of two
tables, A and B, is a
two tables, A and B, is tables, A and B, is a
table with all the rows
a table with all rows table with all lines
that are in the first
that are common to that are in table A but
table A or the second
both tables A and B. not in table B.
table B or both.

28
SQL LANGUAGE: UNION, INTERSECT, EXCEPT

SELECT column_name(s) FROM table1


UNION
SELECT column_name(s) FROM table2;

(SELECT column_name FROM table1)


INTERSECT
(SELECT column_name FROM table2);

(SELECT column_name FROM table1)


EXCEPT
(SELECT column_name FROM table2);

29
SQL LANGUAGE: DATABASE CHANGES

SQL is a complete data manipulation language and can be used to


change database information in addition to query operations. There are
three SQL commands available to modify the contents of the tables in
the database:

INSERT UPDATE DELETE


adds new rows changes the removes rows
to a table; data in a table; from a table.

30
SQL LANGUAGE: INSERT

It is possible to write the INSERT INTO statement in two ways. The first way specifies
both the column names and the values to be inserted. The second way only
specifies the table and the values.
Make sure the order of the values is in the same order as the columns in the table

INSERT INTO table_name


(column1, column2, INSERT INTO table_name
column3, ...) OR VALUES (value1, value2,
VALUES (value1, value2, value3, ...);
value3, ...);

31
SQL LANGUAGE: UPDATE

The UPDATE statement allows the contents of existing rows in a table to


be changed.
The new datavalue(s) must be compatible with the data type(s) of the
corresponding column.

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

32
SQL LANGUAGE: DELETE

The DELETE statement allows rows to be deleted from a table.


Search condition is optional; if omitted, all table rows are deleted. This
does not delete the table - to delete the contents and definition of a
table we use the DROP TABLE command.

DELETE FROM table_name


WHERE condition;

33
RELATIONAL DATABASE DESIGN AND
NORMALIZATION
Standardization is a systematic process, defined by a set of well-defined
rules, aimed at eliminating sources of redundancy in data:
• Maintenance problems;
• Storage space costs;
• Performance problems.

1NF 2NF 3NF BCNF 4NF

- LESS RELATIONS - MORE RELATIONS


- LESS REDUNDANCY
- MORE REDUNDANCY
34
CASE STUDY

For the next slides, the case to be type as an example should be that of a
medical clinic. The following restrictions should be considered:
• A doctor only consults a specialty.
• The price of the consultation depends on the specialty.
• During the consultation treatments can be made to the patient that will
be paid together with the consultation.
• The clinic has 30 doctors consulting 10 specialties, for a universe of 10000
patients, having about 50 consultations per day.

35
NORMALIZATION: 1ST NF

For the medical entity there may be several appointments.

The solution is to break down the initial relationship into as many


relationships as there are groups that repeat themselves.
Delete The conceptual scheme of the database in 1FN is as follows:
duplicated
groups of Doctor(cod_med,name,address,contact,age,cod_esp,especiality,
values for a price)
particular Appointment(cod_med,date,hour,n_ben,patient,age,address,cod_
attribute. post,city, price)

A table is in 1FN, if and only if, all the values in the table columns are atomic

36
DEPENDENCIES

The whole process required for normalization is represented by the existing


dependencies between the data, and there are three types of dependencies:

Functional Multi-Value Join


There is a functional In a relationship it is said that
There is a join dependency if,
dependency between two there is a multi-value
given some projections about
groups of X and Y attributes, if an dependency X->Y if, for each
this relationship, one only
instance of values of X uniquely pair of tuples of R containing
reconstructs the initial
determines or identifies an the same values of X there is relationship through some very
instance of values of Y attributes. also in R a pair of tuples specific junctions but not all
(2nd, 3rd and Boyce-Codd FN). corresponding to the exchange (5th FN).
of Y in the original pair ( 4th FN).

37
NORMALIZATION: 2ND NF

Taking the example again, we see that in the Appointment


relationship there are two functional dependencies:

Appointment(cod_med,date,hour,n_ben,patient,age,address,co
Analysis of d_post,city, price)
functional
dependencies.
1.cod_med,date,hour, 2.n_ben>patient,age,address,
n_ben->price cod_post,city
The second dependency is at odds with the definition of the 2ndFN.
Generation of The solution is to decompose the relationship according to the previous
new tables with functional dependencies:
complete Appointment(cod_med,date,hour,n_bem,price)
functional Patient(n_ben ,patient,age,address,cod_post,city)
dependencies. 38
NORMALIZATION: 3RD NF

Be in 2FN; Creation of new Check the Analysis of Entities in 3FN


tables with direct exclusive functional can also not
functional dependency of dependencies contain attributes
dependencies the primary key; between non-key that are the result
attributes; of some
calculation of
another attribute.
A relationship in the 3rdFN is a relationship in which there are no functional dependencies between
non-key attributes (transitive dependencies).

39
NORMALIZATION: 3RD NF

Taking the example again, we see that in the Medical relationship there are two
functional dependencies

Doctor(cod_med,name,address,contact,age,cod_esp,especiality,price)

cod_med -> cod_esp ->


name,address,contact,age,cod_ especiality,price
esp
The second dependency is at odds with the definition of the 2ndFN.
The solution is to decompose the relationship according to the previous functional dependencies
• Doctor(cod_med ,
The result is:
name,address,contact,age,cod_esp )
• Especiality(cod_esp, especiality,price)
40
NORMALIZATION: 3RD NF

In the Patients relationship there are two functional


dependencies:
Pacient(n_ben ,patient,age,address,cod_post,city)

n_ben ->
cod_post ->
pacient,age,address,cod_pos city
The solution is to decompose the relationship again according to the previous
functional dependencies:

• Patiente(n_ben ,pacient,age,address,cod_post)
• Cod_Postal(cod_post,city)
41
NORMALIZATION: BOYCE CODD NF

It is an advance version of 3NF that’s why it is also referred as


3.5NF.
(patient,cod_serv) ® doctor
It should be
in the Third
Normal
Form. (patient,doctor) (patient,cod_serv)

This solution introduces another problem: it is possible to register two doctors


from the same service for the same patient!
And, for
(This addiction should be treated at application level!)
any
Or keep R together with R2
dependenc
y A → B, A A relationship is in the BCNF if all attributes are functionally dependent on
the key, the whole key and nothing but the key.
should be a
super key. 42
RELATIONAL DATABASE MANAGEMENT SYSTEMS
(RDBMS)

Database Relational Database Relational Database


Management System

A relational database
A relational database is a type
A database is a set of management system (RDBMS) is
of database. It uses a structure
data stored in a a program that allows you to
that allows us to identify and
computer. This data is create, update, and administer
access data in relation to
usually structured in a a relational database. Most
another piece of data in the
way that makes the relational database
database. Often, data in a
data easily accessible. management systems use the
relational database is organized
SQL language to access the
into tables.
database.

43
RDBMS VS. DBMS

An RDBMS is a type of database management system (DBMS) that stores data in a table which connects related
data elements.

• A RDBMS includes functions that maintain • It does not apply any security protocol with
data security, accuracy, integrity and respect to data manipulation.
consistency; • The DBMS stores the data as an archive;
• The data is stored in table form; • Supports only one user;
• Supports multiple users; • Does not support client-server architecture;
• supports client-server architecture • Requires few software and hardware
• It has high software and hardware requirements
requirements • It does not take into account the concept
• Keys and indexes do not allow data of standardization, leading to redundancy
redundancy. in data.

44
ADVANTAGES OF RELATIONAL DATABASE
MANAGEMENT SYSTEMS

Data Structure
The table format is basic and easy to use and understand for database users. RDBMS allow data to be accessed via a
native structure and organization of the data. Database queries can search each column for corresponding entries.

Maintenance
RDBMS have maintenance programs that provide administrators with tools to simply maintain, test, repair and
back up the databases hosted on the system. Many of the features can be automized through the built-in
automation in the RDBMS.

Network Access
RDBMS provide access to the database through a server daemon, a dedicated software program that listens for
requests on a network and allows database clients to connect to and use the database.

45
ADVANTAGES OF RELATIONAL DATABASE
MANAGEMENT SYSTEMS
Multi-User Access
RDBMS allow several users to access a database synchronously. Built-in lock and transaction management
features help users access data as it is being changed, prevent conflicts between two users who are updating
data, and prevent users from accessing partially updated records.

Privileges
Authorization and privilege control features in an RDBMS allow the database administrator to restrict access to
authorized users, and grant privileges to individual users based on the types of database tasks they need to
perform. Authorization can be defined based on the remote client IP address in combination with user
authorization, restricting access to specific external computer systems.

Language
RDBMSs support a generic language called "Structured Query Language" (SQL). The SQL syntax is simple, and the
language uses standard English language keywords and phrasing, making it fairly intuitive and easy to learn. Many
RDBMSs add non-SQL, database-specific keywords, functions and features to the SQL language.

46
POPULAR RELATIONAL DATABASE MANAGEMENT
SYSTEMS

MySQL PostgreSQL SQLite

Oracle SQL Server

47

You might also like