0% found this document useful (0 votes)

34 views32 pages

STAT 624 Computing Tools For Data Science: Module 1: Relational Databases

Relational

Uploaded by

Manuel Alberto Pérez P.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views32 pages

STAT 624 Computing Tools For Data Science: Module 1: Relational Databases

Relational

Uploaded by

Manuel Alberto Pérez P.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

STAT 624 Computing Tools for

Data Science

Module 1: Relational Databases

Instructor: Scott A. Bruce

© 2022 Scott A. Bruce. Do not distribute.

This module contains material from the
LinkedIn Learning course

Programming Foundations: Databases

by Scott Simpson

and is accessible via

https://linkedinlearning.tamu.edu/
What are databases?

• A database is an organized collection of data stored and accessed

electronically from a computer system.

• Can’t we already do this using spreadsheets, files, folders and such?

• Data problems: 1) size, 2) ease of updating, 3) accuracy, 4) security, 5)

redundancy, 6) importance.

• Database solutions: 1) scalable, 2) accessible, 3) accurate, 4) secure,

5) consistent, 6) permanent.

Databases give us structure.

Databases impose your rules on the data!

Database management systems

• A database is an organized collection of data stored and accessed

electronically from a computer system.

• This is different from a database management system, or DBMS,

which manages databases and guarantees your rules and structure
are applied.

• A single DBMS usually manages many different databases.

• Most common: relational DBMS.

• Other DBMSs typically assume knowledge of relational DBMSs and

use similar vocabulary.
Features of a relational database
• A database has one or more tables. Tables are the fundamental
building blocks of a relational database.

• All data are stored in tables and are often represented as a

“spreadsheet”.

• Each column must be defined as to the type of data it contains (i.e. the
attribute/structure). Examples: strings, date, integers, decimals.

• Each row is a 'record' or 'tuple’ – a single data item.

Unique values and primary keys

• Every table will have a key. It is a way to identify a particular row in

a table:

• A key must have unique values for each record. No exceptions!

• Sometimes we have a natural key (e.g. UIN).
• Every table will have a primary key. DB enforced.
• DB will generate a primary key (synthetic key) for your tables if you
do not specify one.
Defining table relationships
• Keys are used to define relationships in tables:

• This is a one-to-many relationship (most common).

Many-to-many relationships
• Most RDBMSs cannot create a direct many-to-many relationship.
See this example:

• Orders can have many items, and dishes can be included in many
orders.

• If multiple columns are used to characterize this relationship, 1) it is

not clear how many columns should be used and 2) most fields will
be blank
Many-to-many relationships
• In such cases, can create a linking (or junction) table:
One-to-one relationships
• Less common since usually implies two tables should just be one
table.

• Some use cases exist (e.g. security):

Referential integrity constraints
• Databases that are aware of relationships won’t allow a user to
modify data in a way that violates those relationships.

• Helps maintain consistency and accuracy of database.

Referential integrity constraints
• Databases that are aware of relationships won’t allow a user to
modify data in a way that violates those relationships.

• Helps maintain consistency and accuracy of database.

Referential integrity for deletions
• Databases that are aware of relationships won’t allow a user to
modify data in a way that violates those relationships.

• Helps maintain consistency and accuracy of database.

• Example: deleting a customer automatically deletes corresponding

orders for that customer (known as cascading delete).
Transactions and the ACID test
• Transactions group queries or statements into a block of activities
(e.g. sending money from account X to account Y).

• ACID:
• Atomic: transaction must completely happen or not at all.
Reason for failure is irrelevant.
• Consistent: transaction must leave DB in a consistent state (i.e.
there can be no violation of rules/structure).
• Isolated: only one transaction for a data element at a time. Data
must be locked for the transaction.
• Durable: transaction must be robust. A "success" must
guarantee the transaction happened correctly and will not be
lost due to service outages, crashes, or other causes.

• The DBMS enforces ACID. Not the programmer’s job!

Structured Query Language (SQL)

• SQL is a language. Been around since the 1970’s.

• SQL is a declarative query language, not procedural or an

imperative language.
• You tell the DB what you want ---you let the DBMS worry about
how to do it.
• You do not worry about steps or algorithms on how to
accomplish a task (e.g. imperative).

• "I want all the books more than $40":

SELECT * FROM Books WHERE ListPrice> 40

• SQL -> CRUD (Create, Read, Update, Delete) data.

Introduction to database modeling
• Creating the formal description of our database (e.g. the schema).
That is, the tables, columns, keys, relationships, etc.

• Database modeling requires planning. Agile (i.e. iterative)

development is not well suited to database design.

• The point of database design is to impose structure on your data.

This requires thought and planning.

• Adding features is easier than modifying or changing fundamental

data structures and relationships.

• Methods for modeling databases have been tested since the 1970s
(e.g. 45+ years of experience).
Planning your database
• What are you trying to accomplish?
• Be careful about simple answers.

• What do you already have?

• Review the existing data and structure
• Examine your existing databases.

• What are your entities?

• Texas A&M: students, faculty, courses, buildings, departments,
colleges, etc. (singular or plural)?

• What are the relationships between the entities?

• ERM -> Entity Relationship Modeling

Entity Relationship Modeling Example
Identifying columns and data types

• Entities -> Tables; Attributes -> Columns

• Entities: be granular.
• LastName, FirstName, Suffix, ZipCode, etc.
• Easier to get to data ---no need to ‘extract’.

• Avoid spaces in entity & attribute names.

• Specify (the data type) on what columns are.

• Character + length (ASCII, Unicode), date, integer (size),
decimal, binary, etc.
• Other characteristics: Allow null? Pattern match (e.g. email,
social security no, etc.)?
• Precise data types allow the DBMS to enforce structure and be
more efficient.
Example: DB2 datatypes
Choosing primary keys
• Choose a primary key (PK) for each entity (e.g. table).
• If there is none, we must make one.
• DBMS will have some mechanism to create a column as a
primary key (e.g. Customary, etc.). Generally this is an
incrementing number.

• Sometimes we can combine

two or more columns and
make them a composite
primary key.
• It is often more useful to
generate a synthetic
primary key.
Database normalization

• Database normalization is the process of organizing the columns

(attributes) and tables (relations) of a relational database to reduce
data redundancy and improve data integrity.

• Edgar F. Codd introduced three rules for organizing data in a

database (1NF, 2NF & 3NF) in the 1970s.

• Informally, a relational database table is often described as

normalized if it meets Third Normal Form.

• Most 3NF tables are free of insertion, update, and deletion

anomalies.
First normal form (1NF)
• 1NF: 1) Values in each cell should be atomic (i.e. only one value)
and 2) tables should have no repeating groups.

• These tables violate 1NF. What to do?

First normal form (1NF)
• 1NF: 1) Values in each cell should be atomic (i.e. only one value)
and 2) tables should have no repeating groups.

• Remove repeating groups and create another table that satisfies 1NF
to hold the values:
First normal form (1NF)
• 1NF often extended to include idea that there aren’t duplicate rows in
a table.

• Also suggests order of rows and columns is not important to the data.
Second normal form (2NF)
• 2NF: No value in a table should depend on only part of a key that
can be used to uniquely identify a row.

• Only an issue when using composite primary keys.

• Location not dependent on the full candidate key (only dependent on

name). Changing event name could leave DB in an inconsistent
state since no guarantee location will also be changed.
Second normal form (2NF)
• 2NF: No value in a table should depend on only part of a key that
can be used to uniquely identify a row.

• Only an issue when using composite primary keys.

• Create new table reflecting fact that each event is held at just one
place. Now both tables have values dependent on full keys.
Third normal form (3NF)
• 3NF: No non-key field is dependent on any other non-key field (e.g.
“Can I figure out values of a row from any other value of that row?”)

• Table is in 1NF and 2NF, but violates 3NF (why?):

Third normal form (3NF)
• 3NF: No non-key field is dependent on any other non-key field (e.g.
“Can I figure out values of a row from any other value of that row?”)

• Risk: someone could edit the Lunch Price and not the Price, which
would violate the 50% discount rule.

• Solution: drop lunch prices from table (it can be calculated) and
possibly create separately table containing lunch prices.
Denormalization
• Sometimes we choose not to normalize for convenience or
performance reasons.
Denormalization
• Sometimes we choose not to normalize for convenience or
performance reasons.

• Risk of someone updating quantity in orders table and data would be

inconsistent.

• Trade-off between speed and risk of inconsistency and accuracy.

Denormalization
• Sometimes data appear denormalized but actually are not.

• ZIP code does NOT uniquely identify city and state.

Introduction To QAD Enterprise Applications User Guide PDF
No ratings yet
Introduction To QAD Enterprise Applications User Guide PDF
208 pages
Cloud Computing Resource Replication
No ratings yet
Cloud Computing Resource Replication
18 pages
Entity Relationship Modeling
No ratings yet
Entity Relationship Modeling
41 pages
Bachelor of Science in Computer Science
No ratings yet
Bachelor of Science in Computer Science
2 pages
Epi Data
100% (2)
Epi Data
31 pages
Network Automation Cookbook Pdf00015
No ratings yet
Network Automation Cookbook Pdf00015
5 pages
RISC V VectorExtension 1 1
No ratings yet
RISC V VectorExtension 1 1
72 pages
SDH Concepts
No ratings yet
SDH Concepts
94 pages
Group Assignment 1 PDF
No ratings yet
Group Assignment 1 PDF
2 pages
Gameduino Tutorial
No ratings yet
Gameduino Tutorial
12 pages
WS 2.4
0% (1)
WS 2.4
3 pages
Project Report SPM
No ratings yet
Project Report SPM
31 pages
Café Time Time Management System02
No ratings yet
Café Time Time Management System02
21 pages
Gotive H42 Advanced User's Guide v1
No ratings yet
Gotive H42 Advanced User's Guide v1
23 pages
23BCS11965 - Rohit Gupta Report File Python
No ratings yet
23BCS11965 - Rohit Gupta Report File Python
18 pages
Undercarriage Inspection Service Undercarriage Inspection Service
No ratings yet
Undercarriage Inspection Service Undercarriage Inspection Service
2 pages
MG Gs Crestron Flex Unified Communications Solutions
No ratings yet
MG Gs Crestron Flex Unified Communications Solutions
9 pages
Skills IT Academy Profile
No ratings yet
Skills IT Academy Profile
8 pages
VSPlayer User Manual V6.0.0.4
No ratings yet
VSPlayer User Manual V6.0.0.4
17 pages
Plantweb Optics Data Lake: Transform Data Into Intelligent Business Decisions
No ratings yet
Plantweb Optics Data Lake: Transform Data Into Intelligent Business Decisions
7 pages
SyncServer S650 SAASM 00002904C
No ratings yet
SyncServer S650 SAASM 00002904C
6 pages
Skylark Skylark Skylark Skylark AAT AAT AAT AAT Suite Suite Suite Suite
No ratings yet
Skylark Skylark Skylark Skylark AAT AAT AAT AAT Suite Suite Suite Suite
12 pages
H.265+ Encoding Technology: Hikvision
No ratings yet
H.265+ Encoding Technology: Hikvision
12 pages
VNX 5100 - Initialize An Array With No Network Access
No ratings yet
VNX 5100 - Initialize An Array With No Network Access
6 pages
MIFAREPLUSXFS
No ratings yet
MIFAREPLUSXFS
2 pages
Design and Simulation of DAC On The Basis of Capacitor Array
No ratings yet
Design and Simulation of DAC On The Basis of Capacitor Array
4 pages
Ti c55x DSP
No ratings yet
Ti c55x DSP
2 pages
Port Forwarding in AsusWRT Merlin
No ratings yet
Port Forwarding in AsusWRT Merlin
2 pages
Translate English To Khmer - Google Search 3
No ratings yet
Translate English To Khmer - Google Search 3
1 page
Provision-IsR CMS - PC Decode & Record Capabilities
No ratings yet
Provision-IsR CMS - PC Decode & Record Capabilities
1 page
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)

STAT 624 Computing Tools For Data Science: Module 1: Relational Databases

Uploaded by

STAT 624 Computing Tools For Data Science: Module 1: Relational Databases

Uploaded by

STAT 624 Computing Tools for

Module 1: Relational Databases

Instructor: Scott A. Bruce

© 2022 Scott A. Bruce. Do not distribute.

Programming Foundations: Databases

and is accessible via

• A database is an organized collection of data stored and accessed

• Can’t we already do this using spreadsheets, files, folders and such?

• Data problems: 1) size, 2) ease of updating, 3) accuracy, 4) security, 5)

• Database solutions: 1) scalable, 2) accessible, 3) accurate, 4) secure,

Databases give us structure.

Databases impose your rules on the data!

• A database is an organized collection of data stored and accessed

• This is different from a database management system, or DBMS,

• A single DBMS usually manages many different databases.

• Most common: relational DBMS.

• Other DBMSs typically assume knowledge of relational DBMSs and

• All data are stored in tables and are often represented as a

• Each row is a 'record' or 'tuple’ – a single data item.

• Every table will have a key. It is a way to identify a particular row in

• A key must have unique values for each record. No exceptions!

• This is a one-to-many relationship (most common).

• If multiple columns are used to characterize this relationship, 1) it is

• Some use cases exist (e.g. security):

• Helps maintain consistency and accuracy of database.

• Helps maintain consistency and accuracy of database.

• Helps maintain consistency and accuracy of database.

• Example: deleting a customer automatically deletes corresponding

• The DBMS enforces ACID. Not the programmer’s job!

• SQL is a language. Been around since the 1970’s.

• SQL is a declarative query language, not procedural or an

• "I want all the books more than $40":

• SQL -> CRUD (Create, Read, Update, Delete) data.

• Database modeling requires planning. Agile (i.e. iterative)

• The point of database design is to impose structure on your data.

• Adding features is easier than modifying or changing fundamental

• What do you already have?

• What are your entities?

• What are the relationships between the entities?

• ERM -> Entity Relationship Modeling

• Entities -> Tables; Attributes -> Columns

• Avoid spaces in entity & attribute names.

• Specify (the data type) on what columns are.

• Sometimes we can combine

• Database normalization is the process of organizing the columns

• Edgar F. Codd introduced three rules for organizing data in a

• Informally, a relational database table is often described as

• Most 3NF tables are free of insertion, update, and deletion

• These tables violate 1NF. What to do?

• Only an issue when using composite primary keys.

• Location not dependent on the full candidate key (only dependent on

• Only an issue when using composite primary keys.

• Table is in 1NF and 2NF, but violates 3NF (why?):

• Risk of someone updating quantity in orders table and data would be

• Trade-off between speed and risk of inconsistency and accuracy.

• ZIP code does NOT uniquely identify city and state.

You might also like