Topic 5 - ERD & SQL 1
Topic 5 - ERD & SQL 1
1
Key values/meanings
Consider Unit codes and descriptions for university units.
UnitCode Description
INS2055 Databases
INS3082 Database Systems
INS2061 Data mining and business analytics
Imagine that you see this data in the Enrolment table (similar to Casting table in Movie Database)
UnitCode StudentID
INS2055 111222333
INS2061 111222333
INS2061 555666777
2
Surrogate Keys
However, so far in Access we have used tables that have surrogate keys
A surrogate key is
• A key that has no real world / business meaning
• Is usually numeric
• Is often a sequential number supplied by the RDBMS
(e.g. the AutoNumber option in field settings)
Many databases around the world have been created with all their
tables using surrogate keys.
3
Surrogate Keys
However, if a Unit table uses a surrogate key (no business meaning), we may have this:
Unit ID UnitCode Description
1 INS2055 Databases
2 INS3082 Database Systems
3 INS2061 Data mining and business analytics
5
Key Wars
Arguments about Surrogate Keys:
Advantages
• The key has no business meaning
• The key should never have to change (i.e. if the business rules change)
Disadvantages
• You will not 'know' if invalid values (non-existent unit codes) have been entered.
• Tutor 1 worked 8 hours this week on running tutorials in unit 74 (what is unit 74??)
http://stackoverflow.com/questions/63090/surrogate-vs-natural-business-keys
http://www.agiledata.org/essays/keys.html
6
Key Wars
Arguments about Natural Keys:
Advantages
• They have business meaning
• Non IT people understand what the keys mean
• The key provides information to the user without having to perform lookups
• Fred Blogs worked 8 hours this week on running tutorials in unit INS2055
Disadvantages
• Natural Keys may change
• The university renames all INS units to ICT units (INS2055 becomes ICT2055)
• They value of a natural key may be long and cumbersome
• A key value containing text may not be processed as quickly as a numeric value
Clever people from both sides of the fence argue about this topic.
https://en.wikipedia.org/wiki/Timeline_of_country_and_capital_changes
7
Modelling with Natural Keys
In the coming weeks, we commence modelling business requirements
using Entity Relationship Diagrams.
When modelling business requirements
• You speak to clients
• You use terms applicable to their business
• You do not invent terms / fields that do not match their business
• You use Natural Keys
E.g. If a small college teaches 20 units, that business may not have/use unit codes.
When identifying a unit, they simply use natural keys such as Unit Name.
While Unit Name may seem obviously inadequate for a large database, it may be
sufficient for our Modelling requirements.
Forcing a term such as Unit Code into the conversation may confuse clients.
8
Modelling with Natural Keys
Modelling typically uses Natural Keys
If a Database is required:
• Database implementers can choose to add surrogate keys.
• When adding a surrogate key – you do not lose any data.
• The natural key data is not removed. It's just not the PK
9
Composite Keys
Natural Keys are sometimes Composite Keys.
A composite key is a key made up of multiple values.
12
Entity Relationship Model & Diagram
Entity Relationship Model
• A logical representation of data required by an organisation
• Uses entities to represent people, objects, events…
• Identifies relationships between various entities
• Based around business rules of the organisation
13
ERDs vs Access Relationship Diagrams
An Entity-Relationship Diagram (ERD) use natural keys
• No surrogate key ( only real-world keys )
14
E-R Diagrams
An Entity-Relationship Diagram (ERD)
• is a way to express the structure of information used by an organization or
business in the form of a diagram
• used to assist in database design
MovieNo RatingCode
Title Description
YearReleased
15
E-R Diagrams - Entities
The E-R Diagram has three major components:
Entities
This diagram has two entities: Subject and Lecturer
SUBJECT LECTURER
LecOffice
16
E-R Diagrams - Attributes
The E-R Diagram has three major components:
SubjectCode LecId
Title LecName
CreditPoints Age
SUBJECT LECTURER
LecOffice
17
E-R Diagrams - Relationships
The E-R Diagram has three major components:
Entities: Subject, Convenor
Attributes: SubjectCode, Title, CreditPoints, LecID…
Relationships: This diagram shows a that there is a relationship
between the entities subject and lecturer
SubjectCode LecId
Title LecName
CreditPoints Age
18
E-R Diagrams & Sample Data
When building an ER Diagram, always consider examples of data that
would be stored in each entity.
After all, that's why we build a database – to store data
SubjectCode LecId
Title LecName
CreditPoints Age
Subject Data
INS2055 Database 4
INS1053 Intro to Business Data Analytics 3
Lecturer Data
207 John Smith 37
119 Jane Pitt 26
19
E-R Diagrams & Business Rules
Consider some business rules:
• A student must only be enrolled in one course at any time
• A student may enrol in many subjects at one time
• A subject must only have one convenor
• An employee must only have one tax file number
21
E-R Diagrams & Business Rules
Why is discovering business rules so difficult?
• No single person knows all of the business rules of an organisation
• Individuals can't tell you every rule
• Individuals won't to tell you every rule (some will tell you rules that don't exist)
• Fear, distrust, changing their job…
• Lack of existing documentation
•
22 This semester, you will be given all of the business rules
Cardinality constraints
Cardinality constraints specify how many instances of one entity
are related to instances of another entity
The answer is always One or Many
At our school:
• How many lecturers convene a single subject? ONE
• How many subjects can a lecturer convene? MANY
Always begin each sentence with the word ONE (never begin with "many")
The only difficulty with this diagram is that the relationship name
"Convened by" only reads left to right.
The reader must rephrase those words so that it makes sense when
reading right to left
26
Entities Instances
An entity instance is one set of values for the attributes of an Entity
LecId 119
LecName Jane Pitt
Age 26
27
E-R Diagrams – Identifiers
An identifier is an attribute(s) that uniquely identifies an entity instance
➢ Same concept as a Primary Key
Rules
• Every entity must have an identifier
• Every instance of an entity must have a unique identifier
• No duplicates
• The value of an identifier cannot be empty / null
SubjectCode LecId
Title
Entity identifiers are LecName
CreditPoints Age
underlined on the ERD
28
ERD to Relation Model Conversion
What next?
Soon, we will begin to translate the ERD into a Relational Schema
SubjectCode LecId
E.g.
Title This: LecName
CreditPoints Age
31
Nulls
A Null is a special value that may be assigned to an attribute when
a value is unknown or inapplicable.
33
Converting the M:1 relationship
The final step in the conversion process is converting any M:1
relationships
SUBJECT
SubjectCode Title CreditPoints
LecId
INS1004 Basic Informatics 3 207
INS2055 Databases 4 345
INS1053 Intro to BDA 2 207
INS4006 Thesis A 10 119
36
Converting two entities in the RM
The relational schema for our model is finally written as this:
DeptID WID
Firstname
DeptName
Surname
Address PhoneNo
38
Relation vs Relationship
A confusing part of this process is the terminology having similarly
sounding terms SubjectCode LecId
Title LecName
CreditPoints Age
40
Relational Schema to RDBMS table
Previously we created a Relational schema
41
Using MySQL Server
42
SQL: Executing the Movie Table script
Lab 1
Download a file CreateMovie01.TXT
Open the file using NotePad (or NotePad++ or any other text editor)
Choose SELECT ALL & Copy in Notepad
CREATE TABLE MOVIE (
MOVIENO NUMBER(6, 0) PRIMARY KEY
, TITLE VARCHAR(100)
, RELYEAR NUMBER(4) Varchar() is used for text data
, RUNTIME NUMBER(4) varchar requires a maximum length value
, RATINGCODE VARCHAR(2)
, COLOUR_CODE VARCHAR(1) Number() is used for numeric data
, TMDB_SCORE NUMBER(3,1) Number has an optional max length and decimal places
, TMDB_VOTES NUMBER
, TMDB_ID VARCHAR(12)
);
INSERT INTO MOVIE (MOVIENO, TITLE, RELYEAR, RUNTIME, RATINGCODE, COLOUR_CODE, TMDB_SCORE,
TMDB_VOTES, TMDB_ID) VALUES(620, 'Ghostbusters', 1984, 107, 'PG', 'C',
6.8, 570, 'tt0087332' );
INSERT INTO MOVIE (MOVIENO, TITLE, RELYEAR, RUNTIME, RATINGCODE, COLOUR_CODE, TMDB_SCORE,
TMDB_VOTES, TMDB_ID) VALUES (324668,'Jason Bourne', 2016, 123, 'M', 'C',
7.2, 121, 'tt4196776' );
43
SQL: Why learn SQL?
➢ Why learn SQL DML? Why not just use a GUI interface just like Access?
➢ Some power end users use SQL statements when existing systems / reports are lacking
45
SQL: DDL & DML
Structured Query Language (SQL) is the language used by all RDBMSs
SQL statements can generally be placed in one of two groups:
DML - Data Manipulation Language – works with data within the database
– Inserting Data
– Querying Data (SELECT)
– Updating Data
– Deleting Data…
46
SQL DDL: CREATE TABLE
Syntax:
CREATE TABLE <table-name> (
<Column-name1> <data-type> [<max length>] ,
[ , <Column-name2> <data-type> [<max length>] ]…
[ , PRIMARY KEY (<column-name>) ]
[ , FOREIGN KEY (<column-name>) REFERENCES (<table-name>) ] ) ;
47
SQL DDL: CREATE TABLE
CREATE TABLE Lecturer ( Commas and brackets are important.
LecId number Errors if not used correctly.
, LecName varchar (50)
, Age number
, PRIMARY KEY (LecId) Short Text is replaced by varchar()
); varchar requires a maximum length value
48
SQL DML: INSERT STATEMENT
Adding data to a table is done via Insert statements
You must specify the table name, column names, the values to be inserted
Syntax:
INSERT INTO <table-name> (<column-name1> , <column-name2> , …
VALUES ( <value1> , <value2> , … ) ;
Text data must be surrounded by single quotes (not double quotes in MySQL)
Numeric data does not have quotes.
No punctuation is used (no commas, dollar signs…)
49
BEWARE of Smart Quotes
Microsoft's Notepad or Don Ho's Notepad++ are typically used as
SQL text editors (there are many others).
The problem is that RDBMS software such as MySQL does not recognize
smart quotes.
So MySQL does not view ‘John Smith’ as a text value.
Error at line 1:
50 ORA-00911: invalid character
BEWARE of Smart Quotes
Solutions:
1. Don’t use MS Word (or Powerpoint or Onenote…)
2. If you a forced to, then disable the appropriate Auto Correct options
within MS Word
51
SQL DML: INSERT STATEMENT
Code required to add 4 rows to the existing tables:
INSERT INTO Lecturer ( LecId, LecName, Age )
VALUES (207, 'John Smith', 37) ;
52
SQL DML: INSERT STATEMENT
Nulls
53
SQL DML: SELECT STATEMENT
The remainder of this lecture examines the SELECT clause
The Select clause is how users extract / display data stored in
database tables
Each column name in the
SELECT clause,
➢ Syntax: causes a column to be
SELECT <item> included in the result set.
FROM <table name>
The sequence of column
… names in the select clause
matches the sequence of
Example: columns in the result set
SELECT LecId, LecName, Age
FROM Lecturer ;
LecId LecName Age
Output / Result Set: 207 John Smith 37
345 Carol Kent 34
54
SQL: SELECT columns
Only columns specified in the Select Clause are returned
Sequence of columns is based on the sequence of column names
The * symbol can be used to indicate all columns are required
SELECT lecid, lecname FROM lecturer; Age LecName
37 John Smith
26 Jane Pitt
LecID LecName
34 Carol Kent
207 John Smith
119 Jane Pitt
345 Carol Kent SELECT age, lecname FROM lecturer;
➢ UPDATE <table-name>
➢ SET <column-name> = <value> [ , … ]
➢ WHERE <condition>
• UPDATE student
• SET Fees_Paid = 'Y'
• WHERE StuID = 1122334
•Warning. If you do not include the where clause, ALL rows from the table
•will be updated!
•Many DBMSs do not have an Undo or Oops feature.
• Suggestion: Before updating, write a Select statement. Check the rows selected.
•Then convert the Select statement to an Update statement
57
SQL: Update command
➢ This example updates multiple columns in the same statement.
➢ UPDATE <table-name>
➢ SET <column-name> = <value> [ , … ]
➢ WHERE <condition>
➢ UPDATE employee
➢ SET salary = salary + 5000,
➢ WHERE UPPER(branch) = 'HAWTHORN'
➢ OR years_of_service >= 10;
58
SQL: Order By– sequence of result set rows
The Order By clause specifies the sequence of rows in the result-set.
• Use the column name or column number
• Multiple columns can be specified.
• Each column may be ordered in ASCending or DESCending sequence.
Ascending is the default
LecName Age
SELECT LecName, Age FROM lecturer Carol Kent 34
ORDER BY LecName; Jane Pitt 26
John Smith 37
LecName Age
John Smith 37
SELECT LecName, Age FROM lecturer
Jane Pitt 26
Carol Kent 34
ORDER BY LecName DESC;
LecName Age
Jane Pitt 26
SELECT LecName, Age FROM lecturer Carol Kent 34
59
ORDER BY Age ASC; John Smith 37
SQL: Order By – column number
A column number in the Order By clause
refers to one of the columns in the select clause.
This can be useful in more complex queries
This example sequences the result set by the 3rd column listed in the
select clause
LecID LecName Age
SELECT LecId, LecName, Age
207 John Smith 37
FROM lecturer
345 Carol Kent 34
ORDER BY 3 DESC;
119 Jane Pitt 26
So 3 means "Sort the displayed rows by the 3rd column of the result set"
60
SQL: Order By– multi column
Multiple columns can be used the Order By clause
SELECT LecName, Age FROM Lecturer ORDER BY Age, LecName
Rows in the result set are in ascending age sequence
Then within each age group, the rows are in ascending name
sequence
Original LecID LecName Age Result LecName Age
Table 207 John Smith 37 Set Lisa Simmons 21
345 Carol Kent 34 Aaron Peters 26
119 Jane Pitt 26 Jane Pitt 26
231 Jim Brady 26 Jim Brady 26
118 Sanjay Parekh 34 Bruce Lee 34
521 Lisa Simmons 21 Carol Kent 34
404 Bruce Lee 34 Sanjay Parekh 34
103 Aaron Peters 26 John Smith 37
61
SQL: Coding Rules & Debugging
Your SQL statements will often cause errors. Most errors are caused by
• misspelling column names
LecId LecName Age
• misspelling table names
207 John Smith 37
• misspelling keywords 345 Carol Kent 34
• excluding commas
63
SQL: DML – SELECT statement clauses
A SELECT statement may have additional clauses
• Clauses must be in the correct sequence
• Most clauses are optional
• Each clause begins with a KEYWORD
• For easy reading the KEYWORD is written in UPPER CASE
SELECT empno, branch, salary
FROM employee
Keywords WHERE branch = 'KEW'
ORDER BY branch, empno;
• This statement has 4 clauses
– The SELECT clause
– The FROM clause
– The WHERE clause
64 – The ORDER BY clause
SQL: WHERE clause – restricting rows
Syntax: SELECT … FROM …
[ WHERE <search-condition> ]
[ ORDER BY …]
The WHERE clause specifies a condition(s) that each row must satisfy to be
included in the result set.
SELECT *
FROM Lecturer
WHERE Age < 35
Each row in the table is evaluated against the condition
If the condition is true (i.e. the age is < 35) then the row is included in the result
set.
The WHERE clause is often referred to as a Restriction
The clause reduces the number of rows in the result set.
65
SQL: WHERE clause – comparison operators
SQL has many operators that compare two values
= Equal to
<> Not equal to (can use != in most DBMSs)
< Less than
<= Less than or Equal to (two keystrokes < and = in that order)
> Greater than
>= Greater than or Equal to
67
SQL: String literal values
When specifying string / text / character literals
– You must use quotes
68
SQL: String values and case-sensitivity
The default setting for case-sensitivity of each DBMS may be different
Can you guess which of the statements below is easier to read?
select rdate, temperature, rainfall from weather_readings where
temperature <= -6.5 or rainfall >= 1000 order by rdate desc
✓
SELECT rdate, temperature, rainfall
FROM weather_readings
WHERE temperature <= - 6.5
OR rainfall >= 1000
ORDER BY rdate desc
• Good style makes the job of code reading and maintenance easier.
• Real world queries are often 20 or 30 lines long. Good style helps
• Queries are often viewed / modified by other users / programmers
• Organisations often have a SQL style that you must adhere to
• Tutors 'hate' debugging poorly styled queries
70