10 - Data Modelling
10 - Data Modelling
Relational Data
Data Modelling
Tom Blount
[email protected]
Today
• Introduction to the Coursework
• Data Modelling
• From Normalised Relations to Tables
Your coursework
• 30% of the module marks
• 20 exercises split into
• The Relational Model (20%)
• Normalisation (25%)
• Modelling (20%)
• Querying (30%)
• Extension (5%)
• Because you’ve got plenty of spare time
• Deadline:
• As late as we can make it
• Friday 17th of May 16:00
• https://secure.ecs.soton.ac.uk/noteswiki/w/COMP1204/C
oursework2
Your coursework: Corona Special
• Topical (and real)
• Working with the COVID-
19 cases dataset
• Turning this into a
relational model that we
can query
• From CSV to SQLite
• Then we can answer
questions about the
current situation
The Data
Don’t Panic
• We have not covered every part of the coursework
yet
• We will be covering what you need for the final
parts of the coursework later this week
• But you can get started on the first parts which you
are now experts on!
Understanding the Data
• EX1: Write down the relation directly represented in the
dataset file. Assign relevant data types to each column.
• EX2: List the minimal set of Functional Dependencies (FDs)
• Every FD must have only one attribute on the RHS (right hand side)
• Every FD must be minimal on its LHS (left hand side)
• There must be no redundant FDs
• Tip: Explain any assumptions you make applying what you know of
the domain to the data and consider future data and the impact it
may have as well.
• EX3: List all potential candidate keys
• EX4: Identify a suitable primary key, and justify your
decision
Normalising the Data
• EX5: List any partial-key dependencies in the relation as it
stands and any resulting additional relations you should
create as part of the decomposition.
• EX6: Convert the relation into 2nd Normal Form using your
answer to the above. List the new relations and their fields,
types and keys. Explain the process you took.
• EX7: List any transitive dependencies in your new relations
• EX8: Convert the relation into 3rd Normal Form using your
answers to the above. List the new relations and their fields,
types and keys. Explain the process you took.
• EX9: Is your relation in Boyce-Codd Normal Form? Justify
your answer.
Modelling the Data
• EX10: Using the CSV import function, import the raw
dataset into SQLite into a single table called 'dataset' in an
SQLite database called coronavirus.db.
• EX11: Write the SQL to create the full normalised
representation, including all additional tables, with no data.
The SQL should contain CREATE statements to create any
new tables. You should include indexes where appropriate,
and list and justify these in your answer.
• EX12: Write INSERT statements using SELECT to populate
the new tables from the 'dataset' table
• EX13: Test and ensure that on a clean SQLite database, you
can execute dataset.sql followed by ex11.sql followed by
ex12.sql to successfully populate your database.
The questions
• EX14: The worldwide total number of cases and deaths
• EX15: The number of cases and the date, by increasing
date order, for the United Kingdom
• EX16: The number of cases, deaths and the date, by
increasing date order, for each continent
• EX17: The number of cases and deaths as a percentage
of the population, for each country
• EX18: A descending list of the the top 10 countries, by
percentage deaths out of cases
• EX19: The date against a cumulative running total of
the number of deaths by day and cases by day for the
united kingdom
An extension
• EX20: Using GnuPlot, write a small script (plot.sh) which
will, using the data in the SQLite database (called
coronavirus.db in the same folder as the script), produce a
graph named graph.png with the date on the horizontal axis
and the cumulative number of deaths by country on the
vertical axis.
• You should represent the top 10 countries in terms of
cumulative deaths only.
• Include the full script in the report and the resulting graph
produced
When you submit to the handin system, it will do a check for you
It is your responsibility to make sure you’ve checked this in good time before you
make your final submission!
Modelling Concepts
• Data Modelling
• Conceptual
• Logical
• Physical
Modelling Concepts
• Entity
• A “real-world” concept
• Example: Planet
• Attribute
• A single property of an entity
• Example: Planet name
• Relation
• A model of an entity, represented as tuples with attributes
• Example: Planet(name,quadrant,sector_x,sector_y)
• Relationship
• A link between relations
• Example: Planet has one ruler. Ruler can have many planets
Three Levels of Modelling
• Conceptual Modelling
• Identification of key ideas and concepts
• Identifies entity names and relationships
• Logical Modelling
• High level design
• Introduces attributes and keys
• Physical Modelling
• Low level design for implementation
• Introduces table names, column names, data types
• Although it is a bit of a grey area what falls where!
Conceptual Model
• Directly from the requirements and domain
• No thought of database design
• Entity names and relationships and sometimes
attributes
Back to our game
• Players play as a ruler of a race, starting on their home
planet with one ship
• Rulers can explore the universe and visit other planets using
ships
• Planets belong to a ruler and have one race living on them
• Rulers can take over a planet and add it to their empire
• Every planet has its own set of resources
• Ships and items can be built on planets using the planet
resources
• A blueprint specifies which resources are needed
• Items can be attached to the planet or the ship
• Any other questions?
Activity 1: Concepts
• Based on the explanation of the game on the
previous slide, identify the entities
• Directly from the requirements
• Based on “the domain”
• Do not think about databases yet
• Remember these are “real-world” (at least as real-world
as you can get when talking about a space game)
• Your entities should be singular not plural
• (Planet not Planets)
• https://ofb-
interactive.soton.ac.uk/space/conceptual
Activity 1 Ideas
• Planet
• Ruler
• Race
• Ship
• Resource
• Item
• Blueprint
• Command
Other thoughts
• Do we need a Player?
• Or can we just use a Ruler as a Player?
• Do we need a Universe?
• Might we want to expand…
• Or is one universe enough for anyone?
• Should Quadrant be represented as a Concept?
• You could!
• Or you could treat it as a co-ordinate and thus as an
attribute, like the x and y
• We will be treating it as a co-ordinate, and thus an
attribute
Relationships
• Link entities together
• One to one
• A planet has one ruler and one race
• One to many
• A ruler can rule many planets and many ships
• Many to many
• A planet can have many resources
• A resource can be found on many planets
• Can have attributes themselves
• A planet has 300 of the iron resource
• Remember, the relationship between A and B and B
and A may be different and both need to be specified
• (e.g. A planet has one ruler, but a ruler can have many
planets)
Relationships
(Crow’s Foot Notation)
Activity 2: Relationships
• Using the Concepts we have identified, try to identify
the relationships between them
• Set your concepts as follows:
• Planet, Ruler, Race, Ship, Resource, Item, Blueprint,
Command
• Identify every relationship you believe holds between
concepts
• Conceptually, we are thinking of relationships in terms
of
• Has-one
• Has-many (for a many to many, do it both ways)
• https://ofb-interactive.soton.ac.uk/space/relationships
Questions that arise
• This is why we model!
• Questions will arise that you might miss otherwise
• https://ofb-
interactive.soton.ac.uk/space/attributes
Physical Model
• Consideration
of database
structure Introduced
foreign keys
• Actual tables
and fields
• Implementation
of relationships
(e.g. keys, join
tables), indexes
etc. Introduced the join table
A Note on Data Types
• Integer types: INT or INTEGER, SMALLINT and BIGINT.
• Floating point types: FLOAT and DOUBLE.
• String types: The two varities are CHAR(n) and VARCHAR(n);
• CHAR: strings of length < n are padded.
• VARCHAR: short strings have an end marker.
• Text types: TEXT Larger text data type (not indexable)
• Date types: DATE, TIME and DATETIME.
• 'YYYY-MM-DD' ('2012-11-05').
• 'HH:MM:SS'.
• None type: BLOB does not specify a format; may be an image
From Logical to Physical Design
• From the high-level design (concepts and ideas) to the low-
level design (implementation)
• Logical modelling is for attributes used in the real world
• You shouldn’t be making attributes up in logical stages
• Your natural keys uniquely identify entities in world
• Your normalisation should be based on your logical modelling
https://www.lucidchart.com
Making ERDs