Assignment # 1
Task: 1
ER Modelling (intro)
Introduction
This document gives a description of an event management web site like
EventBrite. You must design a data model to represent the information used
by such a site and describe your data model via a collection of ER diagrams.
Draw your ER model using the drawing tool draw.io or use a suitable drawing
tool of your choice.
open draw.io
create a new diagram
to display ER shapes: click on "More Shapes" at the bottom left of the
screen, then select "Entity Relation" and click "Apply".
The Problem Domain
Organising an event, especially if it involves purchase of tickets, is a task that
in the past has been the province of large online ticketing companies like
Ticketek, TicketMaster, etc. More recently, startups such as EventBrite,
EventBee, TicketBooth and TryBooking, have allowed small organisations and
not-for-profit organisations to set up their own events and ticketing online via
a simple-to-use Web interface. Our goal in this assignment is to develop a
data model that could be used to implement an online event site like this. To
simplify things, we won't consider the ticketing aspect of such a site; just the
organisation of events.
The aim of this assignment is to develop a database design that can support
some of the core functionality of a site like EventBrite (but without ticketing).
Note that we are dealing with data representation only; we are not actually
going to implement any of the active functionality. However, whatever data
structures you design must be rich enough to enable the functionality below
to be realised.
Let's call our website ev.org (events). The site has to deal primarily with
people and events. Some people (e.g. people who register for open events)
are not known in much detail. On the other hand, we need more detail about
users of the site who organise events.
The front page of the ev.org site gives a list of upcoming events, in reverse
chronological order of start time. At the top of the page, there is a scrolling
list of featured events. There is also a search box so that visitors can find
events that are relevant to them. There is a login link, but a casual customer
will not need to log in. Each event has an associated "RSVP" link; clicking on
this link takes the person to a page where they can register their intent to
attend the event. They will then be emailed a URL which will get them back
into the site to check the event details. A person can also choose to become
a user of the ev.org site, in which case they can subsequently log in, organise
events, and check the details of their events.
When a user logs in to ev.org, the first thing they see is a page that lists
upcoming events that they might be interested in. There are a number of
tabs at the top of the page: Events, Organiser, Account. The Events page is
the default page on login, and also provides a search mechanism. Search is
by keyword and searches on event titles, as well as event location and
category. Search results are listed in reverse chronological order of the event
starting times. When an event is displayed in a list on the Events page, it has
a link, which takes the user to a page to RSVP. If the user is the organiser of
the event, there would also be a link to view the RSVPs.
Here are more details on the data items that might need to be stored in the
back-end of the ev.org web site:
People
for every person associated with the site, we need to know at least
their email address and their name
we'll keep their given names and family names separate so that we can
sort lists of people by their family name
everyone must have at least one given name, but some people may
have no family name (e.g. "Prince")
Users
users are people who can log in to the site and organise events
each user needs to be registered with the system and must provide an
email address and a password
their email and password are used for authentication when they log in
to the system
once logged in, they can modify information related to themselves via
their "Account" page
users may provide additional information about themselves, including a
phone number, their personal website, and their birthday
they can also provide a name that's shown on the ev.org website,
instead of just their user- and family-names
in addition, they may want to provide a home address (a location)
users can also create contact lists, which are essentially just named
groups of people (note that the people in a contact list are not
necessarily users)
Events
events happen at a certain place at a certain time
people may attend an event if they register for the event
events may be public or private
- if public, anyone may attend by registering with their email address
- if private, only people who are invited may attend
so that people can decide whether or not to attend an event, the event
organiser should provide at least the following information:
- a title and description of the event
- starting date, starting time, ending date (generally the same as
starting date), ending time,
- location of the event (a place)
organisers may also specify a set of categories for the event (e.g.
festival, concert, lecture, party, food/wine, music, etc.) to help users
who are searching for events
events may be one-off or they may be repeated at various intervals
(daily, weekly, monthly)
- repeated events run from start-time until end-time on every day they
occur
- for daily events, the event runs every day from start-date until end-
date
- for weekly events, the event runs every 7 days from start-date until
end-date
- for monthly events, the event runs every 4 weeks from start-date
until end-date
a one-off event may span multiple days (e.g. a music festival)
Places
places have a name and address
the address should include: street, city, state, country and postal code
places may also have GPS coordinates (a la Google maps)
Contacts and Invitees Lists
a user can create a contact list (named list of people)
an event can have a list of people who are invited to the event
contact lists can be used to specify invitees to an event
(but invitees lists are always a list of people; any contact list would be
expanded to generate the invitees list)
we need to record whether an invitee has RSVP'd to attend the event
ER Modelling
(7 marks)
Task 1 Exercise
The information in the introduction should provide sufficient information to
design your data model. The important point about the data model you
produce is that it is rich enough to support all of the functionality mentioned
above. If you want any of the above to be further elaborated, then post a
message on the Forum.
Submission
Export your ER model diagram as a PDF file (ass2a.pdf) or PNG file
(ass2a.png), save it and then submit your file to the workspace using the
"Upload File" link. You can submit multiple times, your last submission will be
marked.
Marking Criteria
You will be assessed for using the provided requirements and drawing an ER
diagram to model this scenario as accurately as possible.
Note that you must:
1. underline all (primary) key attributes
2. clearly indicate relationship cardinalities
3. clearly indicate participation constraints
4. add your assumptions if the corresponding requirements are missing
or not clear
Task 2
ER to Relational Mapping (intro)
Introduction
This document gives an ER data model for a photo-sharing web site. You
must convert this design into a PostgreSQL relational schema (a collection of
create table statements). In performing the conversion from the ER design to
a relational schema, you should follow the approach given in the notes on ER
to Relational Mapping.
A template assignment.sql is available, to help you get started. You need to
implement an SQL schema based on the design below, and copy your final
schema to the file assignment.sql in the workspace. You can use your favourite
text/code editor to develop your schema. Eventually, you need to copy it to
the file assignment.sql in the workspace in ED; check that it is a valid
PostgreSQL schema, and then submit it.
The Problem Domain
Photo-sharing sites have proliferated on the Web over the last few years, and
have proved remarkably popular (Instagram has 1 billion users and Flickr has
over 100 million users). The sites allow users to register an account, and then
use that account to upload and manage a collection of their photos. There is
usually a limit on the storage allowed for each user; this can generally be
extended via a paid "premium" subscription. Management of photos is
typically by grouping them in various ways. And, of course, to flatter users'
inner Ansel Adams, most sites allow other users to rate your photos and
comment on how wonderful they are. Photo-sharing sites are typically backed
by massive storage for the photos themselves, along with a database to hold
all of the data about photos, users, ratings, etc.
The aim of this assignment is to develop a database schema that can support
the core functionality of a photo-sharing site like Flickr. Note that we are
dealing with data representation only; we are not actually going to implement
any of the functionality of such a site. However, whatever data structures you
design must be rich enough to enable the functionality below to be realised.
Let's call our website mypics.net (this domain is currently available for sale).
The site has to deal primarily with photos, people, collections of photos, and
groups of people. The public front page of mypics.net displays a collection of
random photos, drawn from all of the public and safe photos on the site.
When a user logs in to mypics.net, the first thing they see is a list of their
photo collections, with each collection appearing as a thumbnail of the "key
photo" for the collection. There will be a menu that allows them to upload
photos, create and organise photo collections, create groups, post comments,
and all of the other things that people like to do in a typical social networking
environment.
Here are some details on the kinds of data items that might need to be
stored in the back-end of the mypics.net web site:
Photos
photos can be uploaded onto the site and must be in JPEG format
each photo is stored in the file system and a record about the photo is
created
the filename is derived from the photo's ID and its upload date (but
does not need to be stored)
the photo record contains a title for the picture, descriptive text and
the file size in KB
the photo record also contains the date when the photo was taken and
when it was uploaded
there is also scope for an optional description of the technical details of
the photo
(e.g. what kind of camera, lens, exposure settings, etc.)
the system often displays thumbnails of photos (e.g. in photo lists);
each photo has a thumbnail,
also stored on the filesystem, created when the photo is uploaded
users can make comments on photos (see Comments below)
users can also rate photos (on a 1-5 star scale; 1=ordinary,
5=exceptional)
each photo must be assigned a "safety-level" when it is uploaded;
possible settings are:
Safe (suitable for children), Moderate (not suitable for children),
Restricted (R-rated)
the owner of a photo can also set its visibility; possible settings are
Private, Friends, Family, Friends+Family, Public
Photo Collections
a photo collection is simply a group of photos
each collection has an owner (a user), a title and some descriptive text
a collection may include photos from any user, as long as they are
public
photos within a collection appear in an order defined by the owner
one photo is defined to be a "key photo" for the collection and is
displayed
whenever the collection appears in a list of collections (along with the
title)
People
for every person in the site, we need to know at least their name
People are either users or people associated with users
Users
users are people who can log in to the site and upload pictures
each user needs to be registered with the system and must provide an
email address and a password
their email and password are used for authentication when they log in
to the system
we record the date when each user joins the site (registration date)
logging in takes them to their home-page, which displays a list of their
photo collections
once logged in, they can modify information related to themselves via
their "Account" page
users may provide additional information about themselves, including
their personal website, their gender and their birthday
they can also upload a small image of themselves (JPEG, smaller than
64KB)
Groups
users can form groups (e.g. special interest groups) consisting of a set
of users
each group is created and managed by some user (its owner)
a group can be set up in a number of modes: private, by-invitation, by-
request
- by-invitation groups allow the owner to invite people to join
- by-request groups allows users to ask to be added to a group
private and by-invitation groups are not visible when users
search/browse groups
each group has a title, a set of photo collections, and a list of
discussion threads
(a discussion thread is simply a title and a list of messages)
Friends
users can also make groups of people (who don't have to be users)
such groups are distinct from the Groups of users noted above
Tags
photos can be tagged by short phrases like "landscape", "street life",
etc.
tags are used as a mechanism for finding groups of similar photos
(they could also be viewed as defining informal collections)
when a user is tagging a photo, auto-completion will be used to
suggest tags,
to try to ensure consistent usage
Comments
a comment is a small piece of text written by a user
comments can be threaded (by one comment referring to an earlier
comment)
for each comment, we need to record the author and the time it was
posted
Requirements on your Submission
The SQL schema you submit must adhere to the following
requirements:
all tables must have an appropriate primary key defined; all foreign
keys must be identified
use appropriate domains for each attribute (e.g. a date-of-birth would
be done as an SQL date, a counter would be done as an SQL integer
constrained to be ≥ 0)
if an attribute is a string, and no maximum length is specified, use
PostgreSQL's (non-standard) text type; otherwise, use an appropriate
varchar(N) type or one of the supplied domain types
if an attribute is a boolean value, use the SQL boolean type
wherever possible, not-null, unique and domain constraints must be
used to enforce constraints implied by the ER design
derived (computed) attributes should not be included in the SQL
schema
wherever possible, participation constraints should be implemented
using the appropriate SQL constructs
map all of the entity class hierarchies in the ER design using the ER-
style mapping (i.e. one table for each entity class).
all relationships should be mapped using the approaches described in
the lecture notes; in particular, you should avoid over-generalising your
SQL schema relative to the ER design (e.g. a 1:n relationship should
not be mapped in such a way that it can actually be used to form an
n:m relationship)
Please follow as much as possible the following naming conventions:
each table that represents an entity should be given a name which is
the "pluralised" version of the entity name (e.g. entity Person is
mapped to a table called People and entity Photo is mapped to a table
called Photos)
each table that represents an n:m relationship R between tables S and
T should be called S_R_T (e.g. Photos_in_Collections)
each data attribute in the SQL schema should be given the same name
as the corresponding attributes in the ER
if an attribute in the SQL schema is derived from a relationship in the
ER diagram, name it after the relationship (suitably modified to make
sense, e.g. if the relationship is owns and the attribute is in the table
for the entity that is being owned, then you would changed the name
to ownedBy)
when mapping multi-valued attributes, name the new table by
concatenating the entity and attribute names
when mapping composite attributes, use the names of the "leaf"
attributes
if names in the ER diagram contain multiple words, separate the words
by underscores in the SQL schema (e.g. date_registered)
Note: if the name you want to use clashes with a PostgreSQL keyword (e.g.
user), you will need to write the name in double-quotes (i.e. "user") and in all
lower-case. An easy way to avoid the need to do this, for table names, is to
give your tables "pluralised" names (e.g. users) which will not clash with
PostgreSQL keywords. To avoid the problem for attributes, add an _id suffix
(e.g. user_id) if the file refers to another table.
To give you a head-start, a template for the schema is available in the file
assignment.sql, which has (parts of) some of the required tables already
defined. Note that you will need to add more tables, as well as filling out the
attributes in the supplied tables. Your submission must follow this format, so
save a copy of this and edit it to produce your submittable assignment.sql file.
You can use your favourite text/code editor to develop your schema.
Eventually, you need to copy it to Ed and check that it is a valid PostgreSQL
schema, and then submit it.
Please don't try to second-guess or improve the ER design below. Even if you
think it's complete rubbish, just translate it as given. If you think that it's
incorrect or that the information supplied isn't enough to do the mapping
unambiguously, post a message on the Ed Forum.
ER Model
This ER design gives one possible data model for the mypics.net application.
This isn't necessarily the design that would be used in practice and may not
even follow all of the requirements mentioned above. It has been designed to
make this assignment more interesting (i.e. to give you experience with a
range of modelling constructs and translation mechanisms).
To make the presentation clearer, the design is broken into a number of
sections. Note that an entity will have its attributes and class hierarchy
defined exactly once. If an entity is used in a later section of the design (e.g.
to show relationships), it will simply be shown as an unadorned entity box
(and you should assume all of the attributes and sub/super-classes from its
original definition).
The development of any significant design requires assumptions.
Assumptions specific to particular entities and relationships are presented
below each diagram.
A general strategy used in the design is to introduce a numeric primary key
called id into all major entities. This is despite the fact that we could have
made a primary key from existing attributes in many cases (e.g. email). The
reason for doing this is that primary keys typically end up as foreign keys in
other tables, and thus their values need to be copied to many places in the
database. "Natural" keys (such as email) are strings (typically 40-60 bytes),
whereas numeric keys are 4-byte integers, so there is a clear space saving in
maintaining copies of smaller keys. Also, natural keys have a habit of
changing (e.g. someone gets a new email account) and changing a primary
key value can have effects that propagate throughout the database. Using
numeric keys also makes indexing and various query processing techniques
faster. One disadvantage is that we add an extra attribute into each table.
Other notational conventions in the ER diagrams:
primary key attributes for entities are underlined
total participation in a relationship is indicated by a thick line
an arrow indicates that each entity at the non-arrow end is associated
with at most one entity at the arrow end
Note that the data here is sufficient to allow the mypics.net site to be built.
Some notions mentioned in the Stage 1 requirements are related to the
working of the application and do not need to be explicitly modelled here.
Actions (e.g. adding a person to a contact list) typically do not have a
presence in the data model either, although they clearly affect the data in the
database.
Data Types
To make your life simpler, we have defined some useful data types using the
create domain statement. Some of the create domain statements use
standard SQL patterns for specifying constraints, while others use
PostgreSQL-specific regular expressions for this purpose. The domain
definitions are given at the top of the template file provided ( assignment.sql)
You shouldn't need to use many varchar(N) types in this assignment. The
above types ought to be sufficient for most of the fields in the database. Use
them wherever you think it's appropriate.
Overview
The following diagram provides an overview of the major entities in
mypics.net and the major relationships between them. Other entities and
relationships and all attributes can be found in the diagrams below.
Comments:
every photo and group has an owner who is a user
users can rate many photos; each photo can be rated by many users
all comments occur in the context of a discussion
each discussion is created when its first message is posted
every comment has an author (i.e. the user who posted it)
a comment may be a reply to some other comment
Users and People
The following diagram shows the entities, attributes and relationships that
provide the information about people on the mypics.net site.
Comments on People:
we use a numeric ID as a primary key, since People and Users will be
extensively referenced in the database
for every person in the database, we need to know their name
note the use of is-a in a circle to indicate that the Person entity has
only a single sub-class (User); remember that you must implement this
(very small) class hierarchy via the ER-style mapping
Comments on Users:
every user is required to provide password and an email address
the system records the date when they registered
users may provide additional data such as their birthday (date) and
their gender ('male', 'female'); these fields can be NULL if users would
rather not specify
all people have a name for the system to display them as; users may
provide such a name for themselves; if no name is supplied, mypics.net
will form a name from the family- and given-names
we assume that names are no longer than 50 chars (for the given-
names and family-name components) and up to 100 chars for
displayed names
a user's website (if supplied) is simply a URL text string
users may supply a portrait to be displayed when they are referenced
on the website; this portrait is a photo of them in JPEG format
Friends and Groups
Friends and Groups both represent collections of people, but in the case of
Groups they are collections of people who are users. Since the membership
relations refer to different entities, we also treat Friends and Groups as
different entities. The following diagram shows entities, attributes and
relationships that deal with various groups of people on the mypics.net site.
Note that, to keep things simple, we are completely ignoring the process by
which users become members of groups; this would typically require
additional tables related to invitations and requests.
Comments on Friends:
users can build lists of people they know (e.g. friends or family)
every Friends list must have a title (e.g. "Family", "Workmates",
"Friends")
such lists cannot be empty; the list is created when the first member is
added
all members of a Friends list must be entered into the database as
people
Comments on Groups:
groups are also created by users; the creator becomes the owner
groups also have IDs and titles, and must have a mode to describe how
membership of the group is organised
any user may be a member of multiple groups; every group must have
at least one member (the owner is automatically a member)
Photos
The following diagram shows entities, attributes and relationships relevant to
photos on the mypics.net site. We assume that the photo files themselves are
stored in the file system with a name based on the photo ID and that the
application logic can retrieve a photo given this ID. Thus, no photo image file
data is actually stored in the database (except file size).
Comments on Photos:
each photo has an automatically defined unique numeric ID
a photo must have a title, a string up to 50 characters long
a photo may have a description, which is arbitrary text
the user may supply the date on which the photo was taken; the
system will automatically provide the date it was uploaded
the file size of a photo is stored as a whole number of KB
visibility is given by one of the values 'private', 'friends', 'family',
'friends+family', 'public'
safety level is given by one of the values 'safe', 'moderate', 'restricted'
since the description of the technical details of photos can vary widely,
such details are supplied, if the owner wants, simply as a text string
Tags and Ratings
Associated with each photo is user-supplied information on its content and
quality. Including it on the diagram above would have made things to messy,
so we show entities, attributes and relationships for photos and their ratings
and tags below:
Comments on Tags and Ratings
a photo may have an arbitrary number of tags associated with it
each tag is defined primarily by a descriptive name (up to 50
characters in length)
in order to efficiently accomodate interface notions like "tag clouds", a
count of the number of times each tag is used is stored in the tag
record and maintained by the application
a photo can be tagged by many users (and each user might tag it
differently); we keep a record of when they tag it (time-stamp)
each time a photo is tagged, the overall frequency counter for that tag
is incremented
a photo may also be rated on a 1..5 star scale by users
each time a rating is made, the system records the user who made the
rating, a time-stamp for when the rating was made, and the actual
rating value
Collections
The following diagram shows entities, attributes and relationships for the
photo collections that are managed in the mypics.net site.
Comments on Collections:
a collection is simply a group of photos, identified by a numeric ID
each collection must have a title (up to 50 chars) and may have an
arbitrary text description
each collection has a key photo which is used to represent the whole
collection in the interface
photos are marked with the rank order (a small positive integer) in
which they appear when the entire collection is displayed; this allows
users to choose how the collection is viewed
some collections are built by an individual user and owned by that
user, who is the only person who can modify which photos are in the
collection and the order that they appear
other collections are associated with a group, and any member of the
group can modify which photos are in the collection and the order that
they appear
Discussions and Comments
The following diagram shows entities, attributes and relationships for the
discussions and comments in the mypics.net site.
Comments on Discussions:
discussions are essentially holders for collections of comments
each discussion has an identifying numeric ID and may have a title (up
to 50 chars)
a discussion contains one or more comments and is created when the
first comment is posted
a collection of comments on one photo needs no title; the discussion is
associated with the photo being commented on
discussions under a group may have a title and will be associated the
group
Comments on Comments:
comments are messages posted by users as part of a discussion
each comment has an identifying numeric ID
the system time-stamps each comment with the precise posting date
and time
the content of a comment is an arbitrary text string
ER to Relational Mapping (exercise)
(8 marks)
Task 2 Exercise
Make sure you read the description in the introduction thoroughly, and
review the notes and exercises on ER-to-relational mapping. Read the
available assignment.sql template in the workspace and see what's
provided there. If any aspect of this design requires further
clarification, ask for it on the Ed Forum.
Reminder: you must ensure that your schema will load without error
into a newly created database.
Submission
You can use your favourite text/code editor to develop your schema.
Eventually, you need to copy it to the file assignment.sql in the
workspace in ED; check that it is a valid PostgreSQL schema, and then
submit it. You can submit multiple times, your last submission will be
marked. Click on the "Mark" button to submit your answers.
Marking Criteria
1. Defining domains 0.5
2. Defining tables from entities 1.5
3. Defining tables from sub-entities 0.5
4. Defining tables from relationships 1.0
5. Foreign keys 2.5
6. Constraints and data types 1.0
7. Primary keys 1.0
------------------------------------------
Total 8.0
(This is provided in the answer section on the website)
(-- ZZEN9311 Assignment 2b
-- Schema for the mypics.net photo-sharing site
--
-- Written by <<YOUR NAME GOES HERE>>
--
-- Conventions:
-- * all entity table names are plural
-- * most entities have an artifical primary key called "id"
-- * foreign keys are named after the relationship they represent
-- Domains (you may add more)
create domain URLValue as
varchar(100) check (value like 'https://%');
create domain EmailValue as
varchar(100) check (value like '%@%.%');
create domain GenderValue as
varchar(6) check (value in ('male','female'));
create domain GroupModeValue as
varchar(15) check (value in ('private','by-invitation','by-request'));
create domain NameValue as varchar(50);
create domain LongNameValue as varchar(100);
-- Tables (you must add more)
create table People (
id serial,
...
primary key (id)
);
create table Users (
...
primary key (...)
);
create table Groups (
...
primary key (...)
);
create table Photos (
...
primary key (...)
);