Advanced Database Ica
Advanced Database Ica
Contents
Tables For Press_IT's Database
· Customer
o (CustomerID, CompanyName, WebAddress, AddressLine1, AddressLine2,
Postcode, Phone, Fax, Country, RegisteredDate)
o Primary Key CustomerID
· Account
o (Username, Password, FirstName, LastName, Email, CreatedDate, Status,
CustomerID )
o Primary Key Username
o Foreign Key CustomerID references Customer(CustomerID)
· Job
o (JobID, Description, CreatedDate, CompletedDate)
o Primary Key JobID
· Consultation
o (ConsultationID, Description , Date, Duration, AmountForConsultation,
CustomerID, JobID)
o Primary Key ConsultationID
o Foreign Key CustomerID references Customer(CustomerID)
o Foreign Key JobID references Job(JobID)
· Invoice
o (InvoiceID, DateRaised, Date Paid, DaysSinceInvoiced, Amount,
CustomerID, JobID)
o Primary Key InvoiceID
o Foreign Key CustomerID references Customer(CustomerID)
o Foreign Key JobID references Job(JobID)
· Submission
o (SubmissionID, FlateRate, FreeStorageInMb, ExtraSpaceChargePerMb)
o Primary Key SubmissionID
· CoverageType
o (CoverageTypeID, CoverageType)
o Primary Key CoverageID
· Based
o (BasedID, PlaceName)
o Primary Key BasedID
· Publisher
o (PublisherID, Name)
o Primary Key PublisherID
· Language
o (LanguageID, Language)
o Primary Key LanguageID
· Staff [Manager + Editor + Staff]
o (StaffID, Title, FirstName, LastName, MiddleName, Email, Phone,
AddressLine1, AddressLine2, Postcode, DOB, Position, Sex, Salary,
DateStarted )
o Primary Key StaffID
· Team
o (ManagedStaffID, ManagerStaffID, Name)
o Primary Key ManagedStaffID
o Foreign Key ManagedStaffID references Staff(ID)
o Foreign Key ManagerStaffID references Staff(ID)
· SiteTier
o (SiteTierID, SiteTierName)
o Primary Key SiteTierID
· Site
o (SiteID, SiteName, WebAddress, ManualSubmission, SiteScore,
ScoreUpdated BasedID, PublisherID, SiteTierID)
o Primary Key SiteID
o Foreign Key BasedID references Based(BasedID)
o Foreign Key PublisherID references Publisher(PublisherID)
o Foreign Key SiteTierID references SiteTier(SiteTierID)
· SiteContact
o (SiteContactID, Title, FirstName, LastName, Email, Phone, Fax, SiteID)
o Primary Key SiteContactID
o Foreign Key SiteID references Site(SiteID)
· Publication
o (PublicationID, URL, RSS, DatePublicised, Bias, SiteID, LanguageID)
o Primary Key PublicationID
o Foreign Key SiteID references Site(SiteID)
o Foreign Key LanguageID references Language(LanguageID)
· AccessType
o (AccessType)
o Primary Key AccessType
· PDF
o (PDFID, FileName, URL, PublicationID, CustomerID, AccessType, SiteID)
o Primary Key PDFID
o Foreign Key PublicationID references Publication(PublicationID)
o Foreign Key CustomerID references Customer(CustomerID)
o Foreign Key AccessType references AccessType (AccessType)
o Foreign Key SiteID references Site(SiteID)
· FocusGroup
o (GroupName)
o Primary Key GroupName
· IndustryFocus
o (IndustryFocusID, FocusName,GroupName)
o Primary Key IndustryFocusID
o Foreign Key GroupName references GroupName (GroupName)
· EditorFocus
o (StaffID, IndustryFocusID)
o Primary Key StaffID, IndustryFocusID
o Foreign Key StaffID references Staff(StaffID)
o Foreign Key IndustryFocusID references IndustryFocus(IndustryFocusID)
· Release
o (ReleaseID, Headline, ReleasedDate, URL, URLForZippedFolder,
ActualSizeInMb, NoDaysToMonitor, JobID, CustomerID, StaffAllocated ,
AllocatedDate, DistributedDate, SubmissionID, AmountForSubmission,
AmountForMonitoring, IndustryFocusID)
o Primary Key ReleaseID
o Foreign Key JobID references Job(JobID)
o Foreign Key CustomerID references Customer(CustomerID)
o Foreign Key StaffAllocated references Staff(StaffID)
o Foreign Key SubmissionID references Submission(SubmissionID)
o Foreign Key IndustryID references IndustryFocus(IndustryFocusID)
· Distribution
o (ReleaseID, SiteContactID, DistributionDate, ByEmail)
o Primary Key ReleaseID, SiteContactID
o Foreign Key ReleaseID references Release(ReleaseID)
o Foreign Key SiteContactID references SiteContact(SiteContactID)
· Coverage
o (PublicationID, ReleaseID, Headline, Bias , CoverageTypeID)
o Primary Key PublicationID, ReleaseID
o Foreign Key PublicationID references Publication(PublicationID)
o Foreign Key ReleaseID references Release(ReleaseID)
o Foreign Key CoverageTypeID references CoverageType (CoverageTypeID)
· MonitoringCategory
o (MonitoringCategoryID, MinimumDays, MaximumDays, ChargePerDay)
o Primary Key MonitoringCategoryID
· Monitoring
o (MonitoringID, StartDate, EndDate, GeneratedThreads, Coverage,
MonitoringCategoryID, ReleaseID)
o Primary Key MonitoringID
o Foreign Key MonitoringCategoryID references
MonitoringCategory(MonitoringCategoryID)
o Foreign Key ReleaseID references Release(ReleaseID)
· MainRegion
o ( MainRegionName)
o Primary Key MainRegionName
· Region
o (RegionID, RegionName, MainRegionName)
o Primary Key RegionID
o Foreign Key MainRegionName references MainRegion(MainRegionName)
· FocusRegion
o (RegionID,SiteID)
o Primary Key RegionID, SiteID
o Foreign Key RegionID references Region (RegionID)
o Foreign Key SiteID references Site(SiteID)
· SiteLanguage
o (LanguageID, SiteID)
o Primary Key LanguageID, SiteID
o Foreign Key LanguageID references Language (LanguageID)
o Foreign Key SiteID references Site(SiteID)
· ContactFocus
o (SiteContactID, IndustryFocusID)
o Primary Key SiteContactID, IndustryFocusID
o Foreign Key SiteContactID references SiteContact (SiteContactID)
o Foreign Key IndustryFocusID references IndustryFocusI (IndustryFocusID)
Assumptions:
General
1. This ICA is not only creating a database but also suggesting a solution for
the overall Press_IT's current data management problems.
2. Number of records in a highly used table will not be greater than 2 billion.
Therefore this data model uses INT data type as a primary key for many
entities.
3. If records in a table are not incremented by the system (if the number of
records are limited), then one of the unique attributes of the table will be used
as primary key. However if a table is referenced by a highly consumed table,
to enhance the querying and indexing performance, automatically
incrementing INT will be used as a primary key.
4. A program or system will add records to many tables. In these scenarios,
automatically incrementing integer data types will be used as primary keys.
5. Considering the longest international telephone number, the data model
uses 15 digits wherever it is appropriate.
6. If the maximum number of records in a given table will not be greater than
255, this data model uses TINYINT data type.
7. Considering Globalisation, future expansion, Unicode and multi-language
usages, this data model uses NVARCHAR instead of VARCHAR.
8. If a field contains monetary values, this data model uses MONEY data
type.
9. If a field stores double or float values, this data model uses FLOAT data
type.
10. Reports result from monitoring can be created using views or select
queries. Therefore there will not be an entity or a table for storing reports.
Special
Customer:
* Stores customers' detail.
* When a new customer registered to Press_IT, a new folder will also be
created (by an application) on the Press_IT's web server using the customer's
username. This folder and probably other folders under it will be used to store
files uploaded by the customer.
* Customers may base outside United Kingdom and they may not have
postcode.
* A customer may not have a web address.
* No two customers have the same phone numbers.
Account:
* An account uses to login a customer.
* If a customer fails to pay his or her outstanding payment within 30 days, all
his accounts will be inactivated.
* The Length of password set to be 50 characters in considering the length
after encryption.
Job:
· Stores data about a one or many releases or consultations made by a
customer.
· A job will have a detailed description which does not exceed 500 characters
in length.
· When a job has been completed, the date of completion must be updated.
This event raises an invoice using a stored procedure or a trigger.
Consultation:
· Holds details of consultation.
· Consultation will be held for full hours only. Therefore duration for
consultation will be stored as an integer.
· A consultation will have a detailed description which does not exceed 500
characters in length.
Invoice:
· An invoice is raised by when a job is completed. The reason this entity
separated from Job is that, invoice may have other attributes such as
"payment method" in the future which is not the behaviour of Job.
· An invoice has a date counter field that counts number of days since the
invoice has been raised. A stored procedure that is scheduled to run daily at a
certain time uses this value to inactivate accounts of customers who are not
paying on time.
· A Job is unique for an invoice. A job raises only one invoice.
Submission:
· Has a row of record that holds flat price rate and free storage size for a
release.
· Storage size will be described in integer but not in decimal.
CoverageType:
· Stores coverage types such as selective, editorial and blog.
· Number of records will not exceed 255. Therefore TYNIINT is found to be
ideal.
· Even if coverage type is unique by itself, using integers as a primary key
enhances the performance of indexing where this entity is referenced.
Based: [Source: Existed database]
· Holds names of places where a site is based.
· Although name of places are unique, having an integer primary key is useful
for query optimisation and indexing performance.
Publisher: [Source: Excel file]
· Stores names of publishers.
· Press_IT is only interested to store names of publishers.
· Although name of publishers are unique, having an integer primary key is
useful for query optimisation and indexing performance.
Language: [Source: Existed database]
· Stores languages used by publication websites.
· Although languages are unique, having an integer primary key is useful for
query optimisation and indexing performance.
Staff: [Source: Case study]
· Stores details of Press_IT's Employees.
· Manager, Editor and Staff are all included in on "Staff" during
implementation.
· Employees will be assigned an automatic identity number.
· Contact's title is any word, abbreviation or acronym not greater than 10
characters (e.g. Architect, Chief, CBiol (Chartered Biologist), etc).
· Press_IT is based in the United Kingdom.
· An employee's title is either "MR", "MRS" or "MISS "
· An employee gender can only be male or female, and values will be stored
as "M" or "F" respectively.
· A staff can be a member of only one team.
Team: [Source: Case Study]
· A team has only one manager.
· The managed staff identity uses as a primary key.
· A managed staff cannot be a member of more than one team.
SiteTier: [Source: Existed database]
· Stores website Tiers for publication websites.
· Although site tiers are unique, having an integer primary key is useful for
query optimisation and indexing performance.
Site: [Source: Existed database, Case Study and
Excel file]
· Stores publication websites detail.
· A website name is unique.
· A web address is also unique.
· A manual submission happens occasionally.
SiteContact: [Source: Existed database and Case
Study]
· Stores publication website contacts detail.
· Site contacts may be based in countries other than United Kingdom.
· Contact's title is any word, abbreviation or acronym not greater than 10
characters (e.g. Architect, Chief, CBiol (Chartered Biologist), etc).
Publication: [Source: Existed database and Case
Study]
· Holds information about one or more releases which are approved by the
site contact and successfully publicised.
· Publication may have both Uniform Resource Locator (URL) and Really
Simple Syndication (RSS) or at least one of them.
AccessType: [Source: Existed database and Case
Study]
· Holds customers' permission type for copying Portable Document Format
(PDF) equivalents for publications.
· Does not use integer data type as a primary key because it will not be
queried frequently and the number of records to be inserted is known.
PDF: [Source: Existed database and Case Study]
· Holds information about PDF files for publications if they are available.
· A publication is unique to a PDF. A PDF represents only one publication.
FocusGroup: [Source: Existed database and Case
Study]
· Holds collective names for groups of industry focuses. Example: IT could be
a collective name for SQL Server, Oracle, MySQL etc.
· Does not use integer data type as a primary key because it will not be
queried frequently and the number of records to be inserted is known.
IndustryFocus: [Source: Existed database and Case
Study]
· Holds names for industry focuses.
EditorFocus: [Source: Existed database and Case
Study]
· Stores information about industry focuses of assignment editors.
Release: [Source: Existed database and Case Study]
· Stores key information about releases.
· An application creates and uploads files (hold the contents of a release like
introduction, body, and boilerplates, and other media resources) in a new
folder under the appropriate username folder (see assumptions made to
Customer). Then the database stores the URL for the new folder.
o [Assumption Base 1: Press_IT has no interest in searching or querying
introduction, content and boilerplate of a release but storing the information
somewhere safe. Example: Press_IT is not interested in running the following
type of query.
§ "SELECT * FROM Release WHERE Content LIKE '%celebrity girl%' ]
o [Assumption Base 2: Keeping information in a file format helps to achieve
easier and faster distribution]
o [Assumption Base 3: Keeping release details a file format makes edition
simple because it is always possible to use the default application for the file ]
· Allocated staffs are responsible for pitching, distributing and monitoring a
release.
· A customer shows his or her desire for monitoring by filling the number of
days to monitor (NoDaysToMonitor) field.
· ActualSizeInMb stores the actual size of a release. Only the assigned editor
UPDATES this field, this is not expected to come with the release.
· After all the necessary pitching has been carried out , an assignment editor
creates a downloadable zipped folder which holds all release components;
and the URL for this folder will be stored in URLForZippedFolder attribute.
· If a customer wants his or her release to be monitored, then the final
distribution date will be considered as the start of monitoring date.
Distribution: [Source: Existed database and Case
Study]
· Stores information about which release is being distributed to which site
contact.
· Email is the default means of distribution.
· The timestamp for every distribution will be stored when an assignment
editor sends a release to a related site contact [This activity can be
automated using a stored procedure] otherwise timestamp stays null.
Coverage: [Source: Existed database and Case
Study]
· Stores information about how a release was newsworthy, draws many site
contacts attention, and results in publications.
· A publication may contain more than one submitted release [Case Study
B4]. Table Coverage resolves the many to many relationships between
Release and Publication.
· A release may have a headline different form the original. Therefore a
release's headline after publication must be stored in this table.
· A released may arise bias after publication (when it gets coverage). See:
http://en.wikipedia.org/wiki/Publication_bias
MonitoringCategory: [Source: Existed database and
Case Study]
· Stores various monitoring types categorised by the duration of monitoring.
· One year maximum monitoring period.
Monitoring: [Source: Existed database and Case
Study]
· Stores summarised information about releases encountered in monitoring.
· Press_IT uses a tool for updating some of the fields within this table.
Example: "GeneratedThreads and Coverage.
· Generated Threads will be stored in integer type.
· Monitoring takes place only and only if the customer shows his or her desire
during the submission of a release.
MainRegion: [Source: Existed database and Case
Study]
· Stores main region names, such as Europe, Africa, and Middle East.
Region: [Source: Existed database and Case Study]
· Stores region (usually country) names, such as United Kingdom, Ireland,
and Israel.
FocusRegion: [Source: Existed database and Case
Study]
· Stores information about which regions are focused by which website.
SiteLanguge: [Source: Existed database, Excel file
and Case Study]
· Stores lists of language supported by a website.
· Sometimes websites support more than one language (see the excel file
provided with the ICA).
ContactFocus: [Source: Existed database and Case
Study]
· Stores information about site contacts and their area of focus.
Supportive Document:
Denormalisation:
This data model uses denormalisation process to optimize the performance of
the database by grouping data.
Normally, attributes that are resulted from denormalisation process are not
required during data insertion; however they will eventually get updated to
store values different from NULL.
Table: Consultation
Attribute: AmountForConsultation, Data type: Money, Value: [Duration * 100]
There are three ways to achieve the same result shown in the value. See the
following:
Method 1: Table 1 (Consultation):
Duration AmountPerHourTotalConsultation
2 100 200
3 100 300
Disadvantage: AmountPerHour become redundant.
Method 2: Table 1 (Consultation):
Duration
2
3
Method 2: Table 2 (AmountPerHour):
Amount
100
Disadvantage: Requires joining two tables and creating a view to show the
total amount.
Method 3: Table 1 (Consultation): Denormalisation
Duration AmountForConsultation
2 200
3 300
Advantage: Computes the total consultation amount [Duration * 100] as a
result the performance and efficiency of the database increases.
Table: Invoice
Attribute: Amount, Data type: MONEY, Value: [SUM (Release) Where JobID =
@jobId Group By CustomerID, IndustryFocus]
Many releases or consultation may be created in a job collectively depending
on the industry focus of customer's release. Each release stores amounts for
submission and monitoring in table release however customer's aggregate
amount will be stored in the Invoice table Amount field.
Table: Release
Attributes: DistributedDate, AmountForSubmission, AmountForMoitoring
DistributedDate: Data type: DATETIME, Value: [The last timestamp
(DistributedDate) from Distribution table]
Here, it is important to see a related table named "Distribution". In this table
the Editor maps a release with many site contacts. Site contacts may receive
a certain release at a different timestamp. However the "DistributedDate"
attribute in a Release table will store the date where all site contacts, mapped
with a release, received their email. See the table below.
Table Distribution
ReleaseIDSiteContactIDDistributedDate
1 A 11/01/2009
1 B 13/01/2009
1 F 14/01/2009
Table Release
ReleaseIDDistributionDate
1 14/01/2009
The "DistributedDate" within the Release table shows final timestamp from
table Distribution where ReleaseID = 1 and DistributedDate is not null.
AmountForSubmission: Data type: Money, Value: [results for the following
UPDATE Trigger or Stored Procedure]
[If a release size <= 10 Mb, Value = 95 (Submission.FlatRate) otherwise it will
be computed from (((Release.ActualSizeInMb -
Submission.FreeStorageInMb) * Submission.ExtraSpaceChargePerMb) +
Submission.FlatRate)]
Denormalisation saves the database from running a processor intensive
query to compute an invoice which results form a job containing so many
releases originated from a customer having the same industry focuses.
AmountForMonitoring:
This is also a result of denormalisation which exactly similar with
"AmountForSubmission" explained above. If there is a difference, tables
participated to compute "AmountForMonitoring" are Release, Monitoring and
MonitoringCategory.
Indexing
As a rule of thumb, all tables have clustered indexes that are resulted from
their primary key constraints. This section focuses on creating nonclustered
indexes.
Nonclustered index suggestions:
1. Creating indexes on columns that frequently involve in search conditions of
a query (WHERE clause). For Example:
SELECT * FROM Consultation WHERE (CustomerID =1 AND JobID = 4)
2. Creating indexes on columns that contain a large number of distinct values,
such as combinations of last name and first name. For Example:
SELECT (LastName + FirstName) AS StaffFullName FROM Staff WHERE
(StaffID =3).
3. Creating indexes on columns that involve in join and grouping operations.
Normally on any foreign keys. For Example:
SELECT Count (*) AS CountPerDateTime FROM Release JOIN Distribution
ON Release.ReleaseID = Distribution.ReleaseID GROUP BY
Distribution.DistributionDate.
4. Creating indexes for queries that do not return large output. Normally on
unique columns. For Example:
This sort of query may be required by an application.
SELECT * FROM Staff WHERE Email LIKE "[email protected]".
5. Covering queries. For Example:
All of the columns requested by the output are covered by an index.
SELECT Count (*) As Count, DistributedDate FROM Release GROUP BY
DistributedDate.