KIT712
Data Management Technology
LECTURE 1 BY
DR. SAURABH KUMAR GARG
About Me
Distrib
uted
Optimis
Present ation
Lecturer at UTAS Decisi
Cloud
Comput
on BigDat
Past ing Maki a
ng
B.Tech/M.Tech from Indian Institute of
Technology (IIT), Delhi Data Stream
Analyti
cs Computing
PhD from the University of Melbourne MapReduce
Postdoctoral Fellow at IBM Research
Education Analytics
Acoustic Data Analytics
Privacy
Teaching Team
Unit Coordinator
Dr. Saurabh Garg
Lecturer in Hobart
Office: Cent 462 (ICT wing)
Email: [email protected]
Lecturer in Launceston
Dr Son Tran
Office:V112
Introduction to KIT712
Motivation
Unit Content
Learning Activities
Assessment
Resources
Tips for Success
Motivation
What is Data toYou?
Our Data-driven World
Science
Data bases from astronomy, genomics, environmental data, transportation data, …
Humanities and Social Sciences
Scanned books, historical documents, social interactions data, new technology like GPS …
Business & Commerce
Corporate sales, stock market transactions, census, airline traffic, …
Entertainment
Internet images, Hollywood movies, MP3 files, …
Medicine
MRI & CT scans, patient records, …
How to Store Data?
Do we want to store like this?
Need Organization????
Need Organization
Data models
Why you are doing this unit?
Let us be honest!!
It is compulsory unit. I am here only for Permanent Residency (PR)
Consequence: Always distressed and may fail in the end
It is compulsory unit. I am forced to do this unit.
Consequence: Hard struggle and may pass
I will learn industry based technology and improve my skills
Consequence: Will enjoy the unit and will be eager for inputs
Unit Outline
Very Important
Soft copy available on
MyLO under Information
Online at
http://www.utas.edu.au/computing-information-systems/resources/unit-
outlines/
Read It
Teaching Pattern
This unit has:
Lectures – 1 hour per week (Tuesday 9am-10am)(Except Lecture 1 and 13)
Tutorials – 2 hours per week
Online learning modules –up to 2 hours per week on average
Self-Study 3 hours per week
Work on assignments
Prepare for tutorials
Study/Revision/Self check quizzes
Prerequisites (Assumptions)
Basic SQL
Basic Programming
Hard Working
Textbooks (Reference Only)
Database Systems: Design, Implementation and Management, by Cornell and Morris,
Cengage Learning publisher.
Oracle 11g: SQL, by Joan Casteel, published 2010 by Course Technology, Cengage
Learning
Oracle 11g: PL/SQL, by Joan Casteel, published 2010 by Course Technology, Cengage
Learning
Learning Outcomes
LO1. Evaluate, critically analyse alternative techniques and data models for designing databases;
LO2. Adapt and apply techniques and processes for designing, implementing and administering an
enterprise level relational database;
LO3. Design sophisticated SQL queries to efficiently retrieve information from relational databases;
LO4. Understand and appreciate data storage and retrieval issues with current trends and advances in
database technologies.
Topics
Introduction to Systems and Databases;
LO1
Entity Relationship Model review and extension;
Conceptual, logical, physical Modelling;
LO2
SQL Review and advanced SQL;
LO3 LO4
SQL Query Optimisation;
Triggers, Procedures and Functions;
LO2
Database Administration.
Overview of NoSql Databases. LO1
Tentative Tutorial Schedules
Week 2: ER Modelling and Relational Model LO1
Week 3: Relational Algebra
Week 4: SQL Revision using Oracle
LO2
Week 5: TEST (SQL Assignment due)
Week 6: SQL Query Optimisation I
Week 7: SQL Query Optimisation II
Week 8: Lab Test LO3 LO4
Week 9: PL/SQL I
Week 10: PL/SQL II LO2
WeeK 11: Lab TEST
Week 12: Database Administration 1
Week 13: Lab Test
LO1
Online Modules
Data Modelling
SQL Review (Oracle)
Advanced SQL (PL/SQL, Triggers, Cursors)
Query Optimizations
Database Administration
Assessment - Overview
In-Semester Assessment 60%
Final Exam 40%
To pass this unit need at least:
Pass all the learning outcomes
50% of the overall mark
Assessment - In-Semester
60% of overall mark
MUST gain at least 45% of the total mark in this part to pass the unit
Assignments - tasks published on MyLO (10%)
Database Design
In- Semester Tests (conducted in the tutorial)
Database Implementation (15%)
Query Design and Optimization (15%)
Database Constraints Implementation (PL/SQL) (15%)
Database Administration (5%)
Assessment - Assignments
For the assignment 1, you are allowed to work in group of upto
two.
You have to find group members yourself
The team members should be from same tutorial
You may discuss the assignment specification with other students, and you
may ask for help with learning the material covered in the unit, but you must
not submit work which has been done by another person
If you give your work to another student and that student submits
that work, then s/he and you are both guilty of Academic Misconduct
Plagiarism…..
http://www.utas.edu.au/student-learning/for-students
Using words, ideas, computer code, or any work by someone else without giving proper credit is
academic dishonesty.
Academic dishonesty is often referred to as plagiarism.
While studying at University you are expected to submit work that is your own.
The intentional copying of someone else’s work as one’s own is a serious offence punishable by
penalties that may range from a fine or deduction/cancellation of marks and, in the most serious of
cases, to exclusion from a unit, a course or the University.
Assessment – Assignments
- Late Penalties
• From the CIS Late Assessment Policy:
<paste>
– Up to 24 hours after the due date. The assignment will be marked in the usual way and
the mark recorded will be 80% of the actual mark obtained.
– More than 24 hours and up to 7 days after the due date. The assignment will be marked
in the usual way and the mark recorded will be 50% of the actual mark obtained
– Later than 7 day after the due date – the assignment will not be marked.
</paste>
Resources
http://www.utas.edu.au/engineering-ict/current-student-resources
Assessment – Assignments
- Extensions
If you want / need an extension for an assignment, you must
complete the Extension Form available on the CIS Resources
webpage http://www.utas.edu.au/engineering-ict/current-student-resources, and provide
suitable supporting documentation
If possible, apply for an extension before the assignment is due
Assessment – Final Exam
40% of overall mark
At the end of semester during University Examination Period
Will cover whole semester's work
Resources – MyLO
• KIT712 Data Management Technology Unit Home Page
– Content
– Lecture Slides
– Online Modules
– Tutorial Sheets
– Assessment
– Information
• Unit Outline
• Academic Integrity
– Announcements
Resources - General
Virtual Machines on Each Desktop
School of Computing Resources
Help Desk
Consultation Times
University Resources
Services & Support
http://www.students.utas.edu.au/
Resources - Fellow Students
Discuss topics
Work together on tutorial exercises etc
Do not believe everything that other students tell you
Remember that assignment work must be the individual work of the
student who submits it
Do not email your assignment work to other students
Do not edit other students’ assignment work
Tips for Success - Overview
Actively participate in the unit
Do the work in the unit as it falls due
If
you get into difficulties, seek help as soon as
possible
Tips for Success - Tutorials
Try tutorial problems and study lecture slides
before coming to tutorials
Actively participate in set activities
Followup after
(complete activities if necessary)
Tips for Success – Private Study
– Keep up to date
– Follow up on problems/ questions from lectures
– Complete tutorial activities
– Read assignment specifications as soon as they are issued
– Seek help as soon as possible
What do we expect from you?
Regular attendance of lectures:
Pay full attention, be enthusiastic, fully committed to learn new things, ask questions during the
class, participate in discussions
Study Lecture Slides before coming to tutorials
Start on assignments as soon as they are announced
If you have some problem with the lecturer/lectures/unit/??, please discuss with me
early.
Don’t take out your frustrations on me during eValuate
Database Systems
Data vs. Information
Data Information
Raw facts
Produced by processing data
Raw data - Not yet been processed to reveal
the meaning Reveals the meaning of data
Building blocks of information Enables knowledge creation
Data management Should be accurate, relevant, and timely to
Generation, storage, and retrieval of data enable good decision making
What is a Database?
Shared, integrated computer structure that stores a collection
of:
End-user data - Raw facts of interest to end user
Metadata: Data about data, which the end-user data are integrated
and managed
Describe data characteristics and relationships
Database management system (DBMS)
Manages the database structure
Collection of programs
Controls access to data stored in the database
The DBMS Manages the Interaction between the End
User and the Database
Role of the DBMS
Intermediary between the user and the database
Enables data to be shared
Presents the end user with an integrated view of the data
Receives and translates application requests into operations required
to fulfill the requests
Hides database’s internal complexity from the application programs
and users
Advantages of the DBMS
Better data integration and less data inconsistency
Data inconsistency: Different versions of the same data appear in different places
Increased end-user productivity
Improved:
Data sharing
Data security
Data access
Decision making
Data quality: Promoting accuracy, validity, and timeliness of data
Types of Databases: User Count
Single-user database: Supports one user at a time
Desktop database: Runs on PC
Multiuser database: Supports multiple users at the same time
Workgroup databases: Supports a small number of users or a specific department
Enterprise database: Supports many users across many departments
Types of Databases: Location
Centralized database: Data is located at a single site
Distributed database: Data is distributed across different sites
Cloud database: Created and maintained using cloud data services that provide
defined performance measures for the database
Types of Databases: Data Subject
General-purpose databases: Contains a wide variety of data used in multiple
disciplines
Discipline-specific databases: Contains data focused on specific subject areas
Types of Databases: Support
Operational database: Designed to support a company’s day-to-day
operations
Analytical database: Stores historical data and business metrics used
exclusively for tactical or strategic decision making
Data warehouse: Stores data in a format optimized for decision support
Types of Databases: Types of Data
Unstructured data: It exists in their original state
Structured data: It results from formatting
Structure is applied based on type of processing to be performed
Semistructured data: Processed to some extent
Extensible Markup Language (XML)
Represents data elements in textual format
“
Database Life Cycle
”
Six Phases
Database initial study
Database design
Implementation and loading
Testing and evaluation
Operation
Maintenance and evolution
The Database Initial Study
Overall purpose:
Analyze company situation
Define problems and constraints Help in
getting
Define objectives
Business
Define scope and boundaries Rules
Interactive and iterative processes required to complete first phase
of DBLC successfully
The Database Initial Study (cont’d.)
Analyze the company situation
General conditions in which company operates, its organizational
structure, and its mission
Discover what company’s operational components are, how they
function, how information flows between them and how they
interact
The Database Initial Study (cont’d.)
Define problems and constraints
Formal and informal information sources
Finding precise answers is important
Accurate problem definition does not always yield a
solution
The Database Initial Study (cont’d.)
Database system objectives must correspond to those envisioned by end users
What is proposed system’s initial objective?
Will system interface with other systems in the company?
Will system share data with other systems or users?
Scope: extent of design according to operational requirements
Boundaries: limits external to system
Database Design
Most critical phase
Necessary to concentrate on data characteristics required to build database model
Makes sure final product meets requirements
Two views of data within system:
Business view
Data as information source
Designer’s view
Data structure, access, and activities required to transform data into information
Database design process
Create Conceptual design
Analysis of business rules Create Logical design
Entity relationship modeling
iterative process with verification
Determine DBMS
Cost
DBMS features and tools
Physical design Underlying model
Portability
DBMS hardware requirements
Implementation and Loading
Install DBMS
Creating a Database
Load and convert the Data
Other issues
Performance
Security
Backup and recovery
Integrity
Company standards
Concurrency controls
Testing and Evaluation
Database is tested and fine-tuned for performance, integrity, concurrent
access, and security constraints
Occurs in parallel with applications programming
Database tools used to prototype applications
If implementation fails to meet some of system’s evaluation criteria:
Fine-tune specific system and DBMS configuration parameters
Modify physical or logical design
Upgrade software and/or hardware platform
Testing and Evaluation
(cont’d.)
Integrity
Enforced via proper use of primary, foreign key rules
Backup and Recovery
Full backup
Differential backup
Transaction log backup
Operation
Once database has passed evaluation stage, it is considered operational
Beginning of operational phase starts process of system evolution
Problems not foreseen during testing surface
Solutions may include:
Load-balancing software to distribute transactions among multiple computers
Increasing available cache
Maintenance and Evolution
Required periodic maintenance:
Preventive maintenance (backup)
Corrective maintenance (recovery)
Adaptive maintenance
Assignment of access permissions and their maintenance for new and old users
Generation of database access statistics
Periodic security audits
Periodic system-usage summaries
Summary: Creating a Database
Describes what
system contains
describes HOW the system
will be implemented
describes HOW the system will
be implemented using a specific
DBMS
Designing Business Rules
Business Rules
Are collected in the initial phase of database life cycle.
Key points for writing business rules:
Discover what company’s operational components are, how they function, how
information flows between them and how they interact
Formal and informal information sources
Describe characteristics of data as viewed by the company
Source: Database Systems , authors: Peter Rob and Carlos Coronel
Examples
Customer can make many payments in the account
Each payment should be in multiple of 100
Working hours of the organization are between 8am-5pm
Why we need Business Rules?
Standardize company’s view of data
Communications tool between users and designers
Allow designer to understand the nature, role, and scope of data
Allow designer to understand business processes
Allow designer to develop appropriate relationship participation rules and
constraints
Source: Database Systems , authors: Peter Rob and Carlos Coronel
Activity: Write some Business Rules for
Mylo Website
Data Modelling
Conceptual Model
(Entity-Relationship (E-R) Models)
Graphical representation of entities and their relationships in a
database structure
Widely accepted and adapted graphical tool for data modeling
Introduced by Chen in 1976
Many extensions/variations exist
Basis for most other modeling approaches
ER Model - Basic Building Blocks
Entity - anything about which data are to be collected and stored
Attribute - a characteristic of an entity
Relationship - describes an association among entities
One-to-many (1:M) relationship
Many-to-many (M:N or M:M) relationship
One-to-one (1:1) relationship
Participation - a restriction placed on the data
Synonyms you should know…
Entity = class = relation = table
Attribute = column columns
Instance = row
le
rows
tab
Many Conventions
We will useONLY
CONVENTIONS as
given in tutorial handout
Source: Database Systems , authors: Peter Rob and Carlos Coronel
Translating Business Rules into ERD Components
Nouns translate into entities
Verbs translate into relationships among entities
Relationships are bidirectional
Questions to identify the relationship type
How many instances of B are related to one instance of A?
How many instances of A are related to one instance of B?
Examples
How many classes can one student enroll in? Many
How many students can be enrolled in one class? Many
Relationship between Student and Class is: Many to Many, *-*
71
Naming Conventions
Entity names - Required to:
Be descriptive of the objects in the business environment
Use terminology that is familiar to the users
Attribute name - Required to be descriptive of the data represented by the attribute
Proper naming:
Facilitates communication between parties
Promotes self-documentation
CAR RENTAL
ENTERPRISE what should happen to a car
a rental is for exactly one customer that’s never rented?
•we care about cars that are rented by
customers are customers (not other cars) and a given car may be used for no rentals
important to the •we care about customers (people) who rent or for many rentals
enterprise cars (not other people)
cars are important to the
enterprise
CUSTOMER c-makes-r cu-rents-ca r-is for-c CAR
rental
EZ-Rent only rents
•customer name
Ford and GM cars
•customer address •car make
•customer since - •customer who rented •car model
•customer telephone number •which car was rented •car model year
•pick-up location manufacturer
•date out •vehicle identification no. (VIN)
th
t h e ch •time out •number of doors
at a •colour
ar rac customer who rented must be •mileage out no car has more
e i te known to EZ-Rent •return location •date purchased than 5 doors
m p r is car rented must belong to EZ-Rent
or tics •date in •licence number
ta o •licence state
nt f c •time in
: us •miles driven must be one of the states
to •mileage in
me the characteristics of rental that are date/time in must be later than •car style EZ-Rent operates in
rsimportant:
date/time out
must be one
of:
•sedan
•coupe
•wagon
•minivan
•sport utility
•truck
•convertible
ER Conventions for KIT712
DIAGRAMMING CONVENTIONS
NAMING CONVENTIONS
ER Modeling in KIT712
In this unit, students will be asked to draw simple (conceptual) ER diagrams
to model given scenarios
Students should use the simple version of the Crows Feet conventions explained in
the following lecture slides and tutorials
Students should not use the conventions from the Rob and Coronel book
(or from any other source)
Diagramming Conventions
An Entity is represented by a rectangle
A Relationship is represented by a line joining two entities
Attributes are written in a list next to the entity or relationship to
which they belong
Identifiers are placed at the top of the list of attributes and are
underlined
Diagramming Conventions - Entity
An Entity is represented by a rectangle
Thename of an entity should be a noun or a noun
phrase
The name of an entity is written in UPPERCASE
Thename of an entity is written inside the rectangle
representing the entity
Diagramming Conventions - Attributes
Attributes are written in a list next to the entity or
the relationship to which they belong
The name of an attribute is usually a noun or a noun
phrase – sometimes the name is an adjective
An attribute is written in Lowercase With Initial
Capitals
Identifiers are placed at the top of the list of
attributes and are Underlined
Example of a Entity
Modelling units offered at the University of Woolloomooloo
Note: This initial model is not necessarily perfect
Diagramming Conventions - Relationship
A Relationship is represented by a line joining two entities
All relationships are binary (or recursive)
- no n-nary (eg ternary) relationships
The name of a relationship should be a verb or a verb phrase
The name of a relationship is written in lowercase – preferably also in
italics
The name of a relationship is near the line representing the relationship
Example of a Relationship
Modelling the association between students and the units that they are enrolled in
Note: Because we are thinking about the relationship
at this stage, I have not listed the attributes
for the entities
Another Example of a Relationship
Modelling the association between a hospital ward and the patients admitted
to the ward
Note: The attribute Date-admitted belongs to
the relationship
Diagramming Conventions – Relationship Cardinality
The cardinality of a relationship indicates the number of possible occurrences
of an entity participating in a given relationship
We will add crows feet to relationship lines to indicate cardinality
Diagramming Conventions – Crows Feet for Cardinality
Crows feet indicate that many (zero or more) instances of
the entity adjacent to the crows feet may be associated with
each instance of the entity at the other end of the relationship
line
Anabsence of crows feet indicates that zero or one instances of
the entity adjacent to the absence of crows feet may be
associated with each instance of the entity at the other end of
the relationship line
Example of Relationship Cardinality – one-to-many
Each patient is receiving, at most, one type of treatment
Each type of treatment may be received by many patients
Note: This initial model is not necessarily perfect
Example of Relationship Cardinality – many-to-many
Each patient is receiving many types of treatments
Each type of treatment is received by
many patients
Diagramming Conventions – Relationship Participation
Participation indicates whether all, or only some, entity
occurrences participate in a relationship
We use the | symbol on relationship lines to indicate
mandatory participation
We use the O symbol on relationship lines to indicate
optional participation
Another Example of Relationship Participation
If we wish to indicate that it is mandatory for a student to be enrolled in at least one
unit, we add a stroke ( | ) at the end of the relationship line near the entity UNIT
Example of Relationship Participation
If we wish to indicate that it is not necessary for a standardised treatment to be
received by any patients, we add an O (for Optional) at the end of the line near the
entity PATIENT
Updated Example of Relationship Participation
If we also wish to indicate that it is not necessary
for a patient to be receiving any standardised
treatments, we add an O (for Optional) at the end
of the line near the entity PATIENT
When to Indicate Participation
We will add O or | to our relationship lines only if the
scenario specifically indicates that participation is optional
or mandatory
Some people choose to mark every end of every
relationship line with either | or O
Some people choose to use only | (mandatory) symbols
ER Diagram – Example 1
Finnegan’s Falderals Factory - Projects
Notes on
ER Diagram – Example 1
ACTIVITY is a subordinate entity
(also called a weak entity)
Each instance of ACTIVITY must be associated with an instance of PROJECT
Note the mandatory symbol (|) near PROJECT
The identifier of ACTIVITY is derived from the identifier of PROJECT
The identifier of ACTIVITY is {Project-id, Activity-no}
The identifier of PROJECT is {Project-id}
ER Diagram – Example 2
Fred Friendly’s Factory - Projects
Notes on
ER Diagram – Example 2
must precede is a recursive relationship
(also called a unary relationship)
Each instance of TASK may be associated with many instances of TASK
is part of is a binary relationship
The two entities in the relationship are TASK and PROJECT
Each instance of PROJECT may be associated with many instances of TASK
Our ER Naming Conventions
for (Conceptual) ER Diagrams
The name of an entity is written in UPPERCASE (also known as
ALL CAPITALS)
The name of a relationship is written in lowercase – preferably also
in italics
The name of an attribute is written in Lowercase With Initial Capitals
Note that each type of name is written differently
Creating an ER Diagram
Use the ER Conventions when drawing ER diagrams from scenarios
Give each ER Diagram a title
eg
Finnegan’s Falderals Factory - Projects
University of Woolloomooloo
Unless explicitly told otherwise,
include attributes on the diagram
ER Conventions for KIT712 - Handout
Contains more information and more examples
Available on MyLO
Example Scenario
Canterbury Cat Club has decided to create a database to store information about
the cats that belong to its members.
Each cat is allocated a unique identifier. The Club also stores the following data
about each cat: name, sex, age, and spayed status.
Each cat may have one or more owners, and each owner may own one or more
cats. One owner is identified as the primary contact for each cat.
Each cat owner must be a member of the Club, therefore each owner has a unique
Member Number. The Club also stores the following data about each owner:
first name, surname, phone number, and address.
Each owner has only one address, but some owners share the same address.
The Club stores the following data about each address: street number, street name,
suburb, state, and postcode.
ER Diagram of Given Scenario
(Conceptual Model)
Canterbury Cat Club
Bad Business Rules
Discussing implementation details such as foreign keys and
Primary keys
Discussing what is entity or relationship
Not giving details as required
Assignment 1(More Information announced
this week)
Consider a scenario [details will be in assignment description] where you have to design a
database for an organisation
Part 1
Business Rules
Improve based on comments given by Lecturer on your submission
Part 2
ER Modelling based on updated business rules
Relational Model
Submission by Mylo
If done in group of two, names/ids should be provided during submission