Databases-Draft 1
Databases-Draft 1
DATABASES
• Let’s examine some basic principles about how data are
stored in computer systems.
– An entity is anything about which the organization wishes to
store data. At your college or university, one entity would be the
student.
STUDENTS
Phone
Student ID Last Name First Name Number Birth Date
1 of 96
FILE VS. DATABASES
– Information about the attributes of an entity (e.g., the
student’s ID number and birth date) are stored in
fields.
STUDENTS
Phone
Student ID Last Name First Name Number Birth Date
2 of 96
FILE VS. DATABASES
– All the fields containing data about one entity (e.g.,
one student) form a record.
– The example below shows the record for Artie Moore.
STUDENTS
Phone
Student ID Last Name First Name Number Birth Date
3 of 96
FILE VS. DATABASES
– A set of all related records forms a file (e.g., the
student file).
– If this university only had three students and five
fields for each student, then the entire file would be
depicted below.
STUDENTS
Phone
Student ID Last Name First Name Number Birth Date
4 of 96
FILE VS. DATABASES
– A set of interrelated, centrally coordinated files forms
a database.
Student Class
File File
Advisor
File
5 of 96
FILE VS. DATABASES
6 of 96
FILE VS. DATABASES
• This proliferation of master
Master File 1 Enrollment files created problems:
Fact A Program
Fact B – Often the same information was
Fact C stored in multiple master files.
– Made it more difficult to
effectively integrate data and
obtain an organization-wide view
Master File 2
Financial Aid of the data.
Fact A
Fact D Program – Also, the same information may
Fact F not have been consistent
between files.
• If a student changed his
phone number, it may have
Master File 1 Grades been updated in one master
Fact A
Fact B
Program file but not another.
Fact F
7 of 96
FILE VS. DATABASES
Database
• A database is a set
Fact A Fact B
Fact C Fact D of inter-related,
Fact E Fact F
centrally
coordinated files.
Database
Management
System
8 of 96
FILE VS. DATABASES
• The database approach
Database treats data as an
Fact A Fact B
Fact C Fact D
organizational resource
Fact E Fact F that should be used by
and managed for the
entire organization, not
just a particular
Database department.
Management
System • A database management
system (DBMS) serves
as the interface between
the database and the
Enrollment Financial Aid Grades
Program Program Program
various application
programs.
9 of 96
FILE VS. DATABASES
Database
• The combination of
Fact A Fact B
Fact C Fact D the database, the
Fact E Fact F
DBMS, and the
application
Database
Management
programs that
System access the
database is
Enrollment
Program
Financial Aid
Program
Grades
Program
referred to as the
database system.
10 of 96
FILE VS. DATABASES
Database
• The person
Fact A Fact B
Fact C Fact D
responsible for the
Fact E Fact F database is the
• Hewlett- database
Packard is
replacing 784 administrator.
Database
Management • As technology
databases with
a single,
System
company-wide
improves, many large
database. companies are
Enrollment Financial Aid Grades
developing very large
Program Program Program databases called data
warehouses.
11 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
• Database technology is everywhere.
– Virtually all mainframe computer sites use
database technology.
– Use of databases with PCs is growing also.
– Cloud Storage uses databases
12 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
• Database technology provides the
following benefits to organizations:
– Data integration • Achieved by combining
master files into larger
pools of data accessible
by many programs.
13 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
14 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
• Database technology provides the
following benefits to organizations:
– Data integration
– Data sharing
– Reporting flexibility
• Reports can be revised easily and
generated as needed.
• The database can easily be browsed to
research problems or obtain detailed
information.
15 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
• Database technology provides the
following benefits to organizations:
– Data integration
– Data sharing
– Reporting flexibility
– Minimal data redundancy and
inconsistencies • Because data items are
usually stored only once.
16 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
• Database technology provides the
following benefits to organizations:
– Data integration
• Data items are independent of the programs that
use them.
– Data sharing
• Consequently, a data item can be changed
– Reporting flexibility
without changing the program and vice versa.
• Makes programming easier and simplifies data
– Minimal data redundancy and inconsistencies
management.
– Data independence
17 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
• Database technology provides the
following benefits to organizations:
– Data integration
– Data sharing
– Reporting flexibility
• Data management is more efficient
because the database
– Minimal data redundancy administrator is
and inconsistencies
responsible for coordinating, controlling,
– Data independence
and managing data.
18 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
• Database technology provides the
following benefits to organizations:
– Data integration
– Data sharing
– Reporting flexibility
• Relationships can be explicitly defined and
– Minimal dataused
redundancy and inconsistencies
in the preparation of management
reports.
– Data independence
• EXAMPLE: Relationship between selling
– Central management of data campaigns.
costs and promotional
– Cross-functional analysis
19 of 96
IMPORTANCE AND ADVANTAGES OF
DATABASE SYSTEMS
• The importance of good data:
– Bad data leads to:
• Bad decisions
• Embarrassment
• Angry users
Data Warehousing Institute estimates that
dirty data costs $600 billion per year in
unnecessary postage, marketing costs, and
lost customer credibility.
20 of 96
DATABASE SYSTEMS
21 of 96
DATABASE SYSTEMS
22 of 96
DATABASE SYSTEMS
23 of 96
DATABASE SYSTEMS
Jr.
38%
Database
25 of 96
Logical View—User A Logical View—User B
Scholarship Distribution
Enrollment by Class Fr.
5%
Sr. Soph.
33% 24%
Jr.
38%
DBMS
The operating system
Operating translates DBMS
System requests into
instructions to
physically retrieve
data from various
Database disks.
26 of 96
DATABASE SYSTEMS
27 of 96
DATABASE SYSTEMS
28 of 96
DATABASE SYSTEMS
• Schemas
– A schema describes the logical structure of a
database.
– There are three levels of schema.
• Conceptual level
• The organization-wide view of the entire
database—i.e., the big picture.
• Lists all data elements and the relationships
between them.
29 of 96
Subschema--User A Subschema--User B Subschema--User C
Smith . . . A
Jones . . . B
Arnold . . .D
Cash
Receipt
• Schemas
– A schema describes the logical structure of a
database.
– There are three levels of schema.
• Conceptual level
• External level
• A set of individual user views of portions of
the database, i.e., how each user sees the
portion of the system with which he
interacts.
• These individual views are referred to as
subschema.
31 of 96
Subschema--User A Subschema--User B Subschema--User C
Smith . . . A
Jones . . . B
Arnold . . .D
Cash
Receipt
• Schemas
– A schema describes the logical structure of a
database.
– There are three levels of schema.
• Conceptual level• A low-level view of the database.
• External level • It describes how the data are actually
stored and accessed including:
• Internal level
– Record layouts
– Definitions
– Addresses
– Indexes
33 of 96
Subschema--User A Subschema--User B Subschema--User C
Smith . . . A
Jones . . . B
Arnold . . .D
Cash
Receipt
The
Classes Enroll Student bidirectional
arrows
represent
mappings
Cash
between the
Receipt
schema.
36 of 96
DATABASE SYSTEMS
37 of 96
DATABASE SYSTEMS
38 of 96
DATABASE SYSTEMS
39 of 96
DATABASE SYSTEMS
• The DBMS usually maintains the data dictionary.
– It is often one of the first applications of a newly
implemented database system.
– Inputs to the dictionary include:
• Records of new or deleted data elements.
• Changes in names, descriptions, or uses of existing
elements.
– Outputs include:
• Reports that are useful to programmers, database designers,
and IS users in:
– Designing and implementing the system.
– Documenting the system.
– Creating an audit trail.
40 of 96
DATABASE SYSTEMS
• DBMS Languages
– Every DBMS must provide a means of
performing the three basic functions of:
• Creating a database
• Changing a database
• Querying a database
41 of 96
DATABASE SYSTEMS
• DBMS Languages
– Every DBMS must provide a means of
performing the three basic functions of:
• Creating a database
• Changing a database
• Querying a database
42 of 96
DATABASE SYSTEMS
• Creating a database:
– The set of commands used to create the
database is known as data definition
language (DDL). DDL is used to:
• Build the data dictionary
• Initialize or create the database
• Describe the logical views for each individual user
or programmer
• Specify any limitations or constraints on security
imposed on database records or fields
43 of 96
DATABASE SYSTEMS
• DBMS Languages
– Every DBMS must provide a means of
performing the three basic functions of:
• Creating a database
• Changing a database
• Querying a database
44 of 96
DATABASE SYSTEMS
• Changing a database
– The set of commands used to change the
database is known as data manipulation
language (DML). DML is used for
maintaining the data including:
• Updating data
• Inserting data
• Deleting portions of the database
45 of 96
DATABASE SYSTEMS
• DBMS Languages
– Every DBMS must provide a means of
performing the three basic functions of:
• Creating a database
• Changing a database
• Querying a database
46 of 96
DATABASE SYSTEMS
• Querying a database:
– The set of commands used to query the database is
known as data query language (DQL). DQL is used
to interrogate the database, including:
• Retrieving records
• Sorting records
• Ordering records
• Presenting subsets of the database
– The DQL usually contains easy-to-use, powerful
commands that enable users to satisfy their own
information needs.
47 of 96
DATABASE SYSTEMS
• Report Writer
– Many DBMS packages also include a report writer, a
language that simplifies the creation of reports.
– Users typically specify:
• What elements they want printed
• How the report should be formatted
– The report writer then:
• Searches the database
• Extracts specified data
• Prints them out according to specified format
48 of 96
DATABASE SYSTEMS
49 of 96
RELATIONAL DATABASES
50 of 96
RELATIONAL DATABASES
51 of 96
STUDENTS
Last First Phone
Student ID Name Name No.
333-33-3333 Simpson Alice 333-3333
Relation
111-11-1111 Sanders Ned 444-4444
123-45-6789 Moore Artie 555-5555
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
SCID Student ID Course
333333333-1234 333-33-3333 1234
333333333-1236 333-33-3333 1236
111111111-1235 111-11-1111 1235
111111111-1236 111-11-1111 1235
52 of 96
RELATIONAL DATABASES
53 of 96
STUDENTS
Last First Phone
Student ID Name Name No.
333-33-3333 Simpson Alice 333-3333 Each row is
111-11-1111 Sanders Ned 444-4444 called a tuple
123-45-6789 Moore Artie 555-5555
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
SCID
333333333-1234
333333333-1236
111111111-1235
111111111-1236
54 of 96
Each row
STUDENTS
contains data
Last First Phone
about a specific
Student ID Name Name No.
occurrence of
333-33-3333 Simpson Alice 333-3333
the type of entity
111-11-1111 Sanders Ned 444-4444
in the table.
123-45-6789 Moore Artie 555-5555
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
SCID
333333333-1234
333333333-1236
111111111-1235
111111111-1236
55 of 96
STUDENTS Each column in
Last First Phone a table contains
Student ID Name Name No. information
333-33-3333 Simpson Alice 333-3333 about a specific
111-11-1111 Sanders Ned 444-4444 attribute of the
123-45-6789 Moore Artie 555-5555 entity.
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
SCID
333333333-1234
333333333-1236
111111111-1235
111111111-1236
56 of 96
STUDENTS
Last First Phone
Student ID Name Name No.
333-33-3333 Simpson Alice 333-3333
111-11-1111 Sanders Ned 444-4444
123-45-6789 Moore Artie 555-5555
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
A primary key is the
SCID attribute or combination
333333333-1234 of attributes that
333333333-1236 uniquely identifies a
111111111-1235 specific row in a table.
111111111-1236
57 of 96
STUDENTS
Last First Phone
Student ID Name Name No.
333-33-3333 Simpson Alice 333-3333
111-11-1111 Sanders Ned 444-4444
123-45-6789 Moore Artie 555-5555
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
SCID
333333333-1234 In some tables, two or more attributes
333333333-1236 may be joined to form the primary key.
111111111-1235
111111111-1236
58 of 96
STUDENTS
First Advisor
Student ID Last Name Name Phone No. No.
333-33-3333 Simpson Alice 333-3333 1418
111-11-1111 Sanders Ned 444-4444 1418
123-45-6789 Moore Artie 555-5555 1503
ADVISORS
Advisor No. Last Name First Name Office No.
1418 Howard Glen 420
1419 Melton Amy 316
1503 Zhang Xi 202
1506 Radowski J.D. 203
59 of 96
STUDENTS
First Advisor
Student ID Last Name Name Phone No. No.
333-33-3333 Simpson Alice 333-3333 1418
111-11-1111 Sanders Ned 444-4444 1418
123-45-6789 Moore Artie 555-5555 1503
ADVISORS
Advisor No. Last Name First Name Office No.
1418 Howard Glen 420
1419 Melton Amy 316
1503 Zhang Xi 202
1506 Radowski J.D. 203
60 of 96
STUDENTS
First Advisor
Student ID Last Name Name Phone No. No.
333-33-3333 Simpson Alice 333-3333 1418
111-11-1111 Sanders Ned 444-4444 1418
123-45-6789 Moore Artie 555-5555 1503
ADVISORS
Advisor No. Last Name First Name Office No.
1418 Howard Glen 420
1419 Melton Amy 316
1503 Zhang Xi 202
1506 Radowski J.D. 203
61 of 96
RELATIONAL DATABASES
62 of 96
Last First
Student ID Name Name Phone No. Course No. Section Day Time
333-33-3333 Simpson Alice 333-3333 ACCT-3603 1 M 9:00 AM
333-33-3333 Simpson Alice 333-3333 FIN-3213 3 Th 11:00 AM
333-33-3333 Simpson Alice 333-3333 MGMT-3021 11 Th 12:00 PM
111-11-1111 Sanders Ned 444-4444 ACCT-3433 2 T 10:00 AM
111-11-1111 Sanders Ned 444-4444 MGMT-3021 5 W 8:00 AM
111-11-1111 Sanders Ned 444-4444 ANSI-1422 7 F 9:00 AM
123-45-6789 Moore Artie 555-5555 ACCT-3433 2 T 10:00 AM
123-45-6789 Moore Artie 555-5555 FIN-3213 3 Th 11:00 AM
63 of 96
Last First
Student ID Name Name Phone No. Course No. Section Day Time
333-33-3333 Simpson Alice 333-3333 ACCT-3603 1 M 9:00 AM
333-33-3333 Simpson Alice 333-3333 FIN-3213 3 Th 11:00 AM
333-33-3333 Simpson Alice 333-3333 MGMT-3021 11 Th 12:00 PM
111-11-1111 Sanders Ned 444-4444 ACCT-3433 2 T 10:00 AM
111-11-1111 Sanders Ned 444-4444 MGMT-3021 5 W 8:00 AM
111-11-1111 Sanders Ned 444-4444 ANSI-1422 7 F 9:00 AM
123-45-6789 Moore Artie 555-5555 ACCT-3433 2 T 10:00 AM
123-45-6789 Moore Artie 555-5555 FIN-3213 3 Th 11:00 AM
64 of 96
Last First
Student ID Name Name Phone No. Course No. Section Day Time
333-33-3333 Simpson Alice 333-3333 ACCT-3603 1 M 9:00 AM
333-33-3333 Simpson Alice 333-3333 FIN-3213 3 Th 11:00 AM
333-33-3333 Simpson Alice 333-3333 MGMT-3021 11 Th 12:00 PM
111-11-1111 Sanders Ned 444-4444 ACCT-3433 2 T 10:00 AM
111-11-1111 Sanders Ned 444-4444 MGMT-3021 5 W 8:00 AM
111-11-1111 Sanders Ned 444-4444 ANSI-1422 7 F 9:00 AM
123-45-6789 Moore Artie 555-5555 ACCT-3433 2 T 10:00 AM
123-45-6789 Moore Artie 555-5555 FIN-3213 3 Th 11:00 AM
65 of 96
Last First
Student ID Name Name Phone No. Course No. Section Day Time
333-33-3333 Simpson Alice 333-3333 ACCT-3603 1 M 9:00 AM
333-33-3333 Simpson Alice 333-3333 FIN-3213 3 Th 11:00 AM
333-33-3333 Simpson Alice 333-3333 MGMT-3021 11 Th 12:00 PM
111-11-1111 Sanders Ned 444-4444 ACCT-3433 2 T 10:00 AM
111-11-1111 Sanders Ned 444-4444 MGMT-3021 5 W 8:00 AM
111-11-1111 Sanders Ned 444-4444 ANSI-1422 7 F 9:00 AM
123-45-6789 Moore Artie 555-5555 ACCT-3433 2 T 10:00 AM
123-45-6789 Moore Artie 555-5555 FIN-3213 3 Th 11:00 AM
• If Ned withdraws from all his classes and you eliminate all three of
his rows from the table, then you will no longer have a record of
Ned. If Ned is planning to take classes next semester, then you
probably didn’t really want to delete all records of him.
• This problem is referred to as a delete anomaly.
66 of 96
RELATIONAL DATABASES
67 of 96
Last First Phone
Student ID Name Name No. Class 1 Class 2 Class 3 Class 4
68 of 96
STUDENTS
Last First Phone
Student ID Name Name No.
333-33-3333 Simpson Alice 333-3333
111-11-1111 Sanders Ned 444-4444
123-45-6789 Moore Artie 555-5555
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE • The solution to the preceding problems
SCID is to use a set of tables in a relational
333333333-1234 database.
333333333-1236 • Each entity is stored in a separate table,
111111111-1235 and separate tables or foreign keys can
111111111-1236 be used to link the entities together.
69 of 96
RELATIONAL DATABASES
71 of 96
STUDENTS
Last First Phone
Student ID Name Name No.
333-33-3333 Simpson Alice 333-3333
111-11-1111 Sanders Ned 444-4444
123-45-6789 Moore Artie 555-5555
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
SCID • Note that within each table, there are no
333333333-1234 duplicate primary keys and no null
333333333-1236 primary keys.
111111111-1235 • Consistent with the entity integrity rule.
111111111-1236
72 of 96
RELATIONAL DATABASES
73 of 96
STUDENTS
First Advisor
Student ID Last Name Name Phone No. No.
333-33-3333 Simpson Alice 333-3333 1418
111-11-1111 Sanders Ned 444-4444 1418
123-45-6789 Moore Artie 555-5555 1503
ADVISORS
Advisor No. Last Name First Name Office No.
1418 Howard Glen 420
1419 Melton Amy 316
1503 Zhang Xi 202
1506 Radowski J.D. 203
74 of 96
RELATIONAL DATABASES
75 of 96
RELATIONAL DATABASES
76 of 96
RELATIONAL DATABASES
• An important feature is that data about various things of
interest (entities) are stored in separate tables.
– Makes it easier to add new data to the system.
• You add a new student by adding a row to the student
table.
• You add a new course by adding a row to the course
table.
• Means you can add a student even if he hasn’t signed
up for any courses.
• And you can add a class even if no students are yet
enrolled in it.
– Makes it easy to avoid the insert anomaly.
• Space is also used more efficiently than in the other
schemes. There should be no blank rows or attributes.
77 of 96
• Add a
STUDENTS
student
Last First Phone
here.
Student ID Name Name No.
333-33-3333 Simpson Alice 333-3333 • Leaves no
111-11-1111 Sanders Ned 444-4444 blank
123-45-6789 Moore Artie 555-5555 spaces.
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
• Add a course here.
SCID
333333333-1234 • Leaves no blank spaces.
333333333-1236
111111111-1235 • When a particular student enrolls for a
111111111-1236 particular course, add that info here.
78 of 96
RELATIONAL DATABASES
79 of 96
STUDENTS
Last First Phone • Ned still
Student ID Name Name No. exists in
333-33-3333 Simpson Alice 333-3333 the
111-11-1111 Sanders Ned 444-4444 student
123-45-6789 Moore Artie 555-5555 table.
COURSES
Course ID Course Section Day Time
1234 ACCT-3603 1 MWF 8:30
1235 ACCT-3603 2 TR 9:30
1236 MGMT-2103 1 MW 8:30
STUDENT x COURSE
• Even if Ned was the only student in
SCID
the class, ACCT-3603 still exists in
333333333-1234
the course table.
333333333-1236
111111111-1235 • If Ned Sanders drops ACCT-3603,
111111111-1236 remove Ned’s class from this table.
80 of 96
RELATIONAL DATABASES
81 of 96
RELATIONAL DATABASES
82 of 96
RELATIONAL DATABASES
• Normalization
– Starts with the assumption that everything is
initially stored in one large table.
– A set of rules is followed to decompose that
initial table into a set of normalized tables.
– Objective is to produce a set of tables in third-
normal form (3NF) because such tables are
free of update, insert, and delete anomalies.
83 of 96
RELATIONAL DATABASES
84 of 96
RELATIONAL DATABASES
85 of 96
RELATIONAL DATABASES
86 of 96
RELATIONAL DATABASES
87 of 96