0% found this document useful (0 votes)
92 views31 pages

Is

This document provides information about the CS 317 course including the instructor, prerequisites, grading, communication methods, and course topics such as databases, database management systems, data modeling, queries, and concurrency control.

Uploaded by

dhirendra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views31 pages

Is

This document provides information about the CS 317 course including the instructor, prerequisites, grading, communication methods, and course topics such as databases, database management systems, data modeling, queries, and concurrency control.

Uploaded by

dhirendra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

CS 317 - Data Management

and Information Processing


Logistics
• Instructor
Yan Chen ([email protected]),
Office Hours: Th. 2-4pm or by appointment,
Rm 330, 1890 Maple Ave.
• TA
Yi Qiao ([email protected])
Office Hours: Fri. 2-4pm, Rm 246, 1890 Maple
Ave.
Prerequisites
• Required: CS110, CS 111 or programming
experience

Course Materials
• Required: A First Course in Database
Systems,  (2nd Edition), Jeffrey Ullman and
Jennifer Widom, Prentice Hall, 2002.
• Recommended: Database Management
Systems, Third Edition, Raghu Ramakrishnan
and Johannes Gehrke, McGraw-Hill, 2002.
Grading
• Homework (4-5 sets) 20%
• Projects 30%
– Use Microsoft Access to design a database in two
projects. 
– The first project is on the entity-relational (ER)
model,
– The second project is on relational algebra (RA)
and relational calculus (RC).
• Final 25%
– Exams in-class, closed-book, non-cumulative
• Late policy: 10% each day after the due date
• No cheating
Communication
• Web page:
http://www.cs.nwu.edu/~ychen/classes/cs317/
• Recitation: Tu, Th or Fri? 5-6pm, Room 381,
1890 Maple.
– TA lectures on the homework and projects, and help
to prepare the exams.
• Newsgroup are available
– cs.317 (course announcement, and posting Q
& A)
• Send emails to instructor and TA for questions
inappropriate in newsgroup
• Course outline (see it online)
What Is a Database System?

• Database:
a very large, integrated collection of
data.
• Models a real-world enterprise
– Entities (e.g., teams, games)
– Relationships
(e.g., The Forty-Niners are playing in The Superbowl)
– More recently, also includes active components ,
often called “business logic”. (e.g., the BCS ranking
system)

• A Database Management System (DBMS) is a


software system designed to store, manage, and
facilitate access to databases.
Database Systems: Then
Database Systems: Today

From Friendster.com on-line tour


Other Ways Databases Make Life
Better?
• “Players could finally
sign up for the Star
Wars Galaxies
game
last week as Sony
opened up registration
to the public.”

• “Once players got in to


the game they found
that the game
servers
were offline because of database
problems.”

• “Some players spent hours tuning their


Other databases you may use
= Is the WWW a DBMS?
• Fairly sophisticated search available
– crawler indexes pages on the web
– Keyword-based search for pages
• But, currently
– data is mostly unstructured and untyped
– search only:
• can’t modify the data
• can’t get summaries, complex combinations of data
– few guarantees provided for freshness of data,
consistency across data items, fault tolerance, …
– Web sites typically have a DBMS in the background to
provide these functions.
• The picture is changing
– New standards e.g., XML, Semantic Web can help data
modeling
– Research groups (e.g., at Berkeley) are working on
providing some of this functionality across multiple web
sites.
“Search” vs. Query

• What if you
wanted to find
out which actors
donated to John
Kerry’s
presidential
campaign?

• Try “actors
donated to john
kerry” in your
favorite search
engine.
A “Database Query” Approach
Is a File System a
= DBMS?
• Thought Experiment 1:
– You and your project partner are editing the
same file.
– You both save it at the same time.
– Whose changes survive?
A) Yours B) Partner’s C) Both D) Neither E) ???
•Thought Experiment 2: Q: How do you write
programs over a
–You’re updating a file.
subsystem when it
–The power goes out. promises you only “???” ?
–Which of your changes survive?
A: Very, very carefully!!
A) All B) None C) All Since Last Save D) ???
Current Commercial Outlook
• A major part of the software industry:
– Oracle, IBM, Microsoft, Sybase
– also Informix (now IBM), Teradata
– smaller players: java-based dbms, devices, OO, …
• Well-known benchmarks (esp. TPC)
• Lots of related industries
– data warehouse, document management, storage,
backup, reporting, business intelligence, app
integration
• Relational products dominant and
evolving
– adapting for extensibility (user-defined types),
adding native XML support.
• Open Source coming on strong
– MySQL, PostgreSQL, BerkeleyDB
Why Study Databases?? ?

• Shift from computation to information


– always true for corporate computing
– Web made this point for personal computing
– more and more true for scientific computing
• Need for DBMS has exploded in the last years
– Corporate: retail swipe/clickstreams, “customer
relationship mgmt”, “supply chain mgmt”, “data
warehouses”, etc.
– Scientific: digital libraries, Human Genome project,
NASA Mission to Planet Earth, physical sensors, grid
physics network
• DBMS encompasses much of CS in a practical
discipline
– OS, languages, theory, AI, multimedia, logic
– Yet traditional focus on real-world apps
What’s the intellectual content?
• representing information
– data modeling
• languages and systems for querying data
– complex queries with real semantics*
– over massive data sets
• concurrency control for data
manipulation
– controlling concurrent access
– ensuring transactional semantics
• reliable data storage
– maintain data semantics even if you
pull the plug
Describing Data: Data Models
• A data model is a collection of concepts
for describing data.

• A schema is a description of a particular


collection of data, using a given data
model.

• The relational model of data is the most


widely used model today.
– Main concept: relation, basically a table
with rows and columns.
– Every relation has a schema, which
describes the columns, or fields.
Levels of Abstraction
Users
• Views describe how
users see the data.

• Conceptual schema View 1 View 2 View 3


defines logical
structure Conceptual Schema

Physical Schema

• Physical schema
describes the files and DB
indexes used.

• (sometimes called the


Example: University Database
View 1 View 2 View 3
• Conceptual schema:
– Students(sid: string, name: string,
Conceptual Schema
login: string, age: integer, gpa:real)
– Courses(cid: string, cname:string, Physical Schema
credits:integer)
– Enrolled(sid:string, cid:string,
grade:string) DB
• External Schema (View):
– Course_info(cid:string,enrollment:integer)
• Physical schema:
– Relations stored as unordered files.
– Index on first column of Students.
Data Independence
• Applications insulated
from how data is View 1 View 2 View 3
structured and stored.
• Logical data
independence: Protection Conceptual Schema
from changes in logical
structure of data. Physical Schema

• Physical data
independence: DB
Protection from changes
in physical structure of
data.
Queries, Query Plans, and Operators
Count
Having
distinct
SELECT eid,
SELECT E.loc,
ename,
AVG(E.sal)
title
π
π
COUNT DISTINCT (E.eid)
π
FROM Emp E
FROM Emp E,E.loc
Proj P, Asgn A
WHERE
GROUP BY
E.sal > $50K
WHERE E.eid = A.eid
Group(agg)
HAVING Count(*) > 5
AND P.pid = A.pid Join
Select
AND E.loc <> P.loc
Join
π Proj
Emp
Emp Emp
Asgn
• System handles query
plan generation &
optimization; ensures Employees
correct execution. Projects
Assignments

• Issues: view reconciliation, operator ordering, physical


operator choice, memory management, access path (index)
use, …
Concurrency Control
• Concurrent execution of user programs: key to
good DBMS performance.
– Disk accesses frequent, pretty slow
– Keep the CPU working on several programs
concurrently.
• Interleaving actions of different programs:
trouble!
– e.g., account-transfer & print statement at same
time
• DBMS ensures such problems don’t arise.
– Users/programmers can pretend they are using a
single-user system. (called “Isolation”)
– Thank goodness! Don’t have to program “very, very
carefully”.
Transactions: ACID Properties
• Key concept is a transaction: a sequence of database
actions (reads/writes).

• DBMS ensures atomicity (all-or-nothing property)


even if system crashes in the middle of a Xact.
• Each transaction, executed completely, must take the
DB between consistent states or must not run at all.
• DBMS ensures that concurrent transactions appear to
run in isolation.
• DBMS ensures durability of committed Xacts even if
system crashes.

• Note: can specify simple integrity constraints on the


data. The DBMS enforces these.
– Beyond this, the DBMS does not understand the
semantics of the data.
– Ensuring that a single transaction (run alone) preserves
consistency is largely the user’s responsibility!
Ensuring Transaction Properites
• DBMS ensures atomicity (all-or-nothing
property) even if system crashes in the middle
of a Xact.
• DBMS ensures durability of committed Xacts
even if system crashes.
• Idea: Keep a log (history) of all actions carried
out by the DBMS while executing a set of
Xacts:
– Before a change is made to the database, the
corresponding log entry is forced to a safe
location.
– After a crash, the effects of partially executed
transactions are undone using the log. Effects of
committed transactions are redone using the log.
The Log
• The following actions are recorded in the log:
– Ti writes an object: the old value and the new value.
• Log record must go to disk before the changed page!
– Ti commits/aborts: a log record indicating this
action.
• Log is often duplexed and archived on “stable”
storage.
• All log related activities (and in fact, all
concurrency control related activities such as
lock/unlock, dealing with deadlocks etc.) are
handled transparently by the DBMS.
These layers
Structure of a DBMS must consider
concurrency
control and
• A typical DBMS has a recovery
layered architecture. Query Optimization
• The figure does not and Execution
show the concurrency
control and recovery Relational Operators
components. Files and Access Methods
• Each database
system has its own Buffer Management
variations. Disk Space Management

DB
Advantages of a DBMS

• Data independence
• Efficient data access
• Data integrity & security
• Data administration
• Concurrent access, crash recovery
• Reduced application development time
• So why not use them always?
– Expensive/complicated to set up & maintain
– This cost & complexity must be offset by need
– General-purpose, not suited for special-purpose tasks (e.g.
text search!)
Databases make these folks
happy ...
• DBMS vendors, programmers
– Oracle, IBM, MS, Sybase, …
• End users in many fields
– Business, education, science, …
• DB application programmers
– Build enterprise applications on top of DBMSs
– Build web services that run off DBMSs
• Database administrators (DBAs)
– Design logical/physical schemas
– Handle security and authorization
– Data availability, crash recovery
– Database tuning as needs evolve

…must understand how a DBMS work


Summary (part 1)
• DBMS used to maintain, query large
datasets.
– can manipulate data and exploit semantics
• Other benefits include:
– recovery from system crashes,
– concurrent access,
– quick application development,
– data integrity and security.
• Levels of abstraction provide data
independence
– Key when dapp/dt << dplatform/dt
Summary, cont.

• DBAs, DB developers the


bedrock of the information
economy

• DBMS R&D represents a broad,


fundamental branch of the science
of computation

You might also like