0% found this document useful (0 votes)
18 views

Lecture1 Intro To DBMS

Uploaded by

maxohm24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lecture1 Intro To DBMS

Uploaded by

maxohm24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

DBMS (15013)

Lecture 1
Introduction to Database Management System

These slides are primarily based on the slides for (CS186 EECTheseS Berkley by Prof. Michael
Franklin, and Intro. to databases by J Widom {https://cs.stanford.edu/people/widom/}). They are being
used for academic purpose only.
Prof. Anil Kumar Singh
“Knowledge is of two kinds: we know a
subject ourselves, or we know where
we can find information upon it.”

-- Samuel Johnson (1709-1784)


Database
Motivation

• Information is the key


• Data
• Data Repository : Database
• When Needed Information —> Process Data
• storage / indexing / manipulation / Query /backup

• DBMS deals with Data Management


Database
• a collection of self-describing integrated data
• represents some aspects of the real world
• a logically coherent collection
• design, built, populated with data for a specific purpose (users /application)

• DBMS:
• A collection of programs that enables users to create and maintain a database
• a general purpose software system that facilitates the process of defining,
constructing, and manipulating databases for various applications
Database Systems Today
So…What Is a Database System?

Database Management System (DBMS)


provides….
… efficient, reliable, convenient, and
safe
multi-user storage of and access to
massive
amounts of persistent data
Intro to Databases

▪ Massive (Tera / peta / Exabytes )


▪ Persistent Beyond failures
▪ Safe (hardware, software, power, users …)
▪ Multi-user (Concurrency Control)
▪ Convenient (physical data independence/ high
level query lang , declarative)
▪ Efficient (Thousands of queries/updates per sec)
▪ Reliable (99.9999%)
= Is the WWW a DBMS?
• Fairly sophisticated search available
– crawler indexes pages on the web
– Keyword-based search for pages
• But, currently
– data is mostly unstructured and untyped
– search only:
• can’t modify the data
• can’t get summaries, complex combinations of data
– few guarantees provided for freshness of data, consistency
across data items, fault tolerance, …
– Web sites typically have a DBMS in the background to
provide these functions.
• The picture is changing
– New standards e.g., XML, Semantic Web can help data
modeling
– Research groups (e.g., at Berkeley) are working on
providing some of this functionality across multiple web
sites.
I know you think you understand what
you thought I said, but I'm not sure you
realize that what you heard is not what
I meant.

Robert McCloskey
“Search” vs. Query
What if you wanted to
find out which actors
donated to John
Kerry’s presidential
campaign?

• Try “hollywood kerry


donations” in your
favorite search
engine.
“Search” vs. Query

• “Search” can
return only
what’s been
previously
“stored”.

And, it’s subject to


the “spin” of whoever
did the storing.
Also…
• What if I wanted to find out the average
donation of actors to each candidate?
• What if I wanted to compare actor donations this
campaign to the last one?
• What if I wanted to find out who gave the most
to each candidate?
• What if I wanted to know where the data came
from, and how old it was?
A “Database Query” Approach
“Yahoo Actors” JOIN “FECInfo”
(Courtesy of the Telegraph research group @Berkeley)

Q: Did it Work?
What’s going on here?
• Unstructured Data
– Text-based search is based mostly on statistical models of similarity.
• no real “understanding” of the data
– Google’s big step forward was to exploit some of the structure in web
documents.
– Still, web search places a large burden on people to do the last stage of
filtering and interpretation.
• Structure gives computers the ability to
manipulate and maintain the data.

• Traditional (relational) Database systems are


aimed at structured data.
Other Unstructured Data - Images

Similarity
search by
“features”
Picture From Univ. of Konstanz
What about structured data?
•A data model is a collection of concepts for
describing data.
•A schema is a description of a particular
collection of data, using a given data model.

•The relational model of data is the most widely


used model today.
•Main concept: relation, basically a table with rows
and columns.
•Every relation has a schema, which describes the
columns, or fields.
Example: University Database

• Conceptual schema:
– Students(sid: string, name: string, age: integer, gpa:real)
– Courses(cid: string, cname:string, credits:integer)
– Enrolled(sid:string, cid:string, grade:string)
FOREIGN KEY sid REFERENCES Students
FOREIGN KEY cid REFERENCES Courses
• External Schema (View):
– Course_info(cid:string,enrollment:integer)
Create View Course_info AS
SELECT cid, Count (*) as enrollment
FROM Courses
GROUP BY cid
So, Don’t you need both?
Good Old Text Search

Database Query
Key concepts

▪ Data model (Set of records, XML, graph)

▪ Schema versus data (Types / Variables)

▪ Data definition language (DDL)

▪ Data manipulation or query language (DML)


Key people

▪ DBMS implementer

▪ Database designer

▪ Database application developer

▪ Database administrator
= Is a File System a DBMS?

• Thought Experiment 1:
– You and your project partner are editing the same file.
– You both save it at the same time.
– Whose changes survive?

A) Yours B) Partner’s C) Both D) Neither E) ???


•Thought Experiment 2: Q: How do you write
–You’re updating a file. programs over a
–The power goes out. subsystem when it
–Which of your changes survive?
promises you only “???” ?
A: Very, very carefully!!
A) All B) None C) All Since last save D) ???
OS Support for Data Management

• Data can be stored in RAM


– this is what every programming language offers!
– RAM is fast, and random access
– Isn’t this heaven?
• Every OS includes a File System
– manages files on a magnetic disk
– allows open, read, seek, close on a file
– allows protections to be set on a file
– drawbacks relative to RAM?
Database Management Systems
• What more could we want than a file system?
– Simple, efficient ad hoc1 queries
– concurrency control
– recovery
– benefits of good data modeling

• S.M.O.P.2? Not really…


– as we’ll see this semester
– in fact, the OS often gets in the way!

1adhoc: formed or used for specific or immediate problems or needs


2SMOP: Small Matter Of Programming
Drawbacks of using File systems for
data storage
• Data redundancy and inconsistenc
• Multiple file formats, duplication of information
in different file
• Difficulty in accessing data
• Need to write a new program to carry out each
new tas
• Data isolation — multiple files and format
• Integrity problem
• Integrity constraints (e.g., account balance >
0) become “buried” in program code rather
than being stated explicitl
• Hard to add new constraints or change existing
ones
k

Why take this class?


A. Database systems are the core of CS

• Shift from computation to information


– True in corporate computing for years
– Web, p2p made this clear for personal computing
– Increasingly true of scientific computing

• Need for DB technology has exploded in the last years


– Corporate: retail swipe/clickstreams, “customer relationship
mgmt”, “supply chain mgmt”, “data warehouses”, etc.
– Web:not just “documents”. Search engines, e-commerce,
blogs, wikis, other “web services”.
– Scientific: digital libraries, genomics, satellite imagery,
physical sensors, simulation data
– Personal: Music, photo, & video libraries. Email archives.
File contents (“desktop search”).
Why take this class?
B. DBs are incredibly important to society

• “Knowledge is power.” -- Sir


Francis Bacon

• “With great power comes


great responsibility.” --
SpiderMan’s Uncle Ben

Policy-makers should understand technological possibilities.


Informed Technologists needed in public discourse on usage.
Why take this class?
C. The topic is intellectually rich.

• representing information
– data modeling
• languages and systems for querying data
– complex queries & query semantics*
– over massive data sets
• concurrency control for data manipulation
– controlling concurrent access
– ensuring transactional semantics
• reliable data storage
– maintain data semantics even if you pull the plug
* semantics: the meaning or relationship of meanings of a sign or set of
signs
Syllabus
C. The topic is intellectually rich.

Course Outline (To be covered in 40 lectures)


Database system concept and architecture, Entity Relationship and
Enhanced E-R (5)
Relational Data Model and Relational Algebra, SQL, Indexing, Query
Optimization (10)
Relational Database Design, Normalization principles and normal forms (8)
Transaction concept and concurrency control (8)
Web Interface to DBMS, Semi-structured databases, Object oriented
databases (6)
DBMS Case studies (3)

Textbooks and other information
• Textbook
– “Database Management Systems”, Johannes Gehrke and Raghu
Ramakrishnan, 3rd Edition
• Other Text Books
– “Database System Concepts” , Korth, Silberschatz, and
Sudarshan
– “Fundamentals of Database Systems”, Elmasari and Nawathe
– “Database Systems: The Complete Book”, Garcia-Molina,
Ullman, & Widom
• Course Portal
– All materials slides/ Assignments and any other study material
shall be on class page on MS Teams portal.
– Follow the instructions and assignment schedules.
– Quizzes will be held and add to the TA component.

You might also like