0% found this document useful (0 votes)
36 views43 pages

IT5351.L001 Intro To DBMS

The document discusses the introduction to databases including what data is, the early history of data management using files, the purpose of database systems, database concepts, and database management systems. It provides examples and discusses different topics at a high level.

Uploaded by

srik2k5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views43 pages

IT5351.L001 Intro To DBMS

The document discusses the introduction to databases including what data is, the early history of data management using files, the purpose of database systems, database concepts, and database management systems. It provides examples and discusses different topics at a high level.

Uploaded by

srik2k5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

IT5351.

Lecture 1
Introduction to Databases

Instructor : Dr. M. Deivamani


September 06, 2021

Reading
Abraham Silberschatz, Henry F. Korth, S. Sudharshan, “Database System Concepts”,
Sixth Edition, Tata McGraw Hill, 2014.
Sections to study

 Please go over the concepts in chapter 1


 1.1 Database-System Applications
 1.2 Purpose of Database Systems
 1.3 View of Data
 1.4 Database Languages
 1.5 Database Design

2
What is “Data”?

 ANSI definition of data:


 A representation of facts, concepts, or instructions in a formalized manner suitable for
communication, interpretation, or processing by humans or by automatic means.
 Any representation such as characters or analog quantities to which meaning is or might
be assigned. Generally, we perform operations on data or data items to supply some
information about an entity.
 Volatile vs persistent data
 Our concern is primarily with persistent data

3
Early Data Management – Ancient History

 Data are not stored on disk


 One data set per program. High data redundancy

4
File Processing – More Recent History

 Data are stored in files with interface between programs and files.
 Various access methods exist (e.g., sequential, indexed, random).
 One file corresponds to one or several programs.

5
Purpose of Database Systems

 As we discussed, in the early days, database applications were built directly on top of file
systems, which leads to:
 Data redundancy and inconsistency: data is stored in multiple file formats resulting induplication of
information in different files
 Difficulty in accessing data: Need to write a new program to carry out each new task
 Data isolation: Multiple files and formats
 Integrity problems:
 Integrity constraints (e.g., account balance > 0) become “buried" in program code rather than being stated
explicitly
 Hard to add new constraints or change existing ones

6
Purpose of Database Systems

 Atomicity problems:
 Failures may leave database in an inconsistent state with partial updates carried out

 Example: Transfer of funds from one account to another should either complete or not happen at all

 Concurrent-access anomalies:
 Concurrent access needed for performance

 Uncontrolled concurrent accesses can lead to inconsistencies


 Ex: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the same time

 Security problems:
 Hard to provide user access to some, but not all, data

7
Database systems offer solutions to all the above problems
Database

 What is a database?
 Organized collection of related data
 Database may be as simple as a text file or a CSV file or may be as complex as a large relational,
integrated collection of data.
 Examples of databases
 Bank account database; payroll database; AU student database; Amazon’s product database; Hotel
reservation database; your notes for this class
 Why do we need databases (in general)?
 Contain details about the organization or domain application
 Manage large amounts of data - deal with “big data"

8
Types of Databases

Operational Databases Analytical Databases

 Collect, modify, maintain data  Store and track historical and time-
dependent data
 Backbone of companies
 Asset for tracking trends, viewing statistical
 Store dynamic data (i.e., change constantly,
data over a long period, making strategic
reflect upto-the-minute info)
business projections

 Store static data (i.e., never or very rarely


change, reflect a point-in-time snapshot of
the data, not up to date)
9
Need for Data Management

 Describe real-world entities in terms of stored data


 Persist large datasets
 Efficiently query and update data
 Change structure of data stored (add, update, remove attributes)
 Simultaneously updates
 Recover from failures
 Ensure security and integrity

10
Think about the past

 Before DBMS, the typical file-processing systems were supported by


conventional operating systems. The system stored permanent records in
various files, and it needed different application programs to extract records
from, and add records to, the appropriate files.
 Data redundancy and inconsistency
 Difficulty in accessing data
 Data isolation
 Integrity problems
 Atomicity problems
 Concurrent-access anomalies
 Security problems
11
Database Applications

 Database Applications:
 Banking: transactions
 Airlines: reservations, schedules
 Universities: registration, grades
 Sales: customers, products, purchases
 Online retailers: order tracking, customized recommendations
 Manufacturing: production, inventory, orders, supply chain
 Human resources: employee records, salaries, tax deductions
 …

 Databases can be very large


 Databases touch all aspects of our lives

12
University Database Example

 Application program examples


 Add new students, instructors, and courses
 Register students for courses, and generate class rosters
 Assign grades to students, compute grade point averages (GPA) and generate transcripts

 In the early days, database applications were built directly on top of file
systems

13
Database Approach

14
What’s a Database Management System?

 A database management system (DBMS) is a collection of programs to maintain a


database, that is, for

 Definition of data and structure

 Physical construction

 Manipulation

 Sharing/Protecting

 Persistence/Recovery

15
Database Management Systems

 Software to create, manage, maintain, persist databases over long periods of


time

 Provide concurrency, security, data integrity and uniform administration


procedures.
 Examples:
 MySQL, SQLite, MongoDB, PostgreSQL, Oracle, DB2, MS-SQL, Derby
16
DBMS Properties

 Queryable: Provide a way to ask your DB questions and retrieve data


 Durable: Ensure the safety of information stored (data persists)
 In-memory DB trade durability for speed?
 Have schema: Define structure for storage of information
 What about Semi-structured DB?
 No redundancy: Reduce space
 Indexes trade space for speed?
 Optimizes queries: Make query run faster
 What about complex queries? NoSQL DB has a “WYSIWYG" flavor
 Handle concurrent transactions: Manage database engine
 Turn off serialization for speed?
17

Difficult to achieve all - balance and tradeoff


History

“Data matures like wine, applications like fish" - Andy Todd.


 1950s: Storage on magnetic tapes
 Early 1960s: Hierarchical database systems
 Late 1960s: Network database systems
 1970s: Relational DBMS
 End of 1970s: SQL
 1980s: Object-oriented DBMS
 1990s: Parallel and distributed DBMS
 Early 2000s: XML, XQuery
 Late 2000s: Google BigTable, Yahoo PNuts
 2010s: NoSQL 18
View of Data

 A database system is a collection of interrelated data and a set of programs that allow users
to access and modify these data.

 A major purpose of a database system is to provide users with an abstract view of the data.

 Data models: A collection of conceptual tools for describing data, data relationships, data semantics,
and consistency constraints.

 Data Abstraction: Hide the complexity of data structures to represent data in the database from
users through several levels of data abstraction.

19
Levels of Abstraction

20
Schemas and Instances

 Similar to types and variables in programming languages


 Schema
 Logical Schema – the overall logical structure of the database
 Example: The database consists of information about a set of customers and accounts in a bank and the
relationship between them
 Customer Schema

Name Customer ID Account # Aadhaar # Mobile #


 Account Schema
Account # Account Type Interest Rate Min. Bal. Balance
 Physical schema – the overall physical structure of the database

21
Schemas and Instances

 Instance
 The actual content of the database at a particular point in time
 Analogous to the value of a variable
 Customer Instance
Name Customer ID Account # Aadhaar # Mobile #
Dinesh 6728 917322 182719289372 9830100291
Kavitha 8912 827183 918291204829 7189203928
Chandra Sekar 6617 372912 127837291021 8892021892
 Account Instance
Account # Account Type Interest Rate Min. Bal. Balance
917322 Savings 4.0% 5000 7812
372912 Current 0.0% 0 291820 22

827183 Term Deposit 6.75% 10000 100000


Schemas and Instances

 Data independence - upper levels are unaffected by changes to lower levels

 Logical data independence – the ability to modify the logical schema without changing the
external models
 Physical Data Independence – the ability to modify the physical schema without changing
the logical schema
 Analogous to independence of ‘Interface’ and ‘Implementation’ in Object-Oriented Systems

 Applications depend on the logical schema

 In general, the interfaces between the various levels and components should be well defined so that
changes in some parts do not seriously influence others.

23
Data Modelling and Data Models

 Data modelling: Iterative and progressive process of creating a specific data model for a
determined problem domain
 Data models: Simple representations of complex real-world data structures
 Useful for supporting a specific problem domain

 Model: Abstraction of a real-world object or event

24
Importance of Data models

 Data models are a communication tool


 Data model given an overall view of a database
 Data model organize data for various users
 Data model are an abstraction for the creation of good database

25
Data Models

 A collection of tools for describing Three parts of a data model


 Data • Structure - the definition of relations and contents
 Data relationships • Integrity - constraints on the database’s contents
 Data semantics
• Manipulation - actions that can be done on the database’s
 Data constraints
contents
 Relational model (we discuss later)
 Entity-Relationship data model (mainly for database design)
 Object-based data models (Object-oriented and Object-relational)
 Semi-structured data model (XML)
 Other older models:
 Network model
 Hierarchical model 26
Evolution of Data Models

27
Hierarchical Model

28
Network Model

29
Hierarchical and Network Models

Hierarchical Models Network Models


 Manage large amounts of data for complex  Represent complex data relationships
manufacturing projects
 Improve database performance and
 Represented by an upside-down tree which impose a database standard
contains segments
 Depicts both one-to-many (1:M) and many-
 Segments: Equivalent of a file system’s to-many (M:N) relationships
record type
 Depicts a set of one-to-many (1:M)
relationships

30
Relational Model

 All the data is stored in various tables


 Example of tabular data in the relational model
attributes
(or columns)

tuples
(or rows)

31
A Sample Relational Database

32
Relational Model

Advantages Disadvantages
 Structural independence is promoted using  Requires substantial hardware and system
independent tables software overhead
 Tabular view improves conceptual  Conceptual simplicity gives untrained
simplicity people the tools to use a good system
poorly
 Ad hoc query capability is based on SQL
 May promote information problems
 Isolates the end user from physical-level
details
 Improves implementation and management
simplicity

33
Interfacing to the DBMS

 Data Definition Language (DDL): for specifying schemas


 may have different DDLs for external schema, conceptual schema, internal schema
 information is stored in the data dictionary, or catalog

 Data dictionary contains metadata (i.e., data about data)


 Database schema Example:
 Integrity constraints create table instructor (
ID char(5),
 Primary key (ID uniquely identifies instructors) name varchar(20),
 Authorization dept_name varchar(20),
salary numeric(8,2))
 Who can access what

34
Interfacing to the DBMS

 Data Manipulation Language (DML): for accessing and manipulating the data organized by
the appropriate data model
 DML also known as query language

 Two classes of languages


 Pure – used for proving properties about computational power and for optimization
 Relational Algebra
 Tuple relational calculus
 Domain relational calculus

 Commercial – used in commercial systems


 SQL is the most widely used commercial language

35
Database Design

 The process of designing the general structure of the database:


 Logical Design – Deciding on the database schema. Database design requires that we find a
“good” collection of relation schemas.
 Business decision – What attributes should we record in the database?
 Computer Science decision – What relation schemas should we have and how should the attributes
be distributed among the various relation schemas?
 Physical Design – Deciding on the physical layout of the database

36
Database Design (Cont.)

 Is there any problem with this relation?

37
Design Approaches

 Need to come up with a methodology to ensure that each of the relations in the database is
“good”
 Two ways of doing so:
 Entity Relationship Model (will discuss later)
 Models an enterprise as a collection of entities and relationships
 Represented diagrammatically by an entity-relationship diagram:

 Normalization Theory (will discuss later)


 Formalize what designs are bad, and test for them

38
Object-Relational Data Models

 Relational model: flat, “atomic” values


 Object Relational Data Models
 Extend the relational data model by including object orientation and constructs to deal with added
data types.
 Allow attributes of tuples to have complex types, including non-atomic values such as nested
relations.
 Preserve relational foundations, in particular the declarative access to data, while extending
modeling power.
 Provide upward compatibility with existing relational languages.

39
XML: Extensible Markup Language

 Defined by the WWW Consortium (W3C)


 Originally intended as a document markup language not a database language
 The ability to specify new tags, and to create nested tag structures made XML a great way to
exchange data, not just documents
 XML has become the basis for all new generation data interchange formats.
 A wide variety of tools is available for parsing, browsing and querying XML documents/data

40
Summary

 Data: A representation of facts, concepts, or instructions in a formalized manner suitable for


communication, interpretation, or processing by humans or by automatic means.
 Volatile vs persistent data
 A database is large and persistent collection of (more or less similar) pieces of information
organized in a way that facilitates efficient retrieval and modification.
 Structure of the database is determined by the abstract data model.
 A database-management system (DBMS) consists of a collection of interrelated data and a
collection of programs to access those data. The data describe one enterprise.
 The primary goal of a DBMS is to provide an environment that is both convenient and
efficient for people to use in retrieving and storing information.

41
Summary

 A major purpose of a database system is to provide users with an abstract view of the data.
That is, the system hides certain details of how the data are stored and maintained.
 A schema is a description of the data interface to the database (i.e., how the data is
organized). A schema can have many instances
 A database instance is a database (real data) that conforms to a given schema.
 Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and data constraints.
 The relational data model is the most widely deployed model for storing data in databases.
 A data-manipulation language (DML) is a language that enables users to access or
manipulate data. Nonprocedural DMLs, which require a user to specify only what data are
needed, without specifying exactly how to get those data, are widely used today

42
Summary

 A data-definition language (DDL) is a language for specifying the database schema and other
properties of the data.
 The entity relationship (E-R) data model is a widely used model for database design. It provides a
convenient graphical representation to view data, relationships, and constraints.

43

You might also like