Module 1
Module 1
CS 5200
1
Textbook
1. B1: Introduction to Database Systems by ITL Education
Solutions Limited, Pearson India , November 2008,
ISBN: 9788131731925
(https://books.google.com/books?id=y7P9sa2MeGIC&p
rintsec=frontcover#v=onepage&q&f=false)
2. B2: Database Systems: A practical approach to design,
implementation and management (6th edition),
Thomas Connolly and Carolyn Begg
2
Logistics
• NEU Github :
• Lecture code for samples and exercise labs on NEU github
https://github.khoury.northeastern.edu/jelee0408/CS5200_Lab
• SQL Workshops on NEU github
https://github.khoury.northeastern.edu/jelee0408/CS5200_SQL_Workshop
You can write a program on your own and can also program as part of a small
team (at least with one other programmer) for a term project
This course is going to show you what goes on “beneath the surface” of your
system when you create a project or a program that uses a database as a
back-end
Finally, you are going to learn/use the Structured Query Language (SQL) in
this course, then eventually prepare for SQL related tech interview
4
Grading for CS 5200
CS 5200 grades will consist of
5
Grading Policy for CS 5200
• Homework is due one week after the problem was assigned – at
11:59 PM PT on (the following) Sunday for section 10 and Wednesday
for section 4.
• 2 Late submissions (24hrs) can be turned in without penalty in a
semester.
• Anything apart from that must be approved in advance by the
instructor assigning the homework and will receive no more than 70%
credit.
• No homework will be accepted after a solution has been discussed or
shared
6
Introduction to Database Systems
7
Today’s Outline
• Introduction to Database Systems
• Database System Concepts and Architecture
• Exercise Lab 1: Install MySQL
• HW1: What is a database?
• SQL WS1: SELECT
8
Basic Definitions
• Database:
• An organized collection of logically related data.
• Data:
• Known facts that can be recorded and have an implicit meaning.
• Facts concerning objects and events that could be recorded and stored on
computer media.
• Database Management System (DBMS):
• A software to facilitate the creation and maintenance of a computerized
database.
• Database System:
• The DBMS software together with the data itself. Sometimes, the
applications are also included.
9
Examples of Database Applications from daily
life
• Purchases from the supermarket
• Purchases using your credit card
• Booking a holiday at the travel agents
• Using the local library
• Taking out insurance
• Renting a video
10
Types of Databases and Database Applications
• Traditional Applications:
• Numeric and Textual Databases
• More Recent Applications:
• Multimedia Databases
• Geographic Information Systems
• Biological and Genome Databases
• Data Warehouses
• Mobile databases
• Real-time and Active Databases
• First part of book focuses on traditional applications
11
Recent Developments
• Social Networks capture information about people and about
communications among people-posts, tweets, photos, videos in systems.
• Search Engines- Google, Bing, Yahoo : collect their own repository of web
pages for searching purposes
• New Technologies are emerging from the so-called non-database software
vendors to manage vast amounts of data generated on the web:
• Big Data storage systems involving large clusters of distributed computers.
• NOSQL (Not Only SQL) systems.
• A large amount of data now resides on the “cloud” which means it is in huge data
centers using thousands of machines.
12
Need for Databases
• File processing systems
• Meet the data processing needs of individual departments rather
than the overall information needs of the organization.
• Collection of application programs that perform services for the end
users (e.g. reports).
• Each program defines and manages its own data.
13
File processing systems
14
Limitations of File-Based Approach
• Separation and isolation of data
• Each program maintains its own set of data.
• Users of one program may be unaware of potentially useful data held by
other programs.
• Duplication of data
• Same data is held by different programs.
• Wasted space and potentially different values and/or different formats for the
same item.
15
Limitations of File-Based Approach
• Data dependence
• File structure is defined in the program code.
• Incompatible file formats
• Programs are written in different languages, and so cannot easily
access each other’s files.
• Fixed Queries/Proliferation of application programs
• Programs are written to satisfy particular functions.
• Any new requirement needs a new program.
16
Database
• Shared collection of logically related data (and a description of this
data), designed to meet the information needs of an organization.
17
Database Management System (DBMS)
• A software system that enables users to define, create, maintain, and
control access to the database.
• (Database) application program: a computer program that interacts
with database by issuing an appropriate request (SQL statement) to
the DBMS.
18
Database Management System (DBMS)
19
Data versus Information
• The terms data and information are closely related and in fact are
often used interchange- ably.
• Information: Data that have been processed in such a way as to
increase the knowledge of the person who uses the data.
20
Metadata
• Data that describe the properties or characteristics of end-user data
and the context of those data.
21
Database Approach
• Data definition language (DDL).
• Permits specification of data types, structures and any data
constraints.
• All specifications are stored in the database.
22
Database Approach
• Controlled access to a database may include:
• a security system
• an integrity system
• a concurrency control system a recovery control system
• a user-accessible catalog.
23
Application Activities Against a Database
• Queries: that access different parts of data and formulate the result
of a request
• Transactions: that may read some data and “update” certain values or
generate new data and store that in the database
24
Components of DBMS Environment
• Hardware
• Can range from a PC to a network of computers.
• Software
• DBMS, operating system, network software (if necessary) and also the application
programs.
• Data
• Used by the organization and a description of this data called the schema.
• Procedures
• Instructions and rules that should be applied to the design and use of the
database and DBMS.
• People
25
Example of a Database
• Mini-world for the example:
• Part of a UNIVERSITY environment.
• Some mini-world entities:
• STUDENTs
• COURSEs
• SECTIONs (of COURSEs)
• (academic) DEPARTMENTs
• INSTRUCTORs
26
Example of a Database
• Some mini-world relationships:
• SECTIONs are of specific COURSEs
• STUDENTs take SECTIONs
• COURSEs have prerequisite COURSEs
• INSTRUCTORs teach SECTIONs
• COURSEs are offered by DEPARTMENTs
• STUDENTs major in DEPARTMENTs
• Note: The above entities and relationships are typically expressed in a conceptual
data model, such as the ENTITY-RELATIONSHIP data model (next week)
27
Example of a simple database
28
Main Characteristics of the Database Approach
• Self-describing nature of a database system:
• Metadata
• Insulation between programs and data:
• Called program-data independence.
• Allows changing data structures and storage organization without
having to change the DBMS access programs.
-----------------------------------------------------------------------------
* Some newer systems such as a few NOSQL systems need no meta-data: they store the data definition within
its structure making it self describing
29
Main Characteristics of the Database Approach
• Data Abstraction:
• A data model is used to hide storage details and present the users
with a conceptual view of the database.
• Support of multiple views of the data:
• Each user may see a different view of the database, which
describes only the data of interest to that user.
30
Main Characteristics of the Database Approach
• Sharing of data and multi-user transaction processing:
• Allowing a set of concurrent users to retrieve from and to update the
database.
• Concurrency control within the DBMS guarantees that each transaction
is correctly executed or aborted.
• Recovery subsystem ensures each completed transaction has its effect
permanently recorded in the database. Its rebuilding of a database or
table space after a problem such as media or storage failure, power
interruption, or application failure.
• OLTP (Online Transaction Processing) is a major part of database
applications. This allows hundreds of concurrent transactions to
execute per second.
31
Database Users
• Users may be divided into
• Those who actually use and control the database content, and
those who design, develop and maintain database applications
(called “Actors on the Scene”), and
• Those who design and develop the DBMS software and related
tools, and the computer systems operators (called “Workers
Behind the Scene”).
32
Database Users – Actors on the Scene
• Actors on the scene
• Database administrators:
• Responsible for authorizing access to the database, for
coordinating and monitoring its use, acquiring software and
hardware resources, controlling its use and monitoring
efficiency of operations.
• Database Designers:
• Responsible to define the content, the structure, the
constraints, and functions or transactions against the database.
They must communicate with the end-users and understand
their needs.
33
Database End Users
• End-users: They use the data for queries, reports and some of them
update the database content. End-users can be categorized into:
• Casual: access database occasionally when needed
• Naïve or Parametric: they make up a large section of the end-
user population.
• Users of Mobile Apps mostly fall in this category
• Bank-tellers or reservation clerks are parametric users who
do this activity for an entire shift of operations.
34
Database Users – Actors on the Scene
• System Analysts:
• They understand the user requirements of naïve and
sophisticated users and design applications including canned
transactions to meet those requirements.
• Business Analysts:
• There is an increasing need for such people who can analyze vast
amounts of business data and real-time data (“Big Data”) for
better decision making related to planning, advertising,
marketing etc.
35
Database Users – Actors behind the Scene
• System Designers and Implementors:
• Design and implement DBMS packages in the form of modules and
interfaces and test and debug them. The DBMS must interface with
applications, language compilers, operating system components, etc.
• Tool Developers:
• Design and implement software systems called tools for modeling
and designing databases, performance monitoring, prototyping, test
data generation, user interface creation, simulation etc. that facilitate
building of applications and allow using database effectively.
• Operators and Maintenance Personnel:
• They manage the actual running and maintenance of the database
system hardware and software environment.
36
Advantages of Using the Database Approach
• Controlling redundancy in data storage and in development and
maintenance efforts.
• Sharing of data among multiple users.
• Restricting unauthorized access to data. Only the DBA staff uses
privileged commands and facilities.
• Providing Storage Structures (e.g. indexes) for efficient Query
Processing.
37
Advantages of Using the Database Approach
• Providing optimization of queries for efficient processing.
• Providing backup and recovery services.
• Providing multiple interfaces to different classes of users.
• Representing complex relationships among data.
• Enforcing integrity constraints on the database.
• Drawing inferences and actions from the stored data using deductive
and active rules and triggers.
38
Historical Development of Database
Technology
• Early Database Applications:
• The Hierarchical and Network Models were introduced in mid 1960s and dominated
during the seventies.
• A bulk of the worldwide database processing still occurs using these models,
particularly, the hierarchical model using IBM’s IMS system.
• Relational Model based Systems:
• Relational model was originally introduced in 1970, was heavily researched and
experimented within IBM Research and several universities.
• Relational DBMS Products emerged in the early 1980s.
39
Historical Development of Database
Technology
• Object-oriented and emerging applications:
• Object-Oriented Database Management Systems (OODBMSs) were introduced in late
1980s and early 1990s to cater to the need of complex data processing in CAD and
other applications.
• Their use has not taken off much.
• Many relational DBMSs have incorporated object database concepts, leading to a
new category called object-relational DBMSs (ORDBMSs)
• Extended relational systems add further capabilities (e.g. for multimedia data, text,
XML, and other data types)
40
Historical Development of Database
Technology
• Data on the Web and E-commerce Applications:
• Web contains data in HTML (Hypertext markup language) with links among
pages.
• This has given rise to a new set of applications and E-commerce is using new
standards like XML (eXtended Markup Language).
• Script programming languages such as PHP and JavaScript allow generation of
dynamic Web pages that are partially generated from a database
• Also allow database updates through Web pages
41
When not to use a DBMS
• Main costs of using a DBMS:
• High initial investment and possible need for additional hardware.
• Overhead for providing generality, security, concurrency control, recovery, and
integrity functions.
• When a DBMS may be unnecessary:
• If the database and applications are simple, well defined, and not expected to
change.
• If access to data by multiple users is not required.
• When a DBMS may be infeasible:
• In embedded systems where a general purpose DBMS may not fit in available storage
42
Data Models
• Data Model:
• A set of concepts to describe the structure of a database, the operations for
manipulating these structures, and certain constraints that the database should
obey.
• Data Model Structure and Constraints:
• Constructs are used to define the database structure
• Constructs typically include elements (and their data types) as well as groups of
elements (e.g. entity, record, table), and relationships among such groups
• Constraints specify some restrictions on valid data; these constraints must be
enforced at all times
43
Schemas versus Instances
• Database Schema:
• The description of a database.
• Includes descriptions of the database structure, data types, and the constraints on
the database.
• Schema Diagram:
• An illustrative display of (most aspects of) a database schema.
• Schema Construct:
• A component of the schema or an object within the schema, e.g., STUDENT, COURSE.
• Database State:
• The actual data stored in a database at a particular moment in time. This includes
the collection of all the data in the database.
• Also called database instance (or occurrence or snapshot).
• The term instance is also applied to individual database components, e.g. record instance,
table instance, entity instance
44
Database Schema vs. Database State
• Database State:
• Refers to the content of a database at a moment in time.
• Initial Database State:
• Refers to the database state when it is initially loaded into the system.
• Valid State:
• A state that satisfies the structure and constraints of the database.
• Distinction
• The database schema changes very infrequently.
• The database state changes every time the database is updated.
• Schema is also called intension.
• State is also called extension.
45
Example of a Database Schema
46
Example of a database state
47
Three-Schema Architecture
• ANSI-SPARC Three-Level Architecture
• Proposed to support DBMS characteristics of:
• Program-data independence.
• Support of multiple views of the data.
48
Three-Schema Architecture
• Defines DBMS schemas at three levels:
• Internal schema at the internal level to describe physical storage structures and
access paths (e.g indexes).
• an internal schema consists of two separate schemas: a logical schema and a physical
schema. The logical schema is the representation of data for a type of data management
technology (e.g., relational). The physical schema describes how data are to be
represented and stored in secondary storage using a particular DBMS (e.g., Oracle).
• Conceptual schema at the conceptual level to describe the structure and constraints
for the whole database for a community of users.
• This schema combines the different external views into a single, coherent, and
comprehensive definition of the enterprise’s data. The conceptual schema represents
the view of the data architect or data administrator.
• External schemas at the external level to describe the various user views.
• This is the view (or views) of managers and other employees who are the database users.
49
The three-schema architecture
50
The three-schema architecture
51
Three-Schema Architecture
• Mappings among schema levels are needed to transform requests
and data.
• Programs refer to an external schema and are mapped by the DBMS to the
internal schema for execution.
• Data extracted from the internal DBMS level is reformatted to match the
user’s external view (e.g. formatting the results of an SQL query for display in
a Web page)
52
Data Independence
• Logical Data Independence:
• The capacity to change the conceptual schema without having to change the
external schemas and their associated application programs.
• Physical Data Independence:
• The capacity to change the internal schema without having to change the
conceptual schema.
• For example, the internal schema may be changed when certain file
structures are reorganized or new indexes are created to improve database
performance
53
DBMS Languages
• Data Definition Language (DDL):
• Used by the DBA and database designers to specify the conceptual schema of
a database.
• In many DBMSs, the DDL is also used to define internal and external schemas
(views).
• In some DBMSs, separate storage definition language (SDL) and view
definition language (VDL) are used to define internal and external schemas.
• SDL is typically realized via DBMS commands provided to the DBA and database
designers
54
DBMS Languages
• Data Manipulation Language (DML):
• Used to specify database retrievals and updates
• DML commands (data sublanguage) can be embedded in a general-purpose
programming language (host language), such as COBOL, C,
C++, or Java.
• A library of functions can also be provided to access the DBMS from a programming
language
• Alternatively, stand-alone DML commands can be applied directly (called a
query language).
55
Typical DBMS Component Modules
56
Centralized and Client-Server DBMS Architectures
• Centralized DBMS:
• Combines everything into single system including- DBMS software, hardware,
application programs, and user interface processing software.
• User can still connect through a remote terminal – however, all processing is
done at centralized site.
57
A Physical Centralized Architecture
58
Basic 2-tier Client-Server Architectures
• Specialized Servers with Specialized functions
• Print server
• File server
• DBMS server
• Web server
• Email server
• Clients can access the specialized servers as needed
59
Logical two-tier client server architecture
60
Clients
• Provide appropriate interfaces through a client software module to
access and utilize the various server resources.
• Clients may be diskless machines or PCs or Workstations with disks
with only the client software installed.
• Connected to the servers via some form of a network.
• (LAN: local area network, wireless network, etc.)
61
DBMS Server
• Provides database query and transaction services to the clients
• Relational DBMS servers are often called SQL servers, query servers, or
transaction servers
• Applications running on clients utilize an Application Program Interface (API) to
access server databases via standard interface such as:
• ODBC: Open Database Connectivity standard
• JDBC: for Java programming access
62
Three Tier Client-Server Architecture
• Common for Web applications
• Intermediate Layer called Application Server or Web Server:
• Stores the web connectivity software and the business logic part of the application
used to access the corresponding data from the database server
• Acts like a conduit for sending partially processed data between the database server
and the client.
• Three-tier Architecture Can Enhance Security:
• Database server only accessible via middle tier
• Clients cannot directly access database server
• Clients contain user interfaces and Web browsers
• The client is typically a PC or a mobile device connected to the Web
63
Three-tier client-server architecture
64
Classification of DBMSs
• Based on the data model used
• Legacy: Network, Hierarchical.
• Currently Used: Relational, Object-oriented, Object-relational
• Recent Technologies: Key-value storage systems, NOSQL systems: document
based, column-based, graph-based and key-value based. Native XML DBMSs.
• Other classifications
• Single-user (typically used with personal computers)
vs. multi-user (most DBMSs).
• Centralized (uses a single computer with one database) vs. distributed
(multiple computers, multiple DBs)
65
Variations of Distributed DBMSs (DDBMSs)
• Homogeneous DDBMS
• Heterogeneous DDBMS
• Federated or Multidatabase Systems
• Participating Databases are loosely coupled with high degree of autonomy.
66
History of Data Models (Additional Material)
• Network Model
• Hierarchical Model
• Relational Model
• Object-oriented Data Models
• Object-Relational Models
67
Homework
• Exercise Lab 1
• Submit a couple of screenshots that show your connection
1. One is from command line “mysql”
2. Another is from MySQL Workbench
• HW 1
• What is a database?
• SQL Workshop 1: SELECT
• Submit WS1-3.sql
• Read
• [Text1]: Ch 1
• [Text2]: Ch 1, 2
CS 5200 Fall 2023 by Lee 68
Recourses for today’s lecture slides
• Database Systems, by Thomas Pearson Education © 2014
• Modern Database Management, 13th Edition by Jeff Hoffer
• Fundamentals of Database Systems, 7th Edition, by Elmasri
• Slides from Prof. Jeongkyu Lee, Northeastern university.
69
• Install or Sign-up one of data modeling tool
70
[Optional Assignments]
• WATCH: Database Basics
• https://northeastern.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=b276e48
9-2c6e-4a3f-a46a-abe700f259a4&start=0
• WATCH: Databases, Cloud Computing, and IoT
• https://northeastern.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=3015054
4-a427-4328-a324-abe900feb820&start=0
• WATCH: Multi-Tier Data Architectures
• https://northeastern.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=bafae0e5
-5216-4ab2-9f2d-abf7013aaf81&start=0
• WATCH: SQL, ERDs, UML, Oh My...
• https://northeastern.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=28ad6c7b
-2a66-494b-898e-ac52016700c4&start=0
• WATCH: Overview of Database Architectures
• https://northeastern.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=34a8002
3-027e-4436-a92b-abe701817dff&start=0
71
[Additional Resources]
• [Course] Programming Foundations: Databases by Scott Simpson on
LinkedIn Learning (with Free month or pay)
• https://www.linkedin.com/learning/programming-foundations-databases-2/
• [Wiki Article] Database
• https://en.wikipedia.org/wiki/Database
72