0% found this document useful (0 votes)
39 views

Introduction To Databases Part 1

Uploaded by

stefanrowlings
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Introduction To Databases Part 1

Uploaded by

stefanrowlings
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

1

令和6年4月14日

DATABASE
MANAGEMENT
SYSTEMS
(DBMS)
2
令和6年4月14日

Recommended Course Material


• Recommended textbooks:
• ‘Database Systems: A practical approach to design,
implementation and management by Connolly and
Begg
• `Fundamentals of Database Systems: 2nd Edition. ’ by
Ramez Elmasri, Shamkant B. Navathe.
3
令和6年4月14日

Class Outline

Introduction to Database Concepts

Types of Databases

DBMS Architecture

Data Models

Data Model schema


4

INTRODUCTION TO DATABASE
CONCEPTS
5
Databases and database technology are having a major impact
on the growing use of computers.
• Library catalogues • Train timetables
• Medical records • Airline bookings
• Bank accounts • Credit card details
• Stock control • Student records
• Personnel systems • Customer histories
• Product catalogues • Stock market prices
• Telephone directories • Discussion boards
• Web indexes
6

• The word database is in such common use that we must


begin by defining what a database is.
Database: (DB)
• This is a shared collection of logically related data and a
description of the data designed to meet the
information needs of an organization.
7

• It is a single repository of data that can be used


simultaneously by many departments and users.
• All the data items are integrated with a minimum amount of
duplication.
• Since a database holds not only the organization’s
operational data but also a description of this data hence
database can be defined as a self-describing collection of
integrated records.
• The description of the data is known as the system catalog
(data dictionary or meta-data- data about data).
8
• A database has the following implicit properties:
i. A database represents some aspect of the real world,
sometimes called the mini-world or the universe of
discourse. Changes to the mini-world are reflected in the
database.
ii. A database is a logically coherent collection of data with
some inherent meaning. A random assortment of data
cannot correctly be referred to as a database.
iii. A database is designed, built, and populated with data for a
specific purpose. It has an intended group of users and
some preconceived applications in which these users are
interested.
9
令和6年4月14日

Class Outline

Introduction

Types of Databases

DBMS Architecture

Data Models

Data Model schema


10
11

Types of Databases: Centralized Database


It is the type of database that stores data in a centralized database
system.
• It allows the users to access the stored data from different
locations through several applications.
• These applications contain the authentication process to let
users access data securely.
• An example of a Centralized database can be a Central Library
that carries a central database of each library in a
college/university.
12

Types of Databases: Centralized Database- Advantages


1. It has decreased risk of data management, i.e., manipulation
of data will not affect the core data.
2. Data consistency is maintained as it manages data in a
central repository.
3. It provides better data quality, which enables organizations
to establish data standards.
4. It is less costly because fewer vendors are required to handle
the data sets.
13

Types of Databases: Centralized Database- Disadvantages


1. The size of the centralized database is large, which
increases the response time for fetching the data.
2. It is not easy to update such an extensive database
system.
3. If any server failure occurs, entire data will be lost,
which could be a huge loss.
14

Types of Databases: Distributed Database


• Data is distributed among different database systems of
an organization. These database systems are connected
via communication links. Such links help the end-users
to access the data easily.
• Examples of the Distributed database are Apache
Cassandra, HBase, Ignite, etc.
15
Types of Databases: Distributed Database
• We can further divide a distributed database system into:
Homogeneous DDB: Those database systems
which execute on the same operating system and
use the same application process and carry the
same hardware devices.

Heterogeneous DDB: Those database


systems which execute on different
operating systems under different
application procedures, and carries
different hardware devices.
16

Types of Databases: Relational Database-


• This database is based on the relational data model, which
stores data in the form of rows(tuple) and
columns(attributes), and together forms a table(relation).
• A relational database uses SQL for storing, manipulating, as
well as maintaining the data. E.F. Codd invented the database
in 1970. Each table in the database carries a key that makes
the data unique from others.
• Examples of Relational databases are MySQL, Microsoft SQL
Server, Oracle, etc.
Relational Databases

Representatives
18

ACID Properties of Relational Databases


• A means Atomicity: This ensures the data operation will be
complete either with success or with failure. It follows the
'all or nothing' strategy. For example, a transaction will
either be committed or will abort.
• C means Consistency: If we perform any operation over the
data, its value before and after the operation should be
preserved. For example, the account balance before and after
the transaction should be correct, i.e., it should remain
conserved.
19

ACID Properties of Relational Databases


• I means Isolation: There can be concurrent users accessing
data at the same time from the database. Thus, isolation
between the data should remain isolated. For example, when
multiple transactions occur at the same time, one
transaction effects should not be visible to the other
transactions in the database.
• D means Durability: It ensures that once it completes the
operation and commits the data, data changes should remain
permanent.
20

Types of Databases: NoSQL Database


• Non-SQL/Not Only SQL is a type of database that is used for
storing a wide range of data sets.
• It is not a relational database as it stores data not only in
tabular form but in several different ways.
• It came into existence when the demand for building modern
applications increased.
• Thus, NoSQL presented a wide variety of database
technologies in response to the demands.
Current Trends

Big Data
• Volume: terabytes → zettabytes
• Variety: structured → structured and unstructured
data
• Velocity: batch processing → streaming data
• …
Big users
• Population online, hours spent online, devices online,

• Rapidly growing companies / web applications
Even millions of users within a few months
Current Trends
Everything is in the cloud
• SaaS: Software as a Service
• PaaS: Platform as a Service
• IaaS: Infrastructure as a Service
Processing paradigms
• OLTP: Online Transaction Processing
• OLAP: Online Analytical Processing
• …but also…
• RTAP: Real-Time Analytic Processing
Current Trends

Data assumptions
• Data format is becoming unknown or inconsistent
• Data updates are no longer frequent
• Data is expected to be replaced
• Linear growth → unpredictable exponential
growth
• Strong consistency is no longer mission-critical
• Read requests often prevail write requests
Current Trends

⇒ A new approach is required


• Relational databases simply do not follow the current
trends
• Key technologies
• Distributed file systems
• NoSQL databases
• MapReduce and other programming models
• Data warehouses
• Grid computing, cloud computing
• Large-scale machine learning
NoSQL Databases

What does NoSQL actually mean?


A bit of history …
• 1998
First used for a relational database that omitted
usage of SQL
• 2009
First used during a conference to advocate non-
relational databases
So?
• NoSQL is an accidental term with no precise definition
20
26

• NoSQL stands for:


1. No Relational

2. No RDBMS
3. Not Only SQL

• NoSQL is an umbrella term for all databases and data stores that
don’t follow the RDBMS principles
• A class of products
• A collection of several (related) concepts about data storage and
manipulation
• Often related to large data sets
NoSQL Databases
What does NoSQL actually mean?
NoSQL movement = The whole point of seeking
alternatives
is that you need to solve a problem that relational
databases are a bad fit for.
NoSQL databases = Next-generation databases mostly
addressing: being non-relational, distributed, open-
source and horizontally scalable. The original intention
has been modern web-scale databases. Often more
characteristics apply as: schema-free, easy replication
support, simple API, eventually consistent, a huge data
amount, and more.
28
Types of Databases: NoSQL Database
NoSQL represents a new incarnation
• Due to massively scalable Internet applications
• Based on distributed and parallel computing
• Development
• Started with Google
• First research paper published in 2003
• Thanks to Lucene's developers/Apache (Hadoop) and Amazon
(Dynamo) :NoSQL - Market Share, Competitor Insights in NoSQL Databases (6sense.com)
• Then a lot of products and interests came from Facebook, Netfix,
Yahoo, eBay, Hulu, IBM, and many more
29

Types of Databases: NoSQL Database


• NoSQL comes from the Internet, thus it is often related
to the “big data” concept
• How much big are “big data”?
• Over a few terabytes
• Enough to start spanning multiple storage units
NoSQL - Market Share, Competitor Insights in NoSQL Databases
(6sense.com)
30
What is Big Data?

Buzzword?
Bubble?
Gold rush?
Revolution?

Dan Ariely:
Big Data is like teenage sex: everyone talks about it, nobody really
knows how to do it, and everyone thinks everyone else is doing it, so
everyone claims they are doing it.
Where is Big Data?
Sources of Big Data
• Social media and networks
…all of us are generating data
• Scientific instruments
…collecting all sorts of data
• Mobile devices
…tracking all objects all the time
• Sensor technology and networks
…measuring all kinds of data

31
Big Data Characteristics

Volume
(Scale)

Source: http://www.ibmbigdatahub.com/

32
Big Data Characteristics

Variety
(Complexity)

Source: http://www.ibmbigdatahub.com/

33
Big Data Characteristics

Velocity
(Speed)

Source: http://www.ibmbigdatahub.com/

34
Big Data Characteristics

Veracity
(Uncertainty)

Source: http://www.ibmbigdatahub.com/

35
Big Data Characteristics
Basic 4V

• Volume(Scale)
Data volume is increasing exponentially, not linearly
Even large amounts of small data can result in Big Data
• Variety(Complexity)
Various formats, types, and structures
(from semi-structured XML to unstructured multimedia)
• Velocity(Speed)
Data is being generated fast and needs to be processed fast
• Veracity (Uncertainty)
Uncertainty due to inconsistency, incompleteness, latency,
ambiguities, or approximations

10
Big Data Characteristics
Additional V

• Value
Business value of the data (needs to be
revealed)
• Validity 137

Data correctness and accuracy with respect to


the intended use
• Volatility
Period of time the data is valid and should be
maintained
38

Types of Databases: NoSQL Database


• How did we get here?
1. Explosion of social media sites (Facebook, Twitter) with
large data needs
2. Rise of cloud-based solutions such as Amazon S3 (simple
storage solution)
3. Just as moving to dynamically-typed languages (Python,
Ruby, Groovy), a shift to dynamically-typed data with
frequent schema changes
4. Open-source community
39

Types of Databases: NoSQL Database


Why RDBMS are not suitable for large data
The context is the Internet
• RDBMSs assume that data are
• Dense
• Largely uniform (structured data)
• Data coming from the Internet is
• Massive and sparse
• Semi-structured or unstructured
• With massive sparse data sets, the typical storage mechanisms
and access methods get stretched
40

Types of Databases: NoSQL Database


NoSQL Charactersistics
1. Large data volumes
• Google’s “big data”
2. Scalable replication and distribution
• Potentially thousands of machines
• Potentially distributed around the world
3. Queries need to return answers quickly
• Mostly query, few updates
41

Types of Databases: NoSQL Database


NoSQL Charactersistics
4. Asynchronous Inserts & Updates
• Schema-less
5. ACID transaction properties are not needed –
BASE (Basically Available, Soft State, Eventually Consistent)
6. Open source development
42

Types of Databases: NoSQL Database


• We can further divide a NoSQL database into the following
four types:
1. Key-Value Stores
Data model
• The most simple NoSQL database type
Works as a simple hash table (mapping)
• Key-value pairs
Key (id, identifier, primary key)
Value: binary object, black box for the database system
Query patterns
• Create, update or remove value for a given key
• Get value for a given key
Characteristics
• Simple model ⇒ great performance, easily scaled, …
• Simple model ⇒ not for complex queries nor complex data
1. Key-Value Stores

Suitable use cases


• Sessiondata, user profiles, user preferences, shopping carts, …
I.e. when values are only accessed via keys
When not to use
• Relationshipsamong entities
• Queries requiring access to the content of the value part
• Set operations involving multiple key-value pairs
Examples
• Redis, MemcachedDB, Riak KV, Hazelcast, Ehcache, Amazon
SimpleDB, Berkeley DB, Oracle NoSQL, Infinispan, LevelDB,
Ignite, Project Voldemort, OrientDB, ArangoDB
1. Key-Value Stores

Representatives

145
2. Document Stores

Data model
• Documents
Self-describing
Hierarchical tree structures (JSON, XML, …)
– Scalar (single) values, maps, lists, sets, nested
documents, …
Identified by a unique identifier (key, …)
• Documents are organized into collections
Query patterns
• Create, update or remove a document
• Retrieve documents according to complex query conditions
2. Document Stores

Suitable use cases


• Event logging, content management systems, blogs,
web analytics, e-commerce applications, …
I.e. for structured documents with similar schema
When not to use
• Set operations involving multiple documents
• The design of document structure is constantly
changing
2. Document Stores

Examples
3. Wide Column Stores
Data model
• Columnfamily (table)
The table is a collection of similar rows (not
necessarily identical)
• Row
Row is a collection of columns. Should encompass a
group of data that is accessed together. Associated
with a unique row key
• Column
A column consists of a columnname and column value
(and possibly other metadata records)

30
3. Wide Column Stores

Query patterns
• Create, update or remove a row within a given
column family
• Select rows according to a row key or simple
conditions
Warning
• Wide column stores are not just a special kind
of RDBMSs with a variable set of columns!
3. Wide Column Stores

Suitable use cases


• Event logging, content management systems, blogs, …
I.e. for structured flat data with similar schema
When not to use
• ACID transactions are required
• Complex queries: aggregation(SUM, AVG, …), joining,

• Early prototypes: i.e. when database design may
change
Examples
• Apache Cassandra, Apache HBase, Apache
Accumulo, Hypertable, Google Bigtable
3. Wide Column Stores

Examples
4. Graph Databases

Data model
• Property graphs
Directed / undirected graphs, i.e. collections of …
– nodes (vertices) for real-world entities, and
– relationships (edges) between these nodes
Both the nodes and relationships can be
associated with additional properties
Types of databases
• Non-transactional = small number of very large
graphs
• Transactional = large number of small graphs
4. Graph Databases
Query patterns
• Create, update, or remove a node / relationship in a
graph
• Graph algorithms (shortest paths, spanning trees, …)
• General graph traversals
• Sub-graph queries or super-graph queries
• Similarity-based queries (approximate
matching)
• Examples
• Neo4j, Titan, Apache Giraph, InfiniteGraph, FlockDB,
OrientDB, OpenLink Virtuoso, ArangoDB
4. Graph Databases
Suitable use cases
• Social networks, routing, dispatch, and location-based
services, recommendation engines, chemical
compounds, biological pathways, linguistic trees, …
I.e. simply for graph structures
When not to use
• Extensive batch operations are required
Multiple nodes/relationships are to be affected
4. Graph Databases

Examples
57
Types of Databases: NoSQL Database

NoSQL Databases

Document Graph Key-Value Columnar


Stores Databases Stores Databases

Source: http://nosql-database.org/
58
Types of Databases: NoSQL Database
BASE Transactions
• Acronym contrived to be the opposite of ACID
• Basically Available,
• Soft state,
• Eventually Consistent
• Characteristics
• Weak consistency – stale data OK
• Availability first
• Best effort
• Approximate answers OK
• Aggressive (optimistic)
• Simpler and faster
Features of NoSQL Databases

1. Data model
• Traditional approach: relational model
• (New) possibilities:
• Key-value, document, wide column, graph
• Goal
Respect the real-world nature of data
(i.e. data structure and mutual relationships)
Features of NoSQL Databases
2. Aggregate structure
• Aggregate definition
Data unit with a complex structure
Collection of related data pieces we wish to treat as
a unit
• Examples
Value part of key-value pairs in key-value stores
Document in document stores
Row of a column family in wide column stores
Features of NoSQL Databases

3. Elastic scaling
• Traditional approach: scaling-up
Buying bigger servers as database load increases
• New approach: scaling-out
Distributing database data across multiple hosts
4. Data distribution
• Sharding
Particular ways which database data is split into separate
groups
• Replication
Maintaining several data copies (performance, recovery)
Features of NoSQL Databases
5. Automatedprocesses
• Traditional approach
Expensive and highly trained database administrators
• New approach: automatic recovery, distribution, tuning, …
6. Relaxed consistency
• Traditional approach
Strong consistency (ACID properties and transactions)
• New approach
Eventual consistency only (BASE properties)
I.e. we have to make trade-offs because of the data
distribution
Features of NoSQL Databases

7. Schemalessness
• Relational databases
Database schema present and strictly enforced
• NoSQL databases
Relaxed schema or completely missing
Consequences: higher flexibility
– Dealing with non-uniform data
– Structural changes cause no overhead
However: there is (usually) an implicit schema
– We must know the data structure at the application
level anyway
Features of NoSQL Databases

8. Open source
• Often community and enterprise versions
(with extended features or extent of support)
9. Simple APIs
• Often state-less application interfaces (HTTP)
Features of NoSQL Databases
Advantages
• Scaling
Horizontal distribution of data among hosts
• Volume
High volumes of data that cannot be handled by RDBMS
• Administrators
No longer needed because of the automated maintenance
• Economics
Usage of cheap commodity servers, lower overall costs
• Flexibility
Relaxed or missing data schema, easier design changes
Features of NoSQL Databases
Challenges /Disadvantages
• Maturity
• Often still in pre-production phase with key features missing
• Support
• Mostly open source, limited sources of credibility
• Administration
• Sometimes relatively difficultto install and maintain
• Analytics
• Missing support for business intelligence and ad-hoc
querying
• Expertise
• Still low number of NoSQL experts available in the market

50
Conclusion

The end of relational databases?


• Certainly no
They are still suitable for most projects
Familiarity, stability, feature set, available
support, …
• However, we should also consider different
database models and systems
Polyglot persistence = usage of different
data stores in different circumstances

51
68
Types of Databases: Cloud Database
• A type of database where data is stored in a virtual
environment and executes over the cloud computing
platform.
• It provides users with various cloud computing services
(SaaS, PaaS, IaaS, BPaaS, etc.) for accessing the database.
• There are numerous cloud platforms, but the best options
are:
69
Types of Databases: Cloud Database
• A type of database where data is stored in a virtual
environment and executes over the cloud computing
platform.
• It provides users with various cloud computing services
(SaaS, PaaS, IaaS, BaaS, etc.) for accessing the database.
70
Types of Databases: Cloud Database
• There are numerous cloud platforms, but the best options are:
1. Amazon Web Services(AWS)
2. Microsoft Azure
3. Kamatera
4. PhonixNAP
5. ScienceSoft
6. Google Cloud SQL, etc.

Public Cloud Services Comparison | A simple cloud


comparison chart of all the cloud services offered by the
major public cloud vendors globally. (comparecloud.in)
71

Types of Databases: Object Oriented Databases


• The type of database that uses the object-based data
model approach for storing data in the database
system.
• The data is represented and stored as objects which
are similar to the objects used in the object-oriented
programming language.
72

Types of Databases: Hierarchical Databases


• It is the type of database that stores data in the form of
parent-children relationship nodes.
• Here, it organizes data in a tree-like structure.
• Data gets stored in the form of records that are connected via
links.
• Each child record in the tree will contain only one parent.
• On the other hand, each parent record can have multiple child
records.
73
Types of Databases: Hierarchical Databases
74

Types of Databases: Network Databases


• It is the database that typically follows the network data
model.
• Here, the representation of data is in the form of nodes
connected via links between them.
• Unlike the hierarchical database, it allows each record to
have multiple children and parent nodes to form a
generalized graph structure.
75
Types of Databases: Personal Databases
• Collecting and storing data on the user's system
defines a Personal Database.
• This database is basically designed for a single
user.
• Advantage of Personal Database
• It is simple and easy to handle.
• It occupies less storage space as it is small in size.
76

Types of Databases: Operational Databases


• The type of database which creates and updates
the database in real-time.
• It is basically designed for executing and handling
the daily data operations in several businesses.
• For example, An organization uses operational
databases for managing per-day transactions.
77
• Types of Databases: Enterprise Databases

• Large organizations or enterprises use this database for


managing a massive amount of data. It helps
organizations to increase and improve their efficiency.
Such a database allows simultaneous access to users.
• Advantages of Enterprise Database:
• Multi processes are supportable over the Enterprise
database.
• It allows executing parallel queries on the system.
78
令和6年4月14日

Class Outline

Introduction

Types of Databases

DBMS Architecture

Data Models

Data Model schema

You might also like