Introduction To Databases Part 1
Introduction To Databases Part 1
令和6年4月14日
DATABASE
MANAGEMENT
SYSTEMS
(DBMS)
2
令和6年4月14日
Class Outline
Types of Databases
DBMS Architecture
Data Models
INTRODUCTION TO DATABASE
CONCEPTS
5
Databases and database technology are having a major impact
on the growing use of computers.
• Library catalogues • Train timetables
• Medical records • Airline bookings
• Bank accounts • Credit card details
• Stock control • Student records
• Personnel systems • Customer histories
• Product catalogues • Stock market prices
• Telephone directories • Discussion boards
• Web indexes
6
Class Outline
Introduction
Types of Databases
DBMS Architecture
Data Models
Representatives
18
Big Data
• Volume: terabytes → zettabytes
• Variety: structured → structured and unstructured
data
• Velocity: batch processing → streaming data
• …
Big users
• Population online, hours spent online, devices online,
…
• Rapidly growing companies / web applications
Even millions of users within a few months
Current Trends
Everything is in the cloud
• SaaS: Software as a Service
• PaaS: Platform as a Service
• IaaS: Infrastructure as a Service
Processing paradigms
• OLTP: Online Transaction Processing
• OLAP: Online Analytical Processing
• …but also…
• RTAP: Real-Time Analytic Processing
Current Trends
Data assumptions
• Data format is becoming unknown or inconsistent
• Data updates are no longer frequent
• Data is expected to be replaced
• Linear growth → unpredictable exponential
growth
• Strong consistency is no longer mission-critical
• Read requests often prevail write requests
Current Trends
2. No RDBMS
3. Not Only SQL
• NoSQL is an umbrella term for all databases and data stores that
don’t follow the RDBMS principles
• A class of products
• A collection of several (related) concepts about data storage and
manipulation
• Often related to large data sets
NoSQL Databases
What does NoSQL actually mean?
NoSQL movement = The whole point of seeking
alternatives
is that you need to solve a problem that relational
databases are a bad fit for.
NoSQL databases = Next-generation databases mostly
addressing: being non-relational, distributed, open-
source and horizontally scalable. The original intention
has been modern web-scale databases. Often more
characteristics apply as: schema-free, easy replication
support, simple API, eventually consistent, a huge data
amount, and more.
28
Types of Databases: NoSQL Database
NoSQL represents a new incarnation
• Due to massively scalable Internet applications
• Based on distributed and parallel computing
• Development
• Started with Google
• First research paper published in 2003
• Thanks to Lucene's developers/Apache (Hadoop) and Amazon
(Dynamo) :NoSQL - Market Share, Competitor Insights in NoSQL Databases (6sense.com)
• Then a lot of products and interests came from Facebook, Netfix,
Yahoo, eBay, Hulu, IBM, and many more
29
Buzzword?
Bubble?
Gold rush?
Revolution?
Dan Ariely:
Big Data is like teenage sex: everyone talks about it, nobody really
knows how to do it, and everyone thinks everyone else is doing it, so
everyone claims they are doing it.
Where is Big Data?
Sources of Big Data
• Social media and networks
…all of us are generating data
• Scientific instruments
…collecting all sorts of data
• Mobile devices
…tracking all objects all the time
• Sensor technology and networks
…measuring all kinds of data
31
Big Data Characteristics
Volume
(Scale)
Source: http://www.ibmbigdatahub.com/
32
Big Data Characteristics
Variety
(Complexity)
Source: http://www.ibmbigdatahub.com/
33
Big Data Characteristics
Velocity
(Speed)
Source: http://www.ibmbigdatahub.com/
34
Big Data Characteristics
Veracity
(Uncertainty)
Source: http://www.ibmbigdatahub.com/
35
Big Data Characteristics
Basic 4V
• Volume(Scale)
Data volume is increasing exponentially, not linearly
Even large amounts of small data can result in Big Data
• Variety(Complexity)
Various formats, types, and structures
(from semi-structured XML to unstructured multimedia)
• Velocity(Speed)
Data is being generated fast and needs to be processed fast
• Veracity (Uncertainty)
Uncertainty due to inconsistency, incompleteness, latency,
ambiguities, or approximations
10
Big Data Characteristics
Additional V
• Value
Business value of the data (needs to be
revealed)
• Validity 137
Representatives
145
2. Document Stores
Data model
• Documents
Self-describing
Hierarchical tree structures (JSON, XML, …)
– Scalar (single) values, maps, lists, sets, nested
documents, …
Identified by a unique identifier (key, …)
• Documents are organized into collections
Query patterns
• Create, update or remove a document
• Retrieve documents according to complex query conditions
2. Document Stores
Examples
3. Wide Column Stores
Data model
• Columnfamily (table)
The table is a collection of similar rows (not
necessarily identical)
• Row
Row is a collection of columns. Should encompass a
group of data that is accessed together. Associated
with a unique row key
• Column
A column consists of a columnname and column value
(and possibly other metadata records)
30
3. Wide Column Stores
Query patterns
• Create, update or remove a row within a given
column family
• Select rows according to a row key or simple
conditions
Warning
• Wide column stores are not just a special kind
of RDBMSs with a variable set of columns!
3. Wide Column Stores
Examples
4. Graph Databases
Data model
• Property graphs
Directed / undirected graphs, i.e. collections of …
– nodes (vertices) for real-world entities, and
– relationships (edges) between these nodes
Both the nodes and relationships can be
associated with additional properties
Types of databases
• Non-transactional = small number of very large
graphs
• Transactional = large number of small graphs
4. Graph Databases
Query patterns
• Create, update, or remove a node / relationship in a
graph
• Graph algorithms (shortest paths, spanning trees, …)
• General graph traversals
• Sub-graph queries or super-graph queries
• Similarity-based queries (approximate
matching)
• Examples
• Neo4j, Titan, Apache Giraph, InfiniteGraph, FlockDB,
OrientDB, OpenLink Virtuoso, ArangoDB
4. Graph Databases
Suitable use cases
• Social networks, routing, dispatch, and location-based
services, recommendation engines, chemical
compounds, biological pathways, linguistic trees, …
I.e. simply for graph structures
When not to use
• Extensive batch operations are required
Multiple nodes/relationships are to be affected
4. Graph Databases
Examples
57
Types of Databases: NoSQL Database
NoSQL Databases
Source: http://nosql-database.org/
58
Types of Databases: NoSQL Database
BASE Transactions
• Acronym contrived to be the opposite of ACID
• Basically Available,
• Soft state,
• Eventually Consistent
• Characteristics
• Weak consistency – stale data OK
• Availability first
• Best effort
• Approximate answers OK
• Aggressive (optimistic)
• Simpler and faster
Features of NoSQL Databases
1. Data model
• Traditional approach: relational model
• (New) possibilities:
• Key-value, document, wide column, graph
• Goal
Respect the real-world nature of data
(i.e. data structure and mutual relationships)
Features of NoSQL Databases
2. Aggregate structure
• Aggregate definition
Data unit with a complex structure
Collection of related data pieces we wish to treat as
a unit
• Examples
Value part of key-value pairs in key-value stores
Document in document stores
Row of a column family in wide column stores
Features of NoSQL Databases
3. Elastic scaling
• Traditional approach: scaling-up
Buying bigger servers as database load increases
• New approach: scaling-out
Distributing database data across multiple hosts
4. Data distribution
• Sharding
Particular ways which database data is split into separate
groups
• Replication
Maintaining several data copies (performance, recovery)
Features of NoSQL Databases
5. Automatedprocesses
• Traditional approach
Expensive and highly trained database administrators
• New approach: automatic recovery, distribution, tuning, …
6. Relaxed consistency
• Traditional approach
Strong consistency (ACID properties and transactions)
• New approach
Eventual consistency only (BASE properties)
I.e. we have to make trade-offs because of the data
distribution
Features of NoSQL Databases
7. Schemalessness
• Relational databases
Database schema present and strictly enforced
• NoSQL databases
Relaxed schema or completely missing
Consequences: higher flexibility
– Dealing with non-uniform data
– Structural changes cause no overhead
However: there is (usually) an implicit schema
– We must know the data structure at the application
level anyway
Features of NoSQL Databases
8. Open source
• Often community and enterprise versions
(with extended features or extent of support)
9. Simple APIs
• Often state-less application interfaces (HTTP)
Features of NoSQL Databases
Advantages
• Scaling
Horizontal distribution of data among hosts
• Volume
High volumes of data that cannot be handled by RDBMS
• Administrators
No longer needed because of the automated maintenance
• Economics
Usage of cheap commodity servers, lower overall costs
• Flexibility
Relaxed or missing data schema, easier design changes
Features of NoSQL Databases
Challenges /Disadvantages
• Maturity
• Often still in pre-production phase with key features missing
• Support
• Mostly open source, limited sources of credibility
• Administration
• Sometimes relatively difficultto install and maintain
• Analytics
• Missing support for business intelligence and ad-hoc
querying
• Expertise
• Still low number of NoSQL experts available in the market
50
Conclusion
51
68
Types of Databases: Cloud Database
• A type of database where data is stored in a virtual
environment and executes over the cloud computing
platform.
• It provides users with various cloud computing services
(SaaS, PaaS, IaaS, BPaaS, etc.) for accessing the database.
• There are numerous cloud platforms, but the best options
are:
69
Types of Databases: Cloud Database
• A type of database where data is stored in a virtual
environment and executes over the cloud computing
platform.
• It provides users with various cloud computing services
(SaaS, PaaS, IaaS, BaaS, etc.) for accessing the database.
70
Types of Databases: Cloud Database
• There are numerous cloud platforms, but the best options are:
1. Amazon Web Services(AWS)
2. Microsoft Azure
3. Kamatera
4. PhonixNAP
5. ScienceSoft
6. Google Cloud SQL, etc.
Class Outline
Introduction
Types of Databases
DBMS Architecture
Data Models