0% found this document useful (0 votes)
18 views42 pages

Lecture Database Course Introdutcion For Student

Cơ sở dữ liệu bài giảng Giới thiệu khóa học cho sinh viên

Uploaded by

Diệu Quách
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views42 pages

Lecture Database Course Introdutcion For Student

Cơ sở dữ liệu bài giảng Giới thiệu khóa học cho sinh viên

Uploaded by

Diệu Quách
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Data Science for Economics and Business

DATABASE MANAGEMENT SYSTEM


Course Introduction

Section 1: Course objectives and overview


Course outline

❖ Module 1: Database concepts


❖ Module 2: Data modeling
❖ Module 3: Data Warehouse concept & Dimensional design
technique
❖ Module 4: Structured Query Language (SQL) for Data Analytics

How will this course benefit your career?


Course objectives

After completing this course, you should be able to:

1. Understand basic concepts of database.


2. Use Structured Query Language (SQL) to retrieves data from databases in various
business use case.
3. Fundamental knowledge of relational database and relational database design.
4. Get to know fundamental of Data Warehouse concept and dimensional database
design.
What is not covered in this course

• Relational Algebra
• Data Warehouse technical architecture
• ETL tools
• Advanced SQL programing
Course work

▪ Class Participation: 10 %
▪ Mid-term Exam / Group work: 30 %
▪ Final exam: 60%
Textbook and references

Text book
• [1] Phan Tấn Quốc, Nguyễn Thị Uyên Nhi, Giáo trình Cơ sở dữ liệu, NXB
ĐHQG TPHCM, ISBN: 978-604-73-7236-2, 2019

References
• [2] Malik, U., Goldwasser, M. and Johnston, B., 2019. SQL for Data
Analytics. Birmingham: Packt Publishing, Limited.

• [3] Stephens, R., 2009. Beginning Database Design Solutions. Wiley.

• [4] Kimball, R., Ross, M., 2013. The Data Warehouse Toolkit. 3rd ed.
Hoboken: John Wiley & Sons.
How to take most out of this course

• Full attendance

• Pay attentions

• Do the homework

• Don't Be Afraid to Ask Questions (!)


Basic concepts of database

Section 2: Database concepts


Data Lifecycle
What is Database & Why we need it ?

How do we arrange warehouses ?


What is Database & Why we need it ?

What is the most important features ?

▪ Basic function of warehouses


▪ Easy of use
▪ Persistence
▪ High availability
▪ Security
What is Data ?
Collection of facts, numbers, descriptions, objects.
Type of Data: (by how it stored)

Structured Semi-structured Unstructured


Transactional vs Analytical data stores

Online Transactional Processing (OLTP) Online Analytical Processing (OLAP)

Customer
CustomerID CustomerName CustomerPhone

Orders
OrderID CustomerID OrderDate

Data is periodically loaded,


aggregated and stored in a cube
Data is stored one transaction at a
time
Transactional workloads

Transactional data is information that tracks the interactions related to an


organization's activities.
• Atomicity – each transaction is treated as a single
unit, which success completely or fails completely.
• Consistency – transactions can only take the data in
the database from one valid state to another. -$
• Isolation – concurrent execution of transactions
leave the database in the same state. +$
• Durability – once a transaction has been
committed, it will remain committed.
Analytical Workloads
Analytical workloads are used for data analysis and decision making.

• Summaries
2020 Transactions
• Trends

• Business information
Data Processing
Data processing is the conversion of raw data to meaningful information through a
process.

Batch Processing: data elements are Daily


collected into a group. The whole group Batch Job
is then processed at a future time as a
batch All Input All Output

System System
Stream Processing: each new piece of Upload Upload
data is processed when it arrives.
Input 1 Input 2
Basic concepts of database

Section 3: Introduction to Relational Database


Early history of databases

▪ Before databases existed, everything had to be recorded on paper. We had lists,


journals, ledgers and endless archives containing hundreds of thousands or even
millions of records contained in filing cabinets.
▪ 1960: IBM developed hierarchical model, one of the earliest system.
▪ 1969: Scientists at the CODASYL released a publication that described the network
model.
▪ 1970: Codd introduced the term "relational database" in his research paper.
▪ 1976: A new database model called Entity-Relationship, or ER, was proposed by P.
Chen
▪ 1980: Structured Query Language, or SQL, became the standard query language.
What is relational database ?

A database is a set of data stored in a computer. This data is usually structured in a


way that makes the data easily accessible.

A relational database is a type of database. It uses a structure that allows us to


identify and access data in relation to another piece of data in the database. Often,
data in a relational database is organized into tables.
Identify relational database use cases

Online transaction processing:


For example order systems that perform many small transactional updates

Data warehousing:
Large amounts of data can be imported from multiple sources and structured to enable high-performance
queries

IoT:
Although typically considered for non-relational, the data from IoT devices could be structured and consistent
The characteristics of relational data
Tables
Customers
CustomerID CustomerName CustomerPhone
Data is stored in a table
100 Muisto Linna XXX-XXX-XXXX
101 Noam Maoz XXX-XXX-XXXX
Table consists of rows and columns
102 Vanja Matkovic XXX-XXX-XXXX
103 Qamar Mounir XXX-XXX-XXXX
104 Zhenis Omar XXX-XXX-XXXX All rows have same # of columns
105 Claude Paulet XXX-XXX-XXXX
106 Alex Pettersen XXX-XXX-XXXX
Each column is defined by a datatype
107 Francis Ribeiro XXX-XXX-XXXX
The characteristics of relational data
Entities
Customers
CustomerID CustomerName CustomerPhone
100 Muisto Linna XXX-XXX-XXXX
101 Noam Maoz XXX-XXX-XXXX
102 Vanja Matkovic XXX-XXX-XXXX
103 Qamar Mounir XXX-XXX-XXXX
104 Zhenis Omar XXX-XXX-XXXX
105 Claude Paulet XXX-XXX-XXXX
106 Alex Pettersen XXX-XXX-XXXX

An entity is a representation of an item which can be physical (such as a customer or a product), or virtual (such as an
order).
Entities are connected by relations enabling interaction. For example, a customer can place an order for a product
The characteristics of relational data
Normalization
Customers Orders
CustomerID CustomerName CustomerPhone OrderID CustomerName CustomerPhone
100 Muisto Linna XXX-XXX-XXXX AD100 Noam Maoz XXX-XXX-XXXX
101 Noam Maoz XXX-XXX-XXXX AD101 Noam Maoz XXX-XXX-XXXX
102 Vanja Matkovic XXX-XXX-XXXX AD102 Noam Maoz XXX-XXX-XXXX
103 Qamar Mounir XXX-XXX-XXXX AX103 Qamar Mounir XXX-XXX-XXXX
104 Zhenis Omar XXX-XXX-XXXX AS104 Qamar Mounir XXX-XXX-XXXX
105 Claude Paulet XXX-XXX-XXXX AR105 Claude Paulet XXX-XXX-XXXX
106 Alex Pettersen XXX-XXX-XXXX MK106 Muisto Linna XXX-XXX-XXXX

Data is normalized to:


Reduce storage Avoid data duplication Improve data quality
The characteristics of relational data
Relations
Customers Orders
CustomerID CustomerName CustomerPhone OrderID CustomerID SalesPersonID
100 Muisto Linna XXX-XXX-XXXX AD100 101 200
101 Noam Maoz XXX-XXX-XXXX AD101 101 200
102 Vanja Matkovic XXX-XXX-XXXX AD102 101 200
103 Qamar Mounir XXX-XXX-XXXX AX103 103 201
104 Zhenis Omar XXX-XXX-XXXX AS104 103 201
105 Claude Paulet XXX-XXX-XXXX AR105 105 200
106 Alex Pettersen XXX-XXX-XXXX MK106 105 201

In a normalized database schema:


Primary Keys and Foreign keys are used to define No data duplication exists (other than key values in Data is retrieved by joining tables together
relationships 3rd Normal Form (3NF) in a query
Relational Database Fundamentals

▪ Informally, you can think of a relational database as a collection of tables, each


containing rows and columns. It looks like a workbook containing several
spreadsheets, right?
▪ The formal term for a column is an attribute and the formal term for a row is a
tuple.
▪ A primary key is a unique key that is used to quickly locate rows by the database.
For example, Student ID field in Student table is a primary key.
▪ A foreign key is a column (or collection of columns) in one table, that refers to the
primary key in another table. For example, the ‘Student ID’ column in the ‘Marks’
table points to the ‘Student ID’ column in the ‘Student’ table.
▪ An index is a database structure that makes it quicker and easier to find records.
It’s like a table of contents of a book.
Basic concepts of database

Section 4: Introduction to Non-Relational Database


What is non-relational data ?

Examples:

## Customer 1 ID: 1
Name: Mark Hanson
Telephone: [ Home: 1-999-9999999, Business: 1-888-8888888, Cell: 1-777- 7777777 ]
Address: [ Home: 121 Main Street, Some City, NY, 10110,
Business: 87 Big Building, Some City, NY, 10111 ]
## Customer 2 ID: 2
Title: Mr
Name: Jeff Hay
Telephone: [ Home: 0044-1999-333333, Mobile: 0044-17545-444444 ]
Address: [ UK: 86 High Street, Some Town, A County, GL8888, UK,
US: 777 7th Street, Another City, CA, 90111 ]

Non-relational collections can have:

Multiple entities in the same collection or container Have a different, Are often defined by labeling each field with the
with different fields non-tabular schema name it represents
Identify non-relational database use cases

IoT and Telematics:


Often require to ingest large amounts of data in frequent burst of activity, data is either semi structured or
structured, often requires real time processing

Retail and Marketing:


Common scenarios for globally distributed data, document storage

Gaming:
In-game stats, social media integration, leaderboards, low-latency applications

Web and Mobile:


Commonly used with web click analytics, modern applications including bots
Types of non-relational data

What is semi-structured data?


Data structure is defined within the actual data by fields. For example with JSON data type
Types of non-relational data
What is unstructured data?

Does not naturally contain fields:


Examples: video, audio, media streams, documents

Often used to extract data organization and categorize or identify “structures”

Frequently used in combination with Machine Learning capabilities to “extract data” by using:
Text Analytics
Sentiment Analysis
Computer Vision
What is NoSQL?

Loose term, to describe non-relational


Types of NoSQL: key-value store, graph, column family database, document database

Key-value store Graph database


Section 5: Database design
How to design a good database

How we can design a good database ?

1. Understanding User Needs - Extracting business rules


2. Translating user needs into Data models.

34
Process to design a good database

Extract Business Conceptual Logical Data Physical Data


Rules Data Model Model Model

Data Modeling

35
Process to design a good database

36
Process to design a good database

1. Extracting business rules


• Identify the database requirements: list the attributes that need to be managed
and organize them by entity.
• Classify and describe the information about that entity.
• Determine the relationships between entities.
• Identify the transactions that will be performed on those data entities.
• Define business rules to ensure data integrity.

37
Process to design a good database

2. Conceptual model design


- It is a conceptual-level data model that describes the structure and constraints of
data in a database.
- It is represented as a diagram with three main components: entity, attributes, and
relationships.

38
Process to design a good database

2. Conceptual model design


3 main process to design a conceptual model
- Practical survey
Gathering information
Systematically presenting in the form of a flowchart of documents
- Design conceptual model
Auditing data
Identifying functional dependencies
Constructing a conceptual data model
- Controlling and standardizing the model

39
Process to design a good database

3. Logical data model


• Identify the database requirements: list the attributes that need to be managed
and organize them by entity.
• Classify and describe the information about that entity.
• Determine the relationships between entities.
• Identify the transactions that will be performed on those data entities.
• Define business rules to ensure data integrity.

40
Three levels of Data Modeling

41
THANK YOU !

You might also like