0% found this document useful (0 votes)
57 views6 pages

Data Modeling

Uploaded by

Richard Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views6 pages

Data Modeling

Uploaded by

Richard Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DATA MODELING

WHAT IS DATA MODELING ?

“Data modeling is the process of creating a simplified diagram of a software system and the data
elements it contains, using text and symbols to represent the data and how it flows.”
– TECHTARGET

“Data modeling is the process of creating a visual representation that defines the information
collection and management systems of any organization. This blueprint or data model helps different
stakeholders to create a unified view of the organization’s data”
– AWS

“Data modeling is the process of creating visual representations of whole systems or parts to
show connections between data points and structures. The goal is to show the types of data stored
in the system, their relationships, and the formats and attributes of the data.”
– DASCA

“Data modeling is the process of creating a visual representation of the structure and
relationships within a dataset, aiming to organize and standardize data for effective analysis,
interpretation, and communication.”
– ChatGPT

DATA MODELING TYPES

1. CONCEPTUAL DATA MODEL


Try to abstractly explain what is going on in the business. Shows the key entities
and relationships but not delving into details. It could use only business terms. It is
used to explain the environment to non-technical stakeholders.
2. LOGICAL DATA MODEL
It serves as a bridge between the conceptual model and the physical model and it
could be independent of physical data storage considerations. Includes entity,
relationships and attributes but not technical details as data types.
3. PHYSICAL DATA MODEL
Includes all technical details such as tables, columns, data types and constraints.
It also includes primary and foreign keys (some definitions put these in the logical
model), indexes and other database features. It depends a lot on the technology
chosen and to build it you also should consider arguments like performances,
efficiency and storage dimensions.
QUESTION TO ASK YOURSELF BEFORE/DURING DATA MODEL REALIZATION

Conceptual Data Model


1. Which is the purpose of the data structure?
2. Who are the stakeholders involved and what are their requirements?
3. What are the elements of the domain to represent?
4. Which are the relationships between these elements?
5. What are the main activities or processes that interact with the data?
6. What are the possible issues of the domain?
Logical Data Model
7. Which are the attributes of every entity?
8. Are there any business rules or constraints that need to be represented in the
model?
9. How can we identify each element and with what type of relationship can we
connect it to the others? (PK - FK)
10. How far do we want to abstract things? How much do we need to generalize
(Employees,Customers,Consultants or Persons?)
11. Do we need a normalized or denormalized data structure? (see below)
12. Are validation rules needed to ensure data integrity?
Physical Data Model
13. Which DBMS will be used for the implementation?
14. Which are the data types of every attribute of every entity and which are
nullable or not?
15. How much data do we need to retain and how much storage do we need to
guarantee this?
16. Which kind of indexes do we need to guarantee the performance
requirements?
17. How can we guarantee scalability, security and redundancy?
18. Which kind of maintenance operations will our data system need?

PURPOSE OF MODELING

OLTP (Online Transactional Processing): Designed primarily for daily operation and
transactions using highly normalized data structures with low data redundancy and high
integrity. It is focused on fasted data insertion, modification and small retrieval

OLAP (Online Analytics Processing) : Designed for analysis and decision making or for
BI apps using denormalized schemas allowing complex queries and aggregations

NORMALIZATION VS DENORMALIZATION

Normalization : Process act to reduce data duplication and improve data integrity. It
involves structuring our database in such a way that dependencies among the data are
minimized, dividing the database into multiple tables and defining relationships between
them.It is also focused on saving storage space. It is a common pattern of OLTP systems
and it can be achieved with the use of the normal forms.
Normal forms are steps that are used to reduce duplication→ the higher the degree of
normal form, the less duplication and the greater the data integrity.

● 1NF
○ No repeating groups or array in a single record
○ Each field must contain only a single value
○ Data order doesn’t affect the integrity
○ Attribute domain does not change
● 2NF
○ 1NF
○ Remove partial dependency→ A non-key attribute should depend on the
whole primary key
● 3NF
○ 2NF
○ Eliminate fields that do not directly depend on the primary key
○ Separate data into different tables where each table should be about a
specific topic. All non-key are independent of each other and only directly
related to the primary key and no other fields.
Denormalization: Process act to increase performance by adding redundancy. It involves
consolidation data from multiple tables into fewer tables, reducing the need of complex
joins and queries at the cost of duplication and inconsistency. It is a common pattern of
Data Warehousing where the data systems are very read heavy and the focus was
reduces the need of joins

● OBT (One Big Table) : Join all of the data necessary for analytics into wide
denormalized tables

● Star Schema : Type of data modeling used to represent data in a structured and
intuitive way. It is characterized by a structure of few tables wheres one is the central
one (or few) and it is called fact table surrounded by dimensions tables that
describe its attributes
○ Fact Table : Primary table in a dimensional model where the numerical
performance measurements of the business are stored
– Ralph Kimball
○ Dimension Tables : Tables that contain the descriptive attributes of the
measures in the fact and allow users to analyze data from different
perspectives (who, what, where, when, how, why)
TOOLS FOR DATA MODELING (NoAdv)

ER/Studio : Data modeling tool that lets you identify your data assets
and sources and can construct and share data models. You can
realize logical and physical models

DbSchema : Is a universal database designer and GUI tool that


provides a diagram oriented database designer for relational and
No-SQL databases.

Erwin : Widely used tool that supports the creation of conceptual,


logical and physical data models. It provides features for forward
and reverse engineering with various database platforms

Archi: A cost-effective cross-platform solution

You might also like