Data Modeling
Data Modeling
“Data modeling is the process of creating a simplified diagram of a software system and the data
elements it contains, using text and symbols to represent the data and how it flows.”
– TECHTARGET
“Data modeling is the process of creating a visual representation that defines the information
collection and management systems of any organization. This blueprint or data model helps different
stakeholders to create a unified view of the organization’s data”
– AWS
“Data modeling is the process of creating visual representations of whole systems or parts to
show connections between data points and structures. The goal is to show the types of data stored
in the system, their relationships, and the formats and attributes of the data.”
– DASCA
“Data modeling is the process of creating a visual representation of the structure and
relationships within a dataset, aiming to organize and standardize data for effective analysis,
interpretation, and communication.”
– ChatGPT
PURPOSE OF MODELING
OLTP (Online Transactional Processing): Designed primarily for daily operation and
transactions using highly normalized data structures with low data redundancy and high
integrity. It is focused on fasted data insertion, modification and small retrieval
OLAP (Online Analytics Processing) : Designed for analysis and decision making or for
BI apps using denormalized schemas allowing complex queries and aggregations
NORMALIZATION VS DENORMALIZATION
Normalization : Process act to reduce data duplication and improve data integrity. It
involves structuring our database in such a way that dependencies among the data are
minimized, dividing the database into multiple tables and defining relationships between
them.It is also focused on saving storage space. It is a common pattern of OLTP systems
and it can be achieved with the use of the normal forms.
Normal forms are steps that are used to reduce duplication→ the higher the degree of
normal form, the less duplication and the greater the data integrity.
● 1NF
○ No repeating groups or array in a single record
○ Each field must contain only a single value
○ Data order doesn’t affect the integrity
○ Attribute domain does not change
● 2NF
○ 1NF
○ Remove partial dependency→ A non-key attribute should depend on the
whole primary key
● 3NF
○ 2NF
○ Eliminate fields that do not directly depend on the primary key
○ Separate data into different tables where each table should be about a
specific topic. All non-key are independent of each other and only directly
related to the primary key and no other fields.
Denormalization: Process act to increase performance by adding redundancy. It involves
consolidation data from multiple tables into fewer tables, reducing the need of complex
joins and queries at the cost of duplication and inconsistency. It is a common pattern of
Data Warehousing where the data systems are very read heavy and the focus was
reduces the need of joins
● OBT (One Big Table) : Join all of the data necessary for analytics into wide
denormalized tables
● Star Schema : Type of data modeling used to represent data in a structured and
intuitive way. It is characterized by a structure of few tables wheres one is the central
one (or few) and it is called fact table surrounded by dimensions tables that
describe its attributes
○ Fact Table : Primary table in a dimensional model where the numerical
performance measurements of the business are stored
– Ralph Kimball
○ Dimension Tables : Tables that contain the descriptive attributes of the
measures in the fact and allow users to analyze data from different
perspectives (who, what, where, when, how, why)
TOOLS FOR DATA MODELING (NoAdv)
ER/Studio : Data modeling tool that lets you identify your data assets
and sources and can construct and share data models. You can
realize logical and physical models