Data Modeling
Data Modeling
Data Modeling
Data Modeling in software engineering is the process of simplifying the diagram or data model
of a software system by applying certain formal techniques. It involves expressing data and
information through text and symbols. The data model provides the blueprint for building a
new database or reengineering legacy applications.
In the light of the above, it is the first critical step in defining the structure of available data.
Data Modeling is the process of creating data models by which data associations and
constraints are described and eventually coded to reuse. It conceptually represents data with
diagrams, symbols, or text to visualize the interrelation.
Data Modeling thus helps to increase consistency in naming, rules, semantics, and security.
This, in turn, improves data analytics. The emphasis is on the need for availability and
organization of data, independent of the manner of its application.
Data modeling is a process of creating a conceptual representation of data objects and their
relationships to one another. The process of data modeling typically involves several steps,
including requirements gathering, conceptual design, logical design, physical design, and
implementation. During each step of the process, data modelers work with stakeholders to
understand the data requirements, define the entities and attributes, establish the relationships
between the data objects, and create a model that accurately represents the data in a way that
can be used by application developers, database administrators, and other stakeholders.
• Conceptual level: The conceptual level involves defining the high-level entities and
relationships in the data model, often using diagrams or other visual representations.
• Logical level: The logical level involves defining the relationships and constraints
between the data objects in more detail, often using data modeling languages such
as SQL or ER diagrams.
• Physical level: The physical level involves defining the specific details of how the
data will be stored, including data types, indexes, and other technical details.
The best way to picture a data model is to think about a building plan of an architect. An
architectural building plan assists in putting up all subsequent conceptual models, and so does
a data model.
These data modeling examples will clarify how data models and the process of data modeling
highlights essential data and the way to arrange it.
1. ER (Entity-Relationship) Model
This model is based on the notion of real-world entities and relationships among them. It
creates an entity set, relationship set, general attributes, and constraints.
2. Hierarchical Model
This data model arranges the data in the form of a tree with one root, to which other data is
connected. The hierarchy begins with the root and extends like a tree. This model effectively
explains several real-time relationships with a single one-to-many relationship between two
different kinds of data.
For example, one supermarket can have different departments and many aisles. Thus, the ‘root’
node supermarket will have two ‘child’ nodes of (1) Pantry, (2) Packaged Food.
3. Network Model
This database model enables many-to-many relationships among the connected nodes. The
data is arranged in a graph-like structure, and here ‘child’ nodes can have multiple ‘parent’
nodes. The parent nodes are known as owners, and the child nodes are called members.
4. Relational Model
This popular data model example arranges the data into tables. The tables have columns and
rows, each cataloging an attribute present in the entity. It makes relationships between data
points easy to identify.
For example, e-commerce websites can process purchases and track inventory using the
relational model.
This data model defines a database as an object collection, or recyclable software components,
with related methods and features.
For instance, architectural and engineering real-time systems used in 3D modeling use this data
modeling process.
6. Object-Relational Model
To better understand today's popular data modeling techniques, it's helpful to provide a quick
history lesson on how modeling has evolved. Among the seven data models described here, the
first four types were used in the early days of databases and are still options, followed by the
three models most widely used now.
Data is stored in a tree-like structure with parent and child records that comprise a collection
of data fields. A parent can have one or more children, but a child record can have only one
parent. The hierarchical model is also composed of links, which are the connections between
records, and the types that specify the kind of data contained in the field. It originated in
mainframe databases in the 1960s.
This model extended the hierarchical model by allowing a child record to have one or more
parents. A standard specification of the network model was adopted in 1969 by the Conference
on Data Systems Languages, a now-defunct group better known as CODASYL. As a result, it's
also referred to as the CODASYL model. The network technique is the precursor to a graph
data structure, with a data object represented inside a node and the relationship between two
nodes called an edge. Although popular on mainframes, it was largely replaced by relational
databases after they emerged in the late 1970s.
In this model, data is stored in tables and columns and the relationships between the data
elements in them are identified. It also incorporates database management features such as
constraints and triggers. The relational approach became the dominant data modeling technique
during the 1980s. The entity-relationship and dimensional data models, currently the most
prevalent techniques, are variations of the relational model but can also be used with non-
relational databases.
This model combines aspects of object-oriented programming and the relational data model.
An object represents data and its relationships in a single structure, along with attributes that
specify the object's properties and methods that define its behavior. Objects may have multiple
relationships between them. The object-oriented model is also composed of the following:
• Classes, which are collections of similar objects with shared attributes and
behaviors.
• Inheritance, which enables a new class to inherit the attributes and behaviors of an
existing object.
It was created for use with object databases, which emerged in the late 1980s and early 1990s
as an alternative to relational software. But they didn't make a big dent in relational
technology's dominance.
This model has been widely adopted for relational databases in enterprise applications,
particularly for transaction processing. With minimal redundancy and well-defined
relationships, it's very efficient for data capture and update processes. The model consists of
the following:
• Entities that represent people, places, things, events or concepts on which data is
processed and stored as tables.
• Relationships, which define logical links between two entities that represent
business rules or constraints.
RICK SHERMAN, ATHENA IT SOLUTIONS
This is an entity-relationship data model created from Microsoft's AdventureWorks sample
database.
The model's design is characterized by the degree of normalization -- the level of redundancy
implemented, as identified by Edgar F. Codd, who created the relational model. The most
common forms are third normal form (3NF) and Boyce-Codd normal form, a slightly stronger
version also known as 3.5NF.
Like the entity-relationship model, the dimensional model includes attributes and relationships.
But it features two core components.
• Dimensions, which are tables that contain the business context of the facts to define
their who, what, where and why attributes. Dimensions are typically descriptive
instead of numeric.
RICK SHERMAN, ATHENA IT SOLUTIONS
This dimensional data model was built from Microsoft's AdventureWorks sample database.
The dimensional model has been widely adopted for BI and analytics applications. It's often
referred to as a star schema -- a fact surrounded by and connected to multiple other facts, though
that oversimplifies the model structure. Most dimensional models have many fact tables linked
to many dimensions that are referred to as conformed when shared by more than one fact table.
Graph data modeling has its roots in the network modeling technique. It's primarily used to
model complex relationships in graph databases, but it can also be used for other NoSQL
databases such as key-value and document types.
• Nodes, which represent entities with a unique identity. Each instance of an entity is
a different node that's akin to a row in a table in the relational model.
• Edges, also known as links or relationships. They connect nodes and define how
the nodes are related. All nodes must have at least one edge connected to them.
Edges can be undirected, with a bidirectional relationship between nodes, or
directed, in which the relationship goes in a specified direction.
One of the more popular graph formats is the property graph model. In this model, the
properties of nodes or edges are represented by name-value pairs. Labels can also be used to
group nodes together for easier querying. Each node with the same label becomes a member
of the set, and nodes can be assigned as many labels as they fit.
Treat the data model as a blueprint and specification. Data models should be a useful guide
for the people who design the database schema and those who create, update, manage, govern
and analyze the data. Follow the progression from conceptual to logical to physical models if
a new data model is being created in a greenfield environment with no existing models or
physical schemas.
Gather both business and data requirements upfront. Get input from business stakeholders
to design conceptual and logical data models based on business needs. Also, collect data
requirements from business analysts and other subject matter experts to help derive more
detailed logical and physical models from the business requirements and higher-level models.
Data models need to evolve with the business and technology.
Develop models iteratively and incrementally. A data model may include hundreds or
thousands of entities and relationships. That would be extremely difficult to design all at once.
Don't try to boil the ocean. The best approach is to segment the model into the subject areas
identified in a conceptual data model and design those subject areas one by one. After doing
that, tackle the interconnections between them.
Use a data modeling tool to design and maintain the data models. Data modeling tools
provide visual models, data structure documentation, a data dictionary and the data definition
language code needed to create physical data models. They also can often interchange metadata
with database, data integration, BI, data catalog and data governance tools. And if there are no
data models on existing databases, leverage a tool's reverse-engineering functions to jump-start
the process.
Determine the level of granularity that's needed in data models. In general, maintain the
lowest level of data granularity -- in other words, the most detailed data that's captured. Only
aggregate data when necessary and only as a derivative data model, while still retaining the
lowest-grain data in the primary model.
Use data models as a communication tool with business users. A 10,000-table entity-
relationship model can make anyone's head spin. But a data model, or a portion of one, focused
on a specific business process or data analysis offers the perfect opportunity to discuss and
verify the schema with business users. The assumption that business users can't grasp a data
model is a fatal mistake in modeling efforts.
Manage data models just like any other application code. Enterprise applications, data
integration processes and analytics applications all use data structures, whether they're
designed and documented or not. Rather than allow an unplanned "accidental architecture" to
develop and crush any chance of obtaining a solid ROI from their data, organizations need to
get serious about data modeling.