DB Desing&other Learning
DB Desing&other Learning
Requirement Analysis
It’s very crucial to understand the requirements of our application so that you can think in
productive terms. And imply appropriate integrity constraints to maintain the data integrity &
consistency.
2. Logical & Physical Design
This is the actual design phase that involves various steps that are to be taken while
designing a database. This phase is further divided into two stages:
Logical Data Model Design: This phase consists of coming up with a high-level design
of our database based on initially gathered requirements to structure & organize our data
accordingly. A high-level overview on paper is made of the database without considering
the physical level design, this phase proceeds by identifying the kind of data to be
stored and what relationship will exist among those data.
Entity, Key attributes identification & what constraints are to be implemented is the core
functionality of this phase. It involves techniques such as Data Modeling to visualize
data, normalization to prevent redundancy, etc.
Physical Design of Data Model: This phase involves the implementation of the logical
design made in the previous stage. All the relationships among data and integrity
constraints are implemented to maintain consistency & generate the actual database.
3. Data Insertion and testing for various integrity Constraints
Finally, after implementing the physical design of the database, we’re ready to input the data
& test our integrity. This phase involves testing our database for its integrity to see if
something got left out or, if anything new to add & then integrating it with the desired
application.
Logical Data Model Design
The logical data model design defines the structure of data and what relationship exists
among those data. The following are the major components of the logical design:
1. Data Models: Data modeling is a visual modeling technique used to get a high-level
overview of our database. Data models help us understand the needs and requirements of
our database by defining the design of our database through diagrammatic representation.
Ex: model, Network model, Relational Model, object-oriented data model.
Data Models
2. Entity: Entities are objects in the real world, which can have certain properties & these
properties are referred to as attributes of that particular entity. There are 2 types of entities:
Strong and weak entity, weak entity do not have a key attribute to identify them, their
existence solely depends on one 1-specific strong entity & also have full participation in a
relationship whereas strong entity does have a key attribute to uniquely identify them.
Weak entity example: Loan -> Loan will be given to a customer (which is optional) & the
load will be identified by the customer_id to whom the lone is granted.
3. Relationships: How data is logically related to each other defines the relationship of that
data with other entities. In simple words, the association of one entity with another is defined
here.
A relationship can be further categorized into – unary, binary, and ternary relationships.
Unary: In this, the associating entity & the associated entity both are the same. Ex:
Employee Manages themselves, and students are also given the post of monitor hence
here the student themselves is a monitor.
Binary: This is a very common relationship that you will come across while designing a
database.
Ex: Student is enrolled in courses, Employee is managed by different managers, One
student can be taught by many professors.
Ternary: In this, we have 3 entities involved in a single relationship. Ex: an employee
works on a project for a client. Note that, here we have 3 entities: Employee, Project &
Client.
4. Attributes: Attributes are nothing but properties of a specific entity that define its
behavior. For example, an employee can have unique_id, name, age, date of birth (DOB),
salary, department, Manager, project id, etc.
5. Normalization: After all the entities are put in place and the relationship among data is
defined, we need to look for loopholes or possible ambiguities that may arise as a result of
CRUD operations. To prevent various Anomalies such as INSERTION, UPDATION, and
DELETION Anomalies.
Data Normalization is a basic procedure defined for databases to eliminate such anomalies
& prevent redundancy.
An Example of Logical Design
Logical Design Example
Physical Design
The main purpose of the physical design is to actually implement the logical design that is,
show the structure of the database along with all the columns & their data types, rows,
relations, relationships among data & clearly define how relations are related to each other.
Following are the steps taken in physical design
Step 1: Entities are converted into tables or relations that consist of their properties
(attributes)
Step 2: Apply integrity constraints: establish foreign key, unique key, and composite key
relationships among the data. And apply various constraints.
Step 3: Entity names are converted into table names, property names are translated into
attribute names, and so on.
Step 4: Apply normalization & modify as per the requirements.
Step 5: Final Schemes are defined based on the entities & attributes derived in logical
design.
Physical Design
Conclusion
In conclusion, a good database design is an essential part of a strong database
management system (DBMS). It provides the basis for data governance, data storage, and
data retrieval. The quality of a database has a direct impact on a system’s overall
performance and dependability. It is important to consider data organization,
standardization, performance, integrity, and more when designing a database to meet the
needs of your organization and your users.
A quick review of the present need to store massive chunks of data relevant to
multiple related or unrelated categories, reveals that databases must be highly
effective at what they are designed to do.
This is not only because of the amount of data being continuously revised or
modified that we are dealing with the dynamics of it aren’t of sole interest
anymore. It’s because of the social value that every individual has assigned to
them: databases are the literal backbone of a client’s lifestyle or a business’s
worth.
Designing different types of databases lies at the core of the functionality that
they provide to the users. Since data is a dynamic entity, the way it is stored
varies a lot. It is also the reason behind companies designing their own types of
databases that comply with their needs. In this article, we will be discussing the
types of Databases in detail.
Types of Databases
There are several types of databases, that are briefly explained below.
Hierarchical databases
Network databases
Object-oriented databases
Relational databases
Cloud Database
Centralized Database
Operational Database
NoSQL databases
Hierarchical Databases
Just as in any hierarchy, this database follows the progression of data being
categorized in ranks or levels, wherein data is categorized based on a common
point of linkage. As a result, two entities of data will be lower in rank and the
commonality would assume a higher rank. Refer to the diagram below:
Hierarchical Database Example
Do note how Departments and Administration are entirely unlike each other and
yet fall under the domain of a University. They are elements that form this
hierarchy.
Another perspective advises visualizing the data being organized in a parent-child
relationship, which upon addition of multiple data elements would resemble a
tree. The child records are linked to the parent record using a field, and so the
parent record is allowed multiple child records. However, vice versa is not
possible.
Notice that due to such a structure, hierarchical databases are not easily salable;
the addition of data elements requires a lengthy traversal through the database.
Network Databases
In Layman’s terms, a network database is a hierarchical database, but with a
major tweak. The child records are given the freedom to associate with multiple
parent records. As a result, a network or net of database files linked with multiple
threads is observed. Notice how the Student, Faculty, and Resources elements
each have two-parent records, which are Departments and Clubs.
The disadvantage lies in the inability to alter the structure due to its complexity
and also in it being highly structurally dependent.
Object-Oriented Databases
Those familiar with the Object-Oriented Programming Paradigm would be able to
relate to this model of databases easily. Information stored in a database is
capable of being represented as an object which response as an instance of the
database model. Therefore, the object can be referenced and called without any
difficulty. As a result, the workload on the database is substantially reduced.
Object-Oriented Example
In the chart above, we have different objects linked to one another using
methods; one can get the address of the Person (represented by the Person
Object) using the livesAt() method. Furthermore, these objects have attributes
which are in fact the data elements that need to be defined in the database.
An example of such a model is the Berkeley DB software library which uses the
same conceptual background to deliver quick and highly efficient responses to
database queries from the embedded database.
Relational Databases
Considered the most mature of all databases, these databases lead in the
production line along with their management systems. In this database, every
piece of information has a relationship with every other piece of information. This
is on account of every data value in the database having a unique identity in the
form of a record.
Note that all data is tabulated in this model. Therefore, every row of data in the
database is linked with another row using a primary key. Similarly, every table is
linked with another table using a foreign key.
Refer to the diagram below and notice how the concept of ‘Keys’ is used to link
two tables.
Relational Database Example
Cloud Databases
A cloud database is used where data requires a virtual environment for storing
and executing over the cloud platforms and there are so many cloud computing
services for accessing the data from the databases (like SaaS, Paas, etc).
There are some names of cloud platforms are-
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
Microsoft Azure
ScienceSoft, etc.
Centralized Databases
A centralized database is basically a type of database that is stored, located as
well as maintained at a single location and it is more secure when the user wants
to fetch the data from the Centralized Database.
Advantages
Data Security
Reduced Redundancy
Consistency
Disadvantages
The size of the centralized database is large which increases the response and
retrieval time.
It is not easy to modify, delete and update.
Personal Databases
Collecting and Storing the data on its own System and this type of databases is
basically designed for the single user.
Advantages
It is easy to handle
It occupies less space
Operational Databases
It is used for creating, updating, and deleting the database in real-time and it is
basically designed for executing and handling the daily data operation in
organizations and businesses purposes.
Advantages
easy to fetch.
Structured data
Real-time processing
NoSQL Databases
A NoSQL originally referring to non SQL or non-relational is a database that
provides a mechanism for storage and retrieval of data. This data is modeled in
means other than the tabular relations used in relational databases.
Conclusion
The output of a database design exercise is a data model. A data model represents all the
objects, entities, attributes, relationships, and constraints in the system. Broadly speaking,
data models can be of two types: logical or physical. The representation of the data model
is done by creating an ER diagram, also known as an entity relationship diagram, an ERD
diagram, or a database diagram.
The physical data model relates to the actual implementation details in the database. The
logical data model, on the other hand, abstracts away the implementation technicalities.
This makes the logical data model consumable for the business. One key difference
between the two models is that the logical model is database-agnostic while the physical
model has to be specific to the database in use.
As shown in this example, based on inputs from the business, you would end up choosing
one option over the other. It would result in changing the concerned entities and their
relationships.
For instance, if a company has plans to target a new geographical region, the model
would have to cater to language support, currencies, time zones, and so on. The benefits
of understanding the long-term business roadmap often show up in a smoother transition
to new business processes.
Let me share one more example, which is more about continuously evolving business
priorities. The taxi business was impacted badly at the beginning of COVID-19. As a cab
company, you want to act preemptively to assure people that you're doing everything to
make sure that your travel in the cab is as safe as possible, that the vehicle is disinfected
every day, that the driver wears a mask at all times, and that there's hand sanitizer
available in the cab. Now, to capture all this information, changes to two
entities, drivers and vehicles , would be required. Several Boolean flag fields need to be
added to these entities to cater to this business use case.
Step 3: Identify Entities and Attributes
Once the business requirements are gathered, the information can be used to identify
entities along with the essential set of attributes. One or more entities generally map
directly to business processes, and the relationship between those entities also mimics
the business process workflow.
This step is also used to identify which attributes will act as identifiers in the entities.
Identifiers translate to primary keys in the physical model. In addition, it is also common to
specify data types for all the attributes in this step.
For instance, in the taxicab booking model, you would have to identify the attributes which
will act as the mandatory fields for the registration of users and drivers from the booking
app. User registration would be done using user_phone and driver registration would be
done using driver_phone .
Similarly, other entities and attributes are identified during this step, after having been
mapped to the business process workflows.
In the taxicab booking model, only one type of relationship has been used, i.e., one-to-
many. Take the relationship between users and trips as an example. In the model,
there's an assumption that only one user can be related to a trip, which implies that there
are no shared or pooled cabs. But if there were shared or pooled cabs, there would
possibly have been a many-to-many relationship between users and trips , if many
users shared the same trip_id .
Step 5: Create a Logical ER Diagram
With entities, attributes, and entity relationships defined, the immediate next step is to
draw the ER diagram. All of the steps listed above can be done within VERTABELO . There
are no hard and fast rules for the way logical modeling is supposed to be done, with the
possible exception of the reference notation.
For instance, take a look at the FOLLOWING EXAMPLE of a logical ER diagram. It captures a
simple business workflow of a cab company, where a user can book a ride with the ability
to track the vehicle.
Cab Bookingsvehiclesvehicle_idvehicle_registration_numberIntegerVarchar(255)M
PIMdriversdriver_iddriver_namedriver_phoneIntegerVarchar(255)IntegerM
PIMMvehicle_trip_trackingvehicle_trip_tracking_idvehicle_trip_idgps_timegps_latitudegps
_longitudeIntegerIntegerTimestampLongFlatLongFlatM
PIMMMMvehicle_tripvehicle_trip_idvehicle_idtrip_idIntegerIntegerIntegerM
PIMMusersuser_iduser_nameuser_phoneIntegerVarchar(255)IntegerM
PIMMtripstrip_iddriver_idvehicle_iduser_idtrip_statusIntegerIntegerIntegerIntegerVarchar(
255)M PIMMMM
To take care of this, VERTABELO has a thorough list of checks that can be performed on a
logical model. Checks can be performed at all granularities, from the model as a whole to
individual attributes, and everything in between. Some of the simple checks are:
All of these checks are optional and can be configured to be skipped, if there's another
validation framework in place. Proper validation from VERTABELO helps you move to the
next step with the minimum amount of friction possible.
Step 7: Create a Physical ER diagram
Once the logical ER diagram is created, the next step is to create a physical data model.
The physical data model will be specific to the database where you want to deploy the
data model. All databases have their unique implementation of nomenclature rules, data
types, and constraints. Due to this, the Data Definition Language (DDL) often differs from
one database to another.
1. Find the closest data type in the target database to replace the generic data type
selected in the logical data model.
2. Follow the nomenclature rules for tables, columns, and constraints as prescribed by
the target database.
3. Modify the model to align with predefined query workflows. This generally results in
increasing redundancy to save joins.
4. Finally, you can create indexes, partitions, distribution keys, and sort keys. This is
when you can create any performance-enhancing modifications to the model.
These steps can be performed using any data modeling tool you can use to create a data
model from scratch. However, VERTABELO has an option to convert a logical data model to
a full-fledged physical data model for all the major database systems like MySQL,
PostgreSQL, Oracle, Microsoft SQL Server, Amazon Redshift, Google BigQuery, and
more. Once the logical data model is converted to a physical data model, you can carry on
with the four steps we discussed.
VERTABELO also has an option to add pre- and post-deployment scripts at the table level
along with any comments at the very granular level of the model. The comments turn out
handy when the documentation generation feature offered by VERTABELO is used. The
database document can be exported in any of the following three formats: HTML, PDF, or
DOCX.
Continuing with the cab booking example, let's take a look at the physical data model
generated by VERTABELO .
Once these and other similar issues are resolved, the model is ready to be exported to a
deployable SQL script for the selected database management system.
Step 10: Generate the DDL Scripts for Deploying the Model
DATABASE DESIGN IS NOT JUST ABOUT CREATING ER DIAGRAMS. A data modeling exercise
using ER diagrams is successful only if it results in something deployable. VERTABELO has
a convenient option to export the physical model to a ready-to-deploy SQL script. Once it
is generated, any pending issues can be resolved directly in the SQL script.
Now that we've reached the end of the database design process, let's have a look at
the SQL CODE generated by VERTABELO .
We'll be happy to see what your experiences have been in designing databases. Write to
us at [email protected].
Primary Terminologies Used in Database Design
Following are the terminologies that a person should be familiar with before designing a
database:
Redundancy: Redundancy refers to the duplicity of the data. There can be specific use
cases when we need or don’t need redundancy in our Database. For ex: If we have a
banking system application then we may need to strictly prevent redundancy in our
Database.
Schema: Schema is a logical container that defines the structure & manages the
organization of the data stored in it. It consists of rows and columns having data types for
each column.
Records/Tuples: A Record or a tuple is the same thing, basically its where our data is
stored inside a table
Indexing: Indexing is a data structure technique to promote efficient retrieval of the data
stored in our database.
Data Integrity & Consistency: Data integrity refers to the quality of the information
stored in our database and consistency refers to the correctness of the data stored.
Data Models: Data models provide us with visual modeling techniques to visualize the
data & the relationship that exists among those data. Ex: model, Network Model, Object
Oriented Model, Hierarchical model, etc.
Functional Dependency: Functional Dependency is a relationship between two
attributes of the table that represents that the value of one attribute can be determined by
another. Ex: {A -> B}, A & B are two attributes and attribute A can uniquely determine the
value of B.
Transaction: Transaction is a single logical unit of work. It signifies that some changes
are made in the database. A transaction must satisfy the ACID or BASE properties
(depending on the type of Database).
Schedule: Schedule defines the sequence of transactions in which they’re executed by
one or multiple users.
Concurrency: Concurrency refers to allowing multiple transactions to operate
simultaneously without interfering with one another.
DB Life cycle