0% found this document useful (0 votes)
30 views

12 Rules & Databases

Uploaded by

Aslam Khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

12 Rules & Databases

Uploaded by

Aslam Khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

(Type-1)

Types of Databases
There are various types of databases used for storing different varieties of data:

1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts the users to
access the stored data from different locations through several applications. These applications
contain the authentication process to let users access data securely. An example of a Centralized
database can be Central Library that carries a central database of each library in a
college/university.
Advantages of Centralized Database
ADVERTISEMENT
o It has decreased the risk of data management, i.e., manipulation of data will not affect the
core data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.
Disadvantages of Centralized Database
o The size of the centralized database is large, which increases the response time for
fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among different
database systems of an organization. These database systems are connected via communication
links. Such links help the end-users to access the data easily. Examples of the Distributed
database are Apache Cassandra, HBase, Ignite, etc.
We can further divide a distributed database system into:
PlayNext
Mute
Current Time 0:12
/
Duration 18:10
Loaded: 5.14%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Homogeneous DDB: Those database systems which execute on the same operating
system and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures, and carries different hardware devices.
Advantages of Distributed Database
o Modular development is possible in a distributed database, i.e., the system can be
expanded by including new computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.

3) Relational Database
This database is based on the relational data model, which stores data in the form of rows(tuple)
and columns(attributes), and together forms a table(relation). A relational database uses SQL for
storing, manipulating, as well as maintaining the data. E.F. Codd invented the database in 1970.
Each table in the database carries a key that makes the data unique from others. Examples of
Relational databases are MySQL, Microsoft SQL Server, Oracle, etc.
Properties of Relational Database
There are following four commonly known properties of a relational model known as ACID
properties, where:
A means Atomicity: This ensures the data operation will complete either with success or with
failure. It follows the 'all or nothing' strategy. For example, a transaction will either be committed
or will abort.
C means Consistency: If we perform any operation over the data, its value before and after the
operation should be preserved. For example, the account balance before and after the transaction
should be correct, i.e., it should remain conserved.
I means Isolation: There can be concurrent users for accessing data at the same time from the
database. Thus, isolation between the data should remain isolated. For example, when multiple
transactions occur at the same time, one transaction effects should not be visible to the other
transactions in the database.
D means Durability: It ensures that once it completes the operation and commits the data, data
changes should remain permanent.

4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It
is not a relational database as it stores data not only in tabular form but in several different ways.
It came into existence when the demand for building modern applications increased. Thus,
NoSQL presented a wide variety of database technologies in response to the demands. We can
further divide a NoSQL database into the following four types:
a. Key-value storage: It is the simplest type of database storage where it stores every single
item as a key (or attribute name) holding its value, together.
b. Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model format
as used in the application code.
c. Graph Databases: It is used for storing vast amounts of data in a graph-like structure.
Most commonly, social networking websites use the graph database.
d. Wide-column stores: It is similar to the data represented in relational databases. Here,
data is stored in large columns together, instead of storing in rows.
Advantages of NoSQL Database
o It enables good productivity in the application development as it is not required to store
data in a structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.
5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the cloud
computing platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS,
etc.) for accessing the database. There are numerous cloud platforms, but the best options are:
o Amazon Web Services(AWS)
o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
o Google Cloud SQL, etc.

6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data in the
database system. The data is represented and stored as objects which are similar to the objects
used in the object-oriented programming language.
ADVERTISEMENT

7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship nodes. Here, it
organizes data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record in the tree
will contain only one parent. On the other hand, each parent record can have multiple child
records.

8) Network Databases
It is the database that typically follows the network data model. Here, the representation of data is
in the form of nodes connected via links between them. Unlike the hierarchical database, it allows
each record to have multiple children and parent nodes to form a generalized graph structure.

9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This database is
basically designed for a single user.
Advantage of Personal Database
o It is simple and easy to handle.
o It occupies less storage space as it is small in size.

10) Operational Database


The type of database which creates and updates the database in real-time. It is basically designed
for executing and handling the daily data operations in several businesses. For example, An
organization uses operational databases for managing per day transactions.

11) Enterprise Database


Large organizations or enterprises use this database for managing a massive amount of data. It
helps organizations to increase and improve their efficiency. Such a database allows simultaneous
access to users.
Advantages of Enterprise Database:
o Multi processes are supportable over the Enterprise database.
o It allows executing parallel queries on the system.
(Type-02)
Types of Databases: Relational, NoSQL, Cloud, Vector
The main types of databases include relational databases for structured data, NoSQL databases for
flexibility, cloud databases for remote access, and vector databases for machine learning
applications.
Contents
 The Importance of Databases
 Types of Databases: A Quick Overview
 Popular Databases Management Systems
 Database Popularity Trends
 Relational Databases
 NoSQL Databases
 Cloud Databases
 Vector Database
 Other Types of Databases
 Time-series databases
 Object-oriented databases
 Graph databases
 Hierarchical databases
 Network databases
 Conclusion
 Databases FAQs
Share
In today's data-driven world, we face a significant challenge: how do we efficiently store,
manage, and extract meaningful insights from data? Databases offer a solution, providing
structured repositories for organizing and accessing information.
However, to address the unique requirements of various data structures and use cases, different
types of databases have emerged.
In this article, we'll explore the four main types you'll encounter in the data science
world: relational databases, NoSQL databases, cloud databases, and vector databases.
If you want to learn about database design, check out this course on Database Design.
The Importance of Databases
Databases are essential tools in the digital world. They are organized collections of data that
facilitate the storage, retrieval, management, and manipulation of information.
At their core, databases are designed to hold data in a structured format, allowing users and
applications to efficiently access and update the information as needed.
The importance of databases extends across nearly all fields but is particularly critical in data
science. Data science projects often involve analyzing large volumes of data to derive insights,
make predictions, or inform decision-making.
Without databases, managing this data—especially as it grows in size and complexity—would be
cumbersome and error-prone. Databases provide a systematic way to store data and ensure its
integrity, security, and accessibility.
Consider, for example, a retail company that tracks sales, customer interactions, inventory, and
supplier information. A database serves as the backbone of the company’s operations, enabling
them to analyze trends, forecast demand, optimize inventory levels, and enhance customer
experiences.
Without a database, the company would struggle to handle the vast amounts of data generated
daily, let alone use this data to make informed business decisions.
Types of Databases: A Quick Overview
The different types of databases reflect the varied needs of use-cases and the complexities of the
data they handle. Different types of databases are developed to optimize performance, enhance
functionality, and cater to specific use cases.
This variety is not just a matter of technological abundance but also a necessity to address the
unique challenges and requirements that arise in different use-cases. The need for different types
of databases stems from the differences in data structures, access patterns, scalability demands,
and consistency requirements.
For instance, traditional business applications often rely on structured data that fits well into
tables with predefined schemas, making relational databases an ideal choice.
However, with the rise of big data, social networks, and real-time analytics, the limitations of
relational databases in handling unstructured data, scaling horizontally, or managing highly
connected data became evident.
This led to the emergence of NoSQL databases, designed to offer flexibility, scalability, and
performance advantages for certain types of data that do not conform to the rigid structure of
traditional databases. If you want to learn more how SQL and NoSQL databases compare, check
out this tutorial on SQL vs NoSQL Databases.
Similarly, the advent of IoT and time-sensitive applications necessitated the development of time-
series databases optimized for efficiently handling temporal data.
Cloud databases have also gained prominence, offering scalability and accessibility by hosting
data on remote servers.
Additionally, vector databases have emerged to cater to the specific needs of machine learning
applications, efficiently storing and querying high-dimensional vectors.

Popular Databases Management Systems


DB-Engines Ranking for May 2024 lists the top database management systems (DBMS) based on
their popularity. This ranking is updated monthly and includes 420 systems. As of May 2024, the
top four databases are all relational: Oracle, MySQL, Microsoft SQL, and PostgreSQL.

Source: db-engines
It's worth noting that NoSQL databases like MongoDB and Redis also hold strong positions in the
ranking, reflecting the growing demand for flexible and scalable solutions capable of handling
unstructured data and high-traffic applications. These NoSQL systems have seen significant year-
over-year growth, indicating a shift towards more diverse database architectures.
The ranking also reveals the rising popularity of cloud-based databases like Snowflake, which
offers a fully managed, scalable data warehouse solution. Elasticsearch, a powerful search engine
and analytics platform, has also climbed in the rankings, underscoring the importance of search
and analytics capabilities in modern data management.
Database Popularity Trends
Let’s now look at the line graph below, which illustrates the dynamic landscape of database
popularity from 2014 to 2024.

Source: db-engines
A key takeaway is the enduring dominance of relational database management systems (RDBMS)
like Oracle, MySQL, Microsoft SQL Server, and PostgreSQL. These have consistently maintained
their top positions throughout the decade, highlighting their importance in handling structured
data and supporting complex queries across various applications.
However, the graph also reveals a notable shift in recent years. While RDBMS systems have seen
a gradual decline in popularity, NoSQL databases like MongoDB and Redis have experienced
significant growth. This upward trajectory reflects the increasing adoption of these flexible and
scalable solutions for managing unstructured data and accommodating high-traffic applications.
Another interesting trend is the rise of cloud-based databases. Databricks, a cloud-based data
engineering and machine learning platform, has skyrocketed in popularity, showcasing the
growing demand for cloud-based solutions that offer scalability, ease of use, and powerful
analytics capabilities.
Similarly, Snowflake, a fully managed cloud data warehouse, has seen significant growth,
highlighting the appeal of its scalable and easy-to-use architecture.

Relational Databases
Relational databases store data in tables structured into rows and columns. Each row represents a
unique record, and each column represents a specific attribute of that record.
Imagine them as meticulously organized spreadsheets, where data is stored in tables comprised of
rows (records) and columns (attributes). Each row represents a distinct entity, like a customer or a
product, while each column captures a specific characteristic, such as a name, address, or price.
The real power of relational databases lies in their ability to link these tables together using
relationships. These relationships, established through foreign keys, allow us to connect data from
different tables, creating a unified view of information.
For example, in a customer relationship management (CRM) system, a customer table might be
linked to an orders table, enabling us to track a customer's purchase history.

Structured query language (SQL)


To interact with relational databases, we use Structured Query Language (SQL). This powerful
language enables us to query, insert, update, and delete data, as well as perform complex
operations like joining data from multiple tables. SQL's structured nature ensures data integrity
and consistency through ACID properties:
 Atomicity: All operations within a transaction are treated as a single unit, ensuring that
either all changes are committed or none are.
 Consistency: Data remains in a valid state throughout a transaction, adhering to predefined
constraints and rules.
 Isolation: Transactions are executed independently as if they were the only operation
happening on the database.
 Durability: Once a transaction is committed, its changes are permanent, even in the event
of system failures.
If you want to learn more about SQL, check out this seven-course skill track on SQL
Fundamentals.
When to use relational databases
Relational databases are great when we need:
 Strong consistency: Ensuring all users see the same data simultaneously.
 Complex queries: Joining data from multiple tables to gain insights.
 ACID compliance: Guaranteeing reliable transaction processing for critical applications.
However, they might not be the best fit for:
 Unstructured data: Handling data that doesn't fit neatly into a tabular format (e.g., social
media posts, sensor data).
 Massive scalability: When your application needs to scale horizontally across numerous
servers.
Popular relational databases
Some popular RDBMS options include:
 MySQL: Open-source and known for its ease of use, speed, and reliability, often used in
web applications.
 PostgreSQL: Open-source and highly extensible, offering advanced features and strong
compliance with SQL standards.
 Oracle Database: A comprehensive, enterprise-grade solution known for its performance,
scalability, and security.
 Microsoft SQL Server: Tightly integrated with the Microsoft ecosystem, offering a wide
range of tools for business intelligence and analytics.
If you want to learn how to use relational databases in Python, check out this free course
on Introduction to Databases in Python.

NoSQL Databases
NoSQL databases, short for "not only SQL," have emerged as a powerful alternative to relational
databases, particularly in scenarios where flexibility, scalability, and high performance are
paramount.
Unlike their relational counterparts, NoSQL databases can handle unstructured or semi-structured
data without the constraints of a fixed schema. This means we can store data in various formats,
such as JSON documents, key-value pairs, or graph structures, without having to define a rigid
structure upfront.
These databases often provide features to scale out across multiple servers and clusters, making
them suitable for distributed data environments.
Querying NoSQL databases
Unlike relational databases, which use Structured Query Language (SQL), NoSQL databases
don't have a universal query language. Instead, each type of NoSQL database typically has its
unique query language or API tailored to its specific data model and structure.
While NoSQL databases prioritize flexibility and scalability, they often relax some of the ACID
properties found in relational databases. For example, some NoSQL databases prioritize eventual
consistency over immediate consistency, meaning that changes might not be reflected across all
nodes instantly. This trade-off allows for better performance and scalability but requires careful
consideration when designing applications that rely on strict data consistency.
If you want to learn how to query NoSQL databases, check out this Introduction to
NoSQL course.
When to use NoSQL databases
NoSQL databases are particularly well-suited for scenarios where:
 Agility is key: Rapid development cycles and evolving data models.
 Scale is a priority: Applications with exponential data growth or high traffic.
 Performance matters: Real-time applications requiring fast read/write operations.
 Variety is the norm: Diverse data types (e.g., social media posts, sensor data).
Common use cases include:
 Big data analytics: Processing massive datasets.
 Real-time applications: Delivering up-to-the-minute information.
 Content management systems: Storing and managing diverse content.
 Internet of Things (IoT): Handling continuous data streams.
 Personalization engines: Tailoring user experience.
While offering significant advantages, NoSQL databases might not be ideal for applications
requiring strong transactional guarantees or complex relational queries. Many organizations adopt
a hybrid approach, using both relational and NoSQL databases to leverage their respective
strengths.
Popular NoSQL databases
Some of the most popular NoSQL databases include:
 MongoDB: A document-oriented database that is great for storing JSON-like documents
with dynamic schemas.
 Redis: A key-value store often used for caching and as a fast in-memory datastore.
 Cassandra: A column-family store known for its scalability and fault tolerance.
 Neo4j: A graph database that excels in managing and querying highly connected data.
If you want to learn more about the four major NoSQL databases, check out this course
on NoSQL Concepts.

Cloud Databases
Cloud databases have revolutionized data management by leveraging the vast resources and
scalability of cloud computing platforms. These databases reside on remote servers and are
accessed over the internet, eliminating the need for organizations to invest in and maintain their
own hardware and infrastructure.
Cloud databases operate on a pay-as-you-go model, where we only pay for the resources we
actually use. This eliminates the upfront costs and ongoing maintenance expenses associated with
traditional on-premises databases. Cloud providers handle the underlying infrastructure, including
servers, storage, and networking, while you focus on building and managing your applications.
If you want to learn about cloud computing, check out this course on Understanding Cloud
Computing.
Querying cloud databases
Querying cloud databases typically involves using the same tools and languages we’d use with
on-premises databases. For relational databases in the cloud, we'd use SQL to interact with the
data. NoSQL databases in the cloud typically have their own query languages or APIs, similar to
their on-premises counterparts.
Cloud providers often offer additional tools and services to simplify database management and
querying. These might include web-based consoles, command-line interfaces, and SDKs for
various programming languages.
When to use cloud databases
Cloud databases are an excellent choice when:
 Scalability is crucial: Easily adapt to changing demands.
 Flexibility is a priority: Wide range of database options available.
 Global accessibility is important: Low-latency access for users worldwide.
 Cost-effectiveness is a concern: Pay-as-you-go model and scalable resources.
Popular cloud databases
The leading cloud providers offer a range of database services, each with its own strengths and
specialties:
 Amazon RDS: Supports multiple database engines like MySQL, PostgreSQL, and Oracle,
offering managed relational database services.
 Google Cloud SQL: A fully-managed service that allows running MySQL, PostgreSQL,
and SQL Server databases in the cloud.
 Azure SQL Database: Provides scalable, intelligent, and fully-managed database services
in the Microsoft Azure cloud.
You can learn more about cloud databases in this course on AWS Cloud Technology and Services.

Vector Database
Vector databases have emerged as a specialized tool for handling the unique demands of artificial
intelligence and machine learning applications.
Vector databases are designed to store, index, and manage vector embeddings, which are high-
dimensional data representations often used in machine learning models. This enables efficient
similarity search, where the database can quickly identify vectors that are "close" to a given query
vector based on distance metrics like cosine similarity or Euclidean distance.
These features make them suitable for applications like image recognition, recommendation
systems, and natural language processing. They utilize indexing structures that optimize the
retrieval of similar vectors based on distance metrics.
If you want to learn more about vector databases, you can read this article: An Introduction to
Vector Databases for Machine Learning.
Querying vector databases
Querying a vector database typically involves the following steps:
1. Embedding the query: The input query (e.g., an image, a piece of text) is converted into a
vector embedding using an appropriate embedding model.
2. Similarity search: The vector database performs a similarity search to find the nearest
neighbors of the query embedding in the vector space. This is often done using
approximate nearest neighbor (ANN) algorithms to ensure efficiency at scale.
3. Returning results: The database returns the identified nearest neighbors along with their
associated metadata or original data objects.
Different vector databases may offer various query options and parameters, such as specifying the
number of nearest neighbors to return or setting a distance threshold. Some databases also support
filtering based on metadata or combining vector search with traditional scalar filtering.
When to use vector databases
Vector databases are particularly well-suited for scenarios where:
 Similarity search is critical: Applications like image recognition or recommendation
systems.
 High-dimensional data is involved: Inefficient for traditional databases.
 Real-time performance is required: AI applications like recommender systems.
Popular vector databases
 Faiss: Developed by Facebook AI Research, it provides efficient similarity search and
clustering of dense vectors.
 Milvus: An open-source vector database that supports scalable similarity search and AI
applications.
 Pinecone: A vector database service that simplifies the deployment and scaling of
similarity search in production environments.
If you want to learn more about what the popular databases are, check out this article on Top 5
Best Vector Databases.
Other Types of Databases
While relational, NoSQL, cloud, and vector databases cover a wide range of use cases, several
other database types exist, each tailored to specific data models and access patterns. Let's briefly
explore some of these specialized solutions.
Time-series databases
Time-series databases are optimized for storing and analyzing time-stamped data, such as sensor
readings, stock prices, or server logs. They excel at handling high-volume data ingestion and
efficiently querying data points based on time ranges. Popular options include InfluxDB,
TimescaleDB, and Prometheus.

Object-oriented databases
Object-oriented databases (OODBs) store data as objects, similar to object-oriented programming.
This can simplify modeling complex data structures and relationships. However, OODBs have not
gained widespread adoption due to challenges with standardization and query optimization.
Popular options include ObjectDB and Versant Object Database.

Graph databases
Graph databases excel at representing and querying relationships between entities. They store data
as nodes (entities) and edges (relationships), making them well-suited for social networks,
recommendation engines, fraud detection systems, and knowledge graphs. Popular options
include Neo4j, Amazon Neptune, and JanusGraph.

Hierarchical databases
Hierarchical databases organize data in a tree-like structure, with parent-child relationships
between records. This structure is suitable for some specialized applications but can be inflexible
for complex data models. While historically significant, hierarchical databases are less common in
modern applications.

Network databases
Network databases are similar to hierarchical databases but allow for more complex relationships
between records. While they offer flexibility, they can also be more challenging to manage and
query. Network databases have largely been replaced by relational and graph databases in most
applications.

Conclusion
In this overview, we've explored the diverse landscape of databases, each type tailored to address
specific data challenges. From structured data in relational databases to the flexibility of NoSQL,
the scalability of cloud solutions, and the specialized capabilities of vector databases, we've seen
how these tools underpin modern data management.
Choosing the right database is a critical decision, one that depends on understanding the unique
strengths and tradeoffs of each type. By carefully evaluating your specific needs and constraints,
you can select the database that best empowers your data-driven applications and initiatives.

12 Codd's Rules
Every database has tables, and constraints cannot be referred to as a rational database system. And
if any database has only relational data model, it cannot be a Relational Database System
(RDBMS). So, some rules define a database to be the correct RDBMS. These rules were
developed by Dr. Edgar F. Codd (E.F. Codd) in 1985, who has vast research knowledge on the
Relational Model of database Systems. Codd presents his 13 rules for a database to test the
concept of DBMS against his relational model, and if a database follows the rule, it is called
a true relational database (RDBMS). These 13 rules are popular in RDBMS, known as Codd's
12 rules.
Rule 0: The Foundation Rule
The database must be in relational form. So that the system can handle the database through its
relational capabilities.
Rule 1: Information Rule
A database contains various information, and this information must be stored in each cell of a
table in the form of rows and columns.
Rule 2: Guaranteed Access Rule
Every single or precise data (atomic value) may be accessed logically from a relational database
using the combination of primary key value, table name, and column name.
Rule 3: Systematic Treatment of Null Values
This rule defines the systematic treatment of Null values in database records. The null value has
various meanings in the database, like missing the data, no value in a cell, inappropriate
information, unknown data and the primary key should not be null.
Rule 4: Active/Dynamic Online Catalog based on the relational model
It represents the entire logical structure of the descriptive database that must be stored online and
is known as a database dictionary. It authorizes users to access the database and implement a
similar query language to access the database.
Rule 5: Comprehensive Data SubLanguage Rule
The relational database supports various languages, and if we want to access the database, the
language must be the explicit, linear or well-defined syntax, character strings and supports the
comprehensive: data definition, view definition, data manipulation, integrity constraints, and limit
transaction management operations. If the database allows access to the data without any
language, it is considered a violation of the database.
Rule 6: View Updating Rule
All views table can be theoretically updated and must be practically updated by the database
systems.
Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule
A database system should follow high-level relational operations such as insert, update, and delete
in each level or a single row. It also supports union, intersection and minus operation in the
database system.
Rule 8: Physical Data Independence Rule
All stored data in a database or an application must be physically independent to access the
database. Each data should not depend on other data or an application. If data is updated or the
physical structure of the database is changed, it will not show any effect on external applications
that are accessing the data from the database.
Rule 9: Logical Data Independence Rule
It is similar to physical data independence. It means, if any changes occurred to the logical level
(table structures), it should not affect the user's view (application). For example, suppose a table
either split into two tables, or two table joins to create a single table, these changes should not be
impacted on the user view application.
Rule 10: Integrity Independence Rule
A database must maintain integrity independence when inserting data into table's cells using the
SQL query language. All entered values should not be changed or rely on any external factor or
application to maintain integrity. It is also helpful in making the database-independent for each
front-end application.
Rule 11: Distribution Independence Rule
The distribution independence rule represents a database that must work properly, even if it is
stored in different locations and used by different end-users. Suppose a user accesses the database
through an application; in that case, they should not be aware that another user uses particular
data, and the data they always get is only located on one site. The end users can access the
database, and these access data should be independent for every user to perform the SQL queries.
Rule 12: Non-Subversion Rule
The non-submersion rule defines RDBMS as a SQL language to store and manipulate the data in
the database. If a system has a low-level or separate language other than SQL to access the
database system, it should not subvert or bypass integrity to transform data.

You might also like