0% found this document useful (0 votes)
9 views11 pages

Row-Based Storage Vs Column-Based Storage - A Beginner's Guide - by Santosh Beora - Medium

The document compares row-based and column-based storage formats, highlighting their advantages and disadvantages. Row-based storage is efficient for transactional operations but less so for analytical queries, while column-based storage excels in analytical workloads but can be slower for transactional operations. The choice between the two depends on specific use cases, with row-based being ideal for databases and column-based for data warehouses.

Uploaded by

Yến Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views11 pages

Row-Based Storage Vs Column-Based Storage - A Beginner's Guide - by Santosh Beora - Medium

The document compares row-based and column-based storage formats, highlighting their advantages and disadvantages. Row-based storage is efficient for transactional operations but less so for analytical queries, while column-based storage excels in analytical workloads but can be slower for transactional operations. The choice between the two depends on specific use cases, with row-based being ideal for databases and column-based for data warehouses.

Uploaded by

Yến Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

Row-Based Storage vs Column-


Based Storage: A Beginner’s Guide
Santosh Beora · Follow
5 min read · Jun 11, 2024

image credit :GitHub Pages

Introduction
When it comes to storing and managing data, two primary storage
formats are commonly used: row-based storage and column-based
storage. Understanding the differences between these formats can help
you make better decisions when designing databases or data warehouses.

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 1/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

Let’s dive into these storage formats, their advantages and disadvantages,
and real-life examples to make these concepts clear.

What is Row-Based Storage?


Row-based storage (also known as row-oriented storage) organizes data
by rows. Each row stores a complete record, and each record includes all
the fields (or columns) of that record. This format is typical in traditional
relational databases like MySQL and PostgreSQL.

How Row-Based Storage Works?

Imagine a table of customer data:

In row-based storage, each record (or row) is stored together, so when


you read or write a record, you access all the columns of that row
simultaneously.

Advantages of Row-Based Storage:

1. Efficient for Transactional Operations : Row-based storage is optimized


for transaction-oriented operations where you often need to access and
modify entire records, such as inserting new customer details or
updating an order.

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 2/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

2. Simple Data Access : Accessing a complete record is straightforward,


making it easy to handle operations like adding or updating rows.

Disadvantages of Row-Based Storage:

1. Inefficient for Analytical Queries : When performing analytical queries


that only require specific columns, row-based storage can be inefficient
since it reads entire rows, including unnecessary data.

2. Storage Space : Row-based storage might use more space if not


optimized properly, as entire rows are stored together, including any
unused or redundant data fields.

Example File formats: CSV, JSON, AVRO etc.

What is Column-Based Storage?


Column-based storage (also known as columnar storage) organizes data
by columns. Each column is stored separately, allowing the system to
read or write specific columns independently. This format is common in
data warehouses like Google BigQuery and Amazon Redshift.

How Column-Based Storage Works

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 3/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

Using the same customer data example, column-based storage would


store each column separately:

- CustomerID: [1, 2, 3]

- Name: [Alice, Bob, Charlie]

- Age: [30, 25, 35]

- Country: [USA, Canada, UK]

This structure allows the system to access only the relevant columns
needed for a query.

Advantages of Column-Based Storage:

1. Efficient for Analytical Queries : Column-based storage is optimized for


read-heavy operations and analytical queries, where you typically need to
scan and aggregate data across many rows but only a few columns.

2. Data Compression : Columns with similar data types can be highly


compressed, reducing storage costs and improving read performance.

3. Faster Aggregations : Aggregations and calculations are faster since only


the required columns are read, and the data is already organized in a
columnar format.

Disadvantages of Column-Based Storage:

1. Inefficient for Transactional Operations : Writing new records or


updating existing ones can be slower since it involves accessing multiple
https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 4/11
11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

columns scattered across different storage locations.

2. Complexity : Implementing and managing columnar storage can be


more complex compared to row-based storage.

Example File formats: Parquet , ORC etc.

Real-Life Example: Database vs. Data Warehouse


To illustrate the differences, let’s consider a real-life IT project scenario:

Scenario: An E-commerce Company


Transactional Database (Row-Based Storage) :

The company uses a relational database like MySQL to manage its daily
transactions. This database stores data about customers, orders, and
inventory in a row-based format. Each transaction, such as adding a new
order or updating customer details, requires accessing complete records,
making row-based storage ideal.

Example Query : “Add a new order for customer ID 1 with product ID 123.”

Performance : Efficient because the entire order record is accessed and


modified as a single unit.

Data Warehouse (Column-Based Storage) :


For business analytics, the company uses a data warehouse like Google
BigQuery. This warehouse stores historical data for sales analysis,
customer behavior, and inventory trends in a columnar format. Analytical
queries often involve scanning large datasets but only a few columns.

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 5/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

Example Query : “Calculate the total sales for the last quarter by country.”

Performance : Fast because only the relevant columns (sales amount and
country) are read and aggregated.

Comparison:

1. Data Organization :

Row-Based : Data is stored by rows (records). Each row contains all


columns for that record.

Column-Based : Data is stored by columns. Each column contains all


values for that column across different records.

2. Use Case Suitability :

Row-Based : Best for transactional databases where operations involve


complete records.

Column-Based : Best for data warehouses where operations involve


scanning and aggregating large datasets across a few columns.

3. Query Performance :

Open in app Sign up Sign in


Row-Based : Optimized for INSERT, UPDATE, DELETE operations.
Search Write

Column-Based : Optimized for SELECT queries with aggregations and


filters.

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 6/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

4. Storage Efficiency :

Row-Based : Can be less efficient for large-scale read operations and


might require more storage space.

Column-Based : Highly efficient for read-heavy operations and supports


high compression ratios.

Conclusion:

Choosing between row-based and column-based storage depends on


your specific use case. Row-based storage is ideal for transactional
databases with frequent insertions and updates, while column-based
storage excels in analytical workloads involving large-scale data scans
and aggregations. Understanding these differences will help you design
more efficient and effective data storage solutions.

Note
If this article helped you gain some knowledge, please clap and comment.
Don’t forget to follow me on Medium and on LinkedIn. Your support
helps me create more content like this and keeps us connected in the
data engineering community. Thank you!

Storage Format Row Vs Columnar Storage Data Engineering

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 7/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

Written by Santosh Beora Follow

81 Followers

A GCP Data Engineer sharing cutting-edge data insights.For the latest in data
engineering! Follow me on LinkedIn : https://www.linkedin.com/in/santosh-beora/

More from Santosh Beora

Santosh Beora Santosh Beora

Dimension and Fact Tables Loading Data from GCS to


Dimension Table: BigQuery : A Comprehensive…
Loading data from Google Cloud Storage
(GCS) to BigQuery is a common task for data…

Nov 30, 2023 9 Jul 31

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 8/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

Santosh Beora Santosh Beora

Understanding SparkSession and Dealing with Nulls in Apache


SparkContext in PySpark Spark: Strategies and Examples
Introduction: Welcome to another insightful post on data
processing with Apache Spark! Null values a…

Aug 20 3 Aug 2, 2023 14 1

See all from Santosh Beora

Recommended from Medium

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Can … 9/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

Pritam Deb in Towards Data Engineering Nidhi Jain 👩‍💻 in Code Like A Girl

Mastering SQL Self Joins: Common 7 Productivity Hacks I Stole From a


Interview Questions and Solutio… Principal Software Engineer
In SQL, a self-join is a powerful technique that Golden tips and tricks that can make you
allows you to join a table with itself. This is… unstoppable

Sep 4 19 1 Oct 15 3.8K 69

Lists

Natural Language Processing Staff Picks


1792 stories · 1404 saves 756 stories · 1420 saves

Desiree Peralta in Publishous Archana Goyal

OnlyFans is Finally Dead Data Modeling: Tackling Scenario-


And I’m happy about it. Based Questions
My articles are open to everyone; non-
member readers can read the full article by…

Oct 8 17.9K 359 May 12 166

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Ca… 10/11


11/6/24, 9:28 AM Row-Based Storage vs Column-Based Storage: A Beginner’s Guide | by Santosh Beora | Medium

Mariusz Kujawski Prem Vishnoi(cloudvala) in Towards Dev

Advanced SQL for Data Apache Hive Metastore: The Heart


Professionals of Metadata Management and…
To start working with data, it is important to Apache Hive is an open-source data
learn tools like SQL. Structured Query… warehouse system that is built on top of…

Oct 1 319 4 Sep 30 207 1

See more recommendations

https://medium.com/@santosh_beora/row-based-storage-vs-column-based-storage-a-beginners-guide-6e91dbadb181#:~:text=Row-Based %3A Ca… 11/11

You might also like