0% found this document useful (0 votes)

14 views5 pages

A Study On Databases: Columnar

Uploaded by

Uday Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

A Study On Databases: Columnar

Uploaded by

Uday Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072

A Study on Columnar Databases

Adithya Viswanathan1, Devanshu Mishra1, Srividya MS3
1B.E. Department of Computer Science and Engineering, R.V. College of Engineering
1B.E. Department of Computer Science and Engineering, R.V. College of Engineering
2 Assistant Professor, Department of Computer Science and Engineering, R.V. College of Engineering

---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Over the last few decades, the amount of data in an enterprise of an organization is growing exponentially. Along
with that, there is an increase in the number of people who are trying to access and analyze such data and interpret results. More
tools that offer quick and efficient methods for collection, storage, and querying for analysis of such data are the need of the hour.
Column-oriented storage structures have a growing market share in the last few years. This architecture is preferred when there
are usually copious dynamic queries and in high frequency. This survey paper details the architecture of column-store, the
advantages, and limitations of what column-store can offer us, and how it fares when looked up against the traditional row-store.

Key Words: Column-Store, Databases, Row-Store, Relational database management, ERP, Querying

1. INTRODUCTION

One of the largest hurdles faced by enterprises today is to provide an amalgamation of analysis features and essential reporting
within services and applications that they provide on their platform. Adding to that, users need to adequately operate on the
ever-increasing velocity of data without delays caused by re-optimization causing time delays related to evolving query
patterns or data. It becomes necessary to support analytically intensive database workloads, the data mining processes, and
inquisitive ad-hoc query statements, which were not expected during initial phases of planning. With the world rapidly
changing, data is the purest form of currency. It is been created faster than ever, and in more volume than ever. Multiple
problems develop due to this, one being storing such large volumes with variety. But the more important problem is accessing
this data when the time comes efficiently in an eloquent manner for the true purpose it serves.
Databases primarily store data in two forms, row-store databases and column store. For the former, the data is stored in a
sequence where each data the field values of each tuple is stored successively, while in the databases present using column-
store, data exists according to the values of the column. [1]. Both are necessary for they serve different purposes. Row Store is
optimized for writes and quite popular and the norm. Column Store is optimized for reading the data. In this paper, the column
store database approach will be analyzed to see if it does help enterprises in their database requirements.
2. ARCHITECTURE OF COLUMNNAR DATABASES

Different workloads in both systems have explained the conventional business division into online analytical processing (OLAP)
and online transaction processing (OLTP). While online transaction processing (OLTP) computations are characterized by a mix
of writes and reads from a few rows at a time, online analytical processing (OLAP) applications are defined by complex read
operations including joins and large longitudinal scans spanning only a few columns but several database rows. Usually, different
systems handle these two workloads: data warehousing systems or business intelligence (BI) and TPS transaction processing
systems [3].

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6448
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072

Fig -1: Architecture of In-Memory Columnar Database [3]

The above figure 1, [3] is an example of column-store database system architecture. The insert-only method involves change of
code, and updates are designed as inserts and invalidate the changed line without attempting to remove it. Delete simply
invalidates the removed rows, too. The tuple injection is kept in order, and the variant that was last inserted is correct. The
insert-only method combined with multiple variants [10] allows the maintenance of past changes of the tables [6] and ability for
retention of history and its entirety for administrative necessities. In addition, HS tables are often stored in the actual memory
using an attribute along with metadata sets, with each of two divisions: the primary (main) and the delta division. The key
division uses an ordered dictionary, which replaces values into an encoded format. In a write-optimized delta partition, incoming
updates are collected to reduce the computation add-on of keeping the order sorted [7, 8]. Data is collected using a dictionary
which is unsorted in the delta division. Additionally, a tree-based data structure is maintained per column, with all the exclusive
uncondensed delta partition values [5]. Using bit-packing mechanisms the condensation of both division vectors takes places [9].
At one's discretion, the columns can be having an inverted index to allow the rapid recovery of single tuples [4].

A database table has two dimensions - rows and columns, and the intersection between them is represented by cells. The
computer memory in contrast can be represented like a row with many cells, but without columns. This difference brings the
question of how to store two-dimensional data in a single-dimensional form (a linear form). Figure 1 shows the answer - there
are (at least) two ways to achieve this. The most common in the RDBMS world is to store table data row by row - a row-based
data storage. The other way is to store data column-wise - a sequence of all cells of the first column followed by all cells of the
second column etc.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6449
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072

Fig -2: Example for Row Store vs Column Store

Overall performance is affected undoubtedly by type of the datastore. Depending on the type, some database operations'
performance can improve, and others can decrease.

The quick sequential memory search speed of modern systems facilitates a modern enterprise system database architecture that
incorporates different database techniques such as dictionary compression, columnar table layouts, and insert-only approach
and multi-version competition control. Furthermore, the designed architecture of the database enables the reconstruction of
applications, as easy and fast aggregations are possible and remove the overhead of maintenance of complex aggregates at the
middle layer.

3. ADVANTAGES

The main advantage of column-store is ability for queries to become agile and rapid. If we need the experience average of all of
our end-users, we should easily leap into the place where all “experience” values are saved and procure only values needed over
actually checking the experience field for each sequentially. Instead of reading the entire rows like in traditional databases when
only some fields are queried, through the columnar database, we can read only those queried columns skipping the columns we
don’t need, thus increasing the speed of operation and retrieving faster results when the databases are large (billions of records).
The columnar database provides faster data access as only selected columns must be read. There’s a higher rated condensation
whilst performing algorithms for compression on columns, which will speed up the operations in the columnar databases as
most hold merely a lean set of exclusive terms. In columnar databases, by default, the table is vertically portioned, which means
simultaneously various columns can be operated on by assigning a processor core, thus providing better parallel processing. The
columnar store table is more optimized to read the data than a row store table. Columnar databases are more suitable for
analytical applications like SAP HANA. For Online analytical processing for example data warehousing which mostly involves
highly complex queries overall data, columnar databases are a better option. However, data must be written in a columnar
database which requires some pre-processing. On aggressive operations like AVG, COUNT, MAX, and MIN, the columnar
databases provide high performance. It offers exponentially scalable features, rapid loading for variety, velocity and volumes of
data along with high efficiency on condensation and partitioning. Columnar - oriented databases are efficient in accessing hard -
disks. In systems like CRM that is Customer Relationship Management, Ad-hoc query systems, Business Intelligence (BI) column-
oriented databases prove advantageous aiding to capabilities to process vast non-exclusive values through aggregation [1].
Business intelligence tends to concentrate on the analysis of columns – having eagle eyes on an exclusive column, operating on
the values in that attribute, finding distinct values and holding an eye out for location where an update takes place. Those are
some things performed better by a column-store than by a RDBMS using row-store. Some ledgers can contain more than one
hundred thousand of documents, and the former database enables end-users the essential stored values, or abstract by only
providing the needful data points noted for an operation. Columnar databases get results quicker because of the way the data is
structured and allow for more effective data analysis.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6450
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072

4. LIMITATIONS

If performing a row operation, it is very fast in a row store table than a column-store table. Row store tables are more optimized
to write the data than the column store table. Performing operations like update or delete data are costlier and less efficient for
columnar databases are, as multiple locations on the disk, where multiple separate columns are stored, are needed for deleting
or updating a single row of data. Columnar Databases are not Suitable for typical transactional applications unlike row-oriented
databases like RDBMS or NoSQL. As multiple columns are read parallelly at once, there is a disk seek time is required between
each block, thus increasing the disk seek time. But this can be reduced if a large disk is pre-fetched. The column-store
performance depletes and thus the cost of inserts increases due to multiple distinct locations on disk which are updated for each
inserted tuple (one for each attribute), but it can be reduced if inserts are done in bulk [1]. For online transaction processing
workloads columnar databases are costly since INSERT transactions must be separated into columns and compressed while
storing. For online transaction processing, row-oriented databases perform better - like workloads they are more heavily loaded
with interactive transactions. Writing new data in column-oriented data could make more time since each column needs to be
written one by one. Just a single operation is needed to insert a record into a row-oriented database. Therefore, it could cost
much time updating many values or loading new data. This is why, a row-oriented database is more suited for running the
backend of the web application, etc. Columnar database is suitable to be considered like Amazon Redshift or for BI analytics
queries once the app becomes huge. If only a single row needs to be processed at a time in the application, then row-based store
is suitable .It is also preferred in the scenario when the table has a small number of rows and neither aggregation nor fast
searching is required or the application typically needs to access complete records..
5. CONCLUSION

Dealing with the variety, velocity and volume of data has become one of the game changers for ERPs and other applications.
Column-oriented databases along with in-memory features were put forth to bring out a change in the way we visualize how
data must be handled. Different systems have different requirements, but none deny the plethora of advantages column store
databases provide. Every coin has two sides, and so column-store with its limitations. Column-oriented databases have quicker
responses since they interpret only the columns required by users' questions, while row-oriented databases will interpret all
rows and columns in the list. For applications that write and update a lot of data (OLTP systems), a row-oriented approach is
the best solution. On the other hand, OLAP systems primarily deal with extensive dynamic querying on large volumes of data
and hence find it feasible to implement Columnar databases. The market is moving towards these new innovations, in order to
sustain longevity game. Only time will tell if column-store will one day, completely dominate with technological advancement
over the traditional architectures.

ACKNOWLEDGEMENT

We would like to extend our gratitude towards my guide Prof. Srividya M S. for her continued support and guidance on our
work in this project and paper. We would also like to show our gratitude to Department of Computer Science and Engineering,
RV College of Engineering for giving us the opportunity to write this paper with complete support.

REFERENCES

[1] Vibha Shukla, Dr. Rajdev Tiwari, “Column Oriented Database: Implementation and Performance analysis” International
Journal of Science and Research (IJSR), Volume 4 Issue 9, September 2015
[2] Jhonatan W. Durán-Cazar, Eduardo J. Tandazo-Gaona, Mario R. Morales-Morales, Santiago Morales Cardoso, “Performance of
Columnar Database”, INGENIUS N.◦ 22, July- December 2019
[3] Hasso Plattner, Martin Faust, Stephan Müller, David Schwalb, Matthias, Uflacker, Johannes Wust, “The Impact of Columnar
In-Memory Databases on Enterprise Systems”, Proceedings of the VLDB Endowment, 2014.
[4] M. Faust, D. Schwalb, J. Krüger, and H. Plattner. Fast lookups for in-memory column stores: Group-key indices, lookup and
maintenance. In ADMS@VLDB, 2012.
[5] T. Karnagel, R. Dementiev, R. Rajwar, K. Lai, T. Legler, B. Schlegel, and W. Lehner. Improving in-memory database index
performance with intel transactional synchronization extensions. HPCA, 2014.
[6] M. Kaufmann, P. Vagenas, P. M. Fischer,D. Kossmann, and F. Färber. Comprehensive and
interactive temporal query processing with SAP HANA. VLDB, 2013.
[7] J. Krüger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, P. Dubey, H. Plattner, and A. Zeier. Fast updates on read-
optimized databases using multi-core cpus. VLDB, 2011.
[8] M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O’Neil, P. O’Neil, A.
Rasin, N. Tran, and S. Zdonik. C -store: A column-oriented dbms. VLDB, 2005.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6451
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072

[9] T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra-Fast in-Memory Table Scan
Using on-Chip Vector Processing Units. VLDB, 2009.
[10] P. A. Bernstein, V. Hadzilacos, and N. Goodman, “Concurrency control and recovery in database”, systems. Boston, MA, USA,
1986.

Database Design For Mere Mortals
33% (6)
Database Design For Mere Mortals
30 pages
9781119823414
No ratings yet
9781119823414
53 pages
5 - Characteristics of A Columnar Database
No ratings yet
5 - Characteristics of A Columnar Database
3 pages
SQL Server
0% (1)
SQL Server
158 pages
Optimization of Database Management System
No ratings yet
Optimization of Database Management System
3 pages
Storing and Using Objects in A Relational Database
No ratings yet
Storing and Using Objects in A Relational Database
21 pages
(Davoudian Et Al., 2018) A Survey On NoSQL Stores
No ratings yet
(Davoudian Et Al., 2018) A Survey On NoSQL Stores
43 pages
BR Columndb
No ratings yet
BR Columndb
18 pages
Unit 2
No ratings yet
Unit 2
34 pages
A Hybrid Filtering Approach For Storage Optimization - 2015 - Egyptian Informat
No ratings yet
A Hybrid Filtering Approach For Storage Optimization - 2015 - Egyptian Informat
9 pages
RDBMS
No ratings yet
RDBMS
12 pages
Column Vs Row
No ratings yet
Column Vs Row
64 pages
Columnar Objects
No ratings yet
Columnar Objects
14 pages
Artículo Sistemas de BD Activas
No ratings yet
Artículo Sistemas de BD Activas
42 pages
Cca498 - Final - Review - Jiajia
No ratings yet
Cca498 - Final - Review - Jiajia
86 pages
Information 14 00563
No ratings yet
Information 14 00563
24 pages
NoSQL Paper 2
No ratings yet
NoSQL Paper 2
18 pages
Duda
No ratings yet
Duda
13 pages
Advanced Database Indexing
No ratings yet
Advanced Database Indexing
17 pages
Materialization Strategies in A Column-Oriented DBMS
No ratings yet
Materialization Strategies in A Column-Oriented DBMS
15 pages
Lec 10 - Column DB
No ratings yet
Lec 10 - Column DB
34 pages
Column-vs-Row Databases
No ratings yet
Column-vs-Row Databases
12 pages
Cs614-Mid Term Solved MCQs With References by Moaaz PDF
No ratings yet
Cs614-Mid Term Solved MCQs With References by Moaaz PDF
30 pages
Column Vs Row
No ratings yet
Column Vs Row
64 pages
NoSQL Data Models
No ratings yet
NoSQL Data Models
32 pages
Query Execution in Column-Oriented Database Systems: Dna@csail - Mit.edu
No ratings yet
Query Execution in Column-Oriented Database Systems: Dna@csail - Mit.edu
148 pages
Types of Database Managment System
No ratings yet
Types of Database Managment System
9 pages
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
No ratings yet
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
119 pages
Lecture 9 Chapter 5 Part 5 Big Data Storage Concepts
No ratings yet
Lecture 9 Chapter 5 Part 5 Big Data Storage Concepts
15 pages
DBMS
No ratings yet
DBMS
72 pages
Dundas BI Features - A Single Business Intelligence (BI) Platform - Dashboards, Reports, Scorecards, Charts and Analytics - Dundas Data Visualization
No ratings yet
Dundas BI Features - A Single Business Intelligence (BI) Platform - Dashboards, Reports, Scorecards, Charts and Analytics - Dundas Data Visualization
32 pages
Materialization Strategies in A Column-Oriented DBMS
No ratings yet
Materialization Strategies in A Column-Oriented DBMS
12 pages
Nosql Datawarehouse
No ratings yet
Nosql Datawarehouse
11 pages
Cost Center and Internal Order Accounting in SAP S/4HANA
No ratings yet
Cost Center and Internal Order Accounting in SAP S/4HANA
20 pages
Unit 4
No ratings yet
Unit 4
7 pages
Vertica Column-vs-Row
No ratings yet
Vertica Column-vs-Row
64 pages
Teradata Columnar
No ratings yet
Teradata Columnar
10 pages
Cheat Sheet v4
No ratings yet
Cheat Sheet v4
3 pages
Memsql
No ratings yet
Memsql
23 pages
AX2009 Introduction (80020)
No ratings yet
AX2009 Introduction (80020)
322 pages
Column Store Databases
No ratings yet
Column Store Databases
7 pages
DP900 Chapter1 Notes
No ratings yet
DP900 Chapter1 Notes
10 pages
A Course in In-Memory Data Management: Prof. Hasso Plattner
No ratings yet
A Course in In-Memory Data Management: Prof. Hasso Plattner
6 pages
Database Design Assesment
No ratings yet
Database Design Assesment
26 pages
Analytical CRM - 1
100% (1)
Analytical CRM - 1
29 pages
A Course in In-Memory Data Management: Prof. Hasso Plattner
No ratings yet
A Course in In-Memory Data Management: Prof. Hasso Plattner
6 pages
Installation and Configuration
No ratings yet
Installation and Configuration
104 pages
SAP HANA Sizing Simplified Level 2 Quiz PDF
100% (1)
SAP HANA Sizing Simplified Level 2 Quiz PDF
8 pages
BDM Review Session
No ratings yet
BDM Review Session
179 pages
Overview: High Performance Scalable Data Stores
No ratings yet
Overview: High Performance Scalable Data Stores
19 pages
Database Management and Relational Database Management System
No ratings yet
Database Management and Relational Database Management System
11 pages
AX2012 SQL Optimization - All Chapters PDF
No ratings yet
AX2012 SQL Optimization - All Chapters PDF
274 pages
Kabul University: Computer Science Faculty
No ratings yet
Kabul University: Computer Science Faculty
27 pages
Business Inteligence
No ratings yet
Business Inteligence
74 pages
Hyperion Essbase
No ratings yet
Hyperion Essbase
16 pages
Data Warehousing and Mining Complete Notes
No ratings yet
Data Warehousing and Mining Complete Notes
495 pages
BO User Manual V 1.0
No ratings yet
BO User Manual V 1.0
38 pages
Chapter 1: Introduction To Decision Support Systems
No ratings yet
Chapter 1: Introduction To Decision Support Systems
6 pages
Ax2012 Enus WN Dev 08
No ratings yet
Ax2012 Enus WN Dev 08
16 pages
Data Mining Sem
No ratings yet
Data Mining Sem
3 pages
SQ L Interview
No ratings yet
SQ L Interview
19 pages
Syllabus 1
No ratings yet
Syllabus 1
13 pages
ETL Interview V - 07
No ratings yet
ETL Interview V - 07
158 pages
CV Erkan Ciftci 4page 201902
No ratings yet
CV Erkan Ciftci 4page 201902
4 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
50 pages
IBM Cognos TM1 Whitepaper
No ratings yet
IBM Cognos TM1 Whitepaper
7 pages
7th Sem Syllabus
No ratings yet
7th Sem Syllabus
15 pages
Konstantin - Mikhaylov v1
No ratings yet
Konstantin - Mikhaylov v1
9 pages
Subject Question Bank-1
No ratings yet
Subject Question Bank-1
6 pages
Exercise Loans With Solutions
No ratings yet
Exercise Loans With Solutions
9 pages
OLAP Function and Tools. OLAP Servers, ROLAP, MOLAP, HOLAP
No ratings yet
OLAP Function and Tools. OLAP Servers, ROLAP, MOLAP, HOLAP
2 pages
Commerce Bba CA Semester 6 2023 November Recent Trends in Information Technology 2019 Pattern
No ratings yet
Commerce Bba CA Semester 6 2023 November Recent Trends in Information Technology 2019 Pattern
2 pages
Mastering Database Design
From Everand
Mastering Database Design
Ted Noreux
No ratings yet
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
From Everand
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
From Everand
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
Kaushal Mehta
No ratings yet
The Study of Building the Data Warehouse
From Everand
The Study of Building the Data Warehouse
venkateswara Rao
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Basic Concepts in Data Structures
From Everand
Basic Concepts in Data Structures
K.Meenendranath Reddy
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

A Study On Databases: Columnar

Uploaded by

A Study On Databases: Columnar

Uploaded by

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072

A Study on Columnar Databases

Fig -1: Architecture of In-Memory Columnar Database [3]

Fig -2: Example for Row Store vs Column Store

You might also like