Relational Databases Versus HBase
Relational Databases Versus HBase
net/publication/332745421
Article in Advances in Science Technology and Engineering Systems Journal · April 2019
DOI: 10.25046/aj040249
CITATION READS
1 1,623
3 authors:
Ilias Cherti
Université Hassan 1er
25 PUBLICATIONS 59 CITATIONS
SEE PROFILE
All content following this page was uploaded by Inssaf El Guabassi on 02 May 2019.
Workload A: A mixed workload with 50% of reads and CPU Intel® Xeon(R)
50% of writes CPU E5504 @ 2.00GHz × 8
Workload B: A mixed workload with 95% of reads and Memory 16 GB
5% of writes. Hard disk 237 GB SSD
Workload C: A workload of 100% read Operating system Ubuntu 14.04 (64-bit)
Workload D: A mixed workload with 95% of reads and Java version 1.8.0_16
5% of inserts. YCSB version 0.14
CDH version 5.14.1
Workload E: A mixed workload with 95% of scans and
Cloudera HBase version 1.2.0
5% of inserts.
Cloudera Hadoop version 2.6.0
Workload F: Read-modify-write: A mixed workload
MySQL 5.6.26
with 50% of reads and 50% of read-modify-writes
The main focus of this study is to evaluate read and update
3. Experimental strategy
operations since they are the most used operations [24].
Much work on the potential of comparing database Therefore this comparison mainly consists of three workloads
performances by YCSB has been carried out [24]–[26]. namely A and B included in the YCSB project and we create
Abramova et al [24] compare five NoSQL databases (Redis, new workload G proposed by [24] to evaluate the Update Only
Cassandra, HBase, MongoDB, and OrientDB) in terms of their case. Table 2 shows the tested workloads:
capabilities, based on read and update operations. They affirm Table 2 Used workloads
that MongoDB, Redis, and OrientDB are better for reads,
Cassandra and HBase are optimized for updates. Yassien and Workload Operations
Desouky [26] compare MySQL, MongoDB, and HBase by using Workload A 50% of reads and 50% of writes
YCSB for the aim to study the effect of varying the operation Workload C 100% read: Read Only
and thread count with respect to runtime, throughput, and Workload G 100% update : Update Only
latency. The authors state that each database performs at its best The dataset used in this databases benchmarking is generated
in different circumstances. They recommend HBase to use for by YCSB data generator which is a part of YCSB client. The
the applications that require the high update and insert dataset records are composed of 10 fields. Each field is filled by
operations, MySQL for the applications whose perform mostly a random string with 100 bytes which give 1 KB per record. The
reads operations and MongoDB for the applications that require ‘YCSB_KEY’ is the primary key for each row[22]. Table 3
both adequate read and write performance. Matallah et al [25] shows the YCSB dataset structure.
compare MongoDB and HBase in order to evaluate loading and
www.astesj.com 397
Z. Bousalem et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 4, No. 2, 395-401 (2019)
Table 3 YCSB Dataset structure
YCSB_KEY FIELD1 FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
Row 1
Row 2
......
Row N
www.astesj.com 398
Z. Bousalem et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 4, No. 2, 395-401 (2019)
Update latency (less is better): As shown in Figure 8,
like runtime in update latency MySQL exhibits a slight
steady increase in runtime, unlike HBase that shows a
slight decline and increase thereafter. HBase has the
lowest update latency.
5.2.2. Workload C
As illustrated in Figure 9 and Figure 10 HBase exhibits a
slightly decline initially, it shows an alternating increase and
decline thereafter in terms of runtime and read latency, unlike
Figure 12: Workload G Latency
MySQL that shows a steadiness initially, then it exhibited a slight
increase after reaching 100000 records. MySQL has the shortest
5.3. Increasing the number of operations
run time and read latency.
5.3.1. Workload A
5.2.3. Workload G
6. General observations