0% found this document useful (0 votes)

49 views110 pages

Slide 6 NoSQL Database and HBase Tutorial

Uploaded by

putinphuc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views110 pages

Slide 6 NoSQL Database and HBase Tutorial

Uploaded by

putinphuc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 110

Big Data

NoSQL database and Hbase tutorial

Trong-Hop Do

S3Lab
Smart Software System Laboratory

1
“Without big data, you are blind and deaf
and in the middle of a freeway.”
– Geoffrey Moore

Big Data 2
NoSQL

3
Background
● Relational databases mainstay of business
● Web-based applications caused spikes
● Explosion of social media sites (Facebook, Twitter) with large data needs
● rise of cloud-based solutions such as Amazon S3 (simple storage solution)
● Hooking RDBMS to web-based application becomes trouble

4
Big Data
What is NoSQL?
● This name stands for Not Only SQL
● The term NOSQL was introduced by Carl Strozzi in 1998 to name his file-
based database
● It was again re-introduced by Eric Evans when an event was organized to
discuss open source distributed databases
○ Eric states that “… but the whole point of seeking alternatives is that you need to solve a
problem that relational databases are a bad fit for. …”

5
Big Data
What is NoSQL?
Key features (Advantages)

● non-relational
● don’t require schema
● data are replicated to multiple nodes (so, identical & fault-tolerant)
and can be partitioned:
○ down nodes easily replaced
○ no single point of failure

● horizontal scalable

6
Big Data
What is NoSQL?
Key features (Advantages)

● cheap, easy to implement (open-source)

● massive write performance
● fast key-value access

7
Big Data
What is NoSQL?
Disadvantages

● Don’t fully support relational features

○ no join, group by, order by operations (except within partitions)
○ no referential integrity constraints across partitions

● No declarative query language (e.g., SQL) more programming

● Relaxed ACID (see CAP theorem) fewer guarantees
● No easy integration with other applications that support SQL

8
Big Data
Who is using them?

9
Big Data
3 major papers for NoSQL

● Three major papers were the “seeds” of the NOSQL movement:

○ BigTable (Google)
○ DynamoDB (Amazon)
■ Ring partition and replication
■ Gossip protocol (discovery and error detection)
■ Distributed key-value data stores
■ Eventual consistency
○ CAP Theorem

10
Big Data
The perfect storm

● Large datasets, acceptance of alternatives, and dynamically-typed data

has come together in a “perfect storm”
● Not a backlash against RDBMS
● SQL is a rich query language that cannot be rivaled by the current list of
NOSQL offerings

11
Big Data
CAP Theorem
● Suppose three properties of a distributed system (sharing data)
○ Consistency:
■ Reads and writes are always executed atomically and are strictly consistent
(linearizable). Put differently, all clients have the same view on the data at all times.
○ Availability:
■ Every non-failing node in the system can always accept read and write requests by
clients and will eventually return with a meaningful response, i.e. not with an error
message.
○ Partition-tolerance:
■ system properties (consistency and/or availability) hold even when network failures
prevent some machines from communicating with others. A system can continue to

Big Data operate in the presence of a network partitions 12

CAP Theorem

● Brewer’s CAP Theorem:

○ For any system sharing data, it is “impossible” to guarantee simultaneously all of these
three properties
○ You can have at most two of these three properties for any shared-data system

● Very large systems will “partition” at some point:

○ That leaves either C or A to choose from (traditional DBMS prefers C over A and P )
○ In almost all cases, you would choose A over C (except in specific applications such as
order processing)

13
Big Data
CAP Theorem
Consistency

14
Big Data
CAP Theorem
Consistency

● Have 2 types of consistency:

○ Strong consistency – ACID (Atomicity, Consistency, Isolation, Durability)
○ Weak consistency – BASE (Basically Available Soft-state Eventual consistency)

15
Big Data
CAP Theorem
Consistency
● A consistency model determines rules for visibility and apparent order of
updates
● Example:
○ Row X is replicated on nodes M and N
○ Client A writes row X to node N
○ Some period of time t elapses
○ Client B reads row X from node M
○ Does client B see the write from client A?
○ Consistency is a continuum with tradeoffs
○ For NOSQL, the answer would be: “maybe”
○ CAP theorem states: “strong consistency can't be achieved at the same time as
availability and partition-tolerance”
16
Big Data
NoSQL

● “No-schema” is a common characteristics of most NOSQL storage systems

● Provide “flexible” data types
● Other or additional query languages than SQL
● Distributed – horizontal scaling

● Less structured data

● Supports big data

17
Big Data
NoSQL Categories

18
Big Data
NoSQL Categories
Key-value
● Focus on scaling to huge amounts of data
● Designed to handle massive load
● Based on Amazon’s dynamo paper
● Data model: (global) collection of Key-value pairs
● Dynamo ring partitioning and replication
● Example: (DynamoDB)
○ items having one or more attributes (name, value)
○ An attribute can be single-valued or multivalued like set.
○ items are combined into a table

19
Big Data
NoSQL Categories
Key-value
● Basic API access:
○ get(key): extract the value given a key
○ put(key, value): create or update the value given its key
○ delete(key): remove the key and its associated value
○ execute(key, operation, parameters): invoke an operation to the value (given its key)
which is a special data structure (e.g. List, Set, Map .... etc)

20
Big Data
NoSQL Categories
Key-value
● Pros:
○ very fast
○ very scalable (horizontally distributed to nodes based on key)
○ simple data model
○ eventual consistency
○ fault-tolerance
● Cons:
○ Can’t model more complex data structure such as objects

21
Big Data
NoSQL Categories
Key-value
Name Producer Data model Querying

SimpleDB Amazon set of couples (key, {attribute}), where restricted SQL; select, delete,
attribute is a couple (name, value) GetAttributes, and PutAttributes
operations

Redis Salvatore set of couples (key, value), where value primitive operations for each value
Sanfilippo is simple typed value, list, ordered type
(according to ranking) or unordered set,
hash value

Dynamo Amazon like SimpleDB simple get operation and put in a

context

Voldemort LinkedIn like SimpleDB similar to Dynamo

22
Big Data
NoSQL Categories
Key-value

23
Big Data
NoSQL Categories
Column-based

● Based on Google’s BigTable paper

● Like column oriented relational databases (store data in column order) but
with a twist
● Tables similarly to RDBMS, but handle semi-structured
● Data model:
○ Collection of Column Families
○ Column family = (key, value) where value = set of related columns (standard, super)
○ indexed by row key, column key and timestamp

24
Big Data
NoSQL Categories
Column-based

25
Big Data
NoSQL Categories
Column-based: Keyspace ~ Schema, Column Family ~ Table

26
Big Data
NoSQL Categories
Column-based: Row structure

27
Big Data
NoSQL Categories
Column-based

● One column family can have variable numbers of columns

● Cells within a column family are sorted “physically”
● Very sparse, most cells have null values
● Comparison: RDBMS vs column-based NOSQL
○ Query on multiple tables
■ RDBMS: must fetch data from several places on disk and glue together
■ Column-based NOSQL: only fetch column families of those columns that are required
by a query (all columns in a column family are stored together on the disk, so multiple
rows can be retrieved in one read operation data locality)
28
Big Data
NoSQL Categories
Column-based

29
Big Data
NoSQL Categories
Column-based
● Example: (Cassandra column family--timestamps removed for simplicity)

UserProfile = {
Cassandra = {
emailAddress:”[email protected]” ,
age:”20”
}
TerryCho = {
emailAddress:”[email protected]” ,
gender:”male”
}
Cath = {
emailAddress:”[email protected]”,
age:”20”,gender:”female”,address:”Seoul”
}
Big Data } 30
NoSQL Categories
Column-based
Name Producer Data model Querying

BigTable Google set of couples (key, {value}) selection (by combination of row, column, and time stamp ranges)

HBase Apache groups of columns (a BigTable clone) JRUBY IRB-based shell (similar to SQL)

Hypertable Hypertable like BigTable HQL (Hypertext Query Language)

CASSANDRA Apache columns, groups of columns simple selections on key, range queries, column or columns
(originally corresponding to a key ranges
Facebook) (supercolumns)

PNUTS Yahoo (hashed or ordered) tables, typed selection and projection from a single table (retrieve an arbitrary
arrays, flexible schema single record by primary key, range queries, complex predicates,
ordering, top-k)

31
Big Data
NoSQL Categories
Document-based

● Can model more complex objects

● Inspired by Lotus Notes
● Data model: collection of documents
● Document: JSON (JavaScript Object Notation is a data model, key-value
pairs, which supports objects, records, structs, lists, array, maps, dates,
Boolean with nesting), XML, other semi-structured formats.

32
Big Data
NoSQL Categories
Document-based

● Example: (MongoDB) document

{
Name:"Jaroslav",
Address:"Malostranske nám. 25, 118 00 Praha 1”,
Grandchildren: {Claire: "7", Barbara: "6", "Magda: "3", "Kirsten: "1", "Otis: "3", Richard: "1“}
Phones: [ “123-456-7890”, “234-567-8963” ]
}

33
Big Data
NoSQL Categories
Document-based

34
Big Data
NoSQL Categories
Document-based

35
Big Data
NoSQL Categories
Document-based
Name Producer Data model Querying

MongoDB 10gen object-structured documents manipulations with objects in

stored in collections; collections (find object or
each object has a primary key objects via simple selections
called ObjectId and logical expressions,
delete, update,)

Couchbase Couchbase1 document as a list of named by key and key range, views
(structured) items (JSON via Javascript and
document) MapReduce

36
Big Data
NoSQL Categories
Graph-based

● Focus on modeling the structure of data (interconnectivity)

● A graph is composed of two elements: a node and a relationship.
● Scales to the complexity of data
● Inspired by mathematical Graph Theory (G=(E,V))
● Data model:
○ (Property Graph) nodes and edges
■ Nodes may have properties (including ID)
■ Edges may have labels or roles
○ Key-value pairs on both 37
Big Data
NoSQL Categories
Graph-based
● Interfaces and query languages vary
● Single-step vs path expressions vs full recursion
● Example:
○ Neo4j, FlockDB, Pregel, InfoGrid …

38
Big Data
NoSQL Categories
Graph-based

39
Big Data
NoSQL Categories
Graph-based

40
Big Data
NoSQL Categories
Comparison

41
Big Data
Conclusion
● NOSQL database cover only a part of data-intensive cloud applications
(mainly Web applications)
● Problems with cloud computing:
○ SaaS (Software as a Service or on-demand software) applications require enterprise-
level functionality, including ACID transactions, security, and other features associated
with commercial RDBMS technology, i.e. NOSQL should not be the only option in the
cloud
○ Hybrid solutions:
■ Voldemort with MySQL as one of storage backend
■ deal with NOSQL data as semi-structured data
Big Data -> integrating RDBMS and NOSQL via SQL/XML 42
Conclusion
● next generation of highly scalable and elastic RDBMS: NewSQL
databases (from April 2011)
○ they are designed to scale out horizontally on shared nothing machines,
○ still provide ACID guarantees,
○ applications interact with the database primarily using SQL,
○ the system employs a lock-free concurrency control scheme to avoid user shut down,
○ the system provides higher performance than available from the traditional systems.

● Examples: MySQL Cluster (most mature solution), VoltDB, Clustrix,

ScalArc, etc.

43
Big Data
Hadoop Ecosystem

44
Big Data
HBase tutorial

45
Hbase tutorial
What is HBase?
• HBase is a distributed column-oriented database built on top of the Hadoop file system.
• HBase is a data model that is similar to Google’s big table designed to provide quick random
access to huge amounts of structured data. It leverages the fault tolerance provided by the
Hadoop File System (HDFS).
• It is a part of the Hadoop ecosystem that provides random real-time read/write access to
data in the Hadoop File System.
• One can store the data in HDFS either directly or through HBase. Data consumer
reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop
File System and provides read and write access

46
Big Data
Hbase tutorial
What is HBase?

47
Big Data
Hbase tutorial
HDFS vs HBase

48
Big Data
Hbase tutorial
What is HBase?
• HBase is a column-oriented database and the tables in it are sorted by row. The table
schema defines only column families, which are the key value pairs.

• Table is a collection of rows.

• Row is a collection of column families.
• Column family is a collection of columns.
• Column is a collection of key value pairs.

49
Big Data
Hbase tutorial
Column Oriented and Row Oriented

50
Big Data
Hbase tutorial
HBase and RDBMS

51
Big Data
Hbase tutorial
Features of HBase
• HBase is linearly scalable.
• It has automatic failure support.
• It provides consistent read and writes.
• It integrates with Hadoop, both as a source and a destination.
• It has easy java API for client.
• It provides data replication across clusters.

52
Big Data
Hbase tutorial
Where to Use HBase
• Apache HBase is used to have random, real-time read/write access to Big Data.

• It hosts very large tables on top of clusters of commodity hardware.

• Apache HBase is a non-relational database modeled after Google's Bigtable. Bigtable acts
up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS.

53
Big Data
Hbase tutorial

54
Hbase tutorial

● Accessing HBase by using the HBase Shell

● Command: hbase shell

55
Big Data
Hbase tutorial

• Check the shell functioning before proceeding further. Use the list
command for this purpose. List is a command used to get the list of all
the tables in HBase.

56
Big Data
Hbase tutorial

• Command: status
This command returns the status of the system including the details of the servers
running on the system. Its syntax is as follows:

• Command: table_help
This command guides you what and how to use table-referenced commands.
Given below is the syntax to use this command.
57
Big Data
Hbase tutorial
Creating a Table using HBase Shell
Command: create ‘<table name>’,’<column family>’

verify whether the table is created using the list command

58
Big Data
Hbase tutorial
Creating a Table using HBase Shell
Check the table

59
Big Data
Hbase tutorial
Creating a Table Using java API

● Create Java Project

61
Big Data
Hbase tutorial
Creating a Table Using java API

● Add External JARs

62
Big Data
Hbase tutorial
Creating a Table Using java API

● Add all .jar files in /usr/lib/hbase

63
Big Data
Hbase tutorial
Creating a Table Using java API

● Add all .jar files in /usr/lib/hbase/lib

64
Big Data
Hbase tutorial
Creating a Table Using java API

● Add all .jar files in /usr/lib/hadoop

65
Big Data
Hbase tutorial

● Add all .jar files in /usr/lib/hadoop/clident

66
Big Data
Hbase tutorial
Creating a Table Using java API

● Create new .java file and run it

67
Big Data
Hbase tutorial

● The console output should be like this

68
Big Data
Hbase tutorial
Creating a Table Using java API

● Check if the table employee has been created

69
Big Data
Hbase tutorial
Creating a Table Using java API

70
Hbase tutorial
Listing Tables Using Java API

● Create new Java file in the same project

71
Big Data
Hbase tutorial
Listing Tables Using Java API

● Paste the code and execute it

72
Big Data
Hbase tutorial
Listing Tables Using Java API

● The console output should be like this

73
Big Data
Hbase tutorial
Writing Data to HBase

education

Le hanoi MBA
Phạm tp hcm bachelor

Tran vung tau bachelor

74
Big Data
Hbase tutorial
Writing Data to HBase

● Let us insert the first row values into the emp table as shown below

75
Big Data
Hbase tutorial
Writing Data to HBase

● Check the content of the table

76
Big Data
Hbase tutorial
Writing Data to HBase

● Copy these lines and paste in the Hbase shell (you can type them if you want)

77
Big Data
Hbase tutorial
Writing Data to HBase

● Check the content of the table

78
Big Data
Hbase tutorial
Writing Data to HbaseUsing Java API
Step 1: Instantiate the Configuration Class
Configuration conf = HbaseConfiguration.create();
Step 2:Instantiate the HTable Class
HTable hTable = new HTable(conf, tableName);
Step 3: Instantiate the PutClass
This class requires the row name you want to
Put p = new Put(Bytes.toBytes("row id")); insert the data into, in string format.
Step 4: Insert Data
p.add(Bytes.toBytes("coloumn family "), Bytes.toBytes("column name"),Bytes.toBytes("value"));
Step 5: Save the Data in Table
hTable.put(p);
Step 6: Close the HTable Instance
hTable.close();

79
Big Data
Hbase tutorial
Writing Data to HbaseUsing Java API

● Create new .java file

80
Big Data
Hbase tutorial
Writing Data to HbaseUsing Java API

● Paste the code, save, and run it

81
Big Data
Hbase tutorial

82
Big Data
Hbase tutorial

● Check the result

83
Big Data
Hbase tutorial

● Edit the code

Put p = new Put(Bytes.toBytes(“row4")); Put p = new Put(Bytes.toBytes("4"));

● Then run the code

84
Big Data
Hbase tutorial
Writing Data to HbaseUsing Java API

● Scan the table to see the result.

● In the table, “4” and “row4” are different rows

85
Big Data
Hbase tutorial
Reading Data using HBase Shell

● Command: get ‘<table name>’ , ‘<row id>’

86
Big Data
Hbase tutorial
Reading a Specific Column using HBase Shell
● Command: get 'table name', ‘row id’, {COLUMN ⇒ ‘column family:column name ’}

87
Big Data
Hbase tutorial
Updating Data using HBase Shell

● Command: put ‘table name’,’row id’,'Column family:column name',’new value’

88
Big Data
Hbase tutorial
Deleting a Specific Cell in a Table

● Command: delete ‘<table name>’, ‘<row id>’, ‘<column name >’, ‘<time stamp>’

89
Big Data
Hbase tutorial
Deleting all the cells in a row

● Command: deleteall ‘<table name>’, ‘<row id>’

90
Big Data
Hbase tutorial
Deleting a Column Family

● Command: alter ‘ <table name> ’, ‘delete’ ⇒ ‘ <column family> ’

91
Big Data
Hbase tutorial
VERSION
● When you put data into HBase, a timestamp is required.

● The timestamp can be generated automatically by the RegionServer or can be supplied by

you.
● The timestamp must be unique per version of a given cell, because the timestamp identifies
the version.
● To modify a previous version of a cell, for instance, you would issue a Put with a different
value for the data itself, but the same timestamp.

Command: put ‘table name’,’row id’,'Column family:column name',’new value’, timestamp

92
Big Data
Hbase tutorial
VERSION
● Doing a put always creates a new version of a cell, at a certain timestamp.

● Default update

93
Big Data
Hbase tutorial
Change the maximum number of versions
● Get 2 versions of that cell

● We receive only the latest version of that cell (which is ‘CEO’)

● The reason is because the maximum number of versions defaults to 1
● Use alter command to change the maximum number of versions of that family column

94
Big Data
Hbase tutorial
Get all versions of a cell

● Let’s try again to get 2 versions of that cell

95
Big Data
Hbase tutorial
Update a specific version
Let’s get all versions of the cell

Command: put ‘table name’,’row id’,'Column family:column name',’new value’, timestamp

96
Big Data
Hbase tutorial
Load CSV file from HDFS to HBase

● Create a csv file (e.g. using gedit)

● Put the file to HDFS

97
Big Data
Hbase tutorial
Load CSV file from HDFS to HBase
● Navigate to HBase directory

● Command: hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',’ -Dimporttsv.columns= …

Important: no space before and after ‘,’ when listing columns

won’t run
98
Big Data
Hbase tutorial
Load CSV file from HDFS to HBase

● Scan the table in hbase shell

99
Big Data
Hbase tutorial
Load data from Hive to HBase

● Check the Hive table student

100
Big Data
Hbase tutorial
Create HBase-Hive Mapping table

• Create another hive table which actually points to an HBase table

• hbase_student is the name of Hive table
• studen_hbase is the name of Hbase table linked to the Hive table above

• Command: create table hbase_student (id int,name string,course string,age int) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key,personal:name,personal:course,additional:age") TBLPROPERTIES
("hbase.table.name" = "studen_hbase");

101
Big Data
Hbase tutorial
Load data from Hive to HBase

● Check if the new table has been created in Hbase, then check its schema

102
Big Data
Hbase tutorial
Load data from Hive to HBase

● Migrate hive table data to HBase

103
Big Data
Hbase tutorial
Load data from Hive to HBase

● Check the table in Hive

104
Big Data
Hbase tutorial
Load CSV file from HDFS to HBase

● Check the Hbase table

105
Big Data
Hbase tutorial
Dropping a Table using HBase Shell
● Using the drop command, you can delete a table. Before dropping a table, you have to
disable it.

106
Big Data
Hbase tutorial
Hfile stored in HDFS

● Open HUE and use File Browsers

107
Big Data
Hbase tutorial
Hfile stored in HDFS

● Navigate to /hbase/data/default

108
Big Data
Hbase tutorial
Hfile stored in HDFS

109
Big Data
Hbase tutorial
Hfile stored in HDFS

Important note: different column families are stored separatedly. When you query a row, the region server will
have to grap data in multiple places (which will slow down your system).
110
Big Data
Hbase tutorial
Hfile stored in HDFS

● Check the content of the Hfile (shown in binary and text format)

Big Data ther 111

Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
No ratings yet
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
17 pages
BIG DATA UNIT 3
No ratings yet
BIG DATA UNIT 3
374 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
30 pages
DBMS 11
No ratings yet
DBMS 11
13 pages
NoSQL Databases and Big Data Storage Systems
No ratings yet
NoSQL Databases and Big Data Storage Systems
4 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
big_data_topic4_[nosql_database]_[thanh_binh_nguyen].TextMark
No ratings yet
big_data_topic4_[nosql_database]_[thanh_binh_nguyen].TextMark
53 pages
NoSQL & Virtualization
No ratings yet
NoSQL & Virtualization
85 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
03 Unit Bda Hadoop,Map Reduce
No ratings yet
03 Unit Bda Hadoop,Map Reduce
80 pages
Module_1
No ratings yet
Module_1
69 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Big Data Topic4 (Nosql Database) (Thanh Binh Nguyen) .TextMark
No ratings yet
Big Data Topic4 (Nosql Database) (Thanh Binh Nguyen) .TextMark
52 pages
11-NoSQL_Nhom8
No ratings yet
11-NoSQL_Nhom8
72 pages
BDA_(2)_merged[1]
No ratings yet
BDA_(2)_merged[1]
29 pages
Nosql Prepared
No ratings yet
Nosql Prepared
60 pages
NoSQL Database
No ratings yet
NoSQL Database
64 pages
IntroNoSQL (3)
No ratings yet
IntroNoSQL (3)
44 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
Chapter 5c
No ratings yet
Chapter 5c
18 pages
No SQL
No ratings yet
No SQL
109 pages
2 BDA A6515 Hadoop
No ratings yet
2 BDA A6515 Hadoop
55 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
CC - Lecture 6-Data
No ratings yet
CC - Lecture 6-Data
44 pages
CT113H Lecture 1_ Introduction to NoSQL
No ratings yet
CT113H Lecture 1_ Introduction to NoSQL
51 pages
nosql
No ratings yet
nosql
64 pages
SPF-20A_PartsBook_UB701057-21
No ratings yet
SPF-20A_PartsBook_UB701057-21
151 pages
UNIT 2
No ratings yet
UNIT 2
8 pages
chap 4
No ratings yet
chap 4
18 pages
BigData_NoSQL
No ratings yet
BigData_NoSQL
30 pages
DBMS - Unit 6 (Advances in Databases)
No ratings yet
DBMS - Unit 6 (Advances in Databases)
19 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Unit 2
No ratings yet
Unit 2
65 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Seminar Topic Nosql
No ratings yet
Seminar Topic Nosql
73 pages
1.5 Module-1
No ratings yet
1.5 Module-1
21 pages
PPT 2.2.1
No ratings yet
PPT 2.2.1
26 pages
BDA.Unit-2
No ratings yet
BDA.Unit-2
30 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
PowerBI Dashboard Training Manual
100% (1)
PowerBI Dashboard Training Manual
28 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Unit 2
No ratings yet
Unit 2
26 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
Data Engineering Fundamentals
No ratings yet
Data Engineering Fundamentals
29 pages
Dhis2 End User Manual V 2.30
100% (2)
Dhis2 End User Manual V 2.30
109 pages
Absolute and Relative Paths
No ratings yet
Absolute and Relative Paths
3 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
AWS MINI Project
No ratings yet
AWS MINI Project
63 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
Unit-4 Tree Notes
No ratings yet
Unit-4 Tree Notes
81 pages
Node JS (1) Final
No ratings yet
Node JS (1) Final
26 pages
NetBackup102 AdminGuide OpenStack
No ratings yet
NetBackup102 AdminGuide OpenStack
200 pages
Data analytics Last three Year UT QPaper
No ratings yet
Data analytics Last three Year UT QPaper
6 pages
Unit3 BD
100% (1)
Unit3 BD
104 pages
Microsoft Excel Tips and Tricks
No ratings yet
Microsoft Excel Tips and Tricks
16 pages
Yii2-1 Ebook
No ratings yet
Yii2-1 Ebook
82 pages
07 Explain Explain
No ratings yet
07 Explain Explain
43 pages
Ucc & BM of Osmania University (MBA)
No ratings yet
Ucc & BM of Osmania University (MBA)
22 pages
Distribution Monitor (Version 1.8.1) Users' Guide
No ratings yet
Distribution Monitor (Version 1.8.1) Users' Guide
46 pages
How To Create A PHP - MySQL Powered Forum From Scratch - Nettuts+
No ratings yet
How To Create A PHP - MySQL Powered Forum From Scratch - Nettuts+
27 pages
PL/SQL Developer Version Control Plug-In 1.2 User's Guide
No ratings yet
PL/SQL Developer Version Control Plug-In 1.2 User's Guide
10 pages
7 Oracle Trobulshoting Questions Answers 1
No ratings yet
7 Oracle Trobulshoting Questions Answers 1
7 pages
Audit Trail Config
No ratings yet
Audit Trail Config
3 pages
Cursor and Triggers
No ratings yet
Cursor and Triggers
6 pages
PHR Cqi WKS
No ratings yet
PHR Cqi WKS
54 pages
Implementation Support For DVM One Pager Q3 2021
No ratings yet
Implementation Support For DVM One Pager Q3 2021
1 page
Scrollable and Non-Scrollable Cursors
No ratings yet
Scrollable and Non-Scrollable Cursors
12 pages
HSSLiVE XII CS Chapter - 3 Data Structures Joy PDF
No ratings yet
HSSLiVE XII CS Chapter - 3 Data Structures Joy PDF
4 pages
Production DB Block Corruption
No ratings yet
Production DB Block Corruption
4 pages
Informatics Practices Practical Class-XII
No ratings yet
Informatics Practices Practical Class-XII
25 pages
Unit 5 SQL Injection
No ratings yet
Unit 5 SQL Injection
4 pages
ABAP Workbench Foundations
No ratings yet
ABAP Workbench Foundations
6 pages
1020 Data Profiling
No ratings yet
1020 Data Profiling
3 pages
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
From Everand
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
Robert Johnson
No ratings yet
DBA's Guide to NoSQL
From Everand
DBA's Guide to NoSQL
The Enlightened DBA
5/5 (1)

Slide 6 NoSQL Database and HBase Tutorial

Uploaded by

Slide 6 NoSQL Database and HBase Tutorial

Uploaded by

Big Data

NoSQL database and Hbase tutorial

● cheap, easy to implement (open-source)

● Don’t fully support relational features

● No declarative query language (e.g., SQL) more programming

● Three major papers were the “seeds” of the NOSQL movement:

● Large datasets, acceptance of alternatives, and dynamically-typed data

Big Data operate in the presence of a network partitions 12

● Brewer’s CAP Theorem:

● Very large systems will “partition” at some point:

● Have 2 types of consistency:

● “No-schema” is a common characteristics of most NOSQL storage systems

● Less structured data

Dynamo Amazon like SimpleDB simple get operation and put in a

Voldemort LinkedIn like SimpleDB similar to Dynamo

● Based on Google’s BigTable paper

● One column family can have variable numbers of columns

Hypertable Hypertable like BigTable HQL (Hypertext Query Language)

● Can model more complex objects

● Example: (MongoDB) document

MongoDB 10gen object-structured documents manipulations with objects in

● Focus on modeling the structure of data (interconnectivity)

● Examples: MySQL Cluster (most mature solution), VoltDB, Clustrix,

• Table is a collection of rows.

• It hosts very large tables on top of clusters of commodity hardware.

● Accessing HBase by using the HBase Shell

verify whether the table is created using the list command

● Create Java Project

● Add External JARs

● Add all .jar files in /usr/lib/hbase

● Add all .jar files in /usr/lib/hbase/lib

● Add all .jar files in /usr/lib/hadoop

● Add all .jar files in /usr/lib/hadoop/clident

● Create new .java file and run it

● The console output should be like this

● Check if the table employee has been created

● Create new Java file in the same project

● Paste the code and execute it

● The console output should be like this

Tran vung tau bachelor

● Check the content of the table

● Check the content of the table

● Create new .java file

● Paste the code, save, and run it

● Check the result

● Edit the code

Put p = new Put(Bytes.toBytes(“row4")); Put p = new Put(Bytes.toBytes("4"));

● Then run the code

● Scan the table to see the result.

● Command: get ‘<table name>’ , ‘<row id>’

● Command: put ‘table name’,’row id’,'Column family:column name',’new value’

● Command: deleteall ‘<table name>’, ‘<row id>’

● Command: alter ‘ <table name> ’, ‘delete’ ⇒ ‘ <column family> ’

● The timestamp can be generated automatically by the RegionServer or can be supplied by

Command: put ‘table name’,’row id’,'Column family:column name',’new value’, timestamp

● We receive only the latest version of that cell (which is ‘CEO’)

● Let’s try again to get 2 versions of that cell

Command: put ‘table name’,’row id’,'Column family:column name',’new value’, timestamp

● Create a csv file (e.g. using gedit)

● Put the file to HDFS

● Command: hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',’ -Dimporttsv.columns= …

Important: no space before and after ‘,’ when listing columns

● Scan the table in hbase shell

● Check the Hive table student

• Create another hive table which actually points to an HBase table

● Migrate hive table data to HBase

● Check the table in Hive

● Check the Hbase table

● Open HUE and use File Browsers

Big Data ther 111

You might also like