0% found this document useful (0 votes)

83 views100 pages

G8-HBase 2

The document discusses three NoSQL databases: MongoDB, HBase, and Neo4j. It provides an overview of MongoDB, including its history, data model, CRUD operations, and schema design considerations regarding embedding vs linking related data. The document outlines MongoDB's document-based data structure, use of BSON format, and common operations like create, read, update, and delete. It also discusses schema design patterns in MongoDB like one-to-one, one-to-many, and many-to-many relationships.

Uploaded by

Nhan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views100 pages

G8-HBase 2

Uploaded by

Nhan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

MongoDB, HBase and Neo4j

La Hoàng Lộc -
Nguyễn Danh Khôi

1
Outline
I. MongoDB
II. HBase
III. Neo4j

2
I. MongoDB
• 1: Introduction & Basics
• 2: CRUD
• 3: Schema Design
• 4: Indexes
• 5: Aggregation
• 6: Replication & Sharding

3
I.1 Introduction & Basics
History
• mongoDB = “Humongous DB”
• Open-source
• Document-based
• “High performance, high availability”
• Automatic scaling
• C-P on CAP

5
Other NoSQL Types

Key/value (Dynamo)

Columnar/tabular (HBase)

Document (mongoDB)

6
Motivations
Problems with SQL
Rigid schema
Not easily scalable (designed for 90’s technology or worse)
Requires unintuitive joins

Perks of mongoDB
Easy interface with common languages (Java, Javascript, PHP, etc.)
DB tech should run anywhere (VM’s, cloud, etc.)
Keeps essential features of RDBMS’s while learning from key-value
noSQL systems

7
Data Model
Document-Based (max 16 MB)
Documents are in BSON format, consisting of field-
value pairs
Each document stored in a collection
Collections
Have index set in common
Like tables of relational db’s.
Documents do not have to have uniform structure

8
BSON
• “Binary JSON”
• Binary-encoded serialization of JSON-like docs
• Also allows “referencing”
• Embedded structure reduces need for joins
• Goals
– Lightweight
– Traversable
– Efficient (decoding and encoding)

9
BSON Example
{
"_id" : "37010"
"city" : "ADAMS",
"pop" : 2660,
"state" : "TN",
“councilman” : {
name: “John Smith”
address: “13 Scenic Way”
}
}
10
The _id Field
• By default, each document contains an _id
field. This field has a number of special
characteristics:
– Value serves as primary key for collection.
– Value is unique, immutable, and may be any
non-array type.
– Default data type is ObjectId, which is “small,
likely unique, fast to generate, and ordered.”
Sorting on an ObjectId value is roughly
equivalent to sorting on creation time.

11
mongoDB vs. SQL
mongoDB SQL
Document Tuple
Collection Table/View
PK: _id Field PK: Any Attribute(s)
Uniformity not Required Uniform Relation Schema

Index Index
Embedded Structure Joins
Shard Partition

12
I.2 CRUD
Create, Read, Update, Delete

13
CRUD: Using the Shel
To insert documents into a collection/make a new collection:

db.<collection>.insert(<document>)

<- ----

INSERT INTO <table>

VALUES(<attributevalues>);

14
CRUD: Inserting Data
Insert one document
db.<collection>.insert({<field>:<value>})

Inserting a document with a field name new to the collection is

inherently supported by the BSON model.

To insert multiple documents, use an array.

15
CRUD: Querying
Done on collections.
Get all docs: db.<collection>.find()
Returns a cursor, which is iterated over shell to
display first 20 results.
Add .limit(<number>) to limit results
SELECT * FROM <table>;

Get one doc: db.<collection>.findOne()

16
CRUD: Querying
To match a specific value:
db.<collection>.find({<field>:<value>})
“AND”
db.<collection>.find({<field1>:<value1>,
<field2>:<value2>
})
SELECT *
FROM <table>
WHERE <field1> = <value1> AND <field2> =
<value2>;

17
CRUD: Querying

OR
db.<collection>.find({ $or: [
<field>:<value1>
<field>:<value2> ]
})
SELECT *
FROM <table>
WHERE <field> = <value1> OR <field> = <value2>;

Checking for multiple values of same field

db.<collection>.find({<field>: {$in [<value>, <value>]}})

18
CRUD: Querying
Including/excluding document fields
db.<collection>.find({<field1>:<value>}, {<field2>: 0})

SELECT field1
FROM <table>;

db.<collection>.find({<field>:<value>}, {<field2>: 1})

Find documents with or w/o field
db.<collection>.find({<field>: { $exists: true}})

19
CRUD: Updating
db.<collection>.update(
{<field1>:<value1>}, //all docs in which field = value
{$set: {<field2>:<value2>}}, //set field to value
{multi:true} ) //update multiple docs

upsert: if true, creates a new doc when none matches search

criteria.

UPDATE <table>
SET <field2> = <value2>
WHERE <field1> = <value1>;

20
CRUD: Updating
To remove a field
db.<collection>.update({<field>:<value>},
{ $unset: { <field>: 1}})

Replace all field-value pairs

db.<collection>.update({<field>:<value>},
{ <field>:<value>,
<field>:<value>})
*NOTE: This overwrites ALL the contents of a document, even
removing fields.

21
CRUD: Removal
Remove all records where field = value
db.<collection>.remove({<field>:<value>})

DELETE FROM <table>

WHERE <field> = <value>;

As above, but only remove first document

db.<collection>.remove({<field>:<value>}, true)

22
Schema Design

23
RDBMS MongoDB
Database ➜ Database
Table ➜ Collection
Row ➜ Document
Index ➜ Index
Join ➜ Embedded
Document
Foreign ➜ Reference
Key
24
Intuition – why database exist in
the first place?

Why can’t we just write programs that

operate on objects?
Memory limit
We cannot swap back from disk merely by OS for the
page based memory management mechanism

Why can’t we have the database operating

on the same data structure as in program?
That is where mongoDB comes in

25
Mongo is basically schema-free
The purpose of schema in SQL is for meeting the
requirements of tables and quirky SQL implementation

Every “row” in a database “table” is a data structure,

much like a “struct” in C, or a “class” in Java. A table is
then an array (or list) of such data structures

So we what we design in mongoDB is basically same

way how we design a compound data type binding in
JSON

26
There are some patterns
Embedding

Linking

27
Embedding & Linking

28
One to One relationship
zip = {
_id: 35004,
zip = {
city: “ACMAR”,
loc: [-86, 33], _id: 35004 ,

pop: 6065, city: “ACMAR”

loc: [-86, 33],
State: “AL”
pop: 6065,
} State: “AL”,

council_person: {
Council_person = {
name: “John Doe",
zip_id = 35004, address: “123 Fake St.”,
name: “John Doe", Phone: 123456
}
address: “123 Fake St.”,
Phone: 123456 }

} 29
Example 2

MongoDB: The Definitive Guide,

By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English

Publisher: O’Reilly Media, CA

30
One to many relationship -
Embedding
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
31
One to many relationship –
Linking publisher = {

_id: "oreilly",

name: "O’Reilly Media",

founded: "1980",

location: "CA"

book = {

title: "MongoDB: The Definitive

Guide",

authors: [ "Kristina Chodorow",

"Mike Dirolf" ]

published_date: ISODate("2010-09-
24"),

pages: 216,

language: "English",

publisher_id: "oreilly"

32
Linking vs. Embedding
• Embedding is a bit like pre-joining data
• Document level operations are easy for the server
to handle
• Embed when the “many” objects always appear
with (viewed in the context of) their parents.
• Linking when you need more flexibility

33
Many to many relationship
Can put relation in either one of the
documents (embedding in one of the
documents)

Focus how data is accessed queried

34
Example
book = {
title: "MongoDB: The Definitive Guide",
authors : [
{ _id: "kchodorow", name: "Kristina Chodorow” },
{ _id: "mdirolf", name: "Mike Dirolf” }
]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}

author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York"
}

db.books.find( { authors.name : "Kristina Chodorow" } )

35
What is good about mongoDB?

• find() is more semantically clear for

programming
(map (lambda (b) b.title)
(filter (lambda (p) (> p 100)) Book)

• De-normalization provides Data locality,

and Data locality provides speed

36
I.4: Index in MongoDB

37
Before Index
• What does database normally do when we
query?
• MongoDB must scan every document.
• Inefficient because process large volume of data
db.users.find( { score: { “$lt” : 30} } )

38
Index in MongoDB
Types • Single Field Indexes
• Compound Field Indexes
• Multikey Indexes

39
40
Aggregation

Operations that process data records

and return computed results.
MongoDB provides aggregation
operations
Running data aggregation on
the mongod instance simplifies
application code and limits resource
requirements.

41
Pipelines
Modeled on the concept of data processing pipelines.
Provides:
 filters that operate like queries
document transformations that modify the form of
the output document.
Provides tools for:
grouping and sorting by field
aggregating the contents of arrays, including arrays
of documents
Can use operators for tasks such as calculating
the average or concatenating a string.

42
43
Pipelines

• $limit
• $skip
• $sort

44
Map-Reduce
Has two phases:
A map stage that processes each document
and emits one or more objects for each input document
A reduce phase that combines the output of the map
operation.
An optional finalize stage for final modifications to the
result
Uses Custom JavaScript functions
Provides greater flexibility but is less efficient and
more complex than the aggregation pipeline
Can have output sets that exceed the 16 megabyte
output limitation of the aggregation pipeline.

45
46
Single Purpose Aggregation
Operations

Special purpose database commands:

returning a count of matching documents
returning the distinct values for a field
grouping data based on the values of a field.
Aggregate documents from a single
collection.
Lack the flexibility and capabilities of the
aggregation pipeline and map-reduce.

47
48
49
50
Replication & Sharding

51
Replication
What is replication?
Purpose of
replication/redundanc
y
Fault tolerance
Availability
Increase read capacity

52
Replication in MongoDB
Replica Set Members
 Primary
 Read, Write
operations
 Secondary
 Asynchronous
Replication
 Can be primary
 Arbiter
 Voting
 Can’t be primary
 Delayed Secondary
 Can’t be primary

53
Replication in MongoDB
• Automatic Failover
• Heartbeats
• Elections
• The Standard Replica
Set Deployment
• Deploy an Odd
Number of Members
• Rollback
• Security
• SSL/TLS

54
Sharding
• What is sharding?
• Purpose of sharding
• Horizontal scaling
out
• Query Routers
• mongos
• Shard keys
• Range
based sharding
• Cardinality
• Avoid hotspotting

55
HBase (Hadoop database)
• is an open source, multidimensional, distributed, scalable and
a NoSQL database written in Java.
• runs on top of HDFS (Hadoop Distributed File System).
• is designed to provide a fault tolerant way of storing large collection
of sparse data sets.
• achieves high throughput and low latency by providing faster
Read/Write Access on huge data sets. -> HBase is the choice for the
applications which require fast & random access to large amount of
data.

56
Features of HBase
• Atomic reads and writes: All row level operations within a table are
atomic. -> Consistent reads and writes
• Linear and modular scalability
• Automatic and configurable sharding of tables: tables are distributed
across clusters and these clusters are distributed across regions.
These regions and clusters split, and are redistributed as the data
grows.
• Support Block Cache and Bloom Filters: for high volume query
optimization .
• Automatic failure detection support
• Sorted rowkeys: HBase stores rowkeys in a lexicographical order
57
HBase data model
• Components of Data Model:
• NameSpace: is a collection of tables.
• Table: Data is stored in a table format in HBase. But here tables are in column-
oriented format.
• Row Key: Row keys are used to search records which make searches fast. It acts as a
Primary Key.
• Column Family: Various columns are combined in a column family. These column
families are stored together which makes the searching process faster because data
belonging to same column family can be accessed together in a single seek.
• Column Qualifier: Each column’s name is known as its column qualifier.
• Cell: The data is dumped into cells which are specifically identified by rowkey and
column qualifiers.
• Timestamp: Timestamp is a combination of date and time. Whenever data is stored,
it is stored with its timestamp. This makes easy to search for a particular version of
data.

58
HBase data model (cont.)

59
HBase Architecture
• Major Components of HBase:
• HMaster Server: is responsible for monitoring all RegionServer
instances in the cluster, runs on the NameNode
• HBase Region Server: is responsible for serving and managing
regions. RegionServer runs on a DataNode
• Regions: are the basic element of availability and distribution for
tables, are comprised of a Store per Column Family.
• Zookeeper: acts like a coordinator inside HBase distributed
environment.

60
HBase Architecture (cont.)

61
Region
• contains all the rows between the start key and the end key
assigned to that region.
• A table can be divided into a number of regions in such a
way that all the columns of a column family is stored in one
region.
• Each region contains the rows in a sorted order.
• A Group of regions is served to the clients by a Region Server.

62
Region Server
• Many regions are assigned to a Region Server, which is responsible for
handling, managing, executing reads and writes operations on that
set of regions.
• A Region Server can serve approximately 1000 regions to the client.
• Components of Region Server: WAL, Block Cache, MemStore, HFile.

63
Region Server (cont.)
• WAL (Write-ahead-Log): is a file attached to every Region
Server inside the distributed environment. The WAL stores
the new data that hasn’t been persisted or committed to the
permanent storage. It is used in case of failure to recover the
data sets.
• Block Cache: stores the frequently read data in the
memory. If the data in BlockCache is least recently used,
then that data is removed from BlockCache.

64
Region Server (cont.)
• MemStore: is the write cache. It stores all the incoming data
before committing it to the disk or permanent memory
(HFile). There is one MemStore for each column family in a
region. The data is sorted in lexicographical order before
committing it to the disk.
• HFile: is stored on HDFS. It stores the actual cells on the
disk. MemStore commits the data to HFile when the size of
MemStore exceeds.

65
HMaster Server
• HMaster create, delete tables and assigns regions to the Region
servers.
• It coordinates and manages the Region Server.
• It assigns regions to the Region Servers on startup and re-assigns
regions to Region Servers during recovery and load balancing.
• It monitors all the Region Server’s instances in the cluster (with the
help of Zookeeper) and performs recovery activities whenever any
Region Server is down.
• It provides an interface for creating, deleting and updating tables.

66
ZooKeeper
• ZooKeeper acts like a coordinator inside HBase distributed
environment.
• Every Region Server along with HMaster Server sends continuous
heartbeat at regular interval to Zookeeper and it checks which server
is alive and available.
• It also provides server failure notifications so that, recovery measures
can be executed.
• also maintains the .META Server’s path, which helps any client
in searching for any region. The Client first has to check with .META
Server in which Region Server a region belongs, and it gets the path of
that Region Server.
67
ZooKeeper (cont.)
• How does Zookeeper support for failure detection ?

68
Meta Table
• is a special HBase catalog table. It maintains a list of all the
Regions Servers in the HBase storage system.
• .META file maintains the table in form of keys and values. Key
represents the start key of the region and its id whereas the
value contains the path of the Region Server.

69
Meta Table (cont.)

70
Write Operation

71
Write Operation(cont.)
• Step 1: Whenever the client has a write request, the client writes the
data to the WAL. This WAL file is maintained in every Region Server
and Region Server uses it to recover data which is not committed to
the disk. (fault tolerant)
• Step 2:The data to be written is forwarded to MemStore which is
actually the RAM of the data node (Region Server).
• Step 3: A ACK signal is sent to client as a confirmation of task
completed.
• Step 4: When the MemStore reaches the threshold, it dumps the data
into a HFile.
72
HFile
• There is one MemStore for each column family. When the MemStore
reaches the threshold, it dumps all the data into a new HFile. ->HBase
contains multiple HFiles for each Column Family.
• Over time, the number of HFile grows as MemStore dumps the data. ->
need compaction
• The writes are placed sequentially on the disk. Therefore, the movement of
the disk’s read-write head is very less. This makes write and search
mechanism very fast.
• The HFile indexes are loaded in memory whenever an HFile is opened. This
helps in finding a record in a single seek.
• HFile also has information about bloom filters.
• Bloom Filter helps in searching key value pairs, it skips the file which does
not contain the required rowkey.
73
Read Operation
• Search by rowkeys:
• Step 1: Zookeeper has the location for META table which is
present in HRegion Server. When a client requests zookeeper, it
gives the address for the table.
• Step 2: The process continues to HRegionServer and gets to
META table, where it gets the region address of table where the
data is present to be read.
• The client caches this information with the location of the META
Table.

74
Read Operation (cont.)
• After the client retrieves the location of the Region Server (search
operation), if the client does not have it in its cache memory:
• Step 1: the scanner first looks for the row cell in Block cache. Here all the
recently read key value pairs are stored.
• Step 2: If Scanner fails to find the required result, it moves to the MemStore,
as we know this is the write cache memory. There, it searches for the most
recently written files, which has not been dumped yet in HFile.
• Step 3: It will use bloom filters to load the data from HFile.
• Step 4: The data taken from HFile is the latest read data and can be read by
the user again. Hence the data is written in BlockCache, so that the next time,
it can be instantly accessed by the client -> only need 1 step.
• Step 5: When the data is written in BlockCache and all the search is
completed, the read process with required data will be returned to the client
along with ACK
75
Compation Operation
• HBase need combines HFiles to reduce the storage and reduce the
number of disk seeks needed for a read. Compaction chooses some
HFiles from a region and combines them. There are two types of
compaction:
• Minor Compaction: HBase automatically picks smaller HFiles and recommits
them to bigger HFiles. It performs merge sort for committing smaller HFiles to
bigger HFiles. This helps in storage space optimization.
• Major Compaction: HBase merges and recommits the smaller HFiles of a
region to a new HFile. In this process, the same column families are placed
together in the new HFile. It drops deleted and expired cell in this process. It
increases read performance.
• write amplification: But during this process, input-output disks
and network traffic might get congested. So, it is generally
scheduled during low peak load timings. 76
Compation Operation (cont.)

77
Region Split
• Whenever a region becomes large, it is
divided into two child regions.
• Each region represents exactly a half of the
parent region. Then this split is reported to
the HMaster. This is handled by the same
Region Server until the HMaster allocates
them to a new Region Server for load
balancing.

78
Failure Recovery
• Whenever a Region Server fails, ZooKeeper notifies to the HMaster
about the failure.
• Then HMaster distributes and allocates the regions of crashed Region
Server to many active Region Servers. To recover the data of the
MemStore of the failed Region Server, the HMaster distributes the WAL
to all the Region Servers.
• Each Region Server re-executes the WAL to build the MemStore for
that failed region’s column family.
• The data is written in chronological order (in a timely order) in WAL.
Therefore, Re-executing that WAL means making all the change that
were made and stored in the MemStore file.
• So, after all the Region Servers executes the WAL, the MemStore data
for all column family is recovered.
79
CRUD operations
• Hbase only has low-level CRUD (create, read, update, delete)
operations. It is the responsibility of the application programs to
implement more complex operations, such as joins between rows in
different tables.

80
CRUD operations (cont.)
• The create operation creates a new table and specifies one or more
column families associated with that table.
• The put operation is used for inserting new data or new versions of
existing data items.
• The get operation is for retrieving the data associated with a single row
in a table.
• The scan operation retrieves all the rows.

81
Cons of HBase
• Single point of failure: At the time when only one HMaster is used, there is a
possibility of failure.
• No handling of JOINS in database: Instead of the database itself, JOINs are
handled in MapReduce layer.
• Sorted only on key: HBase is indexed and sorted only on key.
• Not a perfect replacement: Since HBase does not support some of the
traditional model’s features, we cannot expect completely to use HBase as a
replacement for traditional models.
• No support SQL structure: As there is no support for SQL structure, it
cannot contain any query optimizer.
• Unpredictable latencies: While we integrate HBase with Map-reduce jobs, it
will result in unpredictable latencies.
• Memory issues on the cluster: In some time memory issues on the cluster,
HBase is integrated with Pig and Hive jobs results.

82
Introduction to Neo4j
• is a NoSQL database. It is highly scalable and schema-free. It's world
most popular graph database management system.
• Its architecture is designed for optimal management, storage and
traversal of nodes and relationships.
• Neo4j is best for storing data that has many interconnecting
relationships.
• The graph model doesn't usually require a predefined schema. So there
is no need to create the database structure before you load the data
(like you do in a relational database). In Neo4j, the data is the
structure. Neo4j is a "schema-optional" DBMS.
• In Neo4j, no need to set up primary key/foreign key constraints to
predetermine which fields can have a relationship, and to which data.
You just have to define the relationships between the nodes you need.
83
Features of Neo4j
• ACID Property: Neo4j supports full ACID properties.
• Scalability: Neo4j facilitates you to scale the database by increasing the
number of reads/writes, and the volume without affecting the data
integrity and the speed of query processing.
• Reliability: Neo4j provides replication for data safety and reliability.
• Cypher Query Language: Neo4j provides a powerful declarative query
language called Cypher Query language.
• GraphDB: Neo4j follows Property Graph Data Model.
• Support Built-in Web applications
• Flexible Schema

84
Cypher
• Neo4j has a high-level query language, Cypher.
• There are declarative commands for creating nodes and
relationships as well as for finding nodes and relationships based on
specifying patterns.
• Deletion and modification of data is also possible in Cypher.
• A Cypher query is made up of clauses. When a query has several
clauses, the result from one clause can be the input to the next
clause in the query.

85
Cypher (cont.)
Example 1:

86
Cypher (cont.)
Example 1 (cont.):
- Create node:

87
Cypher (cont.)
Example 1 (cont.):
- Create relationships:

88
Cypher (cont.)

89
Cypher (cont.)
Example 2:
- Returns the projects and hours per week that the employee with Empid = 2
works on.
MATCH (e: EMPLOYEE {Empid: ‘2’}) – [ w: WorksOn ] →(p)
RETURN e.Ename , w.Hours, p.Pname

- Returns all employees and the projects they work on, hours per week, sorted by
Ename
MATCH (e) – [ w: WorksOn ] → (p)
RETURN e.Ename , w.Hours, p.Pname
ORDER BY e.Ename

90
The Property Graph Model
• Components of Graph Model:
• Example 3: Consider the graph below

91
The Property Graph Model
• Components of Graph Model:
• Nodes are often used to represent entities. The simplest possible
graph is a single node. A node can have one or more labels (that
describe its role) and properties (i.e. attributes). The nodes that
have the same label are grouped into a collection that identifies
a subset of the nodes in the database graph for querying
purposes.

92
The Property Graph Model
• Labels: are used to shape the domain by grouping nodes into
sets where all nodes that have a certain label belongs to the
same set. A node can have zero to many labels.
As example 3, Asume that we want to express different
dimensions of the data. One way of doing that is to add more
labels.

93
The Property Graph Model
• Relationships: A relationship connects two nodes. Relationships
organize nodes into structures, allowing a graph to resemble a
list, a tree, a map, or a compound entity. Relationships can have
one or more properties.

94
The Property Graph Model
• Relationship types: A relationship must have exactly one
relationship type. Relationships always have a direction
Example below is an ACTED_IN relationship, with the Tom
Hanks node as the source node and Forrest Gump as the target
node. The roles property on the ACTED_IN relationship has an
array value with a single item in it.

95
The Property Graph Model
• Properties: Properties are name-value pairs that are used to add
qualities to nodes and relationships. The value part of the
property can hold different data types such as number, string or
boolean
• As example 3, we have name and born on Persons nodes, title
and released on Movie nodes, and the property roles on
the ACTED_IN relationship.

96
The Property Graph Model
• Traversals and Paths: A traversal is how you query a graph in
order to find answers to questions. Traversing a graph means
visiting nodes by following relationships according to some rules.
• Example: Find out which movies Tom Hanks acted in according to
example 3.

97
The Property Graph Model
• Indexing and node identifiers: When a node is created, the
Neo4j system creates an internal unique system-defined
identifier for each node. To retrieve individual nodes using other
properties of the nodes efficiently, the user can create indexes
for the collection of nodes that have a particular label. Typically,
one or more of the properties of the nodes in that collection can
be indexed.
• Example: As example 1, Empid can be used to index nodes with
the EMPLOYEE label, Dno to index the nodes with the
DEPARTMENT label, and Pno to index the nodes with the
PROJECT label.

98
Neo4j with chatbot

99
Reference
• [1] Fundamental of Database System, 7th edition
• [2] https://www.edureka.co/blog/hbase-architecture/
• [3] https://hbase.apache.org/book.html
• [4] Define of The Property Graph Database Model http://ceur-ws.org/Vol-
2100/paper26.pdf
• [5] https://neo4j.com/docs/getting-started/current/graphdb-concepts/
• [6] https://neo4j.com/blog/ebay-shopbot-graph-powered-conversational-
commerce/
• [7] https://neo4j.com/developer/graph-model-refactoring/
• [8] http://docs.mongodb.org/manual/reference/bson-types/
• [9] http://www.aaronstannard.com/post/2011/06/30/MongoDB-vs-SQL-
Server.aspx
• [10] https://docs.mongodb.org/manual/
• [11] http://json.org/
100

The Everyday Healthy Vegetarian by Nandita Iyer
No ratings yet
The Everyday Healthy Vegetarian by Nandita Iyer
458 pages
Mongo DB Course
No ratings yet
Mongo DB Course
69 pages
MongoDB For Data Science Seminar
No ratings yet
MongoDB For Data Science Seminar
135 pages
BIS601 Module 5 Textbook
No ratings yet
BIS601 Module 5 Textbook
57 pages
Unit - Iii Bda
No ratings yet
Unit - Iii Bda
51 pages
Big Data (Unit 3)
No ratings yet
Big Data (Unit 3)
46 pages
MongoDB - Cours 4
No ratings yet
MongoDB - Cours 4
58 pages
Aula10mongodb 161007190427
No ratings yet
Aula10mongodb 161007190427
69 pages
Module 5
No ratings yet
Module 5
32 pages
Lesson Plan CSE 4th Sem Database Management System Swagatika Dalai
No ratings yet
Lesson Plan CSE 4th Sem Database Management System Swagatika Dalai
3 pages
Mongodb
No ratings yet
Mongodb
49 pages
Dbms Unit5 Notes
No ratings yet
Dbms Unit5 Notes
81 pages
Unit 2
No ratings yet
Unit 2
85 pages
Catalogue Corolla Altis Compressed 1
No ratings yet
Catalogue Corolla Altis Compressed 1
8 pages
Unit 2 - Bda Notes
No ratings yet
Unit 2 - Bda Notes
37 pages
Artificial Island, 1
No ratings yet
Artificial Island, 1
25 pages
Policy Server Installation Guide
0% (1)
Policy Server Installation Guide
24 pages
Aramco HSE Questions
No ratings yet
Aramco HSE Questions
20 pages
FSD Unit III
No ratings yet
FSD Unit III
22 pages
21 Mongo DB
No ratings yet
21 Mongo DB
104 pages
DPA Lecture 6
No ratings yet
DPA Lecture 6
69 pages
Mongo DB
No ratings yet
Mongo DB
77 pages
Databases
No ratings yet
Databases
27 pages
MongoDb Imp
No ratings yet
MongoDb Imp
21 pages
mongoDB 1
No ratings yet
mongoDB 1
23 pages
Lecture 40 1
No ratings yet
Lecture 40 1
22 pages
Bda Unit 4
No ratings yet
Bda Unit 4
13 pages
Mongo DB
No ratings yet
Mongo DB
30 pages
Csis 3300 w5 9 Nosql
No ratings yet
Csis 3300 w5 9 Nosql
27 pages
The Electoral Legislation of Somalia 195 PDF
No ratings yet
The Electoral Legislation of Somalia 195 PDF
221 pages
c13 JSE JDBC NoSQL
No ratings yet
c13 JSE JDBC NoSQL
51 pages
NoSQL 24 Mongo P1
No ratings yet
NoSQL 24 Mongo P1
43 pages
Chapitre 4 MongoDB
No ratings yet
Chapitre 4 MongoDB
27 pages
Corrosion Inhibitors
100% (2)
Corrosion Inhibitors
70 pages
Basics of Mongodb-Connectivity
No ratings yet
Basics of Mongodb-Connectivity
26 pages
NoSQL 14 MONGO 2
No ratings yet
NoSQL 14 MONGO 2
37 pages
02 - Document-Based and MongoDB
No ratings yet
02 - Document-Based and MongoDB
133 pages
Big Data Notes
No ratings yet
Big Data Notes
13 pages
Mongo DB
No ratings yet
Mongo DB
36 pages
Mongodb Tutorial
No ratings yet
Mongodb Tutorial
54 pages
TCC Catalog 2017 18
No ratings yet
TCC Catalog 2017 18
186 pages
05 NoSQL
No ratings yet
05 NoSQL
21 pages
WT U5 Mongo Node 20 June 2022
No ratings yet
WT U5 Mongo Node 20 June 2022
19 pages
Module 3 Mongodb
No ratings yet
Module 3 Mongodb
10 pages
Mongo DB
No ratings yet
Mongo DB
31 pages
Lo2 Nosql2
No ratings yet
Lo2 Nosql2
23 pages
Module 3
No ratings yet
Module 3
15 pages
Lecture 9 - MongoDB
No ratings yet
Lecture 9 - MongoDB
8 pages
5.1 s2.0 S095006182032657X Main
No ratings yet
5.1 s2.0 S095006182032657X Main
15 pages
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
Regression - Slides and UIP Case-Study Setup
No ratings yet
Regression - Slides and UIP Case-Study Setup
21 pages
Mongodb
No ratings yet
Mongodb
20 pages
Mongo DB
No ratings yet
Mongo DB
5 pages
Mongodb (Cont.) : Excerpts From "The Little Mongodb Book" Karl Seguin
No ratings yet
Mongodb (Cont.) : Excerpts From "The Little Mongodb Book" Karl Seguin
37 pages
Penberthy Eductor
No ratings yet
Penberthy Eductor
16 pages
WT Unit-4 Mongo DB
No ratings yet
WT Unit-4 Mongo DB
12 pages
4-The MongoDB Data Model (E-Next - In)
No ratings yet
4-The MongoDB Data Model (E-Next - In)
6 pages
A. Im, G. Cai, H. Tunc, J. Stevens, Y. Barve, S. Hei Vanderbilt University
No ratings yet
A. Im, G. Cai, H. Tunc, J. Stevens, Y. Barve, S. Hei Vanderbilt University
81 pages
LP Rascel
No ratings yet
LP Rascel
13 pages
MongoDBTutorialApril2021 Robo3T
No ratings yet
MongoDBTutorialApril2021 Robo3T
20 pages
( ) 2024 7.life in Space - ( ) 2 (25 ) (Q)
No ratings yet
( ) 2024 7.life in Space - ( ) 2 (25 ) (Q)
8 pages
Benchmarking Optimizers
No ratings yet
Benchmarking Optimizers
30 pages
Mongodb Interview Questions (V4.4)
No ratings yet
Mongodb Interview Questions (V4.4)
25 pages
Mongodb
No ratings yet
Mongodb
9 pages
Session 15
No ratings yet
Session 15
6 pages
Week 2
No ratings yet
Week 2
10 pages
Full Metal Mongo: Scale
No ratings yet
Full Metal Mongo: Scale
82 pages
Islamic Names & Meanings in Urdu - Muslim Boys & Muslim Girls Names
48% (25)
Islamic Names & Meanings in Urdu - Muslim Boys & Muslim Girls Names
2 pages
MongoDB Schema Design
No ratings yet
MongoDB Schema Design
69 pages
Hunshu
No ratings yet
Hunshu
6 pages
Marc h5, 2015: Proprietary and Confidential
No ratings yet
Marc h5, 2015: Proprietary and Confidential
26 pages
Rosevil DLL Sample December 5 - 9
No ratings yet
Rosevil DLL Sample December 5 - 9
13 pages
Mettl API Documentation v1.17
No ratings yet
Mettl API Documentation v1.17
59 pages
Script For Week 7
No ratings yet
Script For Week 7
8 pages
MongoDB Data Modeling - Sample Chapter
No ratings yet
MongoDB Data Modeling - Sample Chapter
40 pages
Tutorial Application of GIS For Watershed
No ratings yet
Tutorial Application of GIS For Watershed
28 pages
Living and Non Living Things DLP Final Exam Prepared by Jessica Carolino...
No ratings yet
Living and Non Living Things DLP Final Exam Prepared by Jessica Carolino...
5 pages
Ecs268: Structural & Material Laboratory: I. Objective
No ratings yet
Ecs268: Structural & Material Laboratory: I. Objective
7 pages
Mongodb: Goo The Following Table Shows The Relationship of Rdbms Terminology With Mongodb
No ratings yet
Mongodb: Goo The Following Table Shows The Relationship of Rdbms Terminology With Mongodb
7 pages
Installing ICU 52
No ratings yet
Installing ICU 52
7 pages
Assignment Top Sheet Department of Civil Engineering & Technology
No ratings yet
Assignment Top Sheet Department of Civil Engineering & Technology
3 pages
RM-898 Research Methodology For DM-5, CE&M-4, SE-3 and TN-2 Students Only
No ratings yet
RM-898 Research Methodology For DM-5, CE&M-4, SE-3 and TN-2 Students Only
3 pages
1st Sem Result
No ratings yet
1st Sem Result
1 page
Question: EXAMPLE A Fermentation Medium Contains An Initial Spores Co
No ratings yet
Question: EXAMPLE A Fermentation Medium Contains An Initial Spores Co
2 pages
MongoDB Tutorial
No ratings yet
MongoDB Tutorial
4 pages
Schools Division Office of Pangasinan I Nancapian National High School
No ratings yet
Schools Division Office of Pangasinan I Nancapian National High School
7 pages
Official Resume
No ratings yet
Official Resume
1 page

G8-HBase 2

Uploaded by

G8-HBase 2

Uploaded by

MongoDB, HBase and Neo4j

INSERT INTO <table>

Inserting a document with a field name new to the collection is

To insert multiple documents, use an array.

Get one doc: db.<collection>.findOne()

Checking for multiple values of same field

db.<collection>.find({<field>:<value>}, {<field2>: 1})

upsert: if true, creates a new doc when none matches search

Replace all field-value pairs

DELETE FROM <table>

As above, but only remove first document

Why can’t we just write programs that

Why can’t we have the database operating

Every “row” in a database “table” is a data structure,

So we what we design in mongoDB is basically same

pop: 6065, city: “ACMAR”

MongoDB: The Definitive Guide,

Publisher: O’Reilly Media, CA

name: "O’Reilly Media",

title: "MongoDB: The Definitive

authors: [ "Kristina Chodorow",

Focus how data is accessed queried

db.books.find( { authors.name : "Kristina Chodorow" } )

• find() is more semantically clear for

• De-normalization provides Data locality,

Operations that process data records

Special purpose database commands:

You might also like