0% found this document useful (0 votes)
83 views100 pages

G8-HBase 2

The document discusses three NoSQL databases: MongoDB, HBase, and Neo4j. It provides an overview of MongoDB, including its history, data model, CRUD operations, and schema design considerations regarding embedding vs linking related data. The document outlines MongoDB's document-based data structure, use of BSON format, and common operations like create, read, update, and delete. It also discusses schema design patterns in MongoDB like one-to-one, one-to-many, and many-to-many relationships.

Uploaded by

Nhan Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views100 pages

G8-HBase 2

The document discusses three NoSQL databases: MongoDB, HBase, and Neo4j. It provides an overview of MongoDB, including its history, data model, CRUD operations, and schema design considerations regarding embedding vs linking related data. The document outlines MongoDB's document-based data structure, use of BSON format, and common operations like create, read, update, and delete. It also discusses schema design patterns in MongoDB like one-to-one, one-to-many, and many-to-many relationships.

Uploaded by

Nhan Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

MongoDB, HBase and Neo4j

La Hoàng Lộc -
Nguyễn Danh Khôi

1
Outline
I. MongoDB
II. HBase
III. Neo4j

2
I. MongoDB
• 1: Introduction & Basics
• 2: CRUD
• 3: Schema Design
• 4: Indexes
• 5: Aggregation
• 6: Replication & Sharding

3
I.1 Introduction & Basics
History
• mongoDB = “Humongous DB”
• Open-source
• Document-based
• “High performance, high availability”
• Automatic scaling
• C-P on CAP

5
Other NoSQL Types

Key/value (Dynamo)

Columnar/tabular (HBase)

Document (mongoDB)

6
Motivations
Problems with SQL
Rigid schema
Not easily scalable (designed for 90’s technology or worse)
Requires unintuitive joins

Perks of mongoDB
Easy interface with common languages (Java, Javascript, PHP, etc.)
DB tech should run anywhere (VM’s, cloud, etc.)
Keeps essential features of RDBMS’s while learning from key-value
noSQL systems

7
Data Model
Document-Based (max 16 MB)
Documents are in BSON format, consisting of field-
value pairs
Each document stored in a collection
Collections
Have index set in common
Like tables of relational db’s.
Documents do not have to have uniform structure

8
BSON
• “Binary JSON”
• Binary-encoded serialization of JSON-like docs
• Also allows “referencing”
• Embedded structure reduces need for joins
• Goals
– Lightweight
– Traversable
– Efficient (decoding and encoding)

9
BSON Example
{
"_id" : "37010"
"city" : "ADAMS",
"pop" : 2660,
"state" : "TN",
“councilman” : {
name: “John Smith”
address: “13 Scenic Way”
}
}
10
The _id Field
• By default, each document contains an _id
field. This field has a number of special
characteristics:
– Value serves as primary key for collection.
– Value is unique, immutable, and may be any
non-array type.
– Default data type is ObjectId, which is “small,
likely unique, fast to generate, and ordered.”
Sorting on an ObjectId value is roughly
equivalent to sorting on creation time.

11
mongoDB vs. SQL
mongoDB SQL
Document Tuple
Collection Table/View
PK: _id Field PK: Any Attribute(s)
Uniformity not Required Uniform Relation Schema

Index Index
Embedded Structure Joins
Shard Partition

12
I.2 CRUD
Create, Read, Update, Delete

13
CRUD: Using the Shel
To insert documents into a collection/make a new collection:

db.<collection>.insert(<document>)

<- ----

INSERT INTO <table>


VALUES(<attributevalues>);

14
CRUD: Inserting Data
Insert one document
db.<collection>.insert({<field>:<value>})

Inserting a document with a field name new to the collection is


inherently supported by the BSON model.

To insert multiple documents, use an array.

15
CRUD: Querying
Done on collections.
Get all docs: db.<collection>.find()
Returns a cursor, which is iterated over shell to
display first 20 results.
Add .limit(<number>) to limit results
SELECT * FROM <table>;

Get one doc: db.<collection>.findOne()

16
CRUD: Querying
To match a specific value:
db.<collection>.find({<field>:<value>})
“AND”
db.<collection>.find({<field1>:<value1>,
<field2>:<value2>
})
SELECT *
FROM <table>
WHERE <field1> = <value1> AND <field2> =
<value2>;

17
CRUD: Querying

OR
db.<collection>.find({ $or: [
<field>:<value1>
<field>:<value2> ]
})
SELECT *
FROM <table>
WHERE <field> = <value1> OR <field> = <value2>;

Checking for multiple values of same field


db.<collection>.find({<field>: {$in [<value>, <value>]}})

18
CRUD: Querying
Including/excluding document fields
db.<collection>.find({<field1>:<value>}, {<field2>: 0})

SELECT field1
FROM <table>;

db.<collection>.find({<field>:<value>}, {<field2>: 1})


Find documents with or w/o field
db.<collection>.find({<field>: { $exists: true}})

19
CRUD: Updating
db.<collection>.update(
{<field1>:<value1>}, //all docs in which field = value
{$set: {<field2>:<value2>}}, //set field to value
{multi:true} ) //update multiple docs

upsert: if true, creates a new doc when none matches search


criteria.

UPDATE <table>
SET <field2> = <value2>
WHERE <field1> = <value1>;

20
CRUD: Updating
To remove a field
db.<collection>.update({<field>:<value>},
{ $unset: { <field>: 1}})

Replace all field-value pairs


db.<collection>.update({<field>:<value>},
{ <field>:<value>,
<field>:<value>})
*NOTE: This overwrites ALL the contents of a document, even
removing fields.

21
CRUD: Removal
Remove all records where field = value
db.<collection>.remove({<field>:<value>})

DELETE FROM <table>


WHERE <field> = <value>;

As above, but only remove first document


db.<collection>.remove({<field>:<value>}, true)

22
Schema Design

23
RDBMS MongoDB
Database ➜ Database
Table ➜ Collection
Row ➜ Document
Index ➜ Index
Join ➜ Embedded
Document
Foreign ➜ Reference
Key
24
Intuition – why database exist in
the first place?

Why can’t we just write programs that


operate on objects?
Memory limit
We cannot swap back from disk merely by OS for the
page based memory management mechanism

Why can’t we have the database operating


on the same data structure as in program?
That is where mongoDB comes in

25
Mongo is basically schema-free
The purpose of schema in SQL is for meeting the
requirements of tables and quirky SQL implementation

Every “row” in a database “table” is a data structure,


much like a “struct” in C, or a “class” in Java. A table is
then an array (or list) of such data structures

So we what we design in mongoDB is basically same


way how we design a compound data type binding in
JSON

26
There are some patterns
Embedding

Linking

27
Embedding & Linking

28
One to One relationship
zip = {
_id: 35004,
zip = {
city: “ACMAR”,
loc: [-86, 33], _id: 35004 ,

pop: 6065, city: “ACMAR”


loc: [-86, 33],
State: “AL”
pop: 6065,
} State: “AL”,

council_person: {
Council_person = {
name: “John Doe",
zip_id = 35004, address: “123 Fake St.”,
name: “John Doe", Phone: 123456
}
address: “123 Fake St.”,
Phone: 123456 }

} 29
Example 2

MongoDB: The Definitive Guide,


By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English

Publisher: O’Reilly Media, CA

30
One to many relationship -
Embedding
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
31
One to many relationship –
Linking publisher = {

_id: "oreilly",

name: "O’Reilly Media",

founded: "1980",

location: "CA"

book = {

title: "MongoDB: The Definitive


Guide",

authors: [ "Kristina Chodorow",


"Mike Dirolf" ]

published_date: ISODate("2010-09-
24"),

pages: 216,

language: "English",

publisher_id: "oreilly"

32
Linking vs. Embedding
• Embedding is a bit like pre-joining data
• Document level operations are easy for the server
to handle
• Embed when the “many” objects always appear
with (viewed in the context of) their parents.
• Linking when you need more flexibility

33
Many to many relationship
Can put relation in either one of the
documents (embedding in one of the
documents)

Focus how data is accessed queried

34
Example
book = {
title: "MongoDB: The Definitive Guide",
authors : [
{ _id: "kchodorow", name: "Kristina Chodorow” },
{ _id: "mdirolf", name: "Mike Dirolf” }
]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}

author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York"
}

db.books.find( { authors.name : "Kristina Chodorow" } )

35
What is good about mongoDB?

• find() is more semantically clear for


programming
(map (lambda (b) b.title)
(filter (lambda (p) (> p 100)) Book)

• De-normalization provides Data locality,


and Data locality provides speed

36
I.4: Index in MongoDB

37
Before Index
• What does database normally do when we
query?
• MongoDB must scan every document.
• Inefficient because process large volume of data
db.users.find( { score: { “$lt” : 30} } )

38
Index in MongoDB
Types • Single Field Indexes
• Compound Field Indexes
• Multikey Indexes

39
40
Aggregation

Operations that process data records


and return computed results.
MongoDB provides aggregation
operations
Running data aggregation on
the mongod instance simplifies
application code and limits resource
requirements.

41
Pipelines
Modeled on the concept of data processing pipelines.
Provides:
 filters that operate like queries
document transformations that modify the form of
the output document.
Provides tools for:
grouping and sorting by field
aggregating the contents of arrays, including arrays
of documents
Can use operators for tasks such as calculating
the average or concatenating a string.

42
43
Pipelines

• $limit
• $skip
• $sort

44
Map-Reduce
Has two phases:
A map stage that processes each document
and emits one or more objects for each input document
A reduce phase that combines the output of the map
operation.
An optional finalize stage for final modifications to the
result
Uses Custom JavaScript functions
Provides greater flexibility but is less efficient and
more complex than the aggregation pipeline
Can have output sets that exceed the 16 megabyte
output limitation of the aggregation pipeline.

45
46
Single Purpose Aggregation
Operations

Special purpose database commands:


returning a count of matching documents
returning the distinct values for a field
grouping data based on the values of a field.
Aggregate documents from a single
collection.
Lack the flexibility and capabilities of the
aggregation pipeline and map-reduce.

47
48
49
50
Replication & Sharding

51
Replication
What is replication?
Purpose of
replication/redundanc
y
Fault tolerance
Availability
Increase read capacity

52
Replication in MongoDB
Replica Set Members
 Primary
 Read, Write
operations
 Secondary
 Asynchronous
Replication
 Can be primary
 Arbiter
 Voting
 Can’t be primary
 Delayed Secondary
 Can’t be primary

53
Replication in MongoDB
• Automatic Failover
• Heartbeats
• Elections
• The Standard Replica
Set Deployment
• Deploy an Odd
Number of Members
• Rollback
• Security
• SSL/TLS

54
Sharding
• What is sharding?
• Purpose of sharding
• Horizontal scaling
out
• Query Routers
• mongos
• Shard keys
• Range
based sharding
• Cardinality
• Avoid hotspotting

55
HBase (Hadoop database)
• is an open source, multidimensional, distributed, scalable and
a NoSQL database written in Java.
• runs on top of HDFS (Hadoop Distributed File System).
• is designed to provide a fault tolerant way of storing large collection
of sparse data sets.
• achieves high throughput and low latency by providing faster
Read/Write Access on huge data sets. -> HBase is the choice for the
applications which require fast & random access to large amount of
data.

56
Features of HBase
• Atomic reads and writes: All row level operations within a table are
atomic. -> Consistent reads and writes
• Linear and modular scalability
• Automatic and configurable sharding of tables: tables are distributed
across clusters and these clusters are distributed across regions.
These regions and clusters split, and are redistributed as the data
grows.
• Support Block Cache and Bloom Filters: for high volume query
optimization .
• Automatic failure detection support
• Sorted rowkeys: HBase stores rowkeys in a lexicographical order
57
HBase data model
• Components of Data Model:
• NameSpace: is a collection of tables.
• Table: Data is stored in a table format in HBase. But here tables are in column-
oriented format.
• Row Key: Row keys are used to search records which make searches fast. It acts as a
Primary Key.
• Column Family: Various columns are combined in a column family. These column
families are stored together which makes the searching process faster because data
belonging to same column family can be accessed together in a single seek.
• Column Qualifier: Each column’s name is known as its column qualifier.
• Cell: The data is dumped into cells which are specifically identified by rowkey and
column qualifiers.
• Timestamp: Timestamp is a combination of date and time. Whenever data is stored,
it is stored with its timestamp. This makes easy to search for a particular version of
data.

58
HBase data model (cont.)

59
HBase Architecture
• Major Components of HBase:
• HMaster Server: is responsible for monitoring all RegionServer
instances in the cluster, runs on the NameNode
• HBase Region Server: is responsible for serving and managing
regions. RegionServer runs on a DataNode
• Regions: are the basic element of availability and distribution for
tables, are comprised of a Store per Column Family.
• Zookeeper: acts like a coordinator inside HBase distributed
environment.

60
HBase Architecture (cont.)

61
Region
• contains all the rows between the start key and the end key
assigned to that region.
• A table can be divided into a number of regions in such a
way that all the columns of a column family is stored in one
region.
• Each region contains the rows in a sorted order.
• A Group of regions is served to the clients by a Region Server.

62
Region Server
• Many regions are assigned to a Region Server, which is responsible for
handling, managing, executing reads and writes operations on that
set of regions.
• A Region Server can serve approximately 1000 regions to the client.
• Components of Region Server: WAL, Block Cache, MemStore, HFile.

63
Region Server (cont.)
• WAL (Write-ahead-Log): is a file attached to every Region
Server inside the distributed environment. The WAL stores
the new data that hasn’t been persisted or committed to the
permanent storage. It is used in case of failure to recover the
data sets.
• Block Cache: stores the frequently read data in the
memory. If the data in BlockCache is least recently used,
then that data is removed from BlockCache.

64
Region Server (cont.)
• MemStore: is the write cache. It stores all the incoming data
before committing it to the disk or permanent memory
(HFile). There is one MemStore for each column family in a
region. The data is sorted in lexicographical order before
committing it to the disk.
• HFile: is stored on HDFS. It stores the actual cells on the
disk. MemStore commits the data to HFile when the size of
MemStore exceeds.

65
HMaster Server
• HMaster create, delete tables and assigns regions to the Region
servers.
• It coordinates and manages the Region Server.
• It assigns regions to the Region Servers on startup and re-assigns
regions to Region Servers during recovery and load balancing.
• It monitors all the Region Server’s instances in the cluster (with the
help of Zookeeper) and performs recovery activities whenever any
Region Server is down.
• It provides an interface for creating, deleting and updating tables.

66
ZooKeeper
• ZooKeeper acts like a coordinator inside HBase distributed
environment.
• Every Region Server along with HMaster Server sends continuous
heartbeat at regular interval to Zookeeper and it checks which server
is alive and available.
• It also provides server failure notifications so that, recovery measures
can be executed.
• also maintains the .META Server’s path, which helps any client
in searching for any region. The Client first has to check with .META
Server in which Region Server a region belongs, and it gets the path of
that Region Server.
67
ZooKeeper (cont.)
• How does Zookeeper support for failure detection ?

68
Meta Table
• is a special HBase catalog table. It maintains a list of all the
Regions Servers in the HBase storage system.
• .META file maintains the table in form of keys and values. Key
represents the start key of the region and its id whereas the
value contains the path of the Region Server.

69
Meta Table (cont.)

70
Write Operation

71
Write Operation(cont.)
• Step 1: Whenever the client has a write request, the client writes the
data to the WAL. This WAL file is maintained in every Region Server
and Region Server uses it to recover data which is not committed to
the disk. (fault tolerant)
• Step 2:The data to be written is forwarded to MemStore which is
actually the RAM of the data node (Region Server).
• Step 3: A ACK signal is sent to client as a confirmation of task
completed.
• Step 4: When the MemStore reaches the threshold, it dumps the data
into a HFile.
72
HFile
• There is one MemStore for each column family. When the MemStore
reaches the threshold, it dumps all the data into a new HFile. ->HBase
contains multiple HFiles for each Column Family.
• Over time, the number of HFile grows as MemStore dumps the data. ->
need compaction
• The writes are placed sequentially on the disk. Therefore, the movement of
the disk’s read-write head is very less. This makes write and search
mechanism very fast.
• The HFile indexes are loaded in memory whenever an HFile is opened. This
helps in finding a record in a single seek.
• HFile also has information about bloom filters.
• Bloom Filter helps in searching key value pairs, it skips the file which does
not contain the required rowkey.
73
Read Operation
• Search by rowkeys:
• Step 1: Zookeeper has the location for META table which is
present in HRegion Server. When a client requests zookeeper, it
gives the address for the table.
• Step 2: The process continues to HRegionServer and gets to
META table, where it gets the region address of table where the
data is present to be read.
• The client caches this information with the location of the META
Table.

74
Read Operation (cont.)
• After the client retrieves the location of the Region Server (search
operation), if the client does not have it in its cache memory:
• Step 1: the scanner first looks for the row cell in Block cache. Here all the
recently read key value pairs are stored.
• Step 2: If Scanner fails to find the required result, it moves to the MemStore,
as we know this is the write cache memory. There, it searches for the most
recently written files, which has not been dumped yet in HFile.
• Step 3: It will use bloom filters to load the data from HFile.
• Step 4: The data taken from HFile is the latest read data and can be read by
the user again. Hence the data is written in BlockCache, so that the next time,
it can be instantly accessed by the client -> only need 1 step.
• Step 5: When the data is written in BlockCache and all the search is
completed, the read process with required data will be returned to the client
along with ACK
75
Compation Operation
• HBase need combines HFiles to reduce the storage and reduce the
number of disk seeks needed for a read. Compaction chooses some
HFiles from a region and combines them. There are two types of
compaction:
• Minor Compaction: HBase automatically picks smaller HFiles and recommits
them to bigger HFiles. It performs merge sort for committing smaller HFiles to
bigger HFiles. This helps in storage space optimization.
• Major Compaction: HBase merges and recommits the smaller HFiles of a
region to a new HFile. In this process, the same column families are placed
together in the new HFile. It drops deleted and expired cell in this process. It
increases read performance.
• write amplification: But during this process, input-output disks
and network traffic might get congested. So, it is generally
scheduled during low peak load timings. 76
Compation Operation (cont.)

77
Region Split
• Whenever a region becomes large, it is
divided into two child regions.
• Each region represents exactly a half of the
parent region. Then this split is reported to
the HMaster. This is handled by the same
Region Server until the HMaster allocates
them to a new Region Server for load
balancing.

78
Failure Recovery
• Whenever a Region Server fails, ZooKeeper notifies to the HMaster
about the failure.
• Then HMaster distributes and allocates the regions of crashed Region
Server to many active Region Servers. To recover the data of the
MemStore of the failed Region Server, the HMaster distributes the WAL
to all the Region Servers.
• Each Region Server re-executes the WAL to build the MemStore for
that failed region’s column family.
• The data is written in chronological order (in a timely order) in WAL.
Therefore, Re-executing that WAL means making all the change that
were made and stored in the MemStore file.
• So, after all the Region Servers executes the WAL, the MemStore data
for all column family is recovered.
79
CRUD operations
• Hbase only has low-level CRUD (create, read, update, delete)
operations. It is the responsibility of the application programs to
implement more complex operations, such as joins between rows in
different tables.

80
CRUD operations (cont.)
• The create operation creates a new table and specifies one or more
column families associated with that table.
• The put operation is used for inserting new data or new versions of
existing data items.
• The get operation is for retrieving the data associated with a single row
in a table.
• The scan operation retrieves all the rows.

81
Cons of HBase
• Single point of failure: At the time when only one HMaster is used, there is a
possibility of failure.
• No handling of JOINS in database: Instead of the database itself, JOINs are
handled in MapReduce layer.
• Sorted only on key: HBase is indexed and sorted only on key.
• Not a perfect replacement: Since HBase does not support some of the
traditional model’s features, we cannot expect completely to use HBase as a
replacement for traditional models.
• No support SQL structure: As there is no support for SQL structure, it
cannot contain any query optimizer.
• Unpredictable latencies: While we integrate HBase with Map-reduce jobs, it
will result in unpredictable latencies.
• Memory issues on the cluster: In some time memory issues on the cluster,
HBase is integrated with Pig and Hive jobs results.

82
Introduction to Neo4j
• is a NoSQL database. It is highly scalable and schema-free. It's world
most popular graph database management system.
• Its architecture is designed for optimal management, storage and
traversal of nodes and relationships.
• Neo4j is best for storing data that has many interconnecting
relationships.
• The graph model doesn't usually require a predefined schema. So there
is no need to create the database structure before you load the data
(like you do in a relational database). In Neo4j, the data is the
structure. Neo4j is a "schema-optional" DBMS.
• In Neo4j, no need to set up primary key/foreign key constraints to
predetermine which fields can have a relationship, and to which data.
You just have to define the relationships between the nodes you need.
83
Features of Neo4j
• ACID Property: Neo4j supports full ACID properties.
• Scalability: Neo4j facilitates you to scale the database by increasing the
number of reads/writes, and the volume without affecting the data
integrity and the speed of query processing.
• Reliability: Neo4j provides replication for data safety and reliability.
• Cypher Query Language: Neo4j provides a powerful declarative query
language called Cypher Query language.
• GraphDB: Neo4j follows Property Graph Data Model.
• Support Built-in Web applications
• Flexible Schema

84
Cypher
• Neo4j has a high-level query language, Cypher.
• There are declarative commands for creating nodes and
relationships as well as for finding nodes and relationships based on
specifying patterns.
• Deletion and modification of data is also possible in Cypher.
• A Cypher query is made up of clauses. When a query has several
clauses, the result from one clause can be the input to the next
clause in the query.

85
Cypher (cont.)
Example 1:

86
Cypher (cont.)
Example 1 (cont.):
- Create node:

87
Cypher (cont.)
Example 1 (cont.):
- Create relationships:

88
Cypher (cont.)

89
Cypher (cont.)
Example 2:
- Returns the projects and hours per week that the employee with Empid = 2
works on.
MATCH (e: EMPLOYEE {Empid: ‘2’}) – [ w: WorksOn ] →(p)
RETURN e.Ename , w.Hours, p.Pname

- Returns all employees and the projects they work on, hours per week, sorted by
Ename
MATCH (e) – [ w: WorksOn ] → (p)
RETURN e.Ename , w.Hours, p.Pname
ORDER BY e.Ename

90
The Property Graph Model
• Components of Graph Model:
• Example 3: Consider the graph below

91
The Property Graph Model
• Components of Graph Model:
• Nodes are often used to represent entities. The simplest possible
graph is a single node. A node can have one or more labels (that
describe its role) and properties (i.e. attributes). The nodes that
have the same label are grouped into a collection that identifies
a subset of the nodes in the database graph for querying
purposes.

92
The Property Graph Model
• Labels: are used to shape the domain by grouping nodes into
sets where all nodes that have a certain label belongs to the
same set. A node can have zero to many labels.
As example 3, Asume that we want to express different
dimensions of the data. One way of doing that is to add more
labels.

93
The Property Graph Model
• Relationships: A relationship connects two nodes. Relationships
organize nodes into structures, allowing a graph to resemble a
list, a tree, a map, or a compound entity. Relationships can have
one or more properties.

94
The Property Graph Model
• Relationship types: A relationship must have exactly one
relationship type. Relationships always have a direction
Example below is an ACTED_IN relationship, with the Tom
Hanks node as the source node and Forrest Gump as the target
node. The roles property on the ACTED_IN relationship has an
array value with a single item in it.

95
The Property Graph Model
• Properties: Properties are name-value pairs that are used to add
qualities to nodes and relationships. The value part of the
property can hold different data types such as number, string or
boolean
• As example 3, we have name and born on Persons nodes, title
and released on Movie nodes, and the property roles on
the ACTED_IN relationship.

96
The Property Graph Model
• Traversals and Paths: A traversal is how you query a graph in
order to find answers to questions. Traversing a graph means
visiting nodes by following relationships according to some rules.
• Example: Find out which movies Tom Hanks acted in according to
example 3.

97
The Property Graph Model
• Indexing and node identifiers: When a node is created, the
Neo4j system creates an internal unique system-defined
identifier for each node. To retrieve individual nodes using other
properties of the nodes efficiently, the user can create indexes
for the collection of nodes that have a particular label. Typically,
one or more of the properties of the nodes in that collection can
be indexed.
• Example: As example 1, Empid can be used to index nodes with
the EMPLOYEE label, Dno to index the nodes with the
DEPARTMENT label, and Pno to index the nodes with the
PROJECT label.

98
Neo4j with chatbot

99
Reference
• [1] Fundamental of Database System, 7th edition
• [2] https://www.edureka.co/blog/hbase-architecture/
• [3] https://hbase.apache.org/book.html
• [4] Define of The Property Graph Database Model http://ceur-ws.org/Vol-
2100/paper26.pdf
• [5] https://neo4j.com/docs/getting-started/current/graphdb-concepts/
• [6] https://neo4j.com/blog/ebay-shopbot-graph-powered-conversational-
commerce/
• [7] https://neo4j.com/developer/graph-model-refactoring/
• [8] http://docs.mongodb.org/manual/reference/bson-types/
• [9] http://www.aaronstannard.com/post/2011/06/30/MongoDB-vs-SQL-
Server.aspx
• [10] https://docs.mongodb.org/manual/
• [11] http://json.org/
100

You might also like