Unit 5 - Nosql DB
Unit 5 - Nosql DB
INTRODUCTION TO NOSQL:
NoSQL is a type of database management system (DBMS) that is designed to
handle and store large volumes of unstructured and semi-structured data. Unlike traditional
relational databases that use tables with pre-defined schemas to store data, NoSQL
databases use flexible data models that can adapt to changes in data structures and are
capable of scaling horizontally to handle growing amounts of data.
The term NoSQL originally referred to “non-SQL” or “non-relational” databases,
but the term has since evolved to mean “not only SQL,” as NoSQL databases have
expanded to include a wide range of different database architectures and data models.
MONGODB – INTRODUCTION:
MongoDB is an open-source document database that provides high performance,
high availability, and automatic scaling.
In simple words, you can say that - Mongo DB is a document-oriented database. It is an
open source product, developed and supported by a company named 10gen.
SQL NoSQL
Databases are categorized as Relational NoSQL databases are categorized as Non-
Database Management System (RDBMS) relational or distributed database system.
SQL databases have fixed or static or
NoSQL databases have dynamic schema.
predefined schema.
NoSQL databases display data as collection of
SQL databases display data in form of tables
key-value pair, documents, graph databases or
so it is known as table-based database.
wide-column stores.
SQL databases are vertically scalable. NoSQL databases are horizontally scalable.
In NoSQL databases, collection of documents
SQL databases use a powerful language
are used to query the data. It is also called
"Structured Query Language" to define and
unstructured query language. It varies from
manipulate the data.
database to database.
NoSQL databases are not so good for complex
SQL databases are best suited for complex
queries because these are not as powerful as
queries.
SQL queries.
SQL databases are not best suited for NoSQL databases are best suited for
hierarchical data storage. hierarchical data storage.
1
MySQL, Oracle, Sqlite, PostgreSQL and MongoDB, BigTable, Redis, RavenDB,
MS-SQL etc. are the example of SQL Cassandra, Hbase, Neo4j, CouchDB etc. are
database. the example of nosql database
DATA TYPES:
Data
Description
Types
String is the most commonly used datatype. It is used to store data. A string must
String
be UTF 8 valid in mongodb.
Integer is used to store the numeric value. It can be 32 bit or 64 bit depending on
Integer
the server you are using.
Boolean This datatype is used to store boolean values. It just shows YES/NO values.
Double Double datatype stores floating point values.
Min/Max This datatype compare a value against the lowest and highest bson elements.
Keys
Arrays This datatype is used to store a list or multiple values into a single key.
Object Object datatype is used for embedded documents.
Null It is used to store null values.
Symbol It is generally used for languages that use a specific type.
This datatype stores the current date or time in unix time format. It makes you
Date possible to specify your own date time by creating object of date and pass the
value of date, month, year into it.
>use javatpointdb
Swithched to db javatpointdb
To check the currently selected database, use the command db:
>db
javatpointdb
To check the database list, use the command show dbs:
>show dbs
local 0.078GB
2
Here, your created database "javatpointdb" is not present in the list, insert at least one
document into it to display database:
>db.movie.insert({"name":"javatpoint"})
WriteResult({ "nInserted": 1})
>show dbs
javatpointdb 0.078GB
local 0.078GB
Update database:
In MongoDB, update() method is used to update or modify the existing documents of
a collection.
Syntax:
db.COLLECTION_NAME.update(SELECTIOIN_CRITERIA, UPDATED_DATA)
Example:
Consider an example which has a collection name javatpoint. Insert the following
documents in collection:
db.javatpoint.insert (
{
course: "java",
details: {
duration: "6 months",
Trainer: "Sonoo jaiswal"
},
Batch: [ { size: "Small", qty: 15 }, { size: "Medium", qty: 25 } ],
category: "Programming language"
}
)
Output:
{ "_id" : ObjectId("56482d3e27e53d2dbc93cef8"), "course" : "java", "details" :
{ "duration" : "6 months", "Trainer" : "Sonoo jaiswal" }, "Batch" :
[ {"size" : "Small", "qty" : 15 }, { "size" : "Medium", "qty" : 25 } ],
"category" : "Programming language" }
Output:
{ "_id" : ObjectId("56482d3e27e53d2dbc93cef8"), "course" : "android", "details" :
3
{ "duration" : "6 months", "Trainer" : "Sonoo jaiswal" }, "Batch" :
[ {"size" : "Small", "qty" : 15 }, { "size" : "Medium", "qty" : 25 } ],
"category" : "Programming language" }
Delete database:
Deletion criteria: With the use of its syntax you can remove the documents from the
collection.
JustOne: It removes only one document when set to true or 1.
Syntax:
db.collection_name.remove (DELETION_CRITERIA)
QUERYING:
In MongoDB, the db.collection.find() method is used to retrieve documents from a
collection. This method returns a cursor to the retrieved documents.
The db.collection.find() method reads operations in mongoDB shell and retrieves documents
containing all their fields.
Syntax:
db.COLLECTION_NAME.find({})
4
Select all documents in a collection:
To retrieve all documents from a collection, put the query document ({}) empty. It
will be like this:
db.COLLECTION_NAME.find()
For example: If you have a collection name "canteen" in your database which has some
fields like foods, snacks, beverages, price etc. then you should use the following query to
select all documents in the collection "canteen".
db.canteen.find()
INTRODUCTION TO INDEXING
MongoDB uses indexing in order to make the query processing more efficient. If
there is no indexing, then the MongoDB must scan every document in the collection and
retrieve only those documents that match the query. Indexes are special data structures that
stores some information related to the documents such that it becomes easy for MongoDB
to find the right data file. The indexes are order by the value of the field specified in the
index.
Creating an Index:
MongoDB provides a method called createIndex() that allows user to create an index.
Syntax:
db.COLLECTION_NAME.createIndex({KEY:1})
The key determines the field on the basis of which you want to create an index and 1 (or -1)
determines the order in which these indexes will be arranged (ascending or descending).
Example:
db.mycol.createIndex({“age”:1})
{
“createdCollectionAutomatically” : false,
“numIndexesBefore” : 1,
“numIndexesAfter” : 2,
“ok” : 1
}
The createIndex() method also has a number of optional parameters.
These include:
background (Boolean)
unique (Boolean)
name (string)
sparse (Boolean)
expireAfterSeconds (integer)
hidden (Boolean)
storageEngine (Document)
Drop an index:
In order to drop an index, MongoDB provides the dropIndex() method.
Syntax:
db.NAME_OF_COLLECTION.dropIndex({KEY:1})
5
The dropIndex() methods can only delete one index at a time. In order to delete (or drop)
multiple indexes from the collection, MongoDB provides the dropIndexes() method that
takes multiple indexes as its parameters.
Syntax:
db.NAME_OF_COLLECTION.dropIndexes({KEY1:1, KEY2: 1})
The dropIndex() methods can only delete one index at a time. In order to delete (or drop)
multiple indexes from the collection, MongoDB provides the dropIndexes() method that
takes multiple indexes as its parameters.
CAPPED COLLECTIONS
Fixed-size collections are called capped collections in MongoDB. While creating a
collection, the user must specify the collection’s maximum size in bytes and the maximum
number of documents that it would store. If more documents are added than the specified
capacity, the existing ones are overwritten.
Capped collection in MongoDB supports high-throughput operations used for
insertion and retrieval of documents based on the order of insertion. Capped collection
working is similar to circular buffers, i.e., there is a fixed space allocated for the capped
collection. The oldest documents are overwritten to make space for the new documents in the
collection once the fixed size gets exhausted.
>db.cappedLogCollection.find().sort({$natural:-1})
How to Check Whether a Collection is Capped or Not?
6
You can check whether a collection is capped using the isCapped() method. This
method returns true as the output if the specified collection is a capped one. Otherwise, it
returns false.
To check whether a given collection is capped or not, use the isCapped function as follows:
>db.cappedLogCollection.isCapped()
Hbase
Hbase is an open source and sorted map data built on Hadoop. It is column oriented and
horizontally scalable.
It is based on Google's Big Table.It has set of tables which keep data in key value format.
Hbase is well suited for sparse data sets which are very common in big data use cases. Hbase
provides APIs enabling development in practically any programming language. It is a part of
the Hadoop ecosystem that provides random real-time read/write access to data in the
Hadoop File System.
Why HBase
RDBMS get exponentially slow as the data becomes large
Expects data to be highly structured, i.e. ability to fit in a well-defined schema
Any change in schema might require a downtime
For sparse datasets, too much of overhead of maintaining NULL values
Features of Hbase
Horizontally scalable: You can add any number of columns anytime.
7
Automatic Failover: Automatic failover is a resource that allows a system
administrator to automatically switch data handling to a standby system in the event
of system compromise
Integrations with Map/Reduce framework: Al the commands and java codes internally
implement Map/ Reduce to do the task and it is built over Hadoop Distributed File
System.
sparse, distributed, persistent, multidimensional sorted map, which is indexed by
rowkey, column key and timestamp.
Often referred as a key value store or column family-oriented database, or storing
versioned maps of maps.
Fundamentally, it's a platform for storing and retrieving data with random access.
It doesn't care about datatypes (storing an integer in one row and a string in another
for the same column).
It doesn't enforce relationships within your data.
It is designed to run on a cluster of computers, built using commodity hardware.
Hbase Vs RDBMS
S.
Parameters RDBMS HBase
No.
It requires SQL (Structured Query
1. SQL SQL is not required in HBase.
Language).
It does not have a fixed schema
2. Schema It has a fixed schema. and allows for the addition of
columns on the fly.
Database
3. It is a row-oriented database It is a column-oriented database.
Type
RDBMS allows for scaling up. That
Scale-out is possible using HBase.
implies, that rather than adding new
It means that, while we require
servers, we should upgrade the
extra memory and disc space, we
4. Scalability current server to a more capable
must add new servers to the
server whenever there is a
cluster rather than upgrade the
requirement for more memory,
existing ones.
processing power, and disc space.
5. Nature It is static in nature Dynamic in nature
Data In RDBMS, slower retrieval of
6. In HBase, faster retrieval of data.
retrieval data.
It follows the ACID (Atomicity, It follows CAP (Consistency,
7. Rule Consistency, Isolation, and Availability, Partition-tolerance)
Durability) property. theorem.
It can handle structured,
8. Type of data It can handle structured data. unstructured as well as semi-
structured data.
9. Sparse data It cannot handle sparse data. It can handle sparse data.
10. Volume of The amount of data in RDBMS is In HBase, the .amount of data
data determined by the server’s depends on the number of
configuration. machines deployed rather than on
8
a single machine.
In RDBMS, mostly there is a In HBase, there is no such
Transaction
11. guarantee associated with guarantee associated with the
Integrity
transaction integrity. transaction integrity.
When it comes to referential
Referential Referential integrity is supported by
12. integrity, no built-in support is
Integrity RDBMS.
available.
The data in HBase is not
normalized, which means there is
In RDBMS, you can normalize the
13. Normalize no logical relationship or
data.
connection between distinct tables
of data.
It is designed to accommodate
It is designed to accommodate
14. Table size large tables. HBase may scale
small tables. Scaling is difficult.
horizontally.
We are using the same create command in HBase as well to create HBase Table. But
the difference is the column family name.
We should specify the table name and the column family name while creating an
HBase Table.
HBase is a NoSQL database and works on (key, value) pair. Here Key & value will be of
type bytearray.
Syntax:
create ‘<Table Name>’, ‘<Column Family Name>’
Example:
create ‘Employee’, ‘Personal Data’, ‘Career Data’
Once your table is created, you can check the table with the help of list command.
Insert the remaining rows using the put command in the same way. If you insert the whole
table, you will get the following output.
hbase(main):022:0> scan 'emp'
ROW COLUMN+CELL
1 column=personal data:city, timestamp=1417524216501, value=hyderabad
1 column=personal data:name, timestamp=1417524185058, value=ramu
Example
Suppose there is a table in HBase called emp with the following data.
hbase(main):003:0> scan 'emp'
ROW COLUMN + CELL
row1 column = personal:name, timestamp = 1418051555, value = raju
10
row1 column = personal:city, timestamp = 1418275907, value = Hyderabad
row1 column = professional:designation, timestamp = 14180555,value = manager
row1 column = professional:salary, timestamp = 1418035791555,value = 50000
1 row(s) in 0.0100 seconds
The following command will update the city value of the employee named ‘Raju’ to Delhi.
hbase(main):002:0> put 'emp','row1','personal:city','Delhi'
0 row(s) in 0.0400 seconds
The updated table looks as follows where you can observe the city of Raju has been changed
to ‘Delhi’.
hbase(main):003:0> scan 'emp'
ROW COLUMN + CELL
row1 column = personal:name, timestamp = 1418035791555, value = raju
row1 column = personal:city, timestamp = 1418274645907, value = Delhi
row1 column = professional:designation, timestamp = 141857555,value = manager
row1 column = professional:salary, timestamp = 1418039555, value = 50000
1 row(s) in 0.0100 seconds
Example
The following example shows how to use the get command. Let us scan the first row of the
emp table.
hbase(main):012:0> get 'emp', '1'
COLUMN CELL
personal : city timestamp = 1417521848375, value = hyderabad
personal : name timestamp = 1417521785385, value = ramu
professional: designation timestamp = 1417521885277, value = manager
professional: salary timestamp = 1417521903862, value = 50000
⇒
Given below is the syntax to read a specific column using the get method.
hbase> get 'table name', ‘rowid’, {COLUMN ‘column
family:column name ’}
Example
⇒
Given below is the example to read a specific column in HBase table.
hbase(main):015:0> get 'emp', 'row1', {COLUMN
'personal:name'}
COLUMN CELL
11
personal:name timestamp = 1418035791555, value = raju
1 row(s) in 0.0080 seconds
Example
Here is an example to delete a specific cell. Here we are deleting the salary.
hbase(main):006:0> delete 'emp', '1', 'personal data:city',
1417521848375
0 row(s) in 0.0060 seconds
Example
Here is an example of “deleteall” command, where we are deleting all the cells of row1 of
emp table.
hbase(main):007:0> deleteall 'emp','1'
0 row(s) in 0.0240 seconds
Verify the table using the scan command. A snapshot of the table after deleting the table is
given below.
hbase(main):022:0> scan 'emp'
USE CASES
13