0% found this document useful (0 votes)
27 views13 pages

Unit 5 - Nosql DB

Uploaded by

aiphoneix70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views13 pages

Unit 5 - Nosql DB

Uploaded by

aiphoneix70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT V - NoSQL DATABASE

Introduction to NoSQL - MongoDB: Introduction - Data types - Creating -


Updating and deleing documents - Querying - Introduction to indexing -
Capped collections - Hbase: Concepts - Hbase Vs RDBMS - Creating records -
Accessing data - Updating and deleting data - Modifying data – exporting and
importing data. USE CASES: Call detail log analysis - Credit fraud alert -
Weather forecast.

INTRODUCTION TO NOSQL:
NoSQL is a type of database management system (DBMS) that is designed to
handle and store large volumes of unstructured and semi-structured data. Unlike traditional
relational databases that use tables with pre-defined schemas to store data, NoSQL
databases use flexible data models that can adapt to changes in data structures and are
capable of scaling horizontally to handle growing amounts of data.
The term NoSQL originally referred to “non-SQL” or “non-relational” databases,
but the term has since evolved to mean “not only SQL,” as NoSQL databases have
expanded to include a wide range of different database architectures and data models.

MONGODB – INTRODUCTION:
MongoDB is an open-source document database that provides high performance,
high availability, and automatic scaling.
In simple words, you can say that - Mongo DB is a document-oriented database. It is an
open source product, developed and supported by a company named 10gen.

MongoDB was developed by a NewYork based organization named 10gen which is


now known as MongoDB Inc. It was initially developed as a PAAS (Platform as a Service).
Later in 2009, it is introduced in the market as an open source database server that was
maintained and supported by MongoDB Inc.

Difference between sql and Nosql?

SQL NoSQL
Databases are categorized as Relational NoSQL databases are categorized as Non-
Database Management System (RDBMS) relational or distributed database system.
SQL databases have fixed or static or
NoSQL databases have dynamic schema.
predefined schema.
NoSQL databases display data as collection of
SQL databases display data in form of tables
key-value pair, documents, graph databases or
so it is known as table-based database.
wide-column stores.
SQL databases are vertically scalable. NoSQL databases are horizontally scalable.
In NoSQL databases, collection of documents
SQL databases use a powerful language
are used to query the data. It is also called
"Structured Query Language" to define and
unstructured query language. It varies from
manipulate the data.
database to database.
NoSQL databases are not so good for complex
SQL databases are best suited for complex
queries because these are not as powerful as
queries.
SQL queries.
SQL databases are not best suited for NoSQL databases are best suited for
hierarchical data storage. hierarchical data storage.

1
MySQL, Oracle, Sqlite, PostgreSQL and MongoDB, BigTable, Redis, RavenDB,
MS-SQL etc. are the example of SQL Cassandra, Hbase, Neo4j, CouchDB etc. are
database. the example of nosql database
DATA TYPES:
Data
Description
Types
String is the most commonly used datatype. It is used to store data. A string must
String
be UTF 8 valid in mongodb.
Integer is used to store the numeric value. It can be 32 bit or 64 bit depending on
Integer
the server you are using.
Boolean This datatype is used to store boolean values. It just shows YES/NO values.
Double Double datatype stores floating point values.
Min/Max This datatype compare a value against the lowest and highest bson elements.
Keys
Arrays This datatype is used to store a list or multiple values into a single key.
Object Object datatype is used for embedded documents.
Null It is used to store null values.
Symbol It is generally used for languages that use a specific type.
This datatype stores the current date or time in unix time format. It makes you
Date possible to specify your own date time by creating object of date and pass the
value of date, month, year into it.

CREATING - UPDATING AND DELEING DOCUMENTS:

Creating - Use Database method:


There is no create database command in MongoDB. Actually, MongoDB do not
provide any command to create database.
It may be look like a weird concept, if you are from traditional SQL background
where you need to create a database, table and insert values in the table manually.
Here, in MongoDB you don't need to create a database manually because MongoDB
will create it automatically when you save the value into the defined collection at first time.

How and when to create database?


If there is no existing database, the following command is used to create a new database.
Syntax:
use DATABASE_NAME
If the database already exists, it will return the existing database.
Let' take an example to demonstrate how a database is created in MongoDB. In the following
example, we are going to create a database "javatpointdb".

>use javatpointdb
Swithched to db javatpointdb
To check the currently selected database, use the command db:
>db
javatpointdb
To check the database list, use the command show dbs:
>show dbs
local 0.078GB

2
Here, your created database "javatpointdb" is not present in the list, insert at least one
document into it to display database:
>db.movie.insert({"name":"javatpoint"})
WriteResult({ "nInserted": 1})

>show dbs
javatpointdb 0.078GB
local 0.078GB

Update database:
In MongoDB, update() method is used to update or modify the existing documents of
a collection.

Syntax:
db.COLLECTION_NAME.update(SELECTIOIN_CRITERIA, UPDATED_DATA)

Example:
Consider an example which has a collection name javatpoint. Insert the following
documents in collection:
db.javatpoint.insert (
{
course: "java",
details: {
duration: "6 months",
Trainer: "Sonoo jaiswal"
},
Batch: [ { size: "Small", qty: 15 }, { size: "Medium", qty: 25 } ],
category: "Programming language"
}
)

After successful insertion, check the documents by following query:


>db.javatpoint.find()

Output:
{ "_id" : ObjectId("56482d3e27e53d2dbc93cef8"), "course" : "java", "details" :
{ "duration" : "6 months", "Trainer" : "Sonoo jaiswal" }, "Batch" :
[ {"size" : "Small", "qty" : 15 }, { "size" : "Medium", "qty" : 25 } ],
"category" : "Programming language" }

Update the existing course "java" into "android":


>db.javatpoint.update({'course':'java'},{$set:{'course':'android'}})

Check the updated document in the collection:


>db.javatpoint.find()

Output:
{ "_id" : ObjectId("56482d3e27e53d2dbc93cef8"), "course" : "android", "details" :
3
{ "duration" : "6 months", "Trainer" : "Sonoo jaiswal" }, "Batch" :
[ {"size" : "Small", "qty" : 15 }, { "size" : "Medium", "qty" : 25 } ],
"category" : "Programming language" }

Delete database:

In MongoDB, the db.colloction.remove() method is used to delete documents from a


collection. The remove() method works on two parameters.

Deletion criteria: With the use of its syntax you can remove the documents from the
collection.
JustOne: It removes only one document when set to true or 1.

Syntax:
db.collection_name.remove (DELETION_CRITERIA)

Remove all documents


If you want to remove all documents from a collection, pass an empty query
document {} to the remove() method. The remove() method does not remove the indexes.
Let's take an example to demonstrate the remove() method. In this example, we
remove all documents from the "javatpoint" collection.
db.javatpoint.remove({})
Remove all documents that match a condition
If you want to remove a document that match a specific condition, call the remove() method
with the <query> parameter.
The following example will remove all documents from the javatpoint collection where the
type field is equal to programming language.
db.javatpoint.remove( { type : "programming language" } )
Remove a single document that match a condition
If you want to remove a single document that match a specific condition, call the remove()
method with justOne parameter set to true or 1.
The following example will remove a single document from the javatpoint collection where
the type field is equal to programming language.
db.javatpoint.remove( { type : "programming language" }, 1 )

QUERYING:
In MongoDB, the db.collection.find() method is used to retrieve documents from a
collection. This method returns a cursor to the retrieved documents.
The db.collection.find() method reads operations in mongoDB shell and retrieves documents
containing all their fields.

Syntax:
db.COLLECTION_NAME.find({})

4
Select all documents in a collection:
To retrieve all documents from a collection, put the query document ({}) empty. It
will be like this:
db.COLLECTION_NAME.find()
For example: If you have a collection name "canteen" in your database which has some
fields like foods, snacks, beverages, price etc. then you should use the following query to
select all documents in the collection "canteen".
db.canteen.find()

INTRODUCTION TO INDEXING
MongoDB uses indexing in order to make the query processing more efficient. If
there is no indexing, then the MongoDB must scan every document in the collection and
retrieve only those documents that match the query. Indexes are special data structures that
stores some information related to the documents such that it becomes easy for MongoDB
to find the right data file. The indexes are order by the value of the field specified in the
index.

Creating an Index:
MongoDB provides a method called createIndex() that allows user to create an index.

Syntax:
db.COLLECTION_NAME.createIndex({KEY:1})
The key determines the field on the basis of which you want to create an index and 1 (or -1)
determines the order in which these indexes will be arranged (ascending or descending).

Example:
db.mycol.createIndex({“age”:1})
{
“createdCollectionAutomatically” : false,
“numIndexesBefore” : 1,
“numIndexesAfter” : 2,
“ok” : 1
}
The createIndex() method also has a number of optional parameters.
These include:
 background (Boolean)
 unique (Boolean)
 name (string)
 sparse (Boolean)
 expireAfterSeconds (integer)
 hidden (Boolean)
 storageEngine (Document)

Drop an index:
In order to drop an index, MongoDB provides the dropIndex() method.

Syntax:
db.NAME_OF_COLLECTION.dropIndex({KEY:1})
5
The dropIndex() methods can only delete one index at a time. In order to delete (or drop)
multiple indexes from the collection, MongoDB provides the dropIndexes() method that
takes multiple indexes as its parameters.

Syntax:
db.NAME_OF_COLLECTION.dropIndexes({KEY1:1, KEY2: 1})

The dropIndex() methods can only delete one index at a time. In order to delete (or drop)
multiple indexes from the collection, MongoDB provides the dropIndexes() method that
takes multiple indexes as its parameters.

Get description of all indexes:


The getIndexes() method in MongoDB gives a description of all the indexes that
exists in the given collection.
Syntax:
db.NAME_OF_COLLECTION.getIndexes()
It will retrieve all the description of the indexes created within the collection.

CAPPED COLLECTIONS
Fixed-size collections are called capped collections in MongoDB. While creating a
collection, the user must specify the collection’s maximum size in bytes and the maximum
number of documents that it would store. If more documents are added than the specified
capacity, the existing ones are overwritten.
Capped collection in MongoDB supports high-throughput operations used for
insertion and retrieval of documents based on the order of insertion. Capped collection
working is similar to circular buffers, i.e., there is a fixed space allocated for the capped
collection. The oldest documents are overwritten to make space for the new documents in the
collection once the fixed size gets exhausted.

Creating Capped Collection in MongoDB


Use the createCollection command along with the capped option to create a capped
collection. Specify the collection’s maximum size in bytes as follows:
>db.createCollection("cappedLogCollection",{capped:true,size:10000})
To limit the number of documents that can be included in the capped collection, use the max
parameter as shown below:
>db.createCollection("cappedLogCollection",{capped:true,size:10000,max:1000})
Querying Capped Collection in MongoDB
In a capped collection, the find query display results in insertion order by default. To
retrieve the documents from a capped collection in the reverse order, use the sort command as
follows:

>db.cappedLogCollection.find().sort({$natural:-1})
How to Check Whether a Collection is Capped or Not?

6
You can check whether a collection is capped using the isCapped() method. This
method returns true as the output if the specified collection is a capped one. Otherwise, it
returns false.
To check whether a given collection is capped or not, use the isCapped function as follows:
>db.cappedLogCollection.isCapped()

How to Convert a Collection to a Capped?


If there is a normal collection, you can change it to capped by using the
convertToCapped command. To convert an existing collection to capped, use the following
code:
>db.runCommand({"convertToCapped":"posts",size:10000})

Advantages of Capped Collections in MongoDB


 It returns documents in insertion order without the need for an index and provides greater
insertion throughput.
 Capped collections enable changes that match the original document size, ensuring the
position of the document does not change on the disk.
 Holding log files.

Disadvantages of Capped Collections


 It cannot be shared.
 An update operation fails when the document exceeds the original size of the capped
collection in MongoDB.
 It is not possible to delete documents from a capped collection. You can delete all records
using the following command: { emptycapped: Collection_name }

Hbase
Hbase is an open source and sorted map data built on Hadoop. It is column oriented and
horizontally scalable.
It is based on Google's Big Table.It has set of tables which keep data in key value format.
Hbase is well suited for sparse data sets which are very common in big data use cases. Hbase
provides APIs enabling development in practically any programming language. It is a part of
the Hadoop ecosystem that provides random real-time read/write access to data in the
Hadoop File System.
Why HBase
 RDBMS get exponentially slow as the data becomes large
 Expects data to be highly structured, i.e. ability to fit in a well-defined schema
 Any change in schema might require a downtime
 For sparse datasets, too much of overhead of maintaining NULL values
Features of Hbase
 Horizontally scalable: You can add any number of columns anytime.

7
 Automatic Failover: Automatic failover is a resource that allows a system
administrator to automatically switch data handling to a standby system in the event
of system compromise
 Integrations with Map/Reduce framework: Al the commands and java codes internally
implement Map/ Reduce to do the task and it is built over Hadoop Distributed File
System.
 sparse, distributed, persistent, multidimensional sorted map, which is indexed by
rowkey, column key and timestamp.
 Often referred as a key value store or column family-oriented database, or storing
versioned maps of maps.
 Fundamentally, it's a platform for storing and retrieving data with random access.
 It doesn't care about datatypes (storing an integer in one row and a string in another
for the same column).
 It doesn't enforce relationships within your data.
 It is designed to run on a cluster of computers, built using commodity hardware.

Hbase Vs RDBMS
S.
Parameters RDBMS HBase
No.
It requires SQL (Structured Query
1. SQL SQL is not required in HBase.
Language).
It does not have a fixed schema
2. Schema It has a fixed schema. and allows for the addition of
columns on the fly.
Database
3. It is a row-oriented database It is a column-oriented database.
Type
RDBMS allows for scaling up. That
Scale-out is possible using HBase.
implies, that rather than adding new
It means that, while we require
servers, we should upgrade the
extra memory and disc space, we
4. Scalability current server to a more capable
must add new servers to the
server whenever there is a
cluster rather than upgrade the
requirement for more memory,
existing ones.
processing power, and disc space.
5. Nature It is static in nature Dynamic in nature
Data In RDBMS, slower retrieval of
6. In HBase, faster retrieval of data.
retrieval data.
It follows the ACID (Atomicity, It follows CAP (Consistency,
7. Rule Consistency, Isolation, and Availability, Partition-tolerance)
Durability) property. theorem.
It can handle structured,
8. Type of data It can handle structured data. unstructured as well as semi-
structured data.
9. Sparse data It cannot handle sparse data. It can handle sparse data.
10. Volume of The amount of data in RDBMS is In HBase, the .amount of data
data determined by the server’s depends on the number of
configuration. machines deployed rather than on

8
a single machine.
In RDBMS, mostly there is a In HBase, there is no such
Transaction
11. guarantee associated with guarantee associated with the
Integrity
transaction integrity. transaction integrity.
When it comes to referential
Referential Referential integrity is supported by
12. integrity, no built-in support is
Integrity RDBMS.
available.
The data in HBase is not
normalized, which means there is
In RDBMS, you can normalize the
13. Normalize no logical relationship or
data.
connection between distinct tables
of data.
It is designed to accommodate
It is designed to accommodate
14. Table size large tables. HBase may scale
small tables. Scaling is difficult.
horizontally.

CREATING RECORDS, ACCESSING DATA, UPDATING, DELETING


DATA - MODIFYING DATA – EXPORTING AND IMPORTING DATA

We are using the same create command in HBase as well to create HBase Table. But
the difference is the column family name.
We should specify the table name and the column family name while creating an
HBase Table.
HBase is a NoSQL database and works on (key, value) pair. Here Key & value will be of
type bytearray.

HBase Create Table


Create command is used to create table in HBase followed by the table name and
column families as below-

Syntax:
create ‘<Table Name>’, ‘<Column Family Name>’

Example:
create ‘Employee’, ‘Personal Data’, ‘Career Data’
Once your table is created, you can check the table with the help of list command.

Example: hbase(main):002:0> list


Output: TABLE
Employee

Inserting Data using HBase Shell


This chapter demonstrates how to create data in an HBase table. To create data in an
HBase table, the following commands and methods are used:
 put command,
 add() method of Put class, and
 put() method of HTable class.
As an example, we are going to create the following table in HBase.
9
Using put command, you can insert rows into a table. Its syntax is as follows:
put ’<table name>’,’row1’,’<colfamily:colname>’,’<value>’

Inserting the First Row


Let us insert the first row values into the emp table as shown below.
hbase(main):005:0> put 'emp','1','personal data:name','raju'
0 row(s) in 0.6600 seconds
hbase(main):006:0> put 'emp','1','personal data:city','hyderabad'
0 row(s) in 0.0410 seconds
hbase(main):007:0> put 'emp','1','professional
data:designation','manager'
0 row(s) in 0.0240 seconds
hbase(main):007:0> put 'emp','1','professional data:salary','50000'
0 row(s) in 0.0240 seconds

Insert the remaining rows using the put command in the same way. If you insert the whole
table, you will get the following output.
hbase(main):022:0> scan 'emp'

ROW COLUMN+CELL
1 column=personal data:city, timestamp=1417524216501, value=hyderabad
1 column=personal data:name, timestamp=1417524185058, value=ramu

1 column=professional data:designation, timestamp=1417524232601, value=manager


1 column=professional data:salary, timestamp=1417524244109, value=50000
2 column=personal data:city, timestamp=1417524574905, value=chennai
2 column=personal data:name, timestamp=1417524556125, value=ravi
2 column=professional data:designation, timestamp=1417524592204, value=sr:engg
2 column=professional data:salary, timestamp=1417524604221, value=30000
3 column=personal data:city, timestamp=1417524681780, value=delhi
3 column=personal data:name, timestamp=1417524672067, value=rajesh
3 column=professional data:designation, timestamp=1417524693187, value=jr:engg
3 column=professional data:salary, timestamp=1417524702514, value=25000

Updating Data using HBase Shell


You can update an existing cell value using the put command. To do so, just follow
the same syntax and mention your new value as shown below.
put ‘table name’,’row ’,'Column family:column name',’new value’
The newly given value replaces the existing value, updating the row.

Example
Suppose there is a table in HBase called emp with the following data.
hbase(main):003:0> scan 'emp'
ROW COLUMN + CELL
row1 column = personal:name, timestamp = 1418051555, value = raju

10
row1 column = personal:city, timestamp = 1418275907, value = Hyderabad
row1 column = professional:designation, timestamp = 14180555,value = manager
row1 column = professional:salary, timestamp = 1418035791555,value = 50000
1 row(s) in 0.0100 seconds
The following command will update the city value of the employee named ‘Raju’ to Delhi.
hbase(main):002:0> put 'emp','row1','personal:city','Delhi'
0 row(s) in 0.0400 seconds
The updated table looks as follows where you can observe the city of Raju has been changed
to ‘Delhi’.
hbase(main):003:0> scan 'emp'
ROW COLUMN + CELL
row1 column = personal:name, timestamp = 1418035791555, value = raju
row1 column = personal:city, timestamp = 1418274645907, value = Delhi
row1 column = professional:designation, timestamp = 141857555,value = manager
row1 column = professional:salary, timestamp = 1418039555, value = 50000
1 row(s) in 0.0100 seconds

Reading Data using HBase Shell


The get command and the get() method of HTable class are used to read data from a
table in HBase. Using get command, you can get a single row of data at a time. Its syntax is
as follows:
get ’<table name>’,’row1’

Example
The following example shows how to use the get command. Let us scan the first row of the
emp table.
hbase(main):012:0> get 'emp', '1'

COLUMN CELL
personal : city timestamp = 1417521848375, value = hyderabad
personal : name timestamp = 1417521785385, value = ramu
professional: designation timestamp = 1417521885277, value = manager
professional: salary timestamp = 1417521903862, value = 50000

4 row(s) in 0.0270 seconds

Reading a Specific Column


Given below is the syntax to read a specific column using the get method.
hbase> get 'table name', ‘rowid’, {COLUMN ‘column
family:column name ’}

Example


Given below is the example to read a specific column in HBase table.
hbase(main):015:0> get 'emp', 'row1', {COLUMN
'personal:name'}
COLUMN CELL

11
personal:name timestamp = 1418035791555, value = raju
1 row(s) in 0.0080 seconds

Deleting a Specific Cell in a Table


Using the delete command, you can delete a specific cell in a table. The syntax of
delete command is as follows:
delete ‘<table name>’, ‘<row>’, ‘<column name >’, ‘<time stamp>’

Example
Here is an example to delete a specific cell. Here we are deleting the salary.
hbase(main):006:0> delete 'emp', '1', 'personal data:city',
1417521848375
0 row(s) in 0.0060 seconds

Deleting All Cells in a Table


Using the “deleteall” command, you can delete all the cells in a row. Given below is
the syntax of deleteall command.
deleteall ‘<table name>’, ‘<row>’,

Example
Here is an example of “deleteall” command, where we are deleting all the cells of row1 of
emp table.
hbase(main):007:0> deleteall 'emp','1'
0 row(s) in 0.0240 seconds
Verify the table using the scan command. A snapshot of the table after deleting the table is
given below.
hbase(main):022:0> scan 'emp'

ROW COLUMN + CELL

2 column = personal data:city, timestamp = 1417524574905, value = chennai


2 column = personal data:name, timestamp = 1417524556125, value = ravi
2 column = professional data:designation, timestamp = 1417524204, value = sr:engg
2 column = professional data:salary, timestamp = 1417524604221, value = 30000
3 column = personal data:city, timestamp = 1417524681780, value = delhi
3 column = personal data:name, timestamp = 1417524672067, value = rajesh
3 column = professional data:designation, timestamp = 1417523187, value = jr:engg
3 column = professional data:salary, timestamp = 1417524702514, value = 25000

Exporting and Importing data.


During HBase development you often need to move data around. The easiest way to
import and export data is via command line.

Export a table to local filesystem or HDFS


bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export [tbl_name] [/local/export/path |
hdfs:/node/path]
Importing is also easy, but table MUST EXIST.
12
bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import [tbl_name] [/local/export/path]
Export
Export is a utility that will dump the contents of table to HDFS in a sequence file. The Export
can be run via a Coprocessor Endpoint or MapReduce. Invoke via:
mapreduce-based Export
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir>
[<versions> [<starttime> [<endtime>]]]
endpoint-based Export
Make sure the Export coprocessor is enabled by adding
org.apache.hadoop.hbase.coprocessor.Export to hbase.coprocessor.region.classes.
$ bin/hbase org.apache.hadoop.hbase.coprocessor.Export <tablename> <outputdir>
[<versions> [<starttime> [<endtime>]]]
The outputdir is a HDFS directory that does not exist prior to the export. When done, the
exported files will be owned by the user invoking the export command.

USE CASES

13

You might also like