0% found this document useful (0 votes)

88 views37 pages

Unit 2 - Bda Notes

MongoDB is an open-source, high-performance document database known for its scalability, automatic load balancing, and support for various data types. It offers features like ad hoc queries, indexing, replication, and a flexible query language for CRUD operations. Additionally, it supports aggregation, map-reduce operations, and provides tools for importing and exporting data in JSON and CSV formats, while Apache Cassandra is highlighted as a highly scalable NoSQL database with features such as elastic scalability and always-on architecture.

Uploaded by

aburoobhastudy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views37 pages

Unit 2 - Bda Notes

Uploaded by

aburoobhastudy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

UNIT 2 - BIG DATA PATTERNS AND NOSQL

MONGODB

MongoDB is an open-source document database that provides high performance, high

availability, and automatic scaling.

"MongoDB is a scalable, open source, high performance, document-oriented database." - 10gen

1.FEATURES:

These are some important features of MongoDB:

1. Support ad hoc queries:

In MongoDB, you can search by field, range query and it also supports regular expression
searches.

2. Indexing:

You can index any field in a document.

3. Replication:

MongoDB supports Master Slave replication.

A master can perform Reads and Writes and a Slave copies data from the master and can only be
used for reads or back up (not writes)

4. Duplication of data:

MongoDB can run over multiple servers. The data is duplicated to keep the system up and also
keep its running condition in case of hardware failure.

5. Load balancing:

It has an automatic load balancing configuration because of data placed in shards.

6. Supports map reduce and aggregation tools.

7. Uses JavaScript instead of Procedures.

8. It is a schema-less database written in C++.

9. Provides high performance.

10. Stores files of any size easily without complicating your stack.

11. Easy to administer in the case of failures.

2.DATATYPES:

MongoDB supports many datatypes. Some of them are −

String − This is the most commonly used datatype to store the data. String in
MongoDB must be UTF-8 valid.
Integer − This type is used to store a numerical value. Integer can be 32 bit or 64 bit
depending upon your server.

Boolean − This type is used to store a boolean (true/ false) value.

Double − This type is used to store floating point values.
Min/ Max keys − This type is used to compare a value against the lowest and highest
BSON elements.

Arrays − This type is used to store arrays or list or multiple values into one key.
Timestamp − ctimestamp. This can be handy for recording when a document has been
modified or added.

Object − This datatype is used for embedded documents.

Null − This type is used to store a Null value.
Symbol − This datatype is used identically to a string; however, it's generally reserved
for languages that use a specific symbol type.

Date − This datatype is used to store the current date or time in UNIX time format. You
can specify your own date time by creating object of Date and passing day, month, year
into it.

Object ID − This datatype is used to store the document’s ID.

Binary data − This datatype is used to store binary data.
Code − This datatype is used to store JavaScript code into the document.
Regular expression − This datatype is used to store regular expression.

3.MONGODB QUERY LANGUAGE:

CRUD operations create, read, update, and delete documents

Create Operations
Create or insert operations add new documents to a collection. If the collection does not currently
exist, insert operations will create the collection.

MongoDB provides the following methods to insert documents into a collection:

● db.collection.insertOne() New in version 3.2

● db.collection.insertMany() New in version 3.2

Read Operations
Read operations retrieve documents from a collection; i.e. query a collection for documents.
MongoDB provides the following methods to read documents from a collection:

● db.collection.find()

You can specify query filters or criteria that identify the documents to return.
Update Operations
Update operations modify existing documents in a collection. MongoDB provides the following
methods to update documents of a collection:

● db.collection.updateOne() New in version 3.2

● db.collection.updateMany() New in version 3.2
● db.collection.replaceOne() New in version 3.2

Delete Operations
Delete operations remove documents from a collection. MongoDB provides the following

methods to delete documents of a collection:

● db.collection.deleteOne() New in version 3.2

● db.collection.deleteMany() New in version 3.2
4.ARRAY:

Arrays in MongoDB allow the users to store data in an ordered form. Efficiently querying array
elements is very crucial for developers to extract meaningful information from the databases.

Different Methods to Query Array Elements:

MongoDB provides a variety of methods to access and query array elements within the
documents.

1. Query using dot notation: In MongoDB we can use the dot notation to access an element by
it’s index in the array.

Syntax: db.collection.find({"arrayName.index": "value"})

2. Query using $elemMatch: The $elemMatch operator matches documents that contains an
array with at least one element that matches the specified query criteria.

Syntax: db.collection.find({ <arrayField>:{$elemMatch: {<query>}})

3. Query using $slice: The $slice is an projection operator in MongoDB that limits the number
of elements from an array to return in the results.

Syntax: db.collection.find( {}, {arrayName: { $slice: 5 }})

4. Unwinding: Unwinding allows users in MongoDB to output a document for each element in
the array. This makes it easier for the developers to run aggeregation queries on the array data.

Syntax: db.collection.aggregate([{$unwind: "$arrayName"}])

5.FUNCTIONS:

MongoDB count() Method – db.Collection.count()

The count() method counts the number of documents that match the selection criteria. It returns
the number of documents that match the selection criteria.

Syntax:
db.Collection_Name.count(

Selection_criteria,

limit: <integer>,

skip: <integer>,

hint: <string or document>,

maxTimeMS : <integer>,

readConcern: <string>,

collation: <document>

})

MongoDB – sort() Method

The sort() method specifies the order in which the query returns the matching documents from
the given collection.

Syntax: db.Collection_Name.sort({field_name:1 or -1})

MongoDB – limit() Method

In MongoDB, the limit() method limits the number of records or documents that you want. It
basically defines the max limit of records/documents that you want.

Syntax: cursor.limit()

MongoDB – skip() Method

In MongoDB, the skip() method will skip the first n document from the query result, you just
need to pass the number of records/documents to be skipped.

Syntax : cursor.skip(<offset>)
6.AGGREGATION OPERATIONS:
Aggregation operations process multiple documents and return computed results.

An aggregation pipeline consists of one or more stages that process documents:

● Each stage performs an operation on the input documents. For example, a stage can filter
documents, group documents, and calculate values.
● The documents that are output from a stage are passed to the next stage.
● An aggregation pipeline can return results for groups of documents. For example, return
the total, average, maximum, and minimum values.

Aggregation Pipeline Example

The following aggregation pipeline example contains two stages and returns the total order
quantity of medium size pizzas grouped by pizza name:
db.orders.aggregate( [

// Stage 1: Filter pizza order documents by pizza size

{
$match: { size: "medium" }
},

// Stage 2: Group remaining documents by pizza name and calculate total quantity
{
$group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
}

])
7.MAP REDUCE:
Map-reduce is a data processing paradigm for condensing large volumes of data into useful
aggregated results. To perform map-reduce operations, MongoDB provides the mapReduce
database command.

Consider the following map-reduce operation:

In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the

documents in the collection that match the query condition). The map function emits key-value

pairs. For those keys that have multiple values, MongoDB applies the reduce phase, which

collects and condenses the aggregated data. MongoDB then stores the results in a collection.

Optionally, the output of the reduce function may pass through a finalize function to further

condense or process the results of the aggregation.

8.MONGODB CURSOR METHODS:
The MongoDB cursor methods modifies the way that the specified query is executed. Following
are the list of the cursor methods with description, syntax, and examples.

#1 cursor.addOption(flag)
The method adds "OP_QUERY" wire protocol flags. It is added to change the behaviour of
queries like tailaible flag.

Example
var t = db.myCappedCollection;
var cursor = t.find().
addOption(DBQuery.Option.tailable)
.addOption(DBQuery.Option.awaitData)

#2. Cursor.batchSize(size)

The batch result from the MongoDB object returns the number of documents which is specified
using the batch size method. In many cases, if we modify the batch size, it will not be going to
affect the user or the application.

Example

db.inventory.find().batchSize(10)

#3. cursor.close()

The method used to close the cursor and release the associated server resources on the instruction
of the method. The cursor will be automatically closed by the server that have zero remaining
results or it have been idle for a specified period of time.

Example

db.collection.find(<query>).close()

#4. cursor.forEach(function)

JavaScript function will be applied to all the documents by the cursor using the forEach method.

Syntax:

db.collection.find().forEach(<function>)
#5. cursor.hint(index)

The method is called during the query to override the MongoDB's default selection of index and
query optimization process.

Examples:

All documents in the user's collection using the index on the age field will be returned using the
query below.

db.users.find().hint( { age: 1 } )

#6. cursor.limit()

This method is used to specify the maximum number of documents returned by the cursor. It will
be used within the cursor and comparable to the LIMIT statement in a SQL database.

Example:

db.collection.find(<query>).limit(<number>)

#7. cursor.map(function)

The map method is used by the document visited by the cursor and also collects the return values
from nearest application into an array.

Example:

db.users.find().map( function(u) { return u.name; } );

9.INDEXING IN MONGODB:

Indexes are special data structures that stores some information related to the documents such
that it becomes easy for MongoDB to find the right data file. The indexes are order by the value
of the field specified in the index.

Creating an Index :

MongoDB provides a method called createIndex() that allows user to create an index.
Syntax – db.COLLECTION_NAME.createIndex({KEY:1})

Example –

db.mycol.createIndex({“age”:1})

“createdCollectionAutomatically” : false,

“numIndexesBefore” : 1,

“numIndexesAfter” : 2,

“ok” : 1

Drop an index:

In order to drop an index, MongoDB provides the dropIndex() method.

Syntax – db.NAME_OF_COLLECTION.dropIndex({KEY:1})

The dropIndex() methods can only delete one index at a time. In order to delete (or drop)
multiple indexes from the collection, MongoDB provides the dropIndexes() method that takes
multiple indexes as its parameters.

10.IMPORT DATA IN MONGODB USING MONGOIMPORT:

Here you are going to learn how to import JSON data or CSV file into a collection in
MongoDB.Use mongoimport command to import data into a collection. You should have
installed the MongoDB database tools to use the mongoimport command.Now, extract and copy
all .exe files and paste them to the MongoDB bin folder. On Windows, it is C:\Program
Files\MongoDB\Server\<version>\bin folder.Now, open the terminal or command prompt and
navigate to the location where you have the JSON file to import so that you don't need to specify
the whole path.The following is the mongoimport command.

mongoimport --db database_name --collection collection_name ^--authenticationDatabase admin

--username <user> --password <password> ^ --file file_path

Now, execute the following command to import data from D:\MyData\employeesdata.json file to
employees collection

D:\MyData> mongoimport --db test --collection employees --file employeesdata.json

--jsonArray

Import Data from CSV File

Consider that you have D:\employeesdata.csv file which you want to import into new employee
collection. Execute the following command to import data from the CSV file.

D:\MyData> mongoimport --db test --collection employeesdata --type csv --file employees.csv
--fields _id,firstName,lastName

The --fields option indicates the field names to be used for each column in the CSV file. If a file
contains the header row that should be used as a field name then use --headerline option instead
of --fields. The above command will insert all data into employees collection, as shown below.

test> db.employees.find()

{ _id: 2, firstName: 'bill', lastName: 'gates' },

{ _id: 1, firstName: 'steve', lastName: 'jobs' },

{ _id: 3, firstName: 'james', lastName: 'bond' }

]
11.EXPORTING DATA FROM MONGODB

MongoDB provides a utility called mongoexport to its users, through which users can export
data from MongoDB databases to a JSON file format or a CSV file format. This utility is to be
found in the MongoDB's bin subfolder (the path can be read like this: /mongodb/bin). As you run
this feature, and while running, supply the database name, the name of the collection as well as
the file you wish to export to, it performs the exporting task.

EXPORTING A MONGODB COLLECTION TO A JSON FILE:

So, for performing the exporting of data, you have first to open a new terminal or command
prompt window (cmd) and then write the appropriate command like this:

Example:

mongoexport --db techwriter --collection techwriter --out /data/dump/tw/tw.json

In case you discover that the above code is not running the mongoexport command, then it is for
sure that you have either exited from the mongo utility or else you have opened a new Terminal
or command prompt (cmd) window before executing this mongoexport, as it comes under a
separate service. The mongoexport statement written above will assume that your MongoDB bin
folder is in your PATH, but if it is not in the place, then you have to specify the full path of the
mongoexport file which will be as follows: /mongodb/bin/mongoexport or whatever path you
have for the established directory of MongoDB.

In case, you do not offer a path for your exported file; the file gets created wherever (the path)
you are residing at the time of running the command. Moreover giving the full path, or
navigating to where you want your exported data file to be will make the task neat and easy to
access.

Exporting a MongoDB Collection to a CSV File

Till now, you have encountered how to export files in JSON format. Now, there is another file
format which is proven to be the right way of representing data - the CSV (comma-separated
value) file. For exporting your data to a CSV file, you have to add -type = csv to your command.
Example:

mongoexport --db techwriter --collection writers -type = csv --fields _id, writername --out
/data/dump/tw/techwriter.csv

Also, you can identify the fields in the documents for exporting. In this example, you have use
mongoexport utility for exporting the techwriter collection to a CSV file. Exporting of _id and
writername fields are done here. Also, note that the file name has a .csv extension

CASSANDRA
Apache Cassandra is a highly scalable, high-performance distributed database designed to handle
large amounts of data across many commodity servers, providing high availability with no single
point of failure. It is a type of NoSQL database..

1.FEATURES OF CASSANDRA:
Cassandra has become so popular because of its outstanding technical features. Given below are
some of the features of Cassandra:

Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to

accommodate more customers and more data as per requirement.
Always on architecture − Cassandra has no single point of failure and it is continuously
available for business-critical applications that cannot afford a failure.
Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your
throughput as you increase the number of nodes in the cluster. Therefore it maintains a
quick response time.
Flexible data storage − Cassandra accommodates all possible data formats including:
structured, semi-structured, and unstructured. It can dynamically accommodate changes
to your data structures according to your need.
Easy data distribution − Cassandra provides the flexibility to distribute data where you
need by replicating data across multiple data centers.
Transaction support − Cassandra supports properties like Atomicity, Consistency,
Isolation, and Durability (ACID).
Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs
blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the
read efficiency.

2.CASSANDRA DATA TYPES:

Cassandra supports different types of data types. Let?s see the different data types in the
following table:

CQL Type Constants Description

ascii Strings US-ascii character string

bigint Integers 64-bit signed long

blob blobs Arbitrary bytes in hexadecimal

boolean Booleans True or False

counter Integers Distributed counter values 64 bit

decimal Integers, Floats Variable precision decimal

double Integers, Floats 64-bit floating point

float Integers, Floats 32-bit floating point

frozen Tuples, collections, user defined types stores cassandra types

inet Strings IP address in ipv4 or ipv6 format

int Integers 32 bit signed integer

list Collection of elements

map JSON style collection of elements

set Collection of elements

text strings UTF-8 encoded strings

timestamp Integers, Strings ID generated with date plus time

timeuuid uuids Type 1 uuid

tuple A group of 2,3 fields

uuid uuids Standard uuid

varchar strings UTF-8 encoded string

varint Integers Arbitrary precision integer

3.CASSANDRA CQLsh
Cassandra CQLsh stands for Cassandra CQL shell. CQLsh specifies how to use Cassandra
commands. After installation, Cassandra provides a prompt Cassandra query language shell
(cqlsh). It facilitates users to communicate with it.

Cassandra commands are executed on CQLsh. It looks like this:

Start CQLsh:

CQLsh provides a lot of options which you can see in the following table:

Options Usage

help This command is used to show help topics about the options of CQLsh
commands.
version it is used to see the version of the CQLsh you are using.

color it is used for colored output.

debug It shows additional debugging information.

execute It is used to direct the shell to accept and execute a CQL command.

file= "file By using this option, cassandra executes the command in the given file
name" and exits.

no-color It directs cassandra not to use colored output.

u Using this option, you can authenticate a user. The default user name is:
"username" cassandra.

p Using this option, you can authenticate a user with a password. The
"password" default password is: cassandra.

4.CASSANDRA KEYSPACE:
A keyspace is an object that is used to hold column families, user defined types. A keyspace is
like RDBMS database which contains column families, indexes, user defined types, data center
awareness, strategy used in keyspace, replication factor, etc.

In Cassandra, "Create Keyspace" command is used to create keyspace.

Syntax:

1. CREATE KEYSPACE <identifier> WITH <properties>

1. Create keyspace KeyspaceName with replicaton={'class':strategy name,

2. 'replication_factor': No of replications on different nodes}

Different components of Cassandra Keyspace

Strategy: There are two types of strategy declaration in Cassandra syntax:

○ Simple Strategy: Simple strategy is used in the case of one data center. In this strategy,
the first replica is placed on the selected node and the remaining nodes are placed in
clockwise direction in the ring without considering rack or node location.

○ Network Topology Strategy: This strategy is used in the case of more than one data
centers. In this strategy, you have to provide replication factor for each data center
separately.

Replication Factor: Replication factor is the number of replicas of data placed on different
nodes. More than two replication factor are good to attain no single point of failure. So, 3 is good
replication factor.

Example:

Let's take an example to create a keyspace named "javatpoint".

1. CREATE KEYSPACE javatpoint

2. WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};

Verification:

To check whether the keyspace is created or not, use the "DESCRIBE" command. By using this
command you can see all the keyspaces that are created.
Durable_writes

By default, the durable_writes properties of a table is set to true, you can also set this property to
false. But, this property cannot be set to simplex strategy.

Example:

Let's take an example to see the usage of durable_write property.

1. CREATE KEYSPACE sssit

2. WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }
3. AND DURABLE_WRITES = false;

Using a Keyspace

To use the created keyspace, you have to use the USE command.

Syntax:

1. USE <identifier>

5.CRUD OPERATION:

Cassandra CRUD Operation stands for Create, Update, Read and Delete or Drop. These
operations are used to manipulate data in Cassandra.

a. Create Operation

A user can insert data into the table using Cassandra CRUD operation. The data is stored in the
columns of a row in the table. Using INSERT command with proper what, a user can perform
this operation.

A Syntax of Create Operation-

INSERT INTO <table name>

(<column1>,<column2>....)
VALUES (<value1>,<value2>...)
USING<option>

EXAMPLE :
cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)
VALUES(001, 'Ayush', 'Electrical Engineering', 9999999999, 'Boston');
cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)
VALUES(002, 'Aarav', 'Computer Engineering', 8888888888, 'New York City');
cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)
VALUES(003, 'Kabir', 'Applied Physics', 7777777777, 'Philadelphia');

b.Update Operation

The second operation in the Cassandra CRUD operation is the UPDATE operation. A user can
use UPDATE command for the operation. This operation uses three keywords while updating the
table.

● Where: This keyword will specify the location where data is to be updated.
● Set: This keyword will specify the updated value.
● Must: This keyword includes the columns composing the primary key.

A Syntax of Update Operation-

UPDATE <table name>

SET <column name>=<new value>

<column name>=<value>...

WHERE <condition>

EXAMPLE:

cqlsh:keyspace1> UPDATE student SET city='San Fransisco'

WHERE en=002;

c. Read Operation

This is the third Cassandra CRUD Operation – Read Operation. A user has a choice to read
either the whole table or a single column. To read data from a table, a user can use SELECT
clause. This command is also used for verifying the table after every operation.
SYNTAX to read the whole table-

SELECT * FROM <table name>;

EXAMPLE:
cqlsh:keyspace1> SELECT name, city FROM student;

d. Delete Operation

Delete operation is the last Cassandra CRUD Operation, allows a user to delete data from a table.
The user can use DELETE command for this operation.
A Syntax of Delete Operation-

DELETE <identifier> FROM <table name> WHERE <condition>;

EXAMPLE:
cqlsh:keyspace1> DELETE phone FROM student WHERE en=003;

6.CASSANDRA COLLECTIONS:

Cassandra collections are used to handle tasks. You can store multiple elements in collection.
There are three types of collection supported by Cassandra:

○ Set

○ List

○ Map

Set Collection

A set collection stores group of elements that returns sorted elements when querying.

Syntax:

Create table table_name

id int,
Name text,

Email set<text>,

Primary key(id)

);

Example:

INSERT INTO employee (id, email, name)

VALUES(1, {'[email protected]'}, 'Ajeet');

INSERT INTO employee (id, email, name)

VALUES(2,{'[email protected]'}, 'Kanchan');

INSERT INTO employee (id, email, name)

VALUES(3, {'[email protected]'}, 'Kunwar');

List Collection

The list collection is used when the order of elements matters.

Example:

Map Collection

The map collection is used to store key value pairs. It maps one thing to another. For example, if
you want to save course name with its prerequisite course name, you can use map collection.
See this example:

Create a table named "course".

7.COUNTER TYPE IN CASSANDRA:

For the purpose of storing counter values, Cassandra has a specific data type called Counter
Type. In order to keep track of activities like likes, upvotes, downvotes, and page visits, counters
are utilized. A counter value in Cassandra may only be increased or decreased; it is never fixed at
a particular number. One or more counter columns are included in the implementation of Counter
Type as a column family.

Operations of a Counter Nature in Cassandra

For updating counter values, Cassandra offers unique procedures. Read, decrement, and
increment are some of these operations.

Increment − By performing this action, the value of a counter column is increased. The
following syntax is used to increase a counter column −

UPDATE <table_name> SET <counter_column_name> = <counter_column_name> + <value>

WHERE <row_key> = '<key>';

Example
Input Table

| user_id | name | likes |

|-----------|-----------|-------|

| user123 | John |5 |

| user456 | Jane | 10 |

| user789 | Michael | 2 |

For instance, you can use the following formula to increase a counter value for a user's likes −

UPDATE users SET likes = likes + 1 WHERE user_id = 'user123';

Output Table

| user_id | name | likes |

|-----------|-----------|-------|

| user123 | John |6 |

| user456 | Jane | 10 |

| user789 | Michael | 2 |

Decrement − To decrease the value of a counter column, apply this procedure. The following
syntax is used to decrement a counter column −
UPDATE <table_name> SET <counter_column_name> = <counter_column_name> - <value>
WHERE <row_key> = '<key>';

Example

Input Table
| user_id | name | age | likes | dislikes |
|----------|------------|-----|-------|----------|
| user123 | John Smith | 30 | 5 | 3 |
| user456 | Jane Doe | 25 | 7 | 2 |
| user789 | Bob Johnson| 40 | 2 | 8 |

For instance, you can use the following formula to lower a counter value for a user's dislikes −

UPDATE users SET dislikes = dislikes - 1 WHERE user_id = 'user123';

Output Table

| user_id | name | age | likes | dislikes |

|----------|------------|-----|-------|----------|

| user123 | John Smith | 30 | 5 |2 |

| user456 | Jane Doe | 25 | 7 |2 |

| user789 | Bob Johnson| 40 | 2 |8 |

Read − This procedure is used to read a counter column's value. The following syntax is used to
read a counter column −

SELECT <counter_column_name> FROM <table_name> WHERE <row_key> = '<key>';

Example

Input Table

users table:
| user_id | likes |

|-----------|-----------|

| user123 | 10 |

| user456 | 5 |

| user789 | 20 |

For instance, you can use the query below to determine the significance of a user's likes −

Output Table

| likes |

|-----------|

| 10 |

Batch − Multiple counter columns can be updated using the batch procedure in a single batch.
The following syntax is used to update several counter columns

BEGIN BATCH

UPDATE <table_name> SET <counter_column_name1> = <counter_column_name1> +

<value1> WHERE <row_key> = '<key1>';

UPDATE <table_name> SET <counter_column_name2> = <counter_column_name2> +

<value2> WHERE <row_key> = '<key2>';

APPLY BATCH;

Example
Input Table

+---------+-------+

| user_id | likes |

+---------+-------+

| user123 | 10 |

| user456 | 20 |

+---------+-------+

For instance, you may use the following command to increase the likes of two people at once

BEGIN BATCH

UPDATE users SET likes = likes + 1 WHERE user_id = 'user123';

UPDATE users SET likes = likes + 1 WHERE user_id = 'user456';

APPLY BATCH;

Output Table

+---------+-------+

| user_id | likes |

+---------+-------+

| user123 | 11 |

| user456 | 21 |

+---------+-------+
8.TIME TO LIVE (TTL) IN CASSANDRA:

In Cassandra Time to Live (TTL) is play an important role while if we want to set the time limit
of a column and we want to automatically delete after a point of time then at the time using TTL
keyword is very useful to define the time limit for a particular column.

In Cassandra Both the INSERT and UPDATE commands support setting a time for data in a
column to expire.

It is used to set the time limit for a specific period of time. By USING TTL clause we can set the
TTL value at the time of insertion.

We can use TTL function to get the time remaining for a specific selected query.

At the point of insertion, we can set expire limit of inserted data by using TTL clause. Let us
consider if we want to set the expire limit to two days then we need to define its TTL value.

Table : student_Registration

To create the table used the following CQL query.

CREATE TABLE student_Registration(

Id int PRIMARY KEY,

Name text,

Event text

);

Insertion using TTL :

To insert data by using TTL then used the following CQL query.

INSERT INTO student_Registration (Id, Name, Event)

VALUES (101, 'Ashish', 'Ninza') USING TTL 172800;

INSERT INTO student_Registration (Id, Name, Event)

VALUES (102, 'Ashish', 'Code') USING TTL 172800;

INSERT INTO student_Registration (Id, Name, Event)

VALUES (103, 'Aksh', 'Ninza') USING TTL 172800;

Now, to determine the remaining time to expire for a specific column used the following CQL
query.

SELECT TTL (Name)

from student_Registration

WHERE Id = 101;

It will decrease as you will check again for its TTL value just because of TTL time limit. Now,
used the following CQL query to check again.

SELECT TTL (Name)

from student_Registration

WHERE Id = 101;

Updating using TTL:

Now, if we want to extend the time limit then we can extend with the help of UPDATE command
and USING TTL keyword. Let’s have a look. To extend time limit with 3 days and also to update
the name to ‘rana’ then used the following CQL query.

UPDATE student_Registration

USING TTL 259200

SET Name = 'Rana'

WHERE Id= 102

SELECT TTL (Name)

from student_Registration

WHERE Id = 102;

Deleting a column using TTL:

To delete the specific existing column used the following CQL query.

UPDATE student_Registration

USING TTL 0

SET Name = 'Ashish'

WHERE Id = 102;

9.CASSANDRA ALTER TABLE:

ALTER TABLE command is used to alter the table after creating it. You can use the ALTER
command to perform two types of operations:

○ Add a column

○ Drop a column

Syntax:

ALTER (TABLE | COLUMNFAMILY) <tablename> <instruction>

Adding a Column

You can add a column in the table by using the ALTER command. While adding column, you
have to aware that the column name is not conflicting with the existing column names and that
the table is not defined with compact storage option.
Syntax:

ALTER TABLE table name

ADD new column datatype;

Example:

Let's take an example to demonstrate the ALTER command on the already created table named
"student". Here we are adding a column called student_email of text datatype to the table named
student.

ALTER TABLE student

ADD student_email text;

Dropping a Column

You can also drop an existing column from a table by using ALTER command. You should
check that the table is not defined with compact storage option before dropping a column from a
table.

Syntax:

ALTER table name

DROP column name;

Example:

Let's take an example to drop a column named student_email from a table named student.

1. ALTER TABLE student

2. DROP student_email;
10.EXPORT AND IMPORT DATA IN CASSANDRA:

First, we are going to create table namely as Data in which id, firstname, lastname are the fields
for sample exercise.

Table name: Data

CREATE TABLE Data (

id UUID PRIMARY KEY,

firstname text,

lastname text

);

Now, we are going to insert some data to export and import data for sample exercise. let’s have a
look.

INSERT INTO Data (id, firstname, lastname )

VALUES (3b6441dd-3f90-4c93-8f61-abcfa3a510e1, 'Ashish', 'Rana');

INSERT INTO Data (id, firstname, lastname)

VALUES (3b6442dd-bc0d-4157-a80f-abcfa3a510e2, 'Amit', 'Gupta');

INSERT INTO Data (id, firstname, lastname)

VALUES (3b6443dd-d358-4d99-b900-abcfa3a510e3, 'Ashish', 'Gupta');

INSERT INTO Data (id, firstname, lastname)

VALUES (3b6444dd-4860-49d6-9a4b-abcfa3a510e4, 'Dhruv', 'Gupta');

INSERT INTO Data (id, firstname, lastname)

VALUES (3b6445dd-e68e-48d9-a5f8-abcfa3a510e5, 'Harsh', 'Vardhan');

INSERT INTO Data (id, firstname, lastname)

VALUES (3b6446dd-eb95-4bb4-8685-abcfa3a510e6, 'Shivang', 'Rana');

Now, we are going to Export Data used the following cqlsh query given below. let’s have a look.

cqlsh>COPY Data(id, firstname, lastname)

TO 'AshishRana\Desktop\Data.csv' WITH HEADER = TRUE;

The CSV file is created:

Using 7 child processes

Starting copy of Data with columns [id, firstname, lastname].

Processed: 6 rows; Rate: 20 rows/s; Avg. rate: 30 rows/s

6 rows exported to 1 files in 0.213 seconds.

Now, we are going to delete data from table ‘Data’ to import again from CSV file which is
already has been created.

truncate Data;

Now, here we are going to import data again. To import Data used the following cqlsh query
given below.

COPY Data (id, firstname, lastname)

FROM 'AshishRana\Desktop\Data.csv'

WITH HEADER = TRUE;

The rows are imported:

Using 7 child processes

Starting copy of Data with columns [id, firstname, lastname].

Processed: 6 rows; Rate: 10 rows/s; Avg. rate: 14 rows/s

To verify the results whether it is successfully imported or not. let’s have a look.

SELECT *

FROM Data;

To copy a specific rows of a table used the following cqlsh query given below.

First, export data from table and then truncate after these two steps follow these steps given
below.

COPY Data FROM STDIN;

After executing above cqlsh query the line prompt changes to [copy] let’s have a look.

Using 7 child processes

Starting copy of cluster1.Data with columns [id, firstname, lastname].

[Use . on a line by itself to end input]

[copy]

Now, insert the row value of table which you want to import.

[copy] 3b6441dd-3f90-4c93-8f61-abcfa3a510e1, 'Ashish', 'Rana'

[copy] . // keep it in mind at the end insert the period

After successfully executed above given cqlsh query will give you the following results given
below. let’s have a look.

Processed: 1 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s

1 rows imported from 1 files in 36.991 seconds (0 skipped).

Now, let verify the results.

SELECT * FROM Data;

11.QUERYING SYSTEM TABLES:

Cassandra databases contain a special system schema that has tables that hold meta data
information on objects such as keyspaces, tables, views, indexes, triggers, and table columns. On
newer versions of Cassandra, the name of the system schema holding these tables is
system_schema.

Older versions of Cassandra had a system schema named "system" that included similar tables
for getting object meta data. This article will be using the system tables defined in version 3 of
Cassandra.

Below are example queries that show how to get information on the following Cassandra objects:
keyspaces, tables, views, indexes, triggers, and table columns.

Keyspaces

Keyspaces in Cassandra are a similar concept to schemas in databases such as PostgreSQL or

Oracle, or databases in databases such as MySQL. Below is an example query for retrieving
keyspace information from Cassandra.

select * from system_schema.keyspaces;

Tables

The query below will return information about all tables in a Cassandra database. The
keyspace_name column can be used to filter the results by keyspace.

select * from system_schema.tables;

Table Columns

The query below will return column information for a table named employee in the sample
keyspace.

select * from system_schema.columns where table_name = 'employee' and keyspace_name =

'sample' allow filtering;

Views
The query below will return information about views defined in a Cassandra database.

select * from system_schema.views;

Indexes

The query below will return information about indexes defined in a Cassandra database. The
target column includes information about the columns defined in the index.

select * from system_schema.indexes;

Triggers

The query below will return information about triggers defined in a Cassandra database.

select * from system_schema.triggers;

Lecture - 1 MongoDB
No ratings yet
Lecture - 1 MongoDB
41 pages
Unit - Iii Bda
No ratings yet
Unit - Iii Bda
51 pages
MongoDB Cheat Sheet
No ratings yet
MongoDB Cheat Sheet
17 pages
NGT Unit 2 - 230630 - 094118
No ratings yet
NGT Unit 2 - 230630 - 094118
62 pages
1664473609-Unit 5 - Database Management - MongoDB
No ratings yet
1664473609-Unit 5 - Database Management - MongoDB
23 pages
Safe Browsing
No ratings yet
Safe Browsing
24 pages
Dod Unit4
No ratings yet
Dod Unit4
18 pages
Mongodb 3.2 Crud Guide
No ratings yet
Mongodb 3.2 Crud Guide
108 pages
Shift Lock
No ratings yet
Shift Lock
5 pages
Mongodb Theory
No ratings yet
Mongodb Theory
75 pages
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
DBMS
No ratings yet
DBMS
24 pages
Mongodb Crud
No ratings yet
Mongodb Crud
63 pages
Big Data (Unit 3)
No ratings yet
Big Data (Unit 3)
46 pages
MongoDb Imp
No ratings yet
MongoDb Imp
21 pages
Inventory of Tools, Materials & Equipment
No ratings yet
Inventory of Tools, Materials & Equipment
3 pages
NoSQL 14 MONGO 2
No ratings yet
NoSQL 14 MONGO 2
37 pages
Lecture 9 - MongoDB
No ratings yet
Lecture 9 - MongoDB
8 pages
Software Requirement Specification For Online Fashion Store
No ratings yet
Software Requirement Specification For Online Fashion Store
20 pages
MongoDB Cheat Sheet
No ratings yet
MongoDB Cheat Sheet
9 pages
Module 5
No ratings yet
Module 5
32 pages
Mongodb Notes
No ratings yet
Mongodb Notes
8 pages
mongoDB 1
No ratings yet
mongoDB 1
23 pages
BDA Question BANK
No ratings yet
BDA Question BANK
7 pages
MIS604 Requirements Engineering Agile Requirements Analysis & Management Report Students Name Students Number Lecturer's Name
No ratings yet
MIS604 Requirements Engineering Agile Requirements Analysis & Management Report Students Name Students Number Lecturer's Name
9 pages
Mongo DB
No ratings yet
Mongo DB
26 pages
NoSQL 24 Mongo P1
No ratings yet
NoSQL 24 Mongo P1
43 pages
An Introduction To Big Data - NoSQL - Data Science
No ratings yet
An Introduction To Big Data - NoSQL - Data Science
14 pages
NoSQL and MongoDB
No ratings yet
NoSQL and MongoDB
24 pages
Module 3
No ratings yet
Module 3
15 pages
Module 3 CRUD Operations
No ratings yet
Module 3 CRUD Operations
6 pages
MongoDB Crud Guide Master
No ratings yet
MongoDB Crud Guide Master
112 pages
48.DIGITAL IMAGE PROCESSING ppt-1
No ratings yet
48.DIGITAL IMAGE PROCESSING ppt-1
10 pages
Questions
No ratings yet
Questions
70 pages
FSD 3 Unit
No ratings yet
FSD 3 Unit
5 pages
MERN Stack Document
No ratings yet
MERN Stack Document
14 pages
Mongo DB Documentation-2922'11
No ratings yet
Mongo DB Documentation-2922'11
21 pages
Week 4 Block 2 - ITDSA2 1
No ratings yet
Week 4 Block 2 - ITDSA2 1
45 pages
Mongo DB
No ratings yet
Mongo DB
77 pages
NOSQL Lab Book
No ratings yet
NOSQL Lab Book
33 pages
Crud
No ratings yet
Crud
23 pages
Module 3 Mongodb
No ratings yet
Module 3 Mongodb
10 pages
JavaScript Introduction
From Everand
JavaScript Introduction
Lisa Saldivar
No ratings yet
Unit 2 (MongoDB)
No ratings yet
Unit 2 (MongoDB)
17 pages
Learn MongoDB in 24 Hours
From Everand
Learn MongoDB in 24 Hours
Alex Nordeen
5/5 (2)
M10A1
No ratings yet
M10A1
3 pages
AWP Unit 6 MongoDB NodeJS
No ratings yet
AWP Unit 6 MongoDB NodeJS
39 pages
A. Im, G. Cai, H. Tunc, J. Stevens, Y. Barve, S. Hei Vanderbilt University
No ratings yet
A. Im, G. Cai, H. Tunc, J. Stevens, Y. Barve, S. Hei Vanderbilt University
81 pages
02 - Document-Based and MongoDB
No ratings yet
02 - Document-Based and MongoDB
133 pages
Python Programming: Reema Thareja
No ratings yet
Python Programming: Reema Thareja
27 pages
Mongo DB
No ratings yet
Mongo DB
36 pages
Mongo DB
No ratings yet
Mongo DB
30 pages
Basics of Mongodb-Connectivity
No ratings yet
Basics of Mongodb-Connectivity
26 pages
Meanstackexperiment 11and 12-WPS Office
No ratings yet
Meanstackexperiment 11and 12-WPS Office
20 pages
Unit 4 (MongoDB)
No ratings yet
Unit 4 (MongoDB)
46 pages
Erd Homework
100% (1)
Erd Homework
6 pages
Big Data Practical 3
No ratings yet
Big Data Practical 3
4 pages
Mongodb Notes HD Excl
No ratings yet
Mongodb Notes HD Excl
22 pages
BDA Experiment2
No ratings yet
BDA Experiment2
7 pages
Session 14 - 15 - Introduction MongoDB
No ratings yet
Session 14 - 15 - Introduction MongoDB
29 pages
Mean Stack Technologies Unit-5
No ratings yet
Mean Stack Technologies Unit-5
9 pages
Unit Iv
No ratings yet
Unit Iv
22 pages
Mongodb
No ratings yet
Mongodb
9 pages
WD-unit-1 - Forms
No ratings yet
WD-unit-1 - Forms
16 pages
Unit III-IOT - LESSON
No ratings yet
Unit III-IOT - LESSON
12 pages
Facebook For Business: Coursenvy®
No ratings yet
Facebook For Business: Coursenvy®
34 pages
1.1 Background of The Study: Our Lady of Lourdes Academy of Bacoor Cavite Incorporated Online Enrollment System 1-1
No ratings yet
1.1 Background of The Study: Our Lady of Lourdes Academy of Bacoor Cavite Incorporated Online Enrollment System 1-1
18 pages
Open The Gate 2 Adventure in The Digital World PDF
No ratings yet
Open The Gate 2 Adventure in The Digital World PDF
1 page
Dbms Assignment 9
No ratings yet
Dbms Assignment 9
6 pages
No. Answer Keywords in Question Similar Words in The Passage
No ratings yet
No. Answer Keywords in Question Similar Words in The Passage
1 page
Output Based Questions
No ratings yet
Output Based Questions
6 pages
Xampp Installation
No ratings yet
Xampp Installation
7 pages
Preview - ANSI AIAA G-043B-2018
No ratings yet
Preview - ANSI AIAA G-043B-2018
11 pages
Bda Unit 4
No ratings yet
Bda Unit 4
16 pages
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
MIS Chapter 1
No ratings yet
MIS Chapter 1
42 pages
Marc h5, 2015: Proprietary and Confidential
No ratings yet
Marc h5, 2015: Proprietary and Confidential
26 pages
Resspar AI-Driven Resume Parsing and Recruitment System Using NLP and Generative AI
No ratings yet
Resspar AI-Driven Resume Parsing and Recruitment System Using NLP and Generative AI
6 pages
20aipw602-Big Data Analytics With Lab
No ratings yet
20aipw602-Big Data Analytics With Lab
14 pages
Cloudguard Architecture Blueprint Diagrams
No ratings yet
Cloudguard Architecture Blueprint Diagrams
23 pages
Log
No ratings yet
Log
88 pages
Bits1123 Coa Sem1 20222023
No ratings yet
Bits1123 Coa Sem1 20222023
9 pages
FYP BSCS Students Handbook-V2.0
No ratings yet
FYP BSCS Students Handbook-V2.0
39 pages
CTI Cheat Sheet v1.0
No ratings yet
CTI Cheat Sheet v1.0
10 pages
Mongo DB
No ratings yet
Mongo DB
31 pages
AccountStatement Report 6037845654 11122024 9 11
No ratings yet
AccountStatement Report 6037845654 11122024 9 11
4 pages
Mongo
No ratings yet
Mongo
7 pages
DB Practices For MongoDB
No ratings yet
DB Practices For MongoDB
7 pages
CDJXDJ Aggregator 105 en
No ratings yet
CDJXDJ Aggregator 105 en
3 pages
Sharah Usoole Aitiqad Ahlesunnati Wal Jamat شرح اصول الاعتقاد اھل السنۃ والجماعۃ امام ابی القاسم ھںۃ اللہ ابن الحسن بن منصور
No ratings yet
Sharah Usoole Aitiqad Ahlesunnati Wal Jamat شرح اصول الاعتقاد اھل السنۃ والجماعۃ امام ابی القاسم ھںۃ اللہ ابن الحسن بن منصور
1 page
EPM Download Files
No ratings yet
EPM Download Files
5 pages
Youssouf Cherif: Experience
No ratings yet
Youssouf Cherif: Experience
3 pages
Teks Group 3
No ratings yet
Teks Group 3
1 page
MongoDB Quick Book
No ratings yet
MongoDB Quick Book
11 pages

Unit 2 - Bda Notes

Uploaded by

Unit 2 - Bda Notes

Uploaded by

UNIT 2 - BIG DATA PATTERNS AND NOSQL

MongoDB is an open-source document database that provides high performance, high

"MongoDB is a scalable, open source, high performance, document-oriented database." - 10gen

These are some important features of MongoDB:

1. Support ad hoc queries:

You can index any field in a document.

MongoDB supports Master Slave replication.

It has an automatic load balancing configuration because of data placed in shards.

6. Supports map reduce and aggregation tools.

7. Uses JavaScript instead of Procedures.

8. It is a schema-less database written in C++.

11. Easy to administer in the case of failures.

MongoDB supports many datatypes. Some of them are −

​ Boolean − This type is used to store a boolean (true/ false) value.

​ Object − This datatype is used for embedded documents.

​ Object ID − This datatype is used to store the document’s ID.

3.MONGODB QUERY LANGUAGE:

CRUD operations create, read, update, and delete documents

MongoDB provides the following methods to insert documents into a collection:

● db.collection.insertOne() New in version 3.2

● db.collection.updateOne() New in version 3.2

methods to delete documents of a collection:

● db.collection.deleteOne() New in version 3.2

Different Methods to Query Array Elements:

Syntax: db.collection.find({"arrayName.index": "value"})

Syntax: db.collection.find({ <arrayField>:{$elemMatch: {<query>}})

Syntax: db.collection.find( {}, {arrayName: { $slice: 5 }})

Syntax: db.collection.aggregate([{$unwind: "$arrayName"}])

MongoDB count() Method – db.Collection.count()

hint: <string or document>,

MongoDB – sort() Method

Syntax: db.Collection_Name.sort({field_name:1 or -1})

MongoDB – limit() Method

MongoDB – skip() Method

An aggregation pipeline consists of one or more stages that process documents:

Aggregation Pipeline Example

// Stage 1: Filter pizza order documents by pizza size

Consider the following map-reduce operation:

condense or process the results of the aggregation.

db.users.find().map( function(u) { return u.name; } );

In order to drop an index, MongoDB provides the dropIndex() method.

10.IMPORT DATA IN MONGODB USING MONGOIMPORT:

mongoimport --db database_name --collection collection_name ^--authenticationDatabase admin

D:\MyData> mongoimport --db test --collection employees --file employeesdata.json

Import Data from CSV File

{ _id: 2, firstName: 'bill', lastName: 'gates' },

{ _id: 1, firstName: 'steve', lastName: 'jobs' },

{ _id: 3, firstName: 'james', lastName: 'bond' }

EXPORTING A MONGODB COLLECTION TO A JSON FILE:

mongoexport --db techwriter --collection techwriter --out /data/dump/tw/tw.json

Exporting a MongoDB Collection to a CSV File

​ Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to

2.CASSANDRA DATA TYPES:

CQL Type Constants Description

ascii Strings US-ascii character string

bigint Integers 64-bit signed long

blob blobs Arbitrary bytes in hexadecimal

boolean Booleans True or False

counter Integers Distributed counter values 64 bit

decimal Integers, Floats Variable precision decimal

double Integers, Floats 64-bit floating point

frozen Tuples, collections, user defined types stores cassandra types

inet Strings IP address in ipv4 or ipv6 format

int Integers 32 bit signed integer

list Collection of elements

map JSON style collection of elements

set Collection of elements

text strings UTF-8 encoded strings

timestamp Integers, Strings ID generated with date plus time

timeuuid uuids Type 1 uuid

tuple A group of 2,3 fields

uuid uuids Standard uuid

varchar strings UTF-8 encoded string

Cassandra commands are executed on CQLsh. It looks like this:

color it is used for colored output.

debug It shows additional debugging information.

Boolean − This type is used to store a boolean (true/ false) value.

Object − This datatype is used for embedded documents.

Object ID − This datatype is used to store the document’s ID.

Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to