Unit 2 - Bda Notes
Unit 2 - Bda Notes
MONGODB
1.FEATURES:
In MongoDB, you can search by field, range query and it also supports regular expression
searches.
2. Indexing:
3. Replication:
A master can perform Reads and Writes and a Slave copies data from the master and can only be
used for reads or back up (not writes)
4. Duplication of data:
MongoDB can run over multiple servers. The data is duplicated to keep the system up and also
keep its running condition in case of hardware failure.
5. Load balancing:
10. Stores files of any size easily without complicating your stack.
2.DATATYPES:
String − This is the most commonly used datatype to store the data. String in
MongoDB must be UTF-8 valid.
Integer − This type is used to store a numerical value. Integer can be 32 bit or 64 bit
depending upon your server.
Arrays − This type is used to store arrays or list or multiple values into one key.
Timestamp − ctimestamp. This can be handy for recording when a document has been
modified or added.
Date − This datatype is used to store the current date or time in UNIX time format. You
can specify your own date time by creating object of Date and passing day, month, year
into it.
Create Operations
Create or insert operations add new documents to a collection. If the collection does not currently
exist, insert operations will create the collection.
Read Operations
Read operations retrieve documents from a collection; i.e. query a collection for documents.
MongoDB provides the following methods to read documents from a collection:
● db.collection.find()
You can specify query filters or criteria that identify the documents to return.
Update Operations
Update operations modify existing documents in a collection. MongoDB provides the following
methods to update documents of a collection:
Delete Operations
Delete operations remove documents from a collection. MongoDB provides the following
Arrays in MongoDB allow the users to store data in an ordered form. Efficiently querying array
elements is very crucial for developers to extract meaningful information from the databases.
MongoDB provides a variety of methods to access and query array elements within the
documents.
1. Query using dot notation: In MongoDB we can use the dot notation to access an element by
it’s index in the array.
2. Query using $elemMatch: The $elemMatch operator matches documents that contains an
array with at least one element that matches the specified query criteria.
3. Query using $slice: The $slice is an projection operator in MongoDB that limits the number
of elements from an array to return in the results.
4. Unwinding: Unwinding allows users in MongoDB to output a document for each element in
the array. This makes it easier for the developers to run aggeregation queries on the array data.
5.FUNCTIONS:
Syntax:
db.Collection_Name.count(
Selection_criteria,
limit: <integer>,
skip: <integer>,
maxTimeMS : <integer>,
readConcern: <string>,
collation: <document>
})
Syntax: cursor.limit()
Syntax : cursor.skip(<offset>)
6.AGGREGATION OPERATIONS:
Aggregation operations process multiple documents and return computed results.
● Each stage performs an operation on the input documents. For example, a stage can filter
documents, group documents, and calculate values.
● The documents that are output from a stage are passed to the next stage.
● An aggregation pipeline can return results for groups of documents. For example, return
the total, average, maximum, and minimum values.
// Stage 2: Group remaining documents by pizza name and calculate total quantity
{
$group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
}
])
7.MAP REDUCE:
Map-reduce is a data processing paradigm for condensing large volumes of data into useful
aggregated results. To perform map-reduce operations, MongoDB provides the mapReduce
database command.
In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the
documents in the collection that match the query condition). The map function emits key-value
pairs. For those keys that have multiple values, MongoDB applies the reduce phase, which
collects and condenses the aggregated data. MongoDB then stores the results in a collection.
Optionally, the output of the reduce function may pass through a finalize function to further
#1 cursor.addOption(flag)
The method adds "OP_QUERY" wire protocol flags. It is added to change the behaviour of
queries like tailaible flag.
Example
var t = db.myCappedCollection;
var cursor = t.find().
addOption(DBQuery.Option.tailable)
.addOption(DBQuery.Option.awaitData)
#2. Cursor.batchSize(size)
The batch result from the MongoDB object returns the number of documents which is specified
using the batch size method. In many cases, if we modify the batch size, it will not be going to
affect the user or the application.
Example
db.inventory.find().batchSize(10)
#3. cursor.close()
The method used to close the cursor and release the associated server resources on the instruction
of the method. The cursor will be automatically closed by the server that have zero remaining
results or it have been idle for a specified period of time.
Example
db.collection.find(<query>).close()
#4. cursor.forEach(function)
JavaScript function will be applied to all the documents by the cursor using the forEach method.
Syntax:
db.collection.find().forEach(<function>)
#5. cursor.hint(index)
The method is called during the query to override the MongoDB's default selection of index and
query optimization process.
Examples:
All documents in the user's collection using the index on the age field will be returned using the
query below.
db.users.find().hint( { age: 1 } )
#6. cursor.limit()
This method is used to specify the maximum number of documents returned by the cursor. It will
be used within the cursor and comparable to the LIMIT statement in a SQL database.
Example:
db.collection.find(<query>).limit(<number>)
#7. cursor.map(function)
The map method is used by the document visited by the cursor and also collects the return values
from nearest application into an array.
Example:
9.INDEXING IN MONGODB:
Indexes are special data structures that stores some information related to the documents such
that it becomes easy for MongoDB to find the right data file. The indexes are order by the value
of the field specified in the index.
Creating an Index :
MongoDB provides a method called createIndex() that allows user to create an index.
Syntax – db.COLLECTION_NAME.createIndex({KEY:1})
Example –
db.mycol.createIndex({“age”:1})
“createdCollectionAutomatically” : false,
“numIndexesBefore” : 1,
“numIndexesAfter” : 2,
“ok” : 1
Drop an index:
Syntax – db.NAME_OF_COLLECTION.dropIndex({KEY:1})
The dropIndex() methods can only delete one index at a time. In order to delete (or drop)
multiple indexes from the collection, MongoDB provides the dropIndexes() method that takes
multiple indexes as its parameters.
Here you are going to learn how to import JSON data or CSV file into a collection in
MongoDB.Use mongoimport command to import data into a collection. You should have
installed the MongoDB database tools to use the mongoimport command.Now, extract and copy
all .exe files and paste them to the MongoDB bin folder. On Windows, it is C:\Program
Files\MongoDB\Server\<version>\bin folder.Now, open the terminal or command prompt and
navigate to the location where you have the JSON file to import so that you don't need to specify
the whole path.The following is the mongoimport command.
Now, execute the following command to import data from D:\MyData\employeesdata.json file to
employees collection
Consider that you have D:\employeesdata.csv file which you want to import into new employee
collection. Execute the following command to import data from the CSV file.
D:\MyData> mongoimport --db test --collection employeesdata --type csv --file employees.csv
--fields _id,firstName,lastName
The --fields option indicates the field names to be used for each column in the CSV file. If a file
contains the header row that should be used as a field name then use --headerline option instead
of --fields. The above command will insert all data into employees collection, as shown below.
test> db.employees.find()
]
11.EXPORTING DATA FROM MONGODB
MongoDB provides a utility called mongoexport to its users, through which users can export
data from MongoDB databases to a JSON file format or a CSV file format. This utility is to be
found in the MongoDB's bin subfolder (the path can be read like this: /mongodb/bin). As you run
this feature, and while running, supply the database name, the name of the collection as well as
the file you wish to export to, it performs the exporting task.
Example:
In case you discover that the above code is not running the mongoexport command, then it is for
sure that you have either exited from the mongo utility or else you have opened a new Terminal
or command prompt (cmd) window before executing this mongoexport, as it comes under a
separate service. The mongoexport statement written above will assume that your MongoDB bin
folder is in your PATH, but if it is not in the place, then you have to specify the full path of the
mongoexport file which will be as follows: /mongodb/bin/mongoexport or whatever path you
have for the established directory of MongoDB.
In case, you do not offer a path for your exported file; the file gets created wherever (the path)
you are residing at the time of running the command. Moreover giving the full path, or
navigating to where you want your exported data file to be will make the task neat and easy to
access.
mongoexport --db techwriter --collection writers -type = csv --fields _id, writername --out
/data/dump/tw/techwriter.csv
Also, you can identify the fields in the documents for exporting. In this example, you have use
mongoexport utility for exporting the techwriter collection to a CSV file. Exporting of _id and
writername fields are done here. Also, note that the file name has a .csv extension
CASSANDRA
Apache Cassandra is a highly scalable, high-performance distributed database designed to handle
large amounts of data across many commodity servers, providing high availability with no single
point of failure. It is a type of NoSQL database..
1.FEATURES OF CASSANDRA:
Cassandra has become so popular because of its outstanding technical features. Given below are
some of the features of Cassandra:
3.CASSANDRA CQLsh
Cassandra CQLsh stands for Cassandra CQL shell. CQLsh specifies how to use Cassandra
commands. After installation, Cassandra provides a prompt Cassandra query language shell
(cqlsh). It facilitates users to communicate with it.
Start CQLsh:
CQLsh provides a lot of options which you can see in the following table:
Options Usage
help This command is used to show help topics about the options of CQLsh
commands.
version it is used to see the version of the CQLsh you are using.
execute It is used to direct the shell to accept and execute a CQL command.
file= "file By using this option, cassandra executes the command in the given file
name" and exits.
u Using this option, you can authenticate a user. The default user name is:
"username" cassandra.
p Using this option, you can authenticate a user with a password. The
"password" default password is: cassandra.
4.CASSANDRA KEYSPACE:
A keyspace is an object that is used to hold column families, user defined types. A keyspace is
like RDBMS database which contains column families, indexes, user defined types, data center
awareness, strategy used in keyspace, replication factor, etc.
Or
○ Simple Strategy: Simple strategy is used in the case of one data center. In this strategy,
the first replica is placed on the selected node and the remaining nodes are placed in
clockwise direction in the ring without considering rack or node location.
○ Network Topology Strategy: This strategy is used in the case of more than one data
centers. In this strategy, you have to provide replication factor for each data center
separately.
Replication Factor: Replication factor is the number of replicas of data placed on different
nodes. More than two replication factor are good to attain no single point of failure. So, 3 is good
replication factor.
Example:
Verification:
To check whether the keyspace is created or not, use the "DESCRIBE" command. By using this
command you can see all the keyspaces that are created.
Durable_writes
By default, the durable_writes properties of a table is set to true, you can also set this property to
false. But, this property cannot be set to simplex strategy.
Example:
Using a Keyspace
To use the created keyspace, you have to use the USE command.
Syntax:
1. USE <identifier>
5.CRUD OPERATION:
Cassandra CRUD Operation stands for Create, Update, Read and Delete or Drop. These
operations are used to manipulate data in Cassandra.
a. Create Operation
A user can insert data into the table using Cassandra CRUD operation. The data is stored in the
columns of a row in the table. Using INSERT command with proper what, a user can perform
this operation.
EXAMPLE :
cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)
VALUES(001, 'Ayush', 'Electrical Engineering', 9999999999, 'Boston');
cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)
VALUES(002, 'Aarav', 'Computer Engineering', 8888888888, 'New York City');
cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)
VALUES(003, 'Kabir', 'Applied Physics', 7777777777, 'Philadelphia');
b.Update Operation
The second operation in the Cassandra CRUD operation is the UPDATE operation. A user can
use UPDATE command for the operation. This operation uses three keywords while updating the
table.
● Where: This keyword will specify the location where data is to be updated.
● Set: This keyword will specify the updated value.
● Must: This keyword includes the columns composing the primary key.
<column name>=<value>...
WHERE <condition>
EXAMPLE:
c. Read Operation
This is the third Cassandra CRUD Operation – Read Operation. A user has a choice to read
either the whole table or a single column. To read data from a table, a user can use SELECT
clause. This command is also used for verifying the table after every operation.
SYNTAX to read the whole table-
EXAMPLE:
cqlsh:keyspace1> SELECT name, city FROM student;
d. Delete Operation
Delete operation is the last Cassandra CRUD Operation, allows a user to delete data from a table.
The user can use DELETE command for this operation.
A Syntax of Delete Operation-
EXAMPLE:
cqlsh:keyspace1> DELETE phone FROM student WHERE en=003;
6.CASSANDRA COLLECTIONS:
Cassandra collections are used to handle tasks. You can store multiple elements in collection.
There are three types of collection supported by Cassandra:
○ Set
○ List
○ Map
Set Collection
A set collection stores group of elements that returns sorted elements when querying.
Syntax:
id int,
Name text,
Email set<text>,
Primary key(id)
);
Example:
VALUES(2,{'[email protected]'}, 'Kanchan');
Example:
Map Collection
The map collection is used to store key value pairs. It maps one thing to another. For example, if
you want to save course name with its prerequisite course name, you can use map collection.
See this example:
For the purpose of storing counter values, Cassandra has a specific data type called Counter
Type. In order to keep track of activities like likes, upvotes, downvotes, and page visits, counters
are utilized. A counter value in Cassandra may only be increased or decreased; it is never fixed at
a particular number. One or more counter columns are included in the implementation of Counter
Type as a column family.
For updating counter values, Cassandra offers unique procedures. Read, decrement, and
increment are some of these operations.
Increment − By performing this action, the value of a counter column is increased. The
following syntax is used to increase a counter column −
Example
Input Table
|-----------|-----------|-------|
| user123 | John |5 |
| user456 | Jane | 10 |
| user789 | Michael | 2 |
For instance, you can use the following formula to increase a counter value for a user's likes −
Output Table
|-----------|-----------|-------|
| user123 | John |6 |
| user456 | Jane | 10 |
| user789 | Michael | 2 |
Decrement − To decrease the value of a counter column, apply this procedure. The following
syntax is used to decrement a counter column −
UPDATE <table_name> SET <counter_column_name> = <counter_column_name> - <value>
WHERE <row_key> = '<key>';
Example
Input Table
| user_id | name | age | likes | dislikes |
|----------|------------|-----|-------|----------|
| user123 | John Smith | 30 | 5 | 3 |
| user456 | Jane Doe | 25 | 7 | 2 |
| user789 | Bob Johnson| 40 | 2 | 8 |
For instance, you can use the following formula to lower a counter value for a user's dislikes −
Output Table
|----------|------------|-----|-------|----------|
Read − This procedure is used to read a counter column's value. The following syntax is used to
read a counter column −
Example
Input Table
users table:
| user_id | likes |
|-----------|-----------|
| user123 | 10 |
| user456 | 5 |
| user789 | 20 |
For instance, you can use the query below to determine the significance of a user's likes −
Output Table
| likes |
|-----------|
| 10 |
Batch − Multiple counter columns can be updated using the batch procedure in a single batch.
The following syntax is used to update several counter columns
BEGIN BATCH
APPLY BATCH;
Example
Input Table
+---------+-------+
| user_id | likes |
+---------+-------+
| user123 | 10 |
| user456 | 20 |
+---------+-------+
For instance, you may use the following command to increase the likes of two people at once
BEGIN BATCH
APPLY BATCH;
Output Table
+---------+-------+
| user_id | likes |
+---------+-------+
| user123 | 11 |
| user456 | 21 |
+---------+-------+
8.TIME TO LIVE (TTL) IN CASSANDRA:
In Cassandra Time to Live (TTL) is play an important role while if we want to set the time limit
of a column and we want to automatically delete after a point of time then at the time using TTL
keyword is very useful to define the time limit for a particular column.
In Cassandra Both the INSERT and UPDATE commands support setting a time for data in a
column to expire.
It is used to set the time limit for a specific period of time. By USING TTL clause we can set the
TTL value at the time of insertion.
We can use TTL function to get the time remaining for a specific selected query.
At the point of insertion, we can set expire limit of inserted data by using TTL clause. Let us
consider if we want to set the expire limit to two days then we need to define its TTL value.
Table : student_Registration
Name text,
Event text
);
To insert data by using TTL then used the following CQL query.
Now, to determine the remaining time to expire for a specific column used the following CQL
query.
from student_Registration
WHERE Id = 101;
It will decrease as you will check again for its TTL value just because of TTL time limit. Now,
used the following CQL query to check again.
from student_Registration
WHERE Id = 101;
Now, if we want to extend the time limit then we can extend with the help of UPDATE command
and USING TTL keyword. Let’s have a look. To extend time limit with 3 days and also to update
the name to ‘rana’ then used the following CQL query.
UPDATE student_Registration
from student_Registration
WHERE Id = 102;
To delete the specific existing column used the following CQL query.
UPDATE student_Registration
USING TTL 0
WHERE Id = 102;
ALTER TABLE command is used to alter the table after creating it. You can use the ALTER
command to perform two types of operations:
○ Add a column
○ Drop a column
Syntax:
Adding a Column
You can add a column in the table by using the ALTER command. While adding column, you
have to aware that the column name is not conflicting with the existing column names and that
the table is not defined with compact storage option.
Syntax:
Example:
Let's take an example to demonstrate the ALTER command on the already created table named
"student". Here we are adding a column called student_email of text datatype to the table named
student.
Dropping a Column
You can also drop an existing column from a table by using ALTER command. You should
check that the table is not defined with compact storage option before dropping a column from a
table.
Syntax:
Example:
Let's take an example to drop a column named student_email from a table named student.
First, we are going to create table namely as Data in which id, firstname, lastname are the fields
for sample exercise.
firstname text,
lastname text
);
Now, we are going to insert some data to export and import data for sample exercise. let’s have a
look.
Now, we are going to Export Data used the following cqlsh query given below. let’s have a look.
Now, we are going to delete data from table ‘Data’ to import again from CSV file which is
already has been created.
truncate Data;
Now, here we are going to import data again. To import Data used the following cqlsh query
given below.
FROM 'AshishRana\Desktop\Data.csv'
To verify the results whether it is successfully imported or not. let’s have a look.
SELECT *
FROM Data;
To copy a specific rows of a table used the following cqlsh query given below.
First, export data from table and then truncate after these two steps follow these steps given
below.
After executing above cqlsh query the line prompt changes to [copy] let’s have a look.
[copy]
Now, insert the row value of table which you want to import.
After successfully executed above given cqlsh query will give you the following results given
below. let’s have a look.
Cassandra databases contain a special system schema that has tables that hold meta data
information on objects such as keyspaces, tables, views, indexes, triggers, and table columns. On
newer versions of Cassandra, the name of the system schema holding these tables is
system_schema.
Older versions of Cassandra had a system schema named "system" that included similar tables
for getting object meta data. This article will be using the system tables defined in version 3 of
Cassandra.
Below are example queries that show how to get information on the following Cassandra objects:
keyspaces, tables, views, indexes, triggers, and table columns.
Keyspaces
Tables
The query below will return information about all tables in a Cassandra database. The
keyspace_name column can be used to filter the results by keyspace.
Table Columns
The query below will return column information for a table named employee in the sample
keyspace.
Views
The query below will return information about views defined in a Cassandra database.
Indexes
The query below will return information about indexes defined in a Cassandra database. The
target column includes information about the columns defined in the index.
Triggers
The query below will return information about triggers defined in a Cassandra database.