0% found this document useful (0 votes)
7 views27 pages

Big Data Lab Manual

The document outlines practical exercises for downloading and installing Hadoop, understanding its architecture and ecosystem, and implementing various tasks such as file management, matrix multiplication, and weather data mining using MapReduce. It describes Hadoop's components including HDFS, MapReduce, and Hive, and provides detailed steps for installation and implementation of specific algorithms. The document emphasizes Hadoop's capability to handle big data efficiently across distributed systems.

Uploaded by

Lakshay Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views27 pages

Big Data Lab Manual

The document outlines practical exercises for downloading and installing Hadoop, understanding its architecture and ecosystem, and implementing various tasks such as file management, matrix multiplication, and weather data mining using MapReduce. It describes Hadoop's components including HDFS, MapReduce, and Hive, and provides detailed steps for installation and implementation of specific algorithms. The document emphasizes Hadoop's capability to handle big data efficiently across distributed systems.

Uploaded by

Lakshay Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Practical-1

Aim: Downloading and installing Hadoop: Understanding different Hadoop


modes. Startup scripts, Configuration files.
Hadoop is an open-source framework that allows to store and process big data in a distributed
environment across clusters of computers using simple programming models. It is designed to
scale up from single servers to thousands of machines, each offering local computation and
storage.

Hadoop Architecture:

The Apache Hadoop framework includes following four modules:

Hadoop Common: Contains Java libraries and utilities needed by other Hadoop modules. These
libraries give file system and OS level abstraction and comprise of the essential Java files and
scripts that are required to start Hadoop.

Hadoop Distributed File System (HDFS): A distributed file-system that provides high-
throughput access to application data on the community machines thus providing very high
aggregate bandwidth across the cluster.

Hadoop YARN: A resource-management framework responsible for job scheduling and cluster
resource management.

Hadoop MapReduce: This is aYARN- based programming model for parallel processing of

large data sets.


HadoopEcosystem:

Hadoop has gained its popularity due to its ability of storing, analyzing and accessing large
amount of data, quickly and cost effectively through clusters of commodity hardware. It won‘t be
wrong if we say that Apache Hadoop is actually a collection of several components and not just a
single product.

With Hadoop Ecosystem there are several commercial along with an open source products which
are broadly used to make Hadoop laymen accessible and more usable.

MapReduce

Hadoop Map Reduce is a software framework for easily writing applications which process big
amounts of data in-parallel on large clusters of commodity hardware in are liable, fault-tolerant
manner. In terms of programming, there are two functions which are most common in
MapReduce.
• The Map Task: Master computer or node takes input and convert it into
divide it into smaller parts and distribute it on other worker nodes. All worker nodes solve
their own small problem and give answer to the master node.

• The Reduce Task: Master node combines all answers coming from
worker node and forms it in some form of output which is answer of our big distributed
problem.

Generally both the input and the output are reserved in a file-system. The framework is
responsible for scheduling tasks, monitoring them and even re-executes the failed tasks.

Hadoop Distributed File System(HDFS)

HDFS is a distributed file-system that provides high throughput access to data. When data is
pushed to HDFS, it automatically splits up into multiple blocks and stores/replicates the data thus
ensuring high availability and fault tolerance.

Note: A file consists of many blocks (large blocks of64MB and above).

Here are the main components of HDFS:

• Name Node: It acts as the master of the system. It maintains the name
system i.e., directories and files and manages the blocks which are present on the Data
Nodes.

• Data Nodes: They are the slaves which are deployed on each machine and
provide the actual storage. They are responsible for serving read and write requests for
the clients.

• Secondary Name Node: It is responsible for performing periodic


checkpoints. In the event of Name Node failure, you can restart the Name Node using the
checkpoint.

Hive

Hive is part of the Hadoop ecosystem and provides an SQL like interface to Hadoop. It is a data
warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the
analysis of large datasets stored in Hadoop compatible file systems.

HBase(Hadoop DataBase)

HBase is a distributed, column oriented database and uses HDFS for the underlying storage. As
said earlier, HDFS works on write once and read many times pattern, but this isn‘t a case always.
We may require real time read/write random access for huge dataset; this is where HBase comes
into the picture. HBase is built on top of HDFS and distributed on column- oriented database.

Installation Steps –

Following are steps to Install Apache Hadoop on Ubuntu14.04

$java–version
$java–version
If it returns "The program java can be found in the following packages" ,If Java isn't been
installed yet, so execute the following command:

$sudoapt-getinstalldefault-jdk
$sudoapt-getinstalldefault-jdk
$sudogedit~/.bashrc

$sudogedit~/.bashrc

• 1.Openbashrcfilein geditor
1.Openbashrcfilein geditor
• Set java environment variable

exportJAVA_HOME=/usr/jdk1.7.0_45/
exportJAVA_HOME=/usr/jdk1.7.0_45/
• Set Hadoop environment variable

exportHADOOP_HOME=/usr/Hadoop2.6/
exportHADOOP_HOME=/usr/Hadoop2.6/

• Apply environment variables

$source~/.bashrc

$source~/.bashrc
Step3: Install eclipse
Step4: Copy Hadoop plug-in ssuch as

• hadoop-eclipse-kepler-plugin-2.2.0.jar

• hadoop-eclipse-kepler-plugin-2.4.1.jar

• hadoop-eclipse-plugin-2.6.0.jarfromreleasefolderofhadoop2x-
eclipse-plugin- master to eclipse plugins

Step5:In eclipse, start new MapReduce project

File->new->other->MapReduceproject
File->new->other->MapReduceproject
Step7:Create Mapper, Reducer,and driver

Inside a project->src->File->new->other->Mapper/Reducer/Driver
Inside a project->src->File->new->other->Mapper/Reducer/Driver
Hadoop is powerful because it is extensible and it is easy to integrate with any component. Its
popularity is due in part to its ability to store, analyze and access large amounts of data, quickly
and cost effectively across clusters of commodity hardware. Apache Hadoop is not actually a
single product but instead a collection of several components. When all these components are
merged, it makes the Hadoop very user friendly.
Practical-2
AIM :- Implement the following file management tasks in Hadoop

• Adding files and directories

• Retrieving files

• Deleting Files

ALGORITHM:-

SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS

Step-1 Adding Files and Directories to HDFS

Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the data into
HDFS first. Let‘s create a directory and put a file in it. HDFS has a default working directory of

/user/$USER, where $USER is your login user name. This directory isn‘t automatically created
for you, though, so let‘s create it with the mkdir command. For the purpose of illustration, we
use chuck. You should substitute your user name in the example commands.

hadoop fs -mkdir /user/chuck

hadoop fs -put example.txt

hadoop fs -put example.txt/user/chuck

Step-2 Retrieving Files from HDFS

The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve
example.txt, we can run the following command:

hadoop fs -cat example.txt

Step-3 Deleting Files from HDFS hadoop fs -rm example.txt

• Command for creating a directory in hdfs is “hdfs dfs –mkdir /lendicse”.

• Adding directory is done through the command “hdfs dfs –put


lendi_english /”.

Step-4 Copying Data from NFS to HDFS


Copying from directory command is “hdfs dfs –copyFromLocal

/home/lendi/Desktop/shakes/glossary /lendicse/”

• View the file by using the command “hdfs dfs –cat


/lendi_english/glossary” • Command for listing of items in Hadoop is “hdfs dfs –ls
hdfs://localhost:9000/”.

• Command for Deleting files is “hdfs dfs –rm r /kartheek”.

SAMPLE INPUT: Input as any data format of type structured, Unstructured or Semi Structured

EXPECTED OUTPUT:
Practical-3

Aim: Implementation of Matrix Multiplication with Hadoop Map Reduce.

DESCRIPTION:

We can represent a matrix as a relation (table) in RDBMS where each cell in the matrix can be
represented as a record (i,j,value). As an example let us consider the following matrix and its
representation. It is important to understand that this relation is a very inefficient relation if the
matrix is dense. Let us say we have 5 Rows and 6 Columns , then we need to store only 30
values. But if you consider above relation we are storing 30 rowid, 30 col_id and 30 values in
other sense we are tripling the data. So a natural question arises why we need to store in this
format ? In practice most of the matrices are sparse matrices . In sparse matrices not all cells
used to have any values , so we don‘t have to store those cells in DB. So this turns out to be very
efficient in storing such matrices.

MapReduceLogic

Logic is to send the calculation part of each output cell of the result matrix to a reducer. So in
matrix multiplication the first cell of output (0,0) has multiplication and summation of elements
from row 0 of the matrix A and elements from col 0 of matrix B. To do the computation of value
in the output cell (0,0) of resultant matrix in a seperate reducer we need to use (0,0) as output key
of mapphase and value should have array of values from row 0 of matrix A and column 0 of
matrix B. Hopefully this picture will explain the point. So in this algorithm output from map
phase should be having a , where key represents the output cell location (0,0) , (0,1) etc.. and
value will be list of all values required for reducer to do computation. Let us take an example for
calculatiing value at output cell (00). Here we need to collect values from row 0 of matrix A and
col 0 of matrix B in the map phase and pass (0,0) as key. So a single reducer can do the
calculation

ALGORITHM

We assume that the input files for A and B are streams of (key,value) pairs in sparse matrix
format, where each key is a pair of indices (i,j) and each value is the corresponding matrix
element value. The output files for matrix C=A*B are in the same format.

We have the following input parameters:

The path of the input file or directory for matrix A.

The path of the input file or directory for matrix B.

The path of the directory for the output files for matrix C.

strategy = 1, 2, 3 or 4.

R = the number of reducers.


I = the number of rows in A and C.

K = the number of columns in A and rows in B.

J = the number of columns in B and C.

IB = the number of rows per A block and C block.

KB = the number of columns per A block and rows per B block.

JB = the number of columns per B block and C block.

In the pseudo-code for the individual strategies below, we have intentionally avoided factoring
common code for the purposes of clarity.

Note that in all the strategies the memory footprint of both the mappers and the reducers is flat at
scale.

Note that the strategies all work reasonably well with both dense and sparse matrices. For sparse
matrices we do not emit zero elements. That said, the

Steps

1. setup ()

2. var NIB = (I-1)/IB+1

3. var NKB = (K-1)/KB+1

4. var NJB = (J-1)/JB+1

5. map (key, value)

6. if from matrix A with key=(i,k) and value=a(i,k)

7. for 0 <= jb < NJB

8. emit (i/IB, k/KB, jb, 0), (i mod IB, k mod KB, a(i,k))

9. if from matrix B with key=(k,j) and value=b(k,j)

10. for 0 <= ib < NIB

emit (ib, k/KB, j/JB, 1), (k mod KB, j mod JB, b(k,j))

Intermediate keys (ib, kb, jb, m) sort in increasing order first by ib, then by kb, then by jb, then
by m. Note that m = 0 for A data and m = 1 for B data.

The partitioner maps intermediate key (ib, kb, jb, m) to a reducer r as follows:
11. r = ((ib*JB + jb)*KB + kb) mod R

12. These definitions for the sorting order and partitioner guarantee that each reducer R[ib,kb,jb]
receives the data it needs for blocks A[ib,kb] and B[kb,jb], with the data for the A block
immediately preceding the data for the B block.

13. var A = new matrix of dimension IBxKB

14. var B = new matrix of dimension KBxJB

15. var sib = -1

16. var skb = -1

Reduce (key, valueList)

17. if key is (ib, kb, jb, 0)

18. // Save the A block.

19. sib = ib

20. skb = kb

21. Zero matrix A

22. for each value = (i, k, v) in valueList A(i,k) = v

23. if key is (ib, kb, jb, 1)

24. if ib != sib or kb != skb return // A[ib,kb] must be zero!

25. // Build the B block.

26. Zero matrix B

27. for each value = (k, j, v) in valueList B(k,j) = v

28. // Multiply the blocks and emit the result.

29. ibase = ib*IB

30. jbase = jb*JB

31. for 0 <= i < row dimension of A

32. for 0 <= j < column dimension of B

33. sum = 0
34. for 0 <= k < column dimension of A = row dimension of B a. sum += A(i,k)*B(k,j)

35. if sum != 0 emit (ibase+i, jbase+j), sum

INPUT:-

Set of Data sets over different Clusters are taken as Rows and Columns

OUTPUT:

Practical-4

AIM:- Write a Map Reduce Program that mines Weather Data

DESCRIPTION:

Climate change has been seeking a lot of attention sincelong time. The antagonistic effect of this
climate is being felt in every part of the earth. There are many examples for these, such as sea
levels are rising, less rainfall, increase in humidity. The propose system overcomes thesome
issues that occurred by using other techniques. Inthis project we use the concept of Big data
Hadoop. In the proposed architecture we are able to process offline data,which is stored in the
National Climatic Data Centre(NCDC). Through this we are able to find out the maximum
temperature and minimum temperature of year, and able to predict the future weather forecast.
Finally, we plot the graph for the obtained MAX and MIN temperaturefor each moth of the
particular year to visualize thetemperature. Based on the previous year data weather data of
coming year is predicted.

ALGORITHM:-

MAP REDUCE PROGRAM

Word Count is a simple program which counts the number of occurrences of each word in a
given text input dataset. Word Count fits very well with the Map Reduce programming model
making it a great example to understand the Hadoop Map/Reduce programming style. Our
implementation consists of three main parts:

• Mapper

• Reducer

• Main program

Step-1.Write a Mapper

A Mapper overrides the―map function from the Class" org.apache.hadoop.mapreduce.Mapper"


which provides <key, value> pairs as the input. A Mapper implementation may output
<key,value>pairs using the provided Context .
Input value of the Word Count Map task will be a line of text from the input data file and the key
would be the line number <line_number, line_of_text>.Map task outputs<word,one>for each
word in the line of text.

• Pseudo-code

Void Map(key, value){

For each max_temp x in value: output.collect(x, 1);

Void Map(key, value){

For each min_temp x in value: output.collect(x, 1);

• Step-2: Write a Reducer

A Reducer collects the intermediate <key, value>output from multiple map tasks and assemble a
single result. Here, the Word Count program will sum up the occurrence of each word to pairs as
<word, occurrence>.

Pseudo-code

Void Reduce(max_temp,<list of value>){ for each x in <list of value>:

sum+=x; final_output.collect(max_temp,sum);

Void Reduce(min_temp,<list of value>){ for each x in <list of value>:

sum+=x; final_output.collect(min_temp,sum);

• Write a Driver

The Driver program configures and run the Map Reduce job. We use the main program to
perform basic configurations such as:

• Job Name : name of this Job

• Executable(Jar) Class: the main executable class. For here, Word Count.
• MapperClass: class which overrides the "map" function. Forhere, Map.

• Reducer: class which override the "reduce" function. For here , Reduce.

• Output Key: type of output key. For here, Text.

• Output Value: type of output value. For here, IntWritable.

• File Input Path

• File Output Path

INPUT:-

Set of Weather Data over the years

OUTPUT:
Practical-5

AIM:- Run a basic Word Count Map Reduce Program to understand Map Reduce Paradigm

DESCRIPTION:--

Map Reduce is the heart of Hadoop. It is this programming paradigm that allows for
massivescalabilityacrosshundredsorthousandsofserversinaHadoopcluster.TheMapReduceconcept
is fairly simple to understand for those who are familiar with clustered scale-out data processing
solutions. The term Map Reduce actually refers to two separate and distinct tasks that Hadoop
programs perform. The first is the map job, which takes a set of data and converts it into another
set of data, where individual elements are broken down into tuples (key/value pairs). The reduce
job takes the output from a map as input and combines those data tuples into a smaller set of
tuples. As the sequence of the name Map Reduce implies, the reduce job is always performed
after the map job.

ALGORITHM

MAP REDUCE PROGRAM

Word Count is a simple program which counts the number of occurrences of each word in a
given text input dataset. Word Count fits very well with the Map Reduce programming model
making it a great example to understand the Hadoop Map/Reduce programming style. Our
implementation consists of three main parts:

• Mapper

• Reducer

• Driver

Step-1.WriteaMapper

A Mapper overrides the―map function from the Class "org.apache.hadoop.mapreduce.Mapper"


which provides <key, value> pairs as the input. A Mapper implementation may output

<key,value> pairs using the provided Context .


Input value of the Word Count Map task will be a line of text from the input data file and the key
would be the line number <line_number,line_of_text>.Map task outputs<word, one>for each
word in the line of text.

Pseudo-code

Void Map (key, value){

For each word x in value: output.collect(x, 1);

Step-2.WriteaReducer

A Reducer collects the intermediate <key,value>output from multiple map tasks and assemble a
single result. Here, the Word Count program will sum up the occurrence of each word to pairs as
<word,occurrence>.

Pseudo-code

Void Reduce(keyword,<list of value>){ for each x in <list of value>:

sum+=x;

final_output.collect(keyword, sum);

Step-3.WriteDriver

The Driver program configures and run the Map Reduce job. We use the main program to
perform basic configurations such as:

• Job Name: name of this Job

• Executable(Jar)Class: the main executable class. For here, Word Count.


• Mapper Class: class which overrides the "map" function. For here, Map.

• Reducer: class which override the "reduce" function. For here, Reduce.

• Output Key: type of output key. For here, Text.

• Output Value: type of output value. For here, Int Writable.

• File Input Path

• File Output Path

INPUT:-

Set of Data Related Shakespeare Comedies, Glossary, Poems

Output:
Practical-6

AIM:-

Install and Run Hive then use Hive to Create, alter and drop databases, tables, views, functions
and Indexes.

DESCRIPTION

Hive, allows SQL developers to write Hive Query Language (HQL) statements that are similar to
standard SQL statements; now you should be aware that HQL is limited in the commands it
understands, but it is still pretty useful. HQL statements are broken down by the Hive service
into Map Reduce jobs and executed across a Hadoop cluster. Hive looks very much like
traditional database code with SQL access. However, because Hive is based on Hadoop and Map
Reduce operations, there are several key differences. The first is that Hadoop is intended for long
sequential scans, and because Hive is based on Hadoop, you can expect queries to have a very
high latency (many minutes). This means that Hive would not be appropriate for applications that
need very fast response times, as you would expect with a database such as DB2. Finally, Hive is
read-based and therefore not appropriate for transaction processing that typically involves a high
percentage of write operations.

ALGORITHM:

Apache HIVE INSTALLATION STEPS

• Install MySQL-Server

Sudo apt-get install mysql-server

• Configuring MySQL UserName and Password

• Creating User and granting all Privileges Mysql –uroot –proot

Create user<USER_NAME>identified by<PASSWORD>

• Extract and Configure Apache Hive

Tar xvfz apache-hive-1.0.1.bin.tar.gz

• Move Apache Hive from Local directory to Home directory


• Set CLASSPATH in bashrc

Export HIVE_HOME = /home/apache-hive Export PATH= $PATH:$HIVE_HOME/bin

• Configuring hive- default.xml by adding MySQL Server Credentials

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value> jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true

</value>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>hadoop</value>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>hadoop</value>

</property>

• Copyingmysql-java-connector.jartohive/libdirectory.
SYNTAX for HIVE Database Operations DATABASE Creation

CREATE DATABASE|SCHEMA[IFNOTEXISTS]<database name>

Drop Database Statement

DROP DATABASE Statement DROP (DATABASE|SCHEMA)[IFEXISTS]

database_name[RESTRICT|CASCADE];

Creating and Dropping Table in HIVE

CREATE [TEMPORARY] [EXTERNAL]TABLE[IFNOTEXISTS][db_name.]table_name

[(col_namedata_type[COMMENTcol_comment],...)]

[COMMENT table_comment][ROWFORMAT row_format][STORED AS file_format]

Loading Data into table log_data Syntax:

LOAD DATALOCALINPATH'<path>/u.data'OVERWRITE IN TO TABLE u_data;

Alter Table in HIVE

Syntax

ALTER TABLE name RENAME TO new_name

ALTER TABLE name ADDCOLUMNS(col_spec[,col_spec...]) ALTER TABLE name DROP


[COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name
REPLACECOLUMNS(col_spec[,col_spec...])

Creating and Dropping View

CREATE VIEW[IFNOTEXISTS] view_name[(column_name[COMMENT


column_comment], ...) ] [COMMENT table_comment] AS SELECT ...

Dropping View Syntax:

DROPVIEW view_name

Functions in HIVE

String Functions:-round(),ceil(),substr(),upper(),reg_exp()etc Date and Time Functions:- year(),


month(), day(), to_date() etc Aggregate Functions :- sum(), min(), max(), count(), avg() etc

INDEXES

CREATE INDEX index_name ON TABLE base_table_name(col_name,...) AS


'index.handler.class.name'

[WITH DEFERRED REBUILD]

[IDX PROPERTIES(property_name=property_value,...)] [IN TABLE index_table_name]

[PARTITIONEDBY(col_name,...)] [

[ROWFORMAT...]STOREDAS...

|STORED BY...

[LOCATIONhdfs_path] [TBLPROPERTIES(...)]

Creating Index
CREATE INDEX index_ip ON TABLE log_data(ip_address) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'WITHDEFERRED REBUILD;

Altering and Inserting Index

ALTER INDEX index_ip_addressONlog_data REBUILD;

Storing Index Data in Metastore

SET

hive.index.compact.file=/home/administrator/Desktop/big/metastore_db/tmp/index_ipaddress_re
sult;

SET

hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;

Dropping Index

DROP INDEXINDEX_NAMEon TABLE_NAME;

INPUT

Input as Web Server Log Data

OUTPUT
Practical-7

DESCRIPTION

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The
language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in
MapReduce,ApacheTez,orApacheSpark.PigLatinabstractstheprogrammingfromtheJavaMapRedu
ce idiom into a notation which makes MapReduce programming high level, similar to that of
SQL for RDBMSs. Pig Latin can be extended using User Defined Functions(UDFs) which the
user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the
language.

Pig Latin is procedural and fits very naturally in the pipeline paradigm while SQL is instead
declarative. In SQL users can specify that data from two tables must be joined, but not what join
implementation to use (You can specify the implementation of JOIN in SQL, thus "... for many
SQL applications the query writer may not have enough knowledge of the data or enough
expertise to specify an appropriate join algorithm."). Pig Latin allows users to specify an
implementationoraspectsofanimplementationtobeusedinexecutingascriptinseveral ways. In effect,
Pig Latin programming is similar to specifying a query execution plan, making it easier for
programmers to explicitly control the flow of their data processing task.

SQL is oriented around queries that produce a single result. SQL handles trees naturally, but has
no built in mechanism for splitting a data processing stream and applying different operators to
each sub-stream. Pig Latin script describes a directed acyclic graph(DAG)rather than a pipeline.

Pig Latin's ability to include user code at any point in the pipeline is useful for pipeline
development. If SQL is used, data must first be imported into the database, and then the
cleansing and transformation process can begin.

ALGORITHM

STEPSFORINSTALLINGAPACHEPIG

• Extract the pig-0.15.0.tar.gz and move to home directory

• Set the environment of PIG in bashrc file.


• Pig can run in two modes

Local Mode and Hadoop Mode Pig –x local and pig

• Grunt Shell

Grunt>

• LOADING Data into Grunt Shell

DATA=LOAD<CLASSPATH>USING PigStorage(DELIMITER)as(ATTRIBUTE:

DataType1,ATTRIBUTE: DataType2…..)

• Describe Data

Describe DATA;

• DUMP Data

Dump DATA;

• FILTER Data

FDATA =FILTER DATA by ATTRIBUTE=VALUE;

• GROUP Data

GDATA=GROUP DATA by ATTRIBUTE;

• Iterating Data
FOR_DATA=FOREACHDATAGENERATEGROUPASGROUP_FUN, ATTRIBUTE =
<VALUE>

• Sorting Data

SORT_DATA=ORDER DATA BY ATTRIBUTE WITH CONDITION;

• LIMIT Data

LIMIT_DATA=LIMIT DATA COUNT;

• JOIN Data

JOIN DATA1 BY (ATTRIBUTE1,ATTRIBUTE2….),DATA2 BY


(ATTRIBUTE3,ATTRIBUTE….N)

INPUT:

Input as Website Click Count Data

OUTPUT:
Practical-8

Aim: Install and Configure MongoDB to execute NoSQL Commands.

Hardware/Software Required: MongoDB

A NoSQL (originally referring to "non SQL" or "non-relational") database provides a mechanism


for storage and retrieval of data which is modeled in means other than the tabular relations used
in relational databases. Relational databases were not designed to cope with the scale and agility
challenges that face modern applications, nor were they built to take advantage of the commodity
storage and processing power available today.

The Benefits of NoSQL

• When compared to relational databases, NoSQL databases are more


scalable and provide superior performance, and their data model addresses several
issues that the relational model is not designed to address:

• Large volumes of rapidly changing structured, semi-structured, and


unstructured data.

• Agiles prints, quick schema iteration, and frequent code pushes.

• Object-oriented programming that is easy to use and flexible.

• Geographically distributed scale-out architecture instead of


expensive, monolithic architecture.

MongoDB

It is an open-source document database, and leading NoSQL database. MongoDB is written in c+


+. It is a cross-platform, document oriented database that provides, high performance, high
availability, and easy scalability. Mongo DB works on concept of collection and document.

MongoDB Installation steps (Ubuntu18.04):

Step1—Update system

Sudo apt-get update

Step 2 — Installing and Verifying MongoDB Now we can install the MongoDB package itself.
Sudo apt-get install-ymongodb

This command will install several packages containing latest stable version of MongoDB along
with helpful management tools for the MongoDB server.

After package installation Mongo DB will be automatically started. you can check this by
running the following command.

Sudo service mongod status

If MongoDB is running, you'll see an output like this (with a different process ID).

Output

mongodstart/running,process1611

You can also stop, start, and restart MongoDB using the service command (e.g.service mongod
stop, service mongod start).

Commands

MongoDB use DATABASE_NAME is used to create database. The command will create a new
database, if it doesn't exist otherwise it will return the existing database.

Syntax:

Basic syntax of use DATABASE statement is as follows:

Use DATABASE_NAME

The drop Database()Method

MongoDB db.dropDatabase()command is used to drop a existing database.

Syntax:
Basic syntax of dropDatabase() command is as follows:

db.dropDatabase()

This will delete the selected database. If you have not selected any database, then it will delete
default 'test' database

The create Collection ()Method

MongoDBdb.createCollection(name,options)isusedtocreatecollection.

Syntax:

Basic syntax of create Collection()command is as follows db.createCollection(name, options)

In the command, name is name of collection to be created. Options is a document and used to
specify configuration of collection

The find()Method

To query data from MongoDB collection, you need to use MongoDB's find () method.

Syntax

Basic syntax of find()method is as follows

>db.COLLECTION_NAME.find().

MongoDB'supdate() and save() methods are used to update document into a collection. The
update() method update values in the existing document while the save()method replaces the
existing document with the document passed in save() method.

MongoDBUpdate() method

The update()method updates values in the existing document.

Syntax

Basic syntax of update()method is as follows

>db.COLLECTION_NAME.update(SELECTIOIN_CRITERIA,UPDATED_DATA)

You might also like