0% found this document useful (0 votes)

7 views27 pages

Big Data Lab Manual

The document outlines practical exercises for downloading and installing Hadoop, understanding its architecture and ecosystem, and implementing various tasks such as file management, matrix multiplication, and weather data mining using MapReduce. It describes Hadoop's components including HDFS, MapReduce, and Hive, and provides detailed steps for installation and implementation of specific algorithms. The document emphasizes Hadoop's capability to handle big data efficiently across distributed systems.

Uploaded by

Lakshay Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views27 pages

Big Data Lab Manual

Uploaded by

Lakshay Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Practical-1

Aim: Downloading and installing Hadoop: Understanding different Hadoop

modes. Startup scripts, Configuration files.
Hadoop is an open-source framework that allows to store and process big data in a distributed
environment across clusters of computers using simple programming models. It is designed to
scale up from single servers to thousands of machines, each offering local computation and
storage.

Hadoop Architecture:

The Apache Hadoop framework includes following four modules:

Hadoop Common: Contains Java libraries and utilities needed by other Hadoop modules. These
libraries give file system and OS level abstraction and comprise of the essential Java files and
scripts that are required to start Hadoop.

Hadoop Distributed File System (HDFS): A distributed file-system that provides high-
throughput access to application data on the community machines thus providing very high
aggregate bandwidth across the cluster.

Hadoop YARN: A resource-management framework responsible for job scheduling and cluster
resource management.

Hadoop MapReduce: This is aYARN- based programming model for parallel processing of

large data sets.

HadoopEcosystem:

Hadoop has gained its popularity due to its ability of storing, analyzing and accessing large
amount of data, quickly and cost effectively through clusters of commodity hardware. It won‘t be
wrong if we say that Apache Hadoop is actually a collection of several components and not just a
single product.

With Hadoop Ecosystem there are several commercial along with an open source products which
are broadly used to make Hadoop laymen accessible and more usable.

MapReduce

Hadoop Map Reduce is a software framework for easily writing applications which process big
amounts of data in-parallel on large clusters of commodity hardware in are liable, fault-tolerant
manner. In terms of programming, there are two functions which are most common in
MapReduce.
• The Map Task: Master computer or node takes input and convert it into
divide it into smaller parts and distribute it on other worker nodes. All worker nodes solve
their own small problem and give answer to the master node.

• The Reduce Task: Master node combines all answers coming from
worker node and forms it in some form of output which is answer of our big distributed
problem.

Generally both the input and the output are reserved in a file-system. The framework is
responsible for scheduling tasks, monitoring them and even re-executes the failed tasks.

Hadoop Distributed File System(HDFS)

HDFS is a distributed file-system that provides high throughput access to data. When data is
pushed to HDFS, it automatically splits up into multiple blocks and stores/replicates the data thus
ensuring high availability and fault tolerance.

Note: A file consists of many blocks (large blocks of64MB and above).

Here are the main components of HDFS:

• Name Node: It acts as the master of the system. It maintains the name
system i.e., directories and files and manages the blocks which are present on the Data
Nodes.

• Data Nodes: They are the slaves which are deployed on each machine and
provide the actual storage. They are responsible for serving read and write requests for
the clients.

• Secondary Name Node: It is responsible for performing periodic

checkpoints. In the event of Name Node failure, you can restart the Name Node using the
checkpoint.

Hive

Hive is part of the Hadoop ecosystem and provides an SQL like interface to Hadoop. It is a data
warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the
analysis of large datasets stored in Hadoop compatible file systems.

HBase(Hadoop DataBase)

HBase is a distributed, column oriented database and uses HDFS for the underlying storage. As
said earlier, HDFS works on write once and read many times pattern, but this isn‘t a case always.
We may require real time read/write random access for huge dataset; this is where HBase comes
into the picture. HBase is built on top of HDFS and distributed on column- oriented database.

Installation Steps –

Following are steps to Install Apache Hadoop on Ubuntu14.04

$java–version
$java–version
If it returns "The program java can be found in the following packages" ,If Java isn't been
installed yet, so execute the following command:

$sudoapt-getinstalldefault-jdk
$sudoapt-getinstalldefault-jdk
$sudogedit~/.bashrc

$sudogedit~/.bashrc

• 1.Openbashrcfilein geditor
1.Openbashrcfilein geditor
• Set java environment variable

exportJAVA_HOME=/usr/jdk1.7.0_45/
exportJAVA_HOME=/usr/jdk1.7.0_45/
• Set Hadoop environment variable

exportHADOOP_HOME=/usr/Hadoop2.6/
exportHADOOP_HOME=/usr/Hadoop2.6/

• Apply environment variables

$source~/.bashrc

$source~/.bashrc
Step3: Install eclipse
Step4: Copy Hadoop plug-in ssuch as

• hadoop-eclipse-kepler-plugin-2.2.0.jar

• hadoop-eclipse-kepler-plugin-2.4.1.jar

• hadoop-eclipse-plugin-2.6.0.jarfromreleasefolderofhadoop2x-
eclipse-plugin- master to eclipse plugins

Step5:In eclipse, start new MapReduce project

File->new->other->MapReduceproject
File->new->other->MapReduceproject
Step7:Create Mapper, Reducer,and driver

Inside a project->src->File->new->other->Mapper/Reducer/Driver
Inside a project->src->File->new->other->Mapper/Reducer/Driver
Hadoop is powerful because it is extensible and it is easy to integrate with any component. Its
popularity is due in part to its ability to store, analyze and access large amounts of data, quickly
and cost effectively across clusters of commodity hardware. Apache Hadoop is not actually a
single product but instead a collection of several components. When all these components are
merged, it makes the Hadoop very user friendly.
Practical-2
AIM :- Implement the following file management tasks in Hadoop

• Adding files and directories

• Retrieving files

• Deleting Files

ALGORITHM:-

SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS

Step-1 Adding Files and Directories to HDFS

Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the data into
HDFS first. Let‘s create a directory and put a file in it. HDFS has a default working directory of

/user/$USER, where $USER is your login user name. This directory isn‘t automatically created
for you, though, so let‘s create it with the mkdir command. For the purpose of illustration, we
use chuck. You should substitute your user name in the example commands.

hadoop fs -mkdir /user/chuck

hadoop fs -put example.txt

hadoop fs -put example.txt/user/chuck

Step-2 Retrieving Files from HDFS

The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve
example.txt, we can run the following command:

hadoop fs -cat example.txt

Step-3 Deleting Files from HDFS hadoop fs -rm example.txt

• Command for creating a directory in hdfs is “hdfs dfs –mkdir /lendicse”.

• Adding directory is done through the command “hdfs dfs –put

lendi_english /”.

Step-4 Copying Data from NFS to HDFS

Copying from directory command is “hdfs dfs –copyFromLocal

/home/lendi/Desktop/shakes/glossary /lendicse/”

• View the file by using the command “hdfs dfs –cat

/lendi_english/glossary” • Command for listing of items in Hadoop is “hdfs dfs –ls
hdfs://localhost:9000/”.

• Command for Deleting files is “hdfs dfs –rm r /kartheek”.

SAMPLE INPUT: Input as any data format of type structured, Unstructured or Semi Structured

EXPECTED OUTPUT:
Practical-3

Aim: Implementation of Matrix Multiplication with Hadoop Map Reduce.

DESCRIPTION:

We can represent a matrix as a relation (table) in RDBMS where each cell in the matrix can be
represented as a record (i,j,value). As an example let us consider the following matrix and its
representation. It is important to understand that this relation is a very inefficient relation if the
matrix is dense. Let us say we have 5 Rows and 6 Columns , then we need to store only 30
values. But if you consider above relation we are storing 30 rowid, 30 col_id and 30 values in
other sense we are tripling the data. So a natural question arises why we need to store in this
format ? In practice most of the matrices are sparse matrices . In sparse matrices not all cells
used to have any values , so we don‘t have to store those cells in DB. So this turns out to be very
efficient in storing such matrices.

MapReduceLogic

Logic is to send the calculation part of each output cell of the result matrix to a reducer. So in
matrix multiplication the first cell of output (0,0) has multiplication and summation of elements
from row 0 of the matrix A and elements from col 0 of matrix B. To do the computation of value
in the output cell (0,0) of resultant matrix in a seperate reducer we need to use (0,0) as output key
of mapphase and value should have array of values from row 0 of matrix A and column 0 of
matrix B. Hopefully this picture will explain the point. So in this algorithm output from map
phase should be having a , where key represents the output cell location (0,0) , (0,1) etc.. and
value will be list of all values required for reducer to do computation. Let us take an example for
calculatiing value at output cell (00). Here we need to collect values from row 0 of matrix A and
col 0 of matrix B in the map phase and pass (0,0) as key. So a single reducer can do the
calculation

ALGORITHM

We assume that the input files for A and B are streams of (key,value) pairs in sparse matrix
format, where each key is a pair of indices (i,j) and each value is the corresponding matrix
element value. The output files for matrix C=A*B are in the same format.

We have the following input parameters:

The path of the input file or directory for matrix A.

The path of the input file or directory for matrix B.

The path of the directory for the output files for matrix C.

strategy = 1, 2, 3 or 4.

R = the number of reducers.

I = the number of rows in A and C.

K = the number of columns in A and rows in B.

J = the number of columns in B and C.

IB = the number of rows per A block and C block.

KB = the number of columns per A block and rows per B block.

JB = the number of columns per B block and C block.

In the pseudo-code for the individual strategies below, we have intentionally avoided factoring
common code for the purposes of clarity.

Note that in all the strategies the memory footprint of both the mappers and the reducers is flat at
scale.

Note that the strategies all work reasonably well with both dense and sparse matrices. For sparse
matrices we do not emit zero elements. That said, the

Steps

1. setup ()

2. var NIB = (I-1)/IB+1

3. var NKB = (K-1)/KB+1

4. var NJB = (J-1)/JB+1

5. map (key, value)

6. if from matrix A with key=(i,k) and value=a(i,k)

7. for 0 <= jb < NJB

8. emit (i/IB, k/KB, jb, 0), (i mod IB, k mod KB, a(i,k))

9. if from matrix B with key=(k,j) and value=b(k,j)

10. for 0 <= ib < NIB

emit (ib, k/KB, j/JB, 1), (k mod KB, j mod JB, b(k,j))

Intermediate keys (ib, kb, jb, m) sort in increasing order first by ib, then by kb, then by jb, then
by m. Note that m = 0 for A data and m = 1 for B data.

The partitioner maps intermediate key (ib, kb, jb, m) to a reducer r as follows:
11. r = ((ib*JB + jb)*KB + kb) mod R

12. These definitions for the sorting order and partitioner guarantee that each reducer R[ib,kb,jb]
receives the data it needs for blocks A[ib,kb] and B[kb,jb], with the data for the A block
immediately preceding the data for the B block.

13. var A = new matrix of dimension IBxKB

14. var B = new matrix of dimension KBxJB

15. var sib = -1

16. var skb = -1

Reduce (key, valueList)

17. if key is (ib, kb, jb, 0)

18. // Save the A block.

19. sib = ib

20. skb = kb

21. Zero matrix A

22. for each value = (i, k, v) in valueList A(i,k) = v

23. if key is (ib, kb, jb, 1)

24. if ib != sib or kb != skb return // A[ib,kb] must be zero!

25. // Build the B block.

26. Zero matrix B

27. for each value = (k, j, v) in valueList B(k,j) = v

28. // Multiply the blocks and emit the result.

29. ibase = ib*IB

30. jbase = jb*JB

31. for 0 <= i < row dimension of A

32. for 0 <= j < column dimension of B

33. sum = 0
34. for 0 <= k < column dimension of A = row dimension of B a. sum += A(i,k)*B(k,j)

35. if sum != 0 emit (ibase+i, jbase+j), sum

INPUT:-

Set of Data sets over different Clusters are taken as Rows and Columns

OUTPUT:

Practical-4

AIM:- Write a Map Reduce Program that mines Weather Data

DESCRIPTION:

Climate change has been seeking a lot of attention sincelong time. The antagonistic effect of this
climate is being felt in every part of the earth. There are many examples for these, such as sea
levels are rising, less rainfall, increase in humidity. The propose system overcomes thesome
issues that occurred by using other techniques. Inthis project we use the concept of Big data
Hadoop. In the proposed architecture we are able to process offline data,which is stored in the
National Climatic Data Centre(NCDC). Through this we are able to find out the maximum
temperature and minimum temperature of year, and able to predict the future weather forecast.
Finally, we plot the graph for the obtained MAX and MIN temperaturefor each moth of the
particular year to visualize thetemperature. Based on the previous year data weather data of
coming year is predicted.

ALGORITHM:-

MAP REDUCE PROGRAM

Word Count is a simple program which counts the number of occurrences of each word in a
given text input dataset. Word Count fits very well with the Map Reduce programming model
making it a great example to understand the Hadoop Map/Reduce programming style. Our
implementation consists of three main parts:

• Mapper

• Reducer

• Main program

Step-1.Write a Mapper

A Mapper overrides the―map function from the Class" org.apache.hadoop.mapreduce.Mapper"

which provides <key, value> pairs as the input. A Mapper implementation may output
<key,value>pairs using the provided Context .
Input value of the Word Count Map task will be a line of text from the input data file and the key
would be the line number <line_number, line_of_text>.Map task outputs<word,one>for each
word in the line of text.

• Pseudo-code

Void Map(key, value){

For each max_temp x in value: output.collect(x, 1);

Void Map(key, value){

For each min_temp x in value: output.collect(x, 1);

• Step-2: Write a Reducer

A Reducer collects the intermediate <key, value>output from multiple map tasks and assemble a
single result. Here, the Word Count program will sum up the occurrence of each word to pairs as
<word, occurrence>.

Pseudo-code

Void Reduce(max_temp,<list of value>){ for each x in <list of value>:

sum+=x; final_output.collect(max_temp,sum);

Void Reduce(min_temp,<list of value>){ for each x in <list of value>:

sum+=x; final_output.collect(min_temp,sum);

• Write a Driver

The Driver program configures and run the Map Reduce job. We use the main program to
perform basic configurations such as:

• Job Name : name of this Job

• Executable(Jar) Class: the main executable class. For here, Word Count.
• MapperClass: class which overrides the "map" function. Forhere, Map.

• Reducer: class which override the "reduce" function. For here , Reduce.

• Output Key: type of output key. For here, Text.

• Output Value: type of output value. For here, IntWritable.

• File Input Path

• File Output Path

INPUT:-

Set of Weather Data over the years

OUTPUT:
Practical-5

AIM:- Run a basic Word Count Map Reduce Program to understand Map Reduce Paradigm

DESCRIPTION:--

Map Reduce is the heart of Hadoop. It is this programming paradigm that allows for
massivescalabilityacrosshundredsorthousandsofserversinaHadoopcluster.TheMapReduceconcept
is fairly simple to understand for those who are familiar with clustered scale-out data processing
solutions. The term Map Reduce actually refers to two separate and distinct tasks that Hadoop
programs perform. The first is the map job, which takes a set of data and converts it into another
set of data, where individual elements are broken down into tuples (key/value pairs). The reduce
job takes the output from a map as input and combines those data tuples into a smaller set of
tuples. As the sequence of the name Map Reduce implies, the reduce job is always performed
after the map job.

ALGORITHM

MAP REDUCE PROGRAM

• Mapper

• Reducer

• Driver

Step-1.WriteaMapper

A Mapper overrides the―map function from the Class "org.apache.hadoop.mapreduce.Mapper"

which provides <key, value> pairs as the input. A Mapper implementation may output

<key,value> pairs using the provided Context .

Input value of the Word Count Map task will be a line of text from the input data file and the key
would be the line number <line_number,line_of_text>.Map task outputs<word, one>for each
word in the line of text.

Pseudo-code

Void Map (key, value){

For each word x in value: output.collect(x, 1);

Step-2.WriteaReducer

A Reducer collects the intermediate <key,value>output from multiple map tasks and assemble a
single result. Here, the Word Count program will sum up the occurrence of each word to pairs as
<word,occurrence>.

Pseudo-code

Void Reduce(keyword,<list of value>){ for each x in <list of value>:

sum+=x;

final_output.collect(keyword, sum);

Step-3.WriteDriver

The Driver program configures and run the Map Reduce job. We use the main program to
perform basic configurations such as:

• Job Name: name of this Job

• Executable(Jar)Class: the main executable class. For here, Word Count.

• Mapper Class: class which overrides the "map" function. For here, Map.

• Reducer: class which override the "reduce" function. For here, Reduce.

• Output Key: type of output key. For here, Text.

• Output Value: type of output value. For here, Int Writable.

• File Input Path

• File Output Path

INPUT:-

Set of Data Related Shakespeare Comedies, Glossary, Poems

Output:
Practical-6

AIM:-

Install and Run Hive then use Hive to Create, alter and drop databases, tables, views, functions
and Indexes.

DESCRIPTION

Hive, allows SQL developers to write Hive Query Language (HQL) statements that are similar to
standard SQL statements; now you should be aware that HQL is limited in the commands it
understands, but it is still pretty useful. HQL statements are broken down by the Hive service
into Map Reduce jobs and executed across a Hadoop cluster. Hive looks very much like
traditional database code with SQL access. However, because Hive is based on Hadoop and Map
Reduce operations, there are several key differences. The first is that Hadoop is intended for long
sequential scans, and because Hive is based on Hadoop, you can expect queries to have a very
high latency (many minutes). This means that Hive would not be appropriate for applications that
need very fast response times, as you would expect with a database such as DB2. Finally, Hive is
read-based and therefore not appropriate for transaction processing that typically involves a high
percentage of write operations.

ALGORITHM:

Apache HIVE INSTALLATION STEPS

• Install MySQL-Server

Sudo apt-get install mysql-server

• Configuring MySQL UserName and Password

• Creating User and granting all Privileges Mysql –uroot –proot

Create user<USER_NAME>identified by<PASSWORD>

• Extract and Configure Apache Hive

Tar xvfz apache-hive-1.0.1.bin.tar.gz

• Move Apache Hive from Local directory to Home directory

• Set CLASSPATH in bashrc

Export HIVE_HOME = /home/apache-hive Export PATH= $PATH:$HIVE_HOME/bin

• Configuring hive- default.xml by adding MySQL Server Credentials

<name>javax.jdo.option.ConnectionURL</name>

<value> jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true

</value>

</property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>hadoop</value>

</property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>hadoop</value>

</property>

• Copyingmysql-java-connector.jartohive/libdirectory.
SYNTAX for HIVE Database Operations DATABASE Creation

CREATE DATABASE|SCHEMA[IFNOTEXISTS]<database name>

Drop Database Statement

DROP DATABASE Statement DROP (DATABASE|SCHEMA)[IFEXISTS]

database_name[RESTRICT|CASCADE];

Creating and Dropping Table in HIVE

CREATE [TEMPORARY] [EXTERNAL]TABLE[IFNOTEXISTS][db_name.]table_name

[(col_namedata_type[COMMENTcol_comment],...)]

[COMMENT table_comment][ROWFORMAT row_format][STORED AS file_format]

Loading Data into table log_data Syntax:

LOAD DATALOCALINPATH'<path>/u.data'OVERWRITE IN TO TABLE u_data;

Alter Table in HIVE

Syntax

ALTER TABLE name RENAME TO new_name

ALTER TABLE name ADDCOLUMNS(col_spec[,col_spec...]) ALTER TABLE name DROP

[COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name
REPLACECOLUMNS(col_spec[,col_spec...])

Creating and Dropping View

CREATE VIEW[IFNOTEXISTS] view_name[(column_name[COMMENT

column_comment], ...) ] [COMMENT table_comment] AS SELECT ...

Dropping View Syntax:

DROPVIEW view_name

Functions in HIVE

String Functions:-round(),ceil(),substr(),upper(),reg_exp()etc Date and Time Functions:- year(),

month(), day(), to_date() etc Aggregate Functions :- sum(), min(), max(), count(), avg() etc

INDEXES

CREATE INDEX index_name ON TABLE base_table_name(col_name,...) AS

'index.handler.class.name'

[WITH DEFERRED REBUILD]

[IDX PROPERTIES(property_name=property_value,...)] [IN TABLE index_table_name]

[PARTITIONEDBY(col_name,...)] [

[ROWFORMAT...]STOREDAS...

|STORED BY...

[LOCATIONhdfs_path] [TBLPROPERTIES(...)]

Creating Index
CREATE INDEX index_ip ON TABLE log_data(ip_address) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'WITHDEFERRED REBUILD;

Altering and Inserting Index

ALTER INDEX index_ip_addressONlog_data REBUILD;

Storing Index Data in Metastore

SET

hive.index.compact.file=/home/administrator/Desktop/big/metastore_db/tmp/index_ipaddress_re
sult;

SET

hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;

Dropping Index

DROP INDEXINDEX_NAMEon TABLE_NAME;

INPUT

Input as Web Server Log Data

OUTPUT
Practical-7

DESCRIPTION

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The
language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in
MapReduce,ApacheTez,orApacheSpark.PigLatinabstractstheprogrammingfromtheJavaMapRedu
ce idiom into a notation which makes MapReduce programming high level, similar to that of
SQL for RDBMSs. Pig Latin can be extended using User Defined Functions(UDFs) which the
user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the
language.

Pig Latin is procedural and fits very naturally in the pipeline paradigm while SQL is instead
declarative. In SQL users can specify that data from two tables must be joined, but not what join
implementation to use (You can specify the implementation of JOIN in SQL, thus "... for many
SQL applications the query writer may not have enough knowledge of the data or enough
expertise to specify an appropriate join algorithm."). Pig Latin allows users to specify an
implementationoraspectsofanimplementationtobeusedinexecutingascriptinseveral ways. In effect,
Pig Latin programming is similar to specifying a query execution plan, making it easier for
programmers to explicitly control the flow of their data processing task.

SQL is oriented around queries that produce a single result. SQL handles trees naturally, but has
no built in mechanism for splitting a data processing stream and applying different operators to
each sub-stream. Pig Latin script describes a directed acyclic graph(DAG)rather than a pipeline.

Pig Latin's ability to include user code at any point in the pipeline is useful for pipeline
development. If SQL is used, data must first be imported into the database, and then the
cleansing and transformation process can begin.

ALGORITHM

STEPSFORINSTALLINGAPACHEPIG

• Extract the pig-0.15.0.tar.gz and move to home directory

• Set the environment of PIG in bashrc file.

• Pig can run in two modes

Local Mode and Hadoop Mode Pig –x local and pig

• Grunt Shell

Grunt>

• LOADING Data into Grunt Shell

DATA=LOAD<CLASSPATH>USING PigStorage(DELIMITER)as(ATTRIBUTE:

DataType1,ATTRIBUTE: DataType2…..)

• Describe Data

Describe DATA;

• DUMP Data

Dump DATA;

• FILTER Data

FDATA =FILTER DATA by ATTRIBUTE=VALUE;

• GROUP Data

GDATA=GROUP DATA by ATTRIBUTE;

• Iterating Data
FOR_DATA=FOREACHDATAGENERATEGROUPASGROUP_FUN, ATTRIBUTE =
<VALUE>

• Sorting Data

SORT_DATA=ORDER DATA BY ATTRIBUTE WITH CONDITION;

• LIMIT Data

LIMIT_DATA=LIMIT DATA COUNT;

• JOIN Data

JOIN DATA1 BY (ATTRIBUTE1,ATTRIBUTE2….),DATA2 BY

(ATTRIBUTE3,ATTRIBUTE….N)

INPUT:

Input as Website Click Count Data

OUTPUT:
Practical-8

Aim: Install and Configure MongoDB to execute NoSQL Commands.

Hardware/Software Required: MongoDB

A NoSQL (originally referring to "non SQL" or "non-relational") database provides a mechanism

for storage and retrieval of data which is modeled in means other than the tabular relations used
in relational databases. Relational databases were not designed to cope with the scale and agility
challenges that face modern applications, nor were they built to take advantage of the commodity
storage and processing power available today.

The Benefits of NoSQL

• When compared to relational databases, NoSQL databases are more

scalable and provide superior performance, and their data model addresses several
issues that the relational model is not designed to address:

• Large volumes of rapidly changing structured, semi-structured, and

unstructured data.

• Agiles prints, quick schema iteration, and frequent code pushes.

• Object-oriented programming that is easy to use and flexible.

• Geographically distributed scale-out architecture instead of

expensive, monolithic architecture.

MongoDB

It is an open-source document database, and leading NoSQL database. MongoDB is written in c+

+. It is a cross-platform, document oriented database that provides, high performance, high
availability, and easy scalability. Mongo DB works on concept of collection and document.

MongoDB Installation steps (Ubuntu18.04):

Step1—Update system

Sudo apt-get update

Step 2 — Installing and Verifying MongoDB Now we can install the MongoDB package itself.
Sudo apt-get install-ymongodb

This command will install several packages containing latest stable version of MongoDB along
with helpful management tools for the MongoDB server.

After package installation Mongo DB will be automatically started. you can check this by
running the following command.

Sudo service mongod status

If MongoDB is running, you'll see an output like this (with a different process ID).

Output

mongodstart/running,process1611

You can also stop, start, and restart MongoDB using the service command (e.g.service mongod
stop, service mongod start).

Commands

MongoDB use DATABASE_NAME is used to create database. The command will create a new
database, if it doesn't exist otherwise it will return the existing database.

Syntax:

Basic syntax of use DATABASE statement is as follows:

Use DATABASE_NAME

The drop Database()Method

MongoDB db.dropDatabase()command is used to drop a existing database.

Syntax:
Basic syntax of dropDatabase() command is as follows:

db.dropDatabase()

This will delete the selected database. If you have not selected any database, then it will delete
default 'test' database

The create Collection ()Method

MongoDBdb.createCollection(name,options)isusedtocreatecollection.

Syntax:

Basic syntax of create Collection()command is as follows db.createCollection(name, options)

In the command, name is name of collection to be created. Options is a document and used to
specify configuration of collection

The find()Method

To query data from MongoDB collection, you need to use MongoDB's find () method.

Syntax

Basic syntax of find()method is as follows

>db.COLLECTION_NAME.find().

MongoDB'supdate() and save() methods are used to update document into a collection. The
update() method update values in the existing document while the save()method replaces the
existing document with the document passed in save() method.

MongoDBUpdate() method

The update()method updates values in the existing document.

Syntax

Basic syntax of update()method is as follows

>db.COLLECTION_NAME.update(SELECTIOIN_CRITERIA,UPDATED_DATA)

C Modeling and Simulation
No ratings yet
C Modeling and Simulation
709 pages
IDS Unit3
No ratings yet
IDS Unit3
19 pages
Hadoop
No ratings yet
Hadoop
71 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
The CAP Theorem Overview
No ratings yet
The CAP Theorem Overview
16 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Big Data
No ratings yet
Big Data
23 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
BDA Practical
No ratings yet
BDA Practical
18 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
02 Hadoop
No ratings yet
02 Hadoop
117 pages
1.mrplab Intro
No ratings yet
1.mrplab Intro
18 pages
Week 14
No ratings yet
Week 14
33 pages
IDS Unit3
No ratings yet
IDS Unit3
16 pages
Bda Lab Manual Symca .Docx-1
No ratings yet
Bda Lab Manual Symca .Docx-1
18 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
Types of Operating System (OS)
No ratings yet
Types of Operating System (OS)
4 pages
Prachi 20CS111 BDALab File
No ratings yet
Prachi 20CS111 BDALab File
20 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
HADOOP
No ratings yet
HADOOP
19 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Hadoop Presentation
No ratings yet
Hadoop Presentation
19 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
29 pages
Unit 4
No ratings yet
Unit 4
14 pages
Bda Unit-4 Notes
No ratings yet
Bda Unit-4 Notes
15 pages
Cloud Unit 5
No ratings yet
Cloud Unit 5
52 pages
Business Process Management
100% (1)
Business Process Management
28 pages
BD Unit-02
No ratings yet
BD Unit-02
16 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
DC Hadoop
No ratings yet
DC Hadoop
48 pages
Big Data
No ratings yet
Big Data
67 pages
Bda - Unit 3
No ratings yet
Bda - Unit 3
29 pages
U-3 Big Data
No ratings yet
U-3 Big Data
23 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Unit V Programming Model
No ratings yet
Unit V Programming Model
53 pages
Unit-2 (HADOOP)
No ratings yet
Unit-2 (HADOOP)
20 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
College Algebra Midterm Exam by Prof. J. Manglib
100% (1)
College Algebra Midterm Exam by Prof. J. Manglib
6 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
PC Build Checklist
No ratings yet
PC Build Checklist
5 pages
Bda Unit 2
No ratings yet
Bda Unit 2
21 pages
Budgetary Control
No ratings yet
Budgetary Control
166 pages
Reporting Format Guide Version 2.0
No ratings yet
Reporting Format Guide Version 2.0
193 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Big Data Unit 4 Own
No ratings yet
Big Data Unit 4 Own
18 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
BDA-Unit 4
No ratings yet
BDA-Unit 4
20 pages
Hadoop Notes
No ratings yet
Hadoop Notes
8 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Dhan Singh Big Data File - 3
No ratings yet
Dhan Singh Big Data File - 3
1 page
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Unit-4-Unit-4-Bda EDIT
No ratings yet
Unit-4-Unit-4-Bda EDIT
16 pages
Iii Sem Dec 2014
No ratings yet
Iii Sem Dec 2014
342 pages
Raid Controller Via Vt6421
No ratings yet
Raid Controller Via Vt6421
42 pages
BRN 4 1 EnergyDataManagement
No ratings yet
BRN 4 1 EnergyDataManagement
20 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
Report On Building A Custom Payment Gateway System 2023
No ratings yet
Report On Building A Custom Payment Gateway System 2023
13 pages
Technology Needs Assessment
100% (2)
Technology Needs Assessment
7 pages
System Requirements
0% (1)
System Requirements
3 pages
SLM Implementer - Installation
No ratings yet
SLM Implementer - Installation
26 pages
c02 Solidworks 2003
No ratings yet
c02 Solidworks 2003
60 pages
Ilearn Usage Guideline1
No ratings yet
Ilearn Usage Guideline1
56 pages
OOAD Lecture 4
No ratings yet
OOAD Lecture 4
14 pages
Tutorial Crack Wep Encryption
No ratings yet
Tutorial Crack Wep Encryption
6 pages
Finacle
No ratings yet
Finacle
4 pages
T Sanchez Resume Weebly 2
No ratings yet
T Sanchez Resume Weebly 2
1 page
3 Processes
No ratings yet
3 Processes
53 pages
Asif Rehman CV
0% (1)
Asif Rehman CV
3 pages
VCT MY-SG Challengers Rulebook
No ratings yet
VCT MY-SG Challengers Rulebook
25 pages
Benchmarking Excel Format
No ratings yet
Benchmarking Excel Format
5 pages
IA1 2023 Step by Step Guide Proposed Solution-1
No ratings yet
IA1 2023 Step by Step Guide Proposed Solution-1
18 pages
Unit-1 - Data Structures Using C
No ratings yet
Unit-1 - Data Structures Using C
30 pages
Linux Users and Permissions
No ratings yet
Linux Users and Permissions
8 pages
Manasa Prakash ComputerSci Intern PDF
No ratings yet
Manasa Prakash ComputerSci Intern PDF
1 page
Assignment2 Operators
No ratings yet
Assignment2 Operators
3 pages
Edit Code - EDA Playground
No ratings yet
Edit Code - EDA Playground
1 page
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet