0% found this document useful (0 votes)

23 views22 pages

Lab Manual Big Data

1. The document outlines a list of 10 experiments related to Hadoop, MongoDB, and network graph analysis using Python libraries. 2. The experiments include setting up a single node Hadoop cluster in Windows, performing file operations in HDFS, inserting and querying data in MongoDB collections, analyzing social networks using NetworkX in Python, and measuring centrality in graphs. 3. Key tasks involve explaining Hadoop architecture and ecosystem, configuring Hadoop in Windows, adding/retrieving/deleting files in HDFS, creating MongoDB collections and performing CRUD operations, designing graphs using NetworkX, and calculating betweenness centrality.

Uploaded by

Rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views22 pages

Lab Manual Big Data

Uploaded by

Rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

List of Experiments:

1. To draw and explain Hadoop architecture and ecosystem with the help of a case
study.
2. Perform setting up and installing single node Hadoop in a Windows
environment.
3. To implement the following file management tasks in Hadoop System (HDFS):
Adding files and directories, retrieving files, Deleting files
4. Create a database ‘STD’ and make a collection (e.g. "student" with fields 'No.,
Stu_Name, Enroll., Branch, Contact, e-mail, Score') using MongoDB. Perform
various operations in the following experiments.
5. Insert multiple records (at least 10) into the created student collection.
6. Execute the following queries on the collection created.
a. Display data in proper format.
b. Update the contact information of a specific student.
c. Add a new field remark to the document with the name 'REM'.
d. Add a new field as no 11, stu_name XYZ, Enrolll 00101, branch VB, e-mail
[email protected] Contact 098675345 without using insert statement.
7. Create an employee table in monogdb with 4 departments and 25 employees
equally divided along with one manager. The following fields should be added;
Employee_ID, Dept_ID, First_Name, Last_Name, Salary (Range between 20K-
60K). Now Run the following queries a. Find all the employees of a particular
department where salary lies < 40K.
b. Find the highest salary for each department and fetch the name of such
employees.
c. Find all the employees who are on a lesser salary than 30k; increase their salary
by 10% and display the results.
8. To design and implement a social network graph of 50 nodes and edges
between nodes using networkx library in Python.
9. Design and plot an asymmetric social network (socio graph) of 5 nodes (A,
B, C, D, and E) such that A is directed to B, B is directed to D, D is directed to A,
and D is directed to C.
10. Consider the above scenario (No. 09) and plot a weighted asymmetric graph,
the weight range is between 20 to 50.

pg. 1
11. Implement betweenness measure between nodes across the social network.
(Assume the social network of 10 nodes)
1. To draw and explain Hadoop architecture and ecosystem with the help of a case
study.

Creating a visual representation of Hadoop architecture and explaining the Hadoop ecosystem
can be quite complex, but I'll provide a simplified textual explanation of Hadoop's architecture
and ecosystem, followed by a hypothetical case study.

Hadoop Architecture:
Hadoop is designed to process and store large volumes of data in a distributed and fault-tolerant
manner. Its core components include:

1. HDFS (Hadoop Distributed File System): HDFS is the storage component of Hadoop. It
divides data into blocks (typically 128MB or 256MB each) and stores multiple copies of
these blocks across different nodes in a cluster for redundancy. HDFS ensures data
reliability and fault tolerance.

2. YARN (Yet Another Resource Negotiator): YARN is Hadoop's resource management and
job scheduling system. It manages resources and schedules tasks for data processing.

3. MapReduce: MapReduce is a programming model and processing engine for distributed

data processing. It consists of Map tasks (data processing) and Reduce tasks (aggregation
and summarization).

4. Common Utilities: This includes libraries and utilities used by various Hadoop
components.

5. Hadoop Common: This contains the shared utilities, libraries, and APIs needed by
Hadoop modules.

pg. 2
Hadoop Ecosystem:
Hadoop's ecosystem consists of various tools and frameworks that extend Hadoop's capabilities,
making it suitable for a wide range of data processing tasks. Some key components of the
Hadoop ecosystem include:

1. Hive: Hive is a data warehousing and SQL-like query language for Hadoop. It allows
users to query and analyze data using HiveQL.
2. Pig: Pig is a platform for analyzing large data sets. It provides a high-level scripting
language, Pig Latin, for data analysis.
3. HBase: HBase is a NoSQL database that provides real-time read and write access to large
datasets. It's suitable for applications requiring low-latency data access.
4. Sqoop: Sqoop is used for transferring data between Hadoop and relational databases. It
simplifies the data import/export process.
5. Flume: Flume is a service for collecting, aggregating, and moving large amounts of log
data to HDFS. It's commonly used for ingesting data from various sources.
6. Oozie: Oozie is a workflow scheduler for Hadoop jobs. It allows you to define, schedule,
and manage data workflows in Hadoop.
7. ZooKeeper: ZooKeeper is a distributed coordination service used for maintaining
configuration information, naming, providing distributed synchronization, and providing
group services.

Hadoop Case Study: Company:

XYZ Retail
Challenges: XYZ Retail is a large retail company with a wealth of data from its online and
offline stores. They needed a way to analyze customer behavior, optimize inventory
management, and gain insights into sales trends. The existing systems were struggling to handle
the volume and variety of data.

pg. 3
Hadoop Solution: XYZ Retail decided to implement Hadoop as a solution to their data
challenges. Here's how they used Hadoop:

1. Data Ingestion: They used Apache Flume to collect data from various sources, including
web logs, point-of-sale systems, and social media.

2. Data Storage: Data was stored in HDFS, which provided a reliable and scalable storage
solution.

3. Data Processing: They used Hadoop MapReduce and Hive to process and analyze the
data. MapReduce helped extract and transform data, while Hive allowed analysts to run
SQL-like queries.

4. Real-time Data: To handle real-time data, they used HBase, which provided low-latency
access to data.

5. Data Integration: Sqoop was used to move data between Hadoop and their existing
relational databases.

6. Workflow Automation: Oozie was employed to schedule and manage data workflows,
ensuring that jobs ran at the right time and in the correct sequence.

7. Data Visualization: For data visualization and reporting, they integrated Hadoop with a
BI tool like Tableau or Power BI.

Results: XYZ Retail was able to gain valuable insights into customer behavior, optimize
inventory management, and make data-driven decisions. Their data processing became more
efficient, and they were able to handle both batch and real-time data effectively.
This hypothetical case study illustrates how a company like XYZ Retail can leverage the Hadoop
ecosystem to address data challenges and drive business improvements.

pg. 4
2. Perform setting up and installing single node Hadoop in a Windows
environment.

Prerequisites:
Before setting up and installing single node Hadoop in a Windows environment, ensure
you have the following prerequisites:
 Java 8: Download and install Java 8 Development Kit (JDK) from
https://www.oracle.com/java/technologies/javase/javase8-archive-
downloads.html.
 7-Zip: Download and install 7-Zip, a file archiver, from https://www.7-zip.org/a/.
Steps:
1. Download Hadoop: Download the latest stable version of Hadoop from the Apache Hadoop
website: https://hadoop.apache.org/releases.html.
2. Extract Hadoop: Extract the downloaded Hadoop archive to a suitable location, for example,
C:\hadoop.
3. Configure Environment Variables:
I. Open System Properties by searching for it in the Start menu.

II. Click on Advanced system settings.

III. In the Advanced tab, click the Environment Variables button.

IV. Under System Variables, select Path and click Edit.

V. Add the following paths to the Variable value field, separated by semicolons:
 C:\hadoop\bin
 C:\Java\jdk1.8.0_261\bin
VI. Click OK to save the changes.
4. Configure Hadoop Configuration Files:
1. Open the following configuration files in a text editor:
 C:\hadoop\etc\core-site.xml
 C:\hadoop\etc\hdfs-site.xml
 C:\hadoop\etc\mapred-site.xml
 C:\hadoop\etc\yarn-site.xml
2. Edit the configuration properties as needed. For a single-node cluster, you can leave
the default settings unchanged.
5. Format the NameNode:
1. Open a command prompt and navigate to the Hadoop bin directory:
 cd C:\hadoop\bin
2. Execute the following command to format the NameNode:
 hadoop namenode -format

pg. 5
6. Start the Hadoop Cluster:
1. Execute the following commands to start the Hadoop cluster:
 start hdfs
 start yarn
7. Verify the Hadoop Cluster:
1. Execute the following command to verify the status of the Hadoop cluster:
 jps
2. You should see the following processes running:
 Jps output:
 NameNode
 ResourceManager
 DataNode
Additional Notes:
 To stop the Hadoop cluster, execute the following commands:
 stop hdfs
 stop yarn
 To view Hadoop logs, navigate to the Hadoop logs directory:
 C:\hadoop\logs

3. To implement the following file management tasks in Hadoop System (HDFS):

Adding files and directories, retrieving files, Deleting files
pg. 6
here are the commands to implement the following file management tasks in Hadoop Distributed
File System (HDFS):

Adding files and directories:

To add a file to HDFS, use the hadoop fs -put command. For example, to add the file myfile.txt
to the directory /user/hadoop/data, use the following command:

 hadoop fs -put myfile.txt /user/hadoop/data

To add a directory to HDFS, use the hadoop fs -mkdir command. For example, to create the
directory /user/hadoop/newdir, use the following command:

 hadoop fs -mkdir /user/hadoop/newdir

Retrieving files:

To retrieve a file from HDFS, use the hadoop fs -get command. For example, to retrieve the
file /user/hadoop/data/myfile.txt to the local filesystem, use the following command:

 hadoop fs -get /user/hadoop/data/myfile.txt myfile.txt

Deleting files:

To delete a file from HDFS, use the hadoop fs -rm command. For example, to delete the file
/user/hadoop/data/myfile.txt, use the following command:

 hadoop fs -rm /user/hadoop/data/myfile.txt

To delete a directory from HDFS, use the hadoop fs -rmr command. For example, to delete the
directory /user/hadoop/newdir, use the following command:

 hadoop fs -rmr /user/hadoop/newdir

4. Create a database ‘STD’ and make a collection (e.g. "student" with fields 'No.,
Stu_Name, Enroll., Branch, Contact, e-mail, Score') using MongoDB. Perform
various operations in the following experiments.
pg. 7
To create a database named "STD" and a collection named "student" with the specified fields in
MongoDB, you can follow these steps. MongoDB is a NoSQL database, and you can interact
with it using a MongoDB client or command-line tools. Below, I'll provide instructions for
creating the database and collection using the MongoDB shell, which is a command-line
interface for MongoDB.

Launch MongoDB Shell:

● Make sure you have MongoDB installed and the MongoDB server running.
● Open a terminal or command prompt.
● Start the MongoDB shell by running the mongo command.
Switch to the 'STD' Database:
● To create a database, you first need to switch to it. In this case, you want to create
a database called "STD."
● Run the following command in the MongoDB shell:

Shell use STD

reating DataBase
use STD;
// Creating Collection
STD.createCollection("Student")
Here are the steps to create a database 'STD' and make a collection (e.g. "student" with fields
'No., Stu_Name, Enroll., Branch, Contact, e-mail, Score') using MongoDB. Perform various
operations in the following experiments:

1. Create a database 'STD' and make a collection "student"

use STD
db.createCollection("student")

db.student.insertOne({
"No": 1,
"Stu_Name": “rahul shrivastava",
"Enroll": "0827CS201194",
"Branch": "Computer Science",
pg. 8
"Contact": "1234567890",
"e-mail": "[email protected]",
"Score": 95
})

reating DataBase
use STD;
// Creating Collection
STD.createCollection("Student")
reating DataBase
use STD;
// Creating Collection
STD.createCollection("St

pg. 9
5. Insert multiple records (at least 10) into the created student collection.

db.student.insertMany([
{
"No": 1,
"Stu_Name": “rahul shrivastava",
"Enroll": "0827CS201194",
"Branch": "Computer Science",
"Contact": "1234567890",
"e-mail": "[email protected]",
"Score": 95
},
{
"No": 2,
"Stu_Name": "Akshay Keswani",
"Enroll": "0827CS201022",
"e-mail":
"[email protected]",
"Score": 88
},
{
"No": 3,
"Stu_Name": "Alokit Sharma",
"Enroll": "0827CS201023",
"Branch": "Computer Science",
"Contact": "1234567891",
"e-mail": "[email protected]",
"Score": 75
},
{
"No": 4,
"Stu_Name": "Aditya Sharma",
"Enroll": "0827CS201016",
"Branch": "Computer Science",
"Contact": "1234567891",
"e-mail":
"[email protected]",

pg. 10
"Score": 92 },
{
"No": 5,
"Stu_Name": "Akshat Singh Gour",
"Enroll": "0827CS201020",
"Branch": "Computer Science",
"Contact": "1234567891",
"e-mail": "[email protected]",
"Score": 80
},
{
"No": 6,
"Stu_Name": "Aayush Gupta",
"Enroll": "0827CS201006",
"e-mail": "[email protected]",
"Score": 87
},
{
"No": 7,
"Stu_Name": "Amit Kumar Yadav",
"Enroll": "0827CS201031",
"Branch": "Computer Science",
"Contact": "1234567891",
"e-mail": "[email protected]",
"Score": 78
},
{
"No": 8,
"Stu_Name": "Aryan Tapkire",
"Enroll": "0827CS201044",
"Branch": "Computer Science",
"Contact": "1234567891",
"e-mail": "[email protected]",
"Score": 91
},

pg. 11
{
"No": 9,
"Stu_Name": "Devesh Sharma",
"Enroll": "0827CS201068",
"Branch": "Computer Science",
"Contact": "1234567891",
"e-mail": "[email protected]",
"Score": 85
},
{
"No": 10,
"Stu_Name": "Asit Joshi",
"Enroll": "0827CS201042",
"Contact": "1234567891",
"e-mail": "[email protected]",
"Score": 89
}
])

pg. 12
6. Execute the following queries on the collection created.
a. Display data in proper format.
b. Update the contact information of a specific student.
c. Add a new field remark to the document with the name 'REM'.
d. Add a new field as no 11, stu_name XYZ, Enroll 00101, branch VB, e-mail
[email protected] Contact 098675345 without using insert statement.

a. Display data in proper format:

 db.student.find().pretty()
This query will retrieve all documents from the 'student' collection and display them in a
formatted manner.

b. Update the contact information of a specific student:

 db.student.updateOne(
{
"Stu_Name": "Bhumi Pandey"
},
{
$set: { "Contact": "9999999999" }
})
This query will update the contact information for the student named 'Bhumi Pandey'. It uses the
$set operator to modify the value of the 'Contact' field to '9999999999'.

c. Add a new field remark to the document with the name 'REM':

 db.student.updateOne(
{ "Stu_Name": "REM" },
{$set: { "remark": "This is a remark" }}
)
This query will add a new field named 'remark' to the document for the student named 'REM'. It
uses the $set operator to set the value of the 'remark' field to 'This is a remark'.

pg. 13
d. Add a new field as no 11, stu_name XYZ, Enroll 00101, branch VB, e-mail xyz@xyz contact
098675345 without using insert statement:

 db.student.updateMany(
{}, {$push: {
"No.": 11,
"Stu_Name": "XYZ",
"Enroll.": "00101",
"Branch": "VB",
"e-mail": "xyz@xyz",
"Contact": "098675345"}
})

This query will add new fields for student number, name, enrollment number, branch, email
address, and contact number to all documents in the 'student' collection. It uses the $push operator
to add these fields to an array within each document.

pg. 14
7. Create an employee table in monogdb with 4 departments and 25 employees
equally
divided along with one manager. The following fields should be added;
Employee_ID, Dept_ID, First_Name, Last_Name, Salary (Range between
20K-60K). Now Run the following queries
a. Find all the employees of a particular department where salary lies < 40K.
b. Find the highest salary for each department and fetch the name of such
employees.
c. Find all the employees who are on a lesser salary than 30k; increase their salary
by 10% and display the results.

1. Create the "employee" Collection:

 db.employee.insertMany([
{ Employee_ID: 1,
Dept_ID: 1,
First_Name: "John",
Last_Name: "Doe",
Salary: 50000 },
{Employee_ID: 2,
Dept_ID: 1,
First_Name: "Jane",
Last_Name: "Smith",
Salary: 48000 },
{ Employee_ID: 3,
Dept_ID: 1,
First_Name: "Bob",
Last_Name: "Johnson",
Salary: 35000},
{ Employee_ID: 26,
Dept_ID: 4,
First_Name: "Manager",
Last_Name: "Smith",
Salary:60000}
])

pg. 15
2. Run the Specified Queries:
a. Find all employees of a particular department where salary is less than 40K:

For example, to find employees in Department 1 with a salary less than 40K:
 db.employee.find({ Dept_ID: 1, Salary: { $lt: 40000 } })

b. Find the highest salary for each department and fetch the names of such employees:

To find the highest salary for each department:

db.employee.aggregate([
{ $group:
{_
id: "$Dept_ID",
maxSalary:
{ $max: "$Salary" }
}
},
])
To fetch the names of employees with the highest salary in each department, you'll need to use a
more complex aggregation query.

c. Find all employees who are on a salary less than 30K, increase their salary by 10%, and
display the results:

To find employees with a salary less than 30K and increase their salary by 10%:

db.employee.find(
{
Salary: {
$lt: 30000
}
}).forEach(function(employee) {
employee.Salary *= 1.10; // Increase the salary by 10% db.employee.save(employee);
})
This code will find employees with a salary less than 30K and update their salary to 10% higher.

pg. 16
8. To design and implement a social network graph of 50 nodes and edges between
nodes using network library in Python.

This code will create a graph with 50 nodes and 50 edges. The nodes will be numbered from 0 to
49, and the edges will be between randomly selected pairs of nodes. The graph will be drawn to
the screen using the NetworkX draw function.

Here is an example of the output of the code:

pg. 17
Figure : Social network graph of 50 nodes and edges
As you can see, the code successfully creates a social network graph of 50 nodes and edges. The
graph is drawn to the screen using the NetworkX draw function.

pg. 18
9. Design and plot an asymmetric social network (socio graph) of 5 nodes (A, B,
C, D, and E) such that A is directed to B, B is directed to D, D is directed to A, and
D is directed to C.

Here's the code for the asymmetric social network you described:

This code will create a directed graph with 5 nodes and 4 directed edges. The nodes will be
labeled A, B, C, D, and E, and the directed edges will be from A to B, B to D, D to A, and D to C.
The graph will be drawn to the screen using the NetworkX draw function.

Here is an example of the output of the code:

pg. 19
In this sociograph:
 Node A is connected to B, forming a directed edge from A to B.
 Node B is connected to D, forming a directed edge from B to D.
 Node D is connected to A and C, forming directed edges from D to A and from D to C.
 Node E is not connected to any of the other nodes in the network.
This visual representation should help you understand the asymmetric social network with the
specified connections.

pg. 20
10. Consider the above scenario (No. 09) and plot a weighted asymmetric graph,
the weight range is between 20 to 50.
To create a weighted asymmetric graph with the specified connections and edge weights
between 20 and 50, you can use NetworkX in Python. Here's how you can design and plot the
weighted graph:

This code creates a directed graph with 5 nodes (A, B, C, D, and E) and the specified directed
edges with random edge weights between 20 and 50. The resulting plot will visualize the
weighted asymmetric social network graph with labeled edge weights.

pg. 21
11. Implement betweenness measure between nodes across the social network.
(Assume the social network of 10 nodes)

To calculate the betweenness centrality measure between nodes in a social network using Python,
you can utilize the NetworkX library. Below is an example of how to calculate the betweenness
centrality between nodes in a social network with 10 nodes. Please note that the example network
has fewer nodes than requested, but you can easily adapt it for 10 nodes.

In this code, we create a small example social network with 10 nodes and edges. The
nx.betweenness_centrality function is used to calculate the betweenness centrality for each node.
The result is printed, showing the betweenness centrality values for each node.

In this output, each node is listed, and its corresponding betweenness centrality value is
displayed. The betweenness centrality values provide information about the importance of each
node in the network regarding the flow of information or interactions. Nodes with higher
betweenness centrality values can be considered more influential in connecting different parts of
the network.

pg. 22

BDA Experiment1
No ratings yet
BDA Experiment1
8 pages
Bda Notes
No ratings yet
Bda Notes
110 pages
Hadoop Components
No ratings yet
Hadoop Components
5 pages
BDA Module2
No ratings yet
BDA Module2
43 pages
Session3 - 4-Bigdata Tools and Movie Use Case
No ratings yet
Session3 - 4-Bigdata Tools and Movie Use Case
79 pages
Bda Ese
No ratings yet
Bda Ese
21 pages
Big Data Lab File
No ratings yet
Big Data Lab File
49 pages
BDA Unit 2 Q&A
No ratings yet
BDA Unit 2 Q&A
14 pages
Unit 4 Endsem PYQs
No ratings yet
Unit 4 Endsem PYQs
24 pages
Module 2
No ratings yet
Module 2
23 pages
DS Unit 4.1
No ratings yet
DS Unit 4.1
14 pages
Unit # 2
No ratings yet
Unit # 2
23 pages
Bda QB Soln
No ratings yet
Bda QB Soln
22 pages
Unit Iii
No ratings yet
Unit Iii
20 pages
Unit Ii
No ratings yet
Unit Ii
30 pages
Getting Started With HDP Sandbox
No ratings yet
Getting Started With HDP Sandbox
107 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
BIG Data Master
No ratings yet
BIG Data Master
24 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
44 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
Questions and Answers On Stalin Russia
No ratings yet
Questions and Answers On Stalin Russia
52 pages
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
No ratings yet
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
30 pages
Hadoop Ecosystem
100% (2)
Hadoop Ecosystem
33 pages
Act2 - March7 - 6E - BDA - SEC
No ratings yet
Act2 - March7 - 6E - BDA - SEC
8 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
DC Hadoop
No ratings yet
DC Hadoop
48 pages
BDA Unit 3
No ratings yet
BDA Unit 3
7 pages
Hadoop
No ratings yet
Hadoop
5 pages
HADOOP ECOSSYTEM, COMPONENTS, Loading, Getting Data From Hadoop
No ratings yet
HADOOP ECOSSYTEM, COMPONENTS, Loading, Getting Data From Hadoop
10 pages
Bda 2
No ratings yet
Bda 2
25 pages
18 Module 2
No ratings yet
18 Module 2
9 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Sdcbdasparkweek1 1
No ratings yet
Sdcbdasparkweek1 1
9 pages
BIG Data - Unit - 2
No ratings yet
BIG Data - Unit - 2
24 pages
Unit 3
No ratings yet
Unit 3
12 pages
Bda A1
No ratings yet
Bda A1
5 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
Bda Module 2
No ratings yet
Bda Module 2
12 pages
Lost Boy
No ratings yet
Lost Boy
8 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
What Is The Hadoop Ecosystem?
No ratings yet
What Is The Hadoop Ecosystem?
4 pages
CT2 BDTT
No ratings yet
CT2 BDTT
6 pages
Unit IV Basics - of - Hadoop
No ratings yet
Unit IV Basics - of - Hadoop
20 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
Apache Hadoop
No ratings yet
Apache Hadoop
11 pages
Unit 2
No ratings yet
Unit 2
9 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
BD - Unit - II - Hadoop Frameworks and HDFS
No ratings yet
BD - Unit - II - Hadoop Frameworks and HDFS
37 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Unit 2 Big Data Notes
No ratings yet
Unit 2 Big Data Notes
21 pages
Bda Lab 1
No ratings yet
Bda Lab 1
9 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
Unit 2 - Hadoop PDF
No ratings yet
Unit 2 - Hadoop PDF
7 pages
Hadoop
No ratings yet
Hadoop
11 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
6 pages
Panel Hospitals
No ratings yet
Panel Hospitals
3 pages
English Quarter 1 Week 7
No ratings yet
English Quarter 1 Week 7
30 pages
Blake and Mouton Leadership Grid
No ratings yet
Blake and Mouton Leadership Grid
2 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
IT-Technical Recruiter
No ratings yet
IT-Technical Recruiter
3 pages
De Minimis Benefits
0% (1)
De Minimis Benefits
3 pages
Feminism The Power of Being Heard
No ratings yet
Feminism The Power of Being Heard
5 pages
How To Manage Your Time Like A CEO
No ratings yet
How To Manage Your Time Like A CEO
47 pages
CHEM2 Long Quiz 2
No ratings yet
CHEM2 Long Quiz 2
4 pages
Inductive Grammar Activity (Unit 6, Page 64)
No ratings yet
Inductive Grammar Activity (Unit 6, Page 64)
2 pages
Sokolova2019 PDF
No ratings yet
Sokolova2019 PDF
9 pages
Practice Test 1: Gen Gaze Gaudy Gate Obtain Obstacle Obstinate Obsolete
No ratings yet
Practice Test 1: Gen Gaze Gaudy Gate Obtain Obstacle Obstinate Obsolete
6 pages
How To Make A Survey in Thesis
100% (1)
How To Make A Survey in Thesis
6 pages
Elements of Poetry
No ratings yet
Elements of Poetry
19 pages
Acsm Get Certified Guide: Be The Gold Standard
No ratings yet
Acsm Get Certified Guide: Be The Gold Standard
16 pages
Introduction: Royal Kingdom of Maharlikan
100% (1)
Introduction: Royal Kingdom of Maharlikan
2 pages
Model Test 3 Lexico-Grammar
No ratings yet
Model Test 3 Lexico-Grammar
4 pages
Use of Motivation in The Teaching-Learning Process Intrinsic and Extrinsic Motivation
No ratings yet
Use of Motivation in The Teaching-Learning Process Intrinsic and Extrinsic Motivation
14 pages
Polakof Et Al, 2012, Review Glucose
No ratings yet
Polakof Et Al, 2012, Review Glucose
31 pages
Unreleased Quorum Based Computations Paper
No ratings yet
Unreleased Quorum Based Computations Paper
19 pages
8888 Uprising - Wikipedia, The Free Encyclopedia
No ratings yet
8888 Uprising - Wikipedia, The Free Encyclopedia
12 pages
Simboluri Flowchart
No ratings yet
Simboluri Flowchart
6 pages
Animal Research and Human Medicine Booklet
No ratings yet
Animal Research and Human Medicine Booklet
24 pages
Case History Excess and Deficiency in Thrush by Cheng Hao Zhou
No ratings yet
Case History Excess and Deficiency in Thrush by Cheng Hao Zhou
2 pages
Doctor Allama Muhamamd Iqbal
No ratings yet
Doctor Allama Muhamamd Iqbal
7 pages
Course-502: Pedagogic Processes in Elementary Schools
No ratings yet
Course-502: Pedagogic Processes in Elementary Schools
18 pages
Understanding The Self
No ratings yet
Understanding The Self
7 pages
CBCP. Pastoral Letter: "The Truth Will Set You Free" (John 8:32), 25 February 2022. Signed by Pablo Virgilio S. David, DD. Manila: CBCP, 2022
No ratings yet
CBCP. Pastoral Letter: "The Truth Will Set You Free" (John 8:32), 25 February 2022. Signed by Pablo Virgilio S. David, DD. Manila: CBCP, 2022
3 pages
Video Game Treatment: PAU Practice Examination (Andalucía)
No ratings yet
Video Game Treatment: PAU Practice Examination (Andalucía)
2 pages
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet

Lab Manual Big Data

Uploaded by

Lab Manual Big Data

Uploaded by

List of Experiments:

3. MapReduce: MapReduce is a programming model and processing engine for distributed

Hadoop Case Study: Company:

II. Click on Advanced system settings.

III. In the Advanced tab, click the Environment Variables button.

IV. Under System Variables, select Path and click Edit.

3. To implement the following file management tasks in Hadoop System (HDFS):

Adding files and directories:

 hadoop fs -put myfile.txt /user/hadoop/data

 hadoop fs -mkdir /user/hadoop/newdir

 hadoop fs -get /user/hadoop/data/myfile.txt myfile.txt

 hadoop fs -rm /user/hadoop/data/myfile.txt

 hadoop fs -rmr /user/hadoop/newdir

Launch MongoDB Shell:

Shell use STD

1. Create a database 'STD' and make a collection "student"

a. Display data in proper format:

b. Update the contact information of a specific student:

1. Create the "employee" Collection:

To find the highest salary for each department:

Here is an example of the output of the code:

Here is an example of the output of the code:

You might also like