0% found this document useful (0 votes)

123 views83 pages

BDA Lab ManuaL

The Big Data Analytics Lab Manual for III-B Tech - I Semester provides students with knowledge of Big Data Analytics principles and practical skills using tools like Hadoop, R, and Excel. It includes a list of experiments such as implementing map-reduce jobs, social media analysis using Cassandra, and performing data analytics with R. The manual also outlines installation procedures, basic shell commands for Hadoop, and references for further reading.

Uploaded by

Krishna Koushik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views83 pages

BDA Lab ManuaL

Uploaded by

Krishna Koushik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

BIG DATA ANALYTICS LAB MANUAL

III-B Tech – I Semester [Branch: CSE-DS]

JNTUH-SYLLABUS
III Year B.Tech.CSE(DS). II – Sem LTPC
Course Code: 0 0 2 1
BIG DATA ANALYTICTS LAB MANUAL

Course Objectives
1. The purpose of this course is to provide the students with the knowledge of Big
data Analytics principles and techniques.
2. This course is also designed to give an exposure of the frontiers of Big data Analytics

Course Outcomes
1. Use Excel as an Analytical tool and visualization tool.
2. Ability to program using HADOOP and Map reduce.
3. Ability to perform data analytics using ML in R.
4. Use cassandra to perform social media analytics.

List of Experiments
1. Implement a simple map-reduce job that builds an inverted index on the set of input
documents (Hadoop)
2. Process big data in HBase
3. Store and retrieve data in Pig
4. Perform Social media analysis using cassandra
5. Buyer event analytics using Cassandra on suitable product sales data.
6. Using Power Pivot (Excel) Perform the following on any dataset
a) Big Data Analytics
b) Big Data Charting
7. Use R-Project to carry out statistical analysis of big data
8. Use R-Project for data visualization of social media data
TEXT BOOKS:
1. Big Data Analytics, Seema Acharya, Subhashini Chellappan, Wiley 2015.
2. Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for
Today’s Business, Michael Minelli, Michehe Chambers, 1st Edition, Ambiga Dhiraj, Wiely
CIO Series, 2013.
3. Hadoop: The Definitive Guide, Tom White, 3rd Edition, O‟Reilly Media, 2012.
4. Big Data Analytics: Disruptive Technologies for Changing the Game, Arvind Sathi,
1st Edition,
IBM Corporation, 2012.
REFERENCES:
1. Big Data and Business Analytics, Jay Liebowitz, Auerbach Publications, CRC press (2013).
2. Using R to Unlock the Value of Big Data: Big Data Analytics with Oracle R Enterprise and
Oracle R Connector for Hadoop, Tom Plunkett, Mark Hornick, McGraw-Hill/Osborne Media
(2013), Oracle press.
3. Professional Hadoop Solutions, Boris lublinsky, Kevin t. Smith, Alexey Yakubovich,
Wiley, ISBN: 9788126551071, 2015.
4. Understanding Big data, Chris Eaton, Dirk deroos et al., McGraw Hill, 2012.
5. Intelligent Data Analysis, Michael Berthold, David J. Hand, Springer, 2007.
6. Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams
with Advanced Analytics, Bill Franks, 1st Edition, Wiley and SAS Business Series,
2012.

INDEX

S.No Program Name Page No.

Write the Hadoop Installation Procedure on windows systems

1.
2. Write Hadoop’s Basic Shell Commands used for its Framework.
3. Implement a simple map-reduce job that builds an inverted index
on the set of input
documents (Hadoop)
4.
Process big data in HBase
5. Store and retrieve data in Pig

6.
Perform Social media analysis using Cassandra
7. Buyer event analytics using Cassandra on suitable product sales
data.
8. Using Power Pivot (Excel) Perform the following on any dataset
a) Big Data Analytics
b) Big Data Charting
9.
Use R-Project to carry out statistical analysis of big data
10. Use R-Project for data visualization of social media data
Experiment-01
Implement Hadoop step-by-step

Preparations
A. Make sure that you are using Windows 10 and are logged in as admin.

B. Download Java jdk1.8.0 from https://www.oracle.com/technetwork/java/javase/downloads/jdk8-

downloads-2133151.html

C. Accept Licence Agreement [1] and download the exe-file [2]

D. Download Hadoop 2.8.0 from http://archive.apache.org/dist/hadoop/core//hadoop-

2.8.0/hadoop-2.8.0.tar.gz

E. Download Notepad++ from

https://notepad-plus-plus.org (current version for Windows)
F.Navigate to C:\ [1], make a New folder [2] and name it Java [3]

G. Run the Java installation file jdk-8u191-windows-x64. Install direct in the folder C:\Java,
or move the items from the folder jdk1.8.0 to the folder C:\Java. It should look like this:

H. Install Hadoop 2.8.0 right under C:\ like this:

If Windows Defender Firewall is activated on your PC, then you must at least open the two
ports 8088 and 50070. If the firewall is deactivated you can skip this step. Else, go to
https://www.windowscentral.com/how-open-port-windows-firewall
follow the instructions and opens the two ports.

1.1 Setup Environment variables

A. Use the search-function to find the environment variables.

In System properties, click the button Environment Variables...

A new window will open with two tables and buttons. The upper table is for User
variables and the lower for System variables.
B. Make a New User variable [1]. Name it JAVA_HOME and set it to the Java bin-folder [2].
Click OK [3].

C. Make another New User variable [1]. Name it HADOOP_HOME and set it to the
hadoop-2.8.0 bin-folder [2]. Click OK [3].
D. Now add Java and Hadoop to System variables path: Go to path [1] and click edit [2]. The
editor window opens. Chose New [3] and add the address C:\Java\bin [4]. Chose New
again
[5] and add the address C:\hadoop-2.8.0\bin [6]. Click OK [7] in the editor window and OK
[8] to change the System variables.
1.1Configuration
A. Go to the file C:\Hadoop\Hadoop-2.8.0\etc\hadoop\core-site.xml [1]. Right-click on the file
and edit with Notepad++ [2].

B. In the end of the file you have two configuration tags.

<configuration>
</configuration>
Paste the following code between the two tags and save (spacing doesn’t matter):

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

It should look like this in Notepad++:

C. Rename C:\Hadoop-2.8.0\etc\hadoop\mapred-site.xml.template to mapred-site.xml
and edit this file with Notepad++. Paste the following code between the configuration
tags and save:

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

D. Under C:\Hadoop-2.8.0 create a folder named data [1] with two subfolders, “datanode”
and “namenode” [2].

E. Edit the file C:\Hadoop-2.8.0\etc\hadoop/hdfs-site.xml with Notepad++. Paste the

following code between the configuration tags and save:

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-2.8.0\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-2.8.0\data\datanode</value>
</property>

F. Edit the file C:\Hadoop-2.8.0\etc\hadoop\yarn-site.xml with Notepad++. Paste the

following code between the configuration tags and save:

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

G. Edit the file C:\Hadoop-2.8.0\etc/hadoop\hadoop-env.cmd with

Notepad++. Write @rem in front of “set JAVA_HOME=%JAVA_HOME%”.
Write set JAVA_HOME=C:\Java at the next row.

It should look like this is Notepad++:

Don’t forget to save.

Bravo, configuration done!

1.1 Replace the bin folder

Before we can start testing we must exchange a folder in Hadoop.

A. Download Hadoop Configuration.zip

https://github.com/MuhammadBilalYar/HADOOP-INSTALLATION-ON-WINDOW-10/blob
/master/Hadoop%20Configuration.zip Unzip the bin file.

B. Delete the bin file C:\Hadoop\Hadoop-2.8.0\bin [1, 2] and replace it with the new
bin-folder from Hadoop Configuration.zip. [3].
1.1Testing
A. Search for cmd [1] and open the Command Prompt [2]. Write
hdfs namenode –format [3] and push enter.

If this first test works the Command Prompt will run a lot of information. It is a good sign!

B. Now you must change directory in the Command Prompt. Write cd C:\hadoop-
2.8.0\sbin
And push enter. In the sbin folder, write start-all.cmd and push enter.

If the configuration is right, four apps will start running and it will look something like this:

C. Now open a browser and write in the address field localhost:8088 and push
enter. Can you see the little hadoop elephant? Then you have made a really
good work!
D. Last test - try to write localhost:50070 instead.

If you can see the overview you have implemented Hadoop on your PC.
Congratulations, you did it!!!
***********************
To close the running programs, run “stop-all.cmd” in tho command prompt
Experiment-02
Hadoop Shell Commands
1. DFShell
The HDFS shell is invoked by bin/hadoop dfs <args>. All the HDFS shell commands take
path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and
for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the
default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child
can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that
your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell
behave like corresponding Unix commands. Differences are described with each of the commands.
Error information is sent to stderr and the output is sent to stdout.

2. cat
Usage: hadoop dfs -cat URI [URI …]
Copies source paths to stdout. Example: