0% found this document useful (0 votes)

5 views39 pages

Bda Lab

The document outlines the installation, configuration, and operation of Hadoop 2.8.0 on Windows 10, including steps for setting up Java, configuring Hadoop files, and running MapReduce programs. It also provides R programming examples for implementing various statistical methods like linear regression, logistic regression, and decision trees, as well as data visualization techniques. Additionally, it includes code snippets for clustering and support vector machine implementations.

Uploaded by

Shekina Satheesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views39 pages

Bda Lab

Uploaded by

Shekina Satheesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 39

AIM

To Install, configure and run Hadoop and HDFS

These software’s should be prepared to install Hadoop 2.8.0 on window 10 64 bits.

1) Download Hadoop 2.8.0

(Link:
http://wwweu.apache.org/dist/hadoop/common/hadoop2.8.0/h
adoop- 2.8.0.tar.gz OR
http://archive.apache.org/dist/hadoop/core//hadoop-
2.8.0/hadoop2.8.0.tar.gz)

2) Java JDK 1.8.0.zip

(Link: http://www.oracle.com/technetwork/java/javase/downloads/jdk8downloads-
2133151.html)

Set up:

1) Check either Java 1.8.0 is already installed on your system or

not, use “Javac version" to check Java version

2) If Java is not installed on your system then first install java under
"C:\JAVA" Java setup

3) Extract files Hadoop 2.8.0.tar.gz or Hadoop-2.8.0.zip and place

under "C:\Hadoop- 2.8.0" Hadoop

4) Set the path HADOOP_HOME Environment variable on windows

10(see Step 1, 2, 3 and 4 below) hadoop

5) Set the path JAVA_HOME Environment variable on windows

10(see Step 1, 2, 3 and 4 below) java

6) Next we set the Hadoop bin directory path and JAVA bin directory path
Configuration :

a) File C:/Hadoop-2.8.0/etc/hadoop/core-site.xml, paste below xml

paragraph and save this file.
<configuration>
<property>
<name>fs.defaultFS</name><value>hdfs://localhost:9000</value>
</property>
</configuration>

b) Rename "mapred-
site.xml.template" to "mapred-site.xml" and edit this file C:/Hadoop-
2.8.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save
this file.

<name>mapreduce.framework.name</name>

</property>

</configuration>

c) Create folder "data" under "C:\Hadoop-2.8.0"

1) Create folder "datanode" under "C:\Hadoop-2.8.0\data"

2) Create folder "namenode" under "C:\Hadoop-2.8.0\data" data

d) Edit file C:\Hadoop-2.8.0/etc/hadoop/hdfs-site.xml, paste below xml

paragraph and save this file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-2.8.0\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-2.8.0\data\datanode</value>
</property>
</
configuration
>
e) Edit file C:/Hadoop-2.8.0/etc/hadoop/yarn-site.xml, paste below xml
paragraph and save this file.
<configuratio
n>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</
configuration
>
f) Edit file C:/Hadoop-2.8.0/etc/hadoop/hadoop-env.cmd by closing the
command line "JAVA_HOME=%JAVA_HOME%" instead of set
"JAVA_HOME=C:\Java" (On C:\java this is path to file jdk.18.0)
Hadoop Configuration
7) Download file Hadoop Configuration.zip (Link:
https://github.com/MuhammadBilalYar/HADOOP-INSTALLATION-
ONWINDOW- 10/blob/master/Hadoop%20Configuration.zip)
8) Delete file bin on C:\Hadoop-2.8.0\bin, replaced by file bin on file just
download (from Hadoop Configuration.zip).
9) Open cmd and typing command "hdfs namenode –format" .You will see hdfs
namenode
–format
Testing
10) Open cmd and change directory to "C:\Hadoop-2.8.0\sbin" and type
"startall.cmd" to start apache.
11) Make sure these apps are running. a) Name node b)Hadoop data
node c) YARN Resource Manager d)YARN Node Manager hadoop
nodes
12) Open: http://localhost:8088

13) Open: http://localhost:50070

AIM:

To Implement word count / frequency programs using MapReduce

Procedure:
Prepare:
1. Download MapReduceClient.jar
(Link:
https://github.com/MuhammadBilalYar/HADOOPINSTALLATI
ON-ON- WINDOW-10/blob/master/MapReduceClient.jar)
2. Download Input_file.txt
(Link:
https://github.com/MuhammadBilalYar/HADOOPINSTALLATI
ON-ON- WINDOW-10/blob/master/input_file.txt)
Place both files in "C:/"
Hadoop Operation:
1. Open cmd in Administrative mode and move to "C:/Hadoop-2.8.0/sbin" and start
cluster
Start-all.cmd

2. Create an input directory in HDFS.

hadoop fs -mkdir /input_dir

3. Copy the input text file named input_file.txt in the input directory (input_dir) of
HDFS.
hadoop fs -put C:/input_file.txt /input_dir
4. Verify input_file.txt available in HDFS input directory (input_dir).
hadoop fs -ls /input_dir/

5. Verify content of the copied file.

hadoop dfs -cat /input_dir/input_file.txt

6. Run MapReduceClient.jar and also provide input and out directories.

hadoop jar C:/MapReduceClient.jar wordcount /input_dir /output_dir

7. dVerify content for generated output file.

hadoop dfs -cat /output_dir/*

Some Other useful commands
8) To leave Safe mode

hadoop dfsadmin –safemode leave

9) To delete file from HDFS directory

hadoop fs -rm -r
/iutput_dir/input_file.txt
10) To delete directory from HDFS directory

hadoop fs -rm -r /iutput_dir

AIM:

To Implementation of an MR program that processes a weather dataset

Program :

AverageMapper.java

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapreduce.*; import java.io.IOException;

public class AverageMapper extends Mapper <LongWritable, Text, Text,

IntWritable>

public static final int MISSING = 9999;

public void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException

String line = value.toString(); String year = line.substring(15,19);

int temperature;

if (line.charAt(87)=='+')

temperature =
Integer.parseInt(line.substring(88, 92));
else

temperature = Integer.parseInt(line.substring(87, 92));

String quality = line.substring(92, 93);

if(temperature != MISSING && quality.matches("[01459]"))

context.write(new Text(year),new IntWritable(temperature));

AverageReducer.java

Import org.apache.hadoop.mapreduce.*;

import java.io.IOException;
public class AverageReducer extends Reducer <Text, IntWritable,Text,
IntWritable >

public void reduce(Text key, Iterable<IntWritable> values, Context context)

throws IOException,

InterruptedException

int max_temp = 0;

int count = 0;
for (IntWritable value : values)

max_temp += value.get();

count+=1;

context.write(key, new IntWritable(max_temp/count)); }

}

AverageDriver.java

import org.apache.hadoop.io.*import

org.apache.hadoop.fs.*;

import org.apache.hadoop.mapreduce.*;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import

org.apache.hadoop.mapreduce.lib.output.FileOutputF

ormat; public class AverageDriver

public static void main (String[] args) throws Exception

{
if (args.length != 2)

System.err.println("Please Enter the input and output

parameters"); System.exit(-1);
}

Job job = new Job();

job.setJarByClass(AverageDriver.class);

job.setJobName("Max temperature");

FileInputFormat.addInputPath(job,new

Path(args[0]));

FileOutputFormat.setOutputPath(job,new

Path (args[1]));

job.setMapperClass(AverageMapper.class);

job.setReducerClass(AverageReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true)?0:1);

}
AIM:

To write a R program to implement linear and logistic regression.

PROGRAM:

SIMPLE LINEAR REGRESSION

dataset = read.csv("data-marketing-budget-

12mo.csv", header=T, colClasses = c("numeric",

"numeric", "numeric")) head(dataset,5)

#/////Simple Regression/////

simple.fit =

lm(Sales~Spend,data=dataset)

summary(simple.fit)

OUTPUT:
****Logistic Regression ****
#selects some column from mtcars input<- mtcars [,c("am","cyl","hp","wt")]

print(head(input))

input<- mtcars [,c("am","cyl","hp","wt")]

am.data =glm(formula = am ~ cyl+hp+wt,data = input,family = binomial)

print(summary(am.data))

OUTPUT:
AIM

To implement support vector machine (SVM) to find optimum hyper plane

(Line in 2D, 3D hyper plane) which maximize the margin between two
classes.

Program library(e1071) plot(iris)

iris

plot(iris$Sepal.Length, iris$Sepal.width,

col=iris$Species) plot(iris$Petal.Length,

iris$Petal.width, col=iris$Species) s<-

sample(150,100)

col<- c("Petal.Length", "Petal.Width",

"Species") iris_train<- iris[s,col]

iris_test<- iris[-s,col]

svmfit<- svm(Species ~., data = iris_train, kernel = "linear", cost = .1,

scale = FALSE) print(svmfit)

plot(svmfit, iris_train[,col])

tuned <- tune(svm, Species~., data = iris_train, kernel = "linear", ranges=

list(cost=c(0.001,0.01,.1,.1,10,100)))

summary(tuned)

p<-predict(svmfit, iris_test[,col],

type="class") plot(p)
table(p,iris_test[,3] ) mean(p== iris_test[,3])

OUTPUT:
AIM

To implement a decision tree used to representing a decision situation in

visually and show all those factors within the analysis that are considered
relevant to the decision

PROGRAM

library(MASS) library(rpart) head(birthwt)

hist(birthwt$bwt) table(birthwt$low)

cols <- c('low', 'race', 'smoke', 'ht',

'ui') birthwt[cols] <-

lapply(birthwt[cols], as.factor)

set.seed(1)

train<- sample(1:nrow(birthwt), 0.75 * nrow(birthwt))

birthwtTree<- rpart(low ~ . - bwt, data = birthwt[train, ], method =

'class') plot(birthwtTree)

text(birthwtTree,

pretty = 0)

summary(birthwtTree)

birthwtPred<- predict(birthwtTree, birthwt[-train, ], type =

'class') table(birthwtPred, birthwt[-train, ]$low)

OUTPUT:
AIM:

To write a R program for implementing of clustering techniques.

PROGRAM:

library(datasets)

head(iris)

library(ggplot2)

ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) +

geom_point() set.seed(20)

irisCluster <- kmeans(iris[, 3:4], 3,

nstart = 20) irisCluster

table(irisCluster$cluster, iris$Species)
OUTPUT:
AIM

To implement Data visualization is to provide an efficient graphical

display for summarizing and reasoning about quantitative
information.

1. Histogram

Histogram is basically a plot that breaks the data into bins (or breaks) and
shows frequency distribution of these bins. You can change the breaks also
and see the effect it has data visualization in terms of understandability.

Note: We have used par(mfrow=c(2,5)) command to fit multiple graphs in

same page for sake of clarity( see the code below).

PROGRAM:
library(RColorBrewer)
data(VADeaths) par(mfrow=c(2,3))
hist(VADeaths,breaks=10,
col=brewer.pal(3,"Set3"),main="Set3 3 colors")
hist(VADeaths,breaks=3
,col=brewer.pal(3,"Set2"),main="Set2 3 colors")
hist(VADeaths,breaks=7,
col=brewer.pal(3,"Set1"),main="Set1 3 colors")
hist(VADeaths,,breaks= 2,
col=brewer.pal(8,"Set3"),main="Set3 8 colors")
hist(VADeaths,col=brewer.pal(8,"Greys"),main="Greys 8 colors")
hist(VADeaths,col=brewer.pal(8,"Greens"),main="Greens 8 colors")
OUTPUT:
2.1. Line Chart

Below is the line chart showing the increase in air passengers over given
time period. Line Charts are commonly preferred when we are to
analyses a trend spread over a time period. Furthermore, line plot is also
suitable to plots where we need to compare relative changes in quantities
across some variable (like time). Below is the code:

PROGRAM:

data(AirPassengers)

plot(AirPassengers,type="l")

#Simple Line Plot

2.2. Bar Chart

Bar Plots are suitable for showing comparison between cumulative totals
across several groups. Stacked Plots are used for bar plots for various
categories. Here’s the code:
PROGRAM:
data("iris")

barplot(iris$Petal.Length) #Creating simple Bar Graph

barplot(iris$Sepal.Length,col=brewer.pal(3,"Set1"))
barplot(table(iris$Species,iris$Sepal.Length),col = brewer.pal(3,"Set1"))
#Stacked Plot
OUTPUT:

3. Box Plot Box Plot shows 5 statistically significant numbers the

minimum, the 25th percentile, the median, the 75th percentile and the
maximum. It is thus useful for visualizing the spread of the data is and
deriving inferences accordingly.

PROGRAM:

data(iris)
par(mfrow=
c(2,2))
boxplot(iris$Sepal.Length,col="red")
boxplot(iris$Sepal.Length~iris$Species,col="red")
boxplot(iris$Sepal.Length~iris$Species,col=heat.colors(3))
boxplot(iris$Sepal.Length~iris$Species,col=topo.colors(3))
boxplot(iris$Petal.Length~iris$Species) #Creating Box Plot
between two variable
OUTPUT:

4.Scatter Plot (including 3D and other features) Scatter plots help in

visualizing data easily and for simple data inspection. Here’s the code for
simple scatter and multivariate scatter plot:
PROGRAM:

plot(x=iris$Petal.Length) #Simple Scatter Plot

plot(x=iris$Petal.Length,y=iris$Species) #Multivariate Scatter

Plot

OUTPUT:

5. Heat Map one of the most innovative data visualizations in R, the heat
map emphasizes color intensity to visualize relationships between multiple
variables. The result is an attractive 2D image that is easy to interpret. As a
basic example, a heat map highlights the popularity of competing items by
ranking them according to their original market launch date. It breaks it
down further by providing sales statistics and figures over the course of
time.
PROGRAM:

# simulate a dataset of 10 points x<‐rnorm(10,mean=rep(1:5,each=2),sd=0.7)

y<‐rnorm(10,mean=rep(c(1,9),each=5),sd=0.1) dataFrame<‐

data.frame(x=x,y=y) set.seed(143)

dataMatrix<‐as.matrix(dataFrame)[sample(1:10),] # convert to class

'matrix', then shuffle the rows of the matrix

heatmap(dataMatrix) # visualize hierarchical clustering via a heatmap

OUTPUT:
6. Correlogram Correlated data is best visualized through corrplot. The 2D
format is similar to a heat map, but it highlights statistics that are directly
related. Most correlograms highlight the amount of correlation between
datasets at various points in time. Comparing sales data between different
months or years is a basic example.

PROGRAM:

#data("mtcars")

corr_matrix <‐

cor(mtcars)
# with circles corrplot(corr_matrix)

# with numbers and lower corrplot(corr_matrix,method = 'number',type = "lower")

OUTPUT:

7.Area Chart Area charts express continuity between different variables or

data sets. It's akin to the traditional line chart you know from grade school
and is used in a similar fashion. Most area charts highlight trends and their
evolution over the course of time, making them highly effective when
trying to expose underlying trends whether they're positive or negative.

PROGRAM:

data("airquality")

#dataset used

airquality %>%

group_by(Day)%>%

summarise(mean_wind=mean(

Wind)) %>% ggplot() +

geom_area(aes(x = Day, y =

mean_wind)) + labs(title = "Area

Chart of Average Wind per Day",

subtitle = "using airquality data", y =

"Mean Wind")
OUTPUT:
1) To use MongoDB with R, first, we have to download and install
MongoDB Next, start MongoDB. We can start MongoDB like so:

mongod

2) Inserting data

Let’s insert the crimes data from data.gov to MongoDB. The dataset
reflects reported incidents of crime (with the exception of murders where
data exists for each victim) that occurred in the City of Chicago since
2001.

library

(ggplot2)

library

(dplyr)

library

(maps)

library

(ggmap)

library

(mongolite)

library

(lubridate)
library

(gridExtra)

crimes=data.table::fread("Crimes_2001_to_pr

esent.csv") names (crimes)

OUTPUT:
ID' 'Case Number' 'Date' 'Block' 'IUCR' 'Primary Type' 'Description' 'Location
Description' 'Arrest''Domestic' 'Beat' 'District' 'Ward' 'Community Area' 'FBI Code' 'X
Coordinate' 'Y Coordinate' 'Year' 'Updated On' 'Latitude' 'Longitude' 'Location'.

3) Let’s remove spaces in the column names to avoid any problems when
we query it from MongoDB.

names(crimes) = gsub("

","",names(crimes))

names(crimes)

4) Let’s use the insert function from the mongolite package to insert rows
to a collection in MongoDB.Let’s create a database called Chicago and
call the collection crimes.

my_collection = mongo(collection = "crimes", db = "Chicago") # create

connection, database and collection

my_collection$insert(crimes)

OUTPUT:
'ID' 'CaseNumber''Date' 'Block''IUCR' 'PrimaryType' 'Description'
'LocationDescription' 'Arrest' 'Domestic' 'Beat' 'District' 'Ward' 'CommunityArea'
'FBICode' 'XCoordinate' 'YCoordinate' 'Year' 'UpdatedOn' 'Latitude' 'Longitude'
'Location'
5) Let’s check if we have inserted the “crimes” data.

my_collection$count()

OUTPUT:

6261148

We see that the collection has 6261148 records.

6) First, let’s look what the data looks like by

displaying one record: my_collection$iterate()$one()

OUTPUT:
$ID
1454164
$Case Number
' G185744'
$Date
' 04/01/2001 06:00:00 PM'
$Block
' 049XX N MENARD AV'
$IUCR

' 0910'
$Primary Type
' MOTOR VEHICLE THEFT'
$Description
' AUTOMOBILE'
$Location
Description ' STREET'
$Arrest
' false'
$Domestic
' false'
$Beat
1622
$District
16
$FBICode
' 07'
$XCoordinate
1136545

$YCoordinate
1932203
$Year
2001
$Updated On
' 08/17/2015 03:03:40 PM'
$Latitude
41.970129962
$Longitude
87.773302309
$Location
'(41.970129962, -87.773302309)'

7) How many distinct “Primary Type” do we have?

length(my_collection$distinct("Primary Type"))
OUTPUT:

35
As shown above, there are 35 different crime primary types in the
database. We will see the patterns of the most common crime types below.

8) Now, let’s see how many domestic assaults there are in the

collection.

my_collection$count('{"PrimaryType":"ASSAULT",

"Domestic" : "true" }')

OUTPUT:

8247
9) To get the filtered data and we can also retrieve only the columns

of interest. query1= my_collection$find('{"PrimaryType" :

"ASSAULT", "Domestic" : "true" }') query2=

my_collection$find('{"PrimaryType" : "ASSAULT", "Domestic" :

"true" }', fields = '{"_id":0, "PrimaryType":1, "Domestic":1}')

ncol(query1) # with all the columns ncol(query2) # only the selected columns

OUTPUT:

22
2

10) To find out “Where do most crimes take place?” use the

following command. my_collection$aggregate('[{"$group":

{"_id":"$LocationDescription", "Count":

{"$sum":1}}}]')%>%na.omit()%>% arrange(desc(Count))%>%head(10)%>%
ggplot(aes(x=reorder(`_id`,Count),y=Count))
+geom_bar(stat="identity",color='skyblue',fill
='#b35900')+geom_text(aes(label = Count), color = "blue") +coord_flip()
+xlab("Location description")

11) If loading the entire dataset we are working with does not slow down
our analysis, we can use data.table or dplyr but when dealing with big
data, using MongoDB can give us performance boost as the whole data
will not be loaded into memory. We can reproduce the above plot without
using MongoDB, like so:

crimes%>%group_by(`LocationDescription`)%>

%summarise(Total=n())%>% arrange(desc(Total))%>
%head(10)%>%

ggplot(aes(x=reorder(`LocationDescription`,Total),y=Total))+

geom_bar(stat="identity",color='skyblue',fill='#b35900')

+geom_text(aes(label = Total), color = "blue") +coord_flip()

+xlab("Location Description")

Cloud Computing Lab Manual Ccs335
100% (1)
Cloud Computing Lab Manual Ccs335
56 pages
BDA Lab Manual R22
0% (1)
BDA Lab Manual R22
70 pages
CCS334 BDA Lab Manual
No ratings yet
CCS334 BDA Lab Manual
35 pages
PWP Summer 2025 Question Paper
No ratings yet
PWP Summer 2025 Question Paper
3 pages
Week1 - Introduction To Enterprise Systems
No ratings yet
Week1 - Introduction To Enterprise Systems
56 pages
Internship Report
No ratings yet
Internship Report
21 pages
Presantation - Chapter 07-Decrease and Conquer
No ratings yet
Presantation - Chapter 07-Decrease and Conquer
41 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
CDS Lab Manual
No ratings yet
CDS Lab Manual
51 pages
Spring Batch - Reference
No ratings yet
Spring Batch - Reference
26 pages
Bda Manual
No ratings yet
Bda Manual
80 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
Rush
No ratings yet
Rush
90 pages
Big Data Analytics IT
No ratings yet
Big Data Analytics IT
55 pages
Unit 1 Bdhall
No ratings yet
Unit 1 Bdhall
66 pages
Agriculture Marketplace Online System
No ratings yet
Agriculture Marketplace Online System
12 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
New Bda Manual
No ratings yet
New Bda Manual
80 pages
Bda 1
No ratings yet
Bda 1
54 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
How To Install Hadoop in Windows 10 & 11 - Hadoop Installation
No ratings yet
How To Install Hadoop in Windows 10 & 11 - Hadoop Installation
9 pages
Dbdms Lab Manual
No ratings yet
Dbdms Lab Manual
56 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Bda Record
No ratings yet
Bda Record
83 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
43 pages
Ai Lab Manual
No ratings yet
Ai Lab Manual
39 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
DA Lab EXERCISE
No ratings yet
DA Lab EXERCISE
24 pages
Data Science
No ratings yet
Data Science
82 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Bda 2
No ratings yet
Bda 2
25 pages
Big Data Journal
No ratings yet
Big Data Journal
50 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
54 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
BIGDATALABCURRENT
No ratings yet
BIGDATALABCURRENT
54 pages
Ball On A Beam System
No ratings yet
Ball On A Beam System
83 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
Hadoop Record 2024-Final
No ratings yet
Hadoop Record 2024-Final
59 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
Introduction EJB3
No ratings yet
Introduction EJB3
52 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
Bda Record (24-25)
No ratings yet
Bda Record (24-25)
50 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Notes
No ratings yet
Notes
53 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Da Lab Record - Merged
No ratings yet
Da Lab Record - Merged
48 pages
Single Line
No ratings yet
Single Line
54 pages
Log
No ratings yet
Log
67 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
42 pages
Big Data
No ratings yet
Big Data
28 pages
Anushka Shetty 35
No ratings yet
Anushka Shetty 35
34 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
BDA Lab Manual by T.Naga Praveena
No ratings yet
BDA Lab Manual by T.Naga Praveena
40 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Hadoop Installation
No ratings yet
Hadoop Installation
17 pages
BDH Lab Manual FINAL (Hadoop)
No ratings yet
BDH Lab Manual FINAL (Hadoop)
29 pages
Asychronisation OM
No ratings yet
Asychronisation OM
94 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Alisha-Vocational Training Report
No ratings yet
Alisha-Vocational Training Report
19 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
DA Lab Manual Final
No ratings yet
DA Lab Manual Final
46 pages
104 Da11-13
No ratings yet
104 Da11-13
14 pages
Gameduino Tutorial
No ratings yet
Gameduino Tutorial
12 pages
Hadoop Installation Process
No ratings yet
Hadoop Installation Process
16 pages
Hadoop On Windows
No ratings yet
Hadoop On Windows
13 pages
Principles of Electronic Communication Systems: Second Edition
No ratings yet
Principles of Electronic Communication Systems: Second Edition
60 pages
Cambridge IGCSE™: Information and Communication Technology 0417/13 May/June 2020
No ratings yet
Cambridge IGCSE™: Information and Communication Technology 0417/13 May/June 2020
10 pages
Lionz
No ratings yet
Lionz
8 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
Undercarriage Inspection Service Undercarriage Inspection Service
No ratings yet
Undercarriage Inspection Service Undercarriage Inspection Service
2 pages
Setup Hadoop On Windows 10 Machines
No ratings yet
Setup Hadoop On Windows 10 Machines
4 pages
Bda 1
No ratings yet
Bda 1
6 pages
Cloud Computing Ex 6
No ratings yet
Cloud Computing Ex 6
8 pages
Attia Elmoslimany ElKeyi JCM 2012
No ratings yet
Attia Elmoslimany ElKeyi JCM 2012
15 pages
Solution Guide - IBM Sterling Order Management - Lightwell
No ratings yet
Solution Guide - IBM Sterling Order Management - Lightwell
5 pages
PSPP Unit WISE QB 2023
No ratings yet
PSPP Unit WISE QB 2023
2 pages
DC Anm
No ratings yet
DC Anm
10 pages
History of International Communication
No ratings yet
History of International Communication
6 pages
Uday Devops Cloud
No ratings yet
Uday Devops Cloud
6 pages
ICT Computer Support Technician 12-11 JD Ps SC 3 Feb 11
No ratings yet
ICT Computer Support Technician 12-11 JD Ps SC 3 Feb 11
3 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
MP Assignment 1
No ratings yet
MP Assignment 1
9 pages
Installation of Hadoop in Windows10
No ratings yet
Installation of Hadoop in Windows10
4 pages
The Exact lwm2m Request That Is Being Sent Like 3/4/1 Format N What Each Value Represents
No ratings yet
The Exact lwm2m Request That Is Being Sent Like 3/4/1 Format N What Each Value Represents
5 pages
Log
No ratings yet
Log
3 pages
Oop Assessment
No ratings yet
Oop Assessment
2 pages
Best Terminal Alternatives For Ubuntu
No ratings yet
Best Terminal Alternatives For Ubuntu
3 pages
Invoice: WPS Canada Inc
No ratings yet
Invoice: WPS Canada Inc
2 pages
Structural and Dynamic Analysis of Optimized Four Bar Mechanism Considering Counterweight in Coupler Link - ScienceDirect
No ratings yet
Structural and Dynamic Analysis of Optimized Four Bar Mechanism Considering Counterweight in Coupler Link - ScienceDirect
1 page
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet

Bda Lab

Uploaded by

Bda Lab

Uploaded by

AIM

To Install, configure and run Hadoop and HDFS

These software’s should be prepared to install Hadoop 2.8.0 on window 10 64 bits.

1) Download Hadoop 2.8.0

2) Java JDK 1.8.0.zip

1) Check either Java 1.8.0 is already installed on your system or

3) Extract files Hadoop 2.8.0.tar.gz or Hadoop-2.8.0.zip and place

4) Set the path HADOOP_HOME Environment variable on windows

5) Set the path JAVA_HOME Environment variable on windows

a) File C:/Hadoop-2.8.0/etc/hadoop/core-site.xml, paste below xml

c) Create folder "data" under "C:\Hadoop-2.8.0"

1) Create folder "datanode" under "C:\Hadoop-2.8.0\data"

2) Create folder "namenode" under "C:\Hadoop-2.8.0\data" data

d) Edit file C:\Hadoop-2.8.0/etc/hadoop/hdfs-site.xml, paste below xml

13) Open: http://localhost:50070

To Implement word count / frequency programs using MapReduce

2. Create an input directory in HDFS.

hadoop fs -mkdir /input_dir

5. Verify content of the copied file.

hadoop dfs -cat /input_dir/input_file.txt

hadoop jar C:/MapReduceClient.jar wordcount /input_dir /output_dir

7. dVerify content for generated output file.

hadoop dfs -cat /output_dir/*

hadoop dfsadmin –safemode leave

hadoop fs -rm -r /iutput_dir

To Implementation of an MR program that processes a weather dataset

import org.apache.hadoop.mapreduce.*; import java.io.IOException;

public class AverageMapper extends Mapper <LongWritable, Text, Text,

public static final int MISSING = 9999;

public void map(LongWritable key, Text value, Context context)

String line = value.toString(); String year = line.substring(15,19);

temperature = Integer.parseInt(line.substring(87, 92));

String quality = line.substring(92, 93);

if(temperature != MISSING && quality.matches("[01459]"))

context.write(new Text(year),new IntWritable(temperature));

public void reduce(Text key, Iterable<IntWritable> values, Context context)

context.write(key, new IntWritable(max_temp/count)); }

ormat; public class AverageDriver

public static void main (String[] args) throws Exception

System.err.println("Please Enter the input and output

Job job = new Job();

To write a R program to implement linear and logistic regression.

****SIMPLE LINEAR REGRESSION****

12mo.csv", header=T, colClasses = c("numeric",

"numeric", "numeric")) head(dataset,5)

input<- mtcars [,c("am","cyl","hp","wt")]

am.data =glm(formula = am ~ cyl+hp+wt,data = input,family = binomial)

To implement support vector machine (SVM) to find optimum hyper plane

Program library(e1071) plot(iris)

iris$Petal.width, col=iris$Species) s<-

col<- c("Petal.Length", "Petal.Width",

"Species") iris_train<- iris[s,col]

svmfit<- svm(Species ~., data = iris_train, kernel = "linear", cost = .1,

scale = FALSE) print(svmfit)

tuned <- tune(svm, Species~., data = iris_train, kernel = "linear", ranges=

To implement a decision tree used to representing a decision situation in

library(MASS) library(rpart) head(birthwt)

cols <- c('low', 'race', 'smoke', 'ht',

'ui') birthwt[cols] <-

train<- sample(1:nrow(birthwt), 0.75 * nrow(birthwt))

birthwtTree<- rpart(low ~ . - bwt, data = birthwt[train, ], method =

birthwtPred<- predict(birthwtTree, birthwt[-train, ], type =

'class') table(birthwtPred, birthwt[-train, ]$low)

To write a R program for implementing of clustering techniques.

ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) +

irisCluster <- kmeans(iris[, 3:4], 3,

nstart = 20) irisCluster

To implement Data visualization is to provide an efficient graphical

Note: We have used par(mfrow=c(2,5)) command to fit multiple graphs in

#Simple Line Plot

2.2. Bar Chart

barplot(iris$Petal.Length) #Creating simple Bar Graph

3. Box Plot Box Plot shows 5 statistically significant numbers the

4.Scatter Plot (including 3D and other features) Scatter plots help in

plot(x=iris$Petal.Length) #Simple Scatter Plot

plot(x=iris$Petal.Length,y=iris$Species) #Multivariate Scatter

SIMPLE LINEAR REGRESSION