0% found this document useful (0 votes)

18 views45 pages

Ccs 334 Bigdata Manual

Uploaded by

kingsolar719

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views45 pages

Ccs 334 Bigdata Manual

Uploaded by

kingsolar719

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CCS334-BIG DATA ANALYTICS

LABORATORY

LAB MANUAL
(Regulation 2021)
LIST OF EXPERIMENTS:
1. Downloading and installing Hadoop; Understanding different Hadoop modes. Startup
scripts, Configuration files.
2. Hadoop Implementation of file management tasks, such as Adding files and directories,
retrieving files and Deleting files
3. Implement of Matrix Multiplication with Hadoop Map Reduce
4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
5. Installation of Hive along with practice examples.
6. Installation of HBase, Installing thrift along with Practice examples
7. Practice importing and exporting data from various databases. Software Requirements:
Cassandra, Hadoop, Java, Pig, Hive and HBase.

.
EXPT.NO.1(a) Downloading and installing Hadoop; Understanding different Hadoop modes.
Startup scripts, Configuration files.
DATE:

AIM:

To Downloading and installing Hadoop; Understanding different Hadoop modes. Startup

scripts, Configuration files.

PROCEDURE:

Prerequisites to Install Hadoop on Ubuntu

Hardware requirement- The machine must have 4GB RAM and minimum 60 GB hard disk
for better performance.
Check java version- It is recommended to install Oracle Java 8.
The user can check the version of java with below command.
$ java –version
STEP 1: Setup passwordless ssh
a) Install Open SSH Server and Open SSH Client
We will now setup the passwordless ssh client with the following command.
1.$sudo apt-get install openssh-server openssh-client

b) Generate Public & Private Key Pairs

2. ssh-keygen -t rsa -P “”

c) Configure password-less SSH

3. cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
d) Now verify the working of password-less ssh
$ ssh localhost

e) Now install rsync with command

$ sudo apt-get install rsync

STEP 2: Configure and Setup Hadoop

Downloading Hadoop
$wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6tar.gz
$tar xzf hadoop-3.3.6.tar.gz
Result:
Downloaded and installed Hadoop and also understand different Hadoop modes. Are
successfully implemented
EXPT.NO.1(b) Downloading and installing Hadoop; Startup scripts, Configuration files.

DATE:

AIM:
To Downloading and installing Hadoop; Understanding different Hadoop modes.
Startup scripts, Configuration files.
STEP 1:Setup Configuration
a) Setting Up the environment variables
Edit .bashrc- Edit the bashrc and therefore add hadoop in a path:
$nano bash.bashrc
export HADOOP_HOME=/home/cse/hadoop-3.3.6
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Source .bashrc in current login session in terminal

$source ~/.bashrc
b) Hadoop configuration file changes
Edit hadoop-env.sh
Edit hadoop-env.sh file which is in etc/hadoop inside the Hadoop installation directory.
$sudo nano $HADOOP_HOME/etc/hadoop/Hadoop-env.sh
The user can set JAVA_HOME:
export JAVA_HOME=<root directory of Java-installation> (eg:
/usr/lib/jvm/jdk1.8.0_151/)
Edit core-site.xml
$sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/cse/hdata</value>
</property>
</configuration>
Edit hdfs-site.xml
$sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>dfs.data.dir</name>
<value>/home/cse/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/cse/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Edit mapred-site.xml
$sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Edit yarn-site.xml
$sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADO
OP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HA
DOOP_MAPRED_HOME</value>
</property>
Step 2: Start the cluster
We will now start the single node cluster with the following commands.
a) Format the namenode
$hdfs namenode –format

b) Start the HDFS

$start-all.sh
c)Verify if all process started
$ jps
6775 DataNode
7209 ResourceManager
7017 SecondaryNameNode
6651 NameNode
7339 NodeManager
7663 Jps
d)Web interface-For viewing Web UI of NameNode
visit : (http://localhost:9870)

Result:
Downloaded and installed Hadoop and also understand different Hadoop modes. Startup
scripts, Configuration files are successfully implemented
EXPT.NO.2 Hadoop Implementation of file management tasks, such as Adding files and
directories, retrieving files and Deleting files
DATE:

AIM: To implement the following file management tasks in Hadoop:

1. Adding files and directories
2. Retrieving files
3. Deleting Files

DESCRIPTION:- HDFS is a scalable distributed filesystem designed to scale to

petabytes of data while running on top of the underlying filesystem of the operating
system. HDFS keeps track of where the data resides in a network by associating the name of
its rack (or network switch) with the dataset. This allows Hadoop to efficiently schedule
tasks to those nodes that contain data, or which are nearest to it, optimizing
bandwidth utilization. Hadoop provides a set of command line utilities that work similarly
to the Linux file commands, and serve as your primary interface with HDFS.
We‘re going to have a look into HDFS by interacting with it from the command line.
We will take a look at the most common file management tasks in Hadoop, which include:
1. Adding files and directories to HDFS
2. Retrieving files from HDFS to local filesystem
3. Deleting files from HDFS

SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM

HDFS

Step 1:Starting HDFS

Initially you have to format the configured HDFS file system, open namenode (HDFS
server), and execute the following command.
$ hadoop namenode -format
After formatting the HDFS, start the distributed file system. The following command will
start the namenode as well as the data nodes as cluster.
$ start-dfs.sh

Listing Files in HDFS

After loading the information in the server, we can find the list of files in a directory, status
of a file, using ls Given below is the syntax of ls that you can pass to a directory or a
filename as an argument.

$ $HADOOP_HOME/bin/hadoop fs -ls <args>

Inserting Data into HDFS

Assume we have data in the file called file.txt in the local system which is ought to be saved
in the hdfs file system. Follow the steps given below to insert the required file in the Hadoop
file system.

Step-2: Adding Files and Directories to HDFS

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Transfer and store a data file from local systems to the Hadoop file system using the put
command.

$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input

Step 3 :You can verify the file using ls command.

$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Step 4 Retrieving Data from HDFS
Assume we have a file in HDFS called outfile. Given below is a simple demonstration for
retrieving the required file from the Hadoop file system.

Initially, view the data from HDFS using cat command.

$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/outfile
Get the file from HDFS to the local file system using get command.

$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/

Step-5: Deleting Files from HDFS
$ hadoop fs -rm file.txt

Step 6:Shutting Down the HDFS

You can shut down the HDFS by using the following command.

$ stop-dfs.sh

Result:
Thus the Installing of Hadoop in three operating modes has been successfully completed
EXPT.NO.3(a)
Implement of Matrix Multiplication with Hadoop Map Reduce
DATE:

AIM: To Develop a MapReduce program to implement Matrix

Multiplication. Description

In mathematics, matrix multiplication or the matrix product is a binary operationthat

produces a matrix from two matrices. The definition is motivated by linear equations and
linear transformations on vectors, which have numerous applicationsin applied mathematics,
physics, and engineering. In more detail, if A is an n × m matrix and B is an m × p matrix,
their matrix product AB is an n × p matrix, in which the m entries across a row of A are
multiplied with the m entries down a column of B and summed to produce an entry of AB.
When two linear transformations are represented by matrices, then the matrix product
represents the composition of the two transformations.

Algorithm for Map Function.

a.
for each element mij of M do
produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of
columns of N
b.
for each element njk of N do
produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number of
rows of M.
c.
return Set of (key,value) pairs that each key (i,k), has list with
values (M,j,mij) and (N, j,njk) for all possible values of j.
Algorithm for Reduce Function.
d.
for each key (i,k) do
e.
sort values begin with M by j in listM sort values begin with N by j in
list N multiply mij and njk for jth value of each list
f.
sum up mij x njk return (i,k), Σj=1 mij x njk

Step 1. Creating directory for matrix

Then open matrix1.txt and matrix2.txt put the values in that text files

Step 2. Creating Mapper file for Matrix Multiplication.

#!/usr/bin/env python
import sys
cache_info = open("cache.txt").readlines()[0].split(",")
row_a, col_b = map(int,cache_info)
for line in sys.stdin:
matrix_index, row, col, value = line.rstrip().split(",")
if matrix_index == "A":
for i in xrange(0,col_b):
key = row + "," + str(i)
print "%s\t%s\t%s"%(key,col,value)
else:
for j in xrange(0,row_a):
key = str(j) + "," + col
print "%s\t%s\t%s"%(key,row,value)
Step 3. Creating reducer file for Matrix Multiplication.

#!/usr/bin/env python
import sys
from operator import itemgetter
prev_index = None
value_list = []
for line in sys.stdin:
curr_index, index, value = line.rstrip().split("\t")
index, value = map(int,[index,value])
if curr_index == prev_index:
value_list.append((index,value))
else:
if prev_index:
value_list = sorted(value_list,key=itemgetter(0))
i=0
result = 0
while i < len(value_list) - 1:
if value_list[i][0] == value_list[i + 1][0]:
result += value_list[i][1]*value_list[i + 1][1]
i += 2
else:
i += 1
print "%s,%s"%(prev_index,str(result))
prev_index = curr_index
value_list = [(index,value)]

if curr_index == prev_index:
value_list = sorted(value_list,key=itemgetter(0))
i=0
result = 0
while i < len(value_list) - 1:
if value_list[i][0] == value_list[i + 1][0]:
result += value_list[i][1]*value_list[i + 1][1]
i += 2
else:
i += 1
print "%s,%s"%(prev_index,str(result))
Step 4. To view this file using cat command
$cat *.txt |python mapper.py
$ chmod +x ~/Desktop/mr/matrix/Mapper.py
$ chmod +x ~/Desktop/mr/matrixl/Reducer.py
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
> -input /user/cse/matrices/ \
> -output /user/cse/mat_output \
> -mapper ~/Desktop/mr/matrix/Mapper.py \
> -reducer ~/Desktop/mr/matrix/Reducer.py
Step 5: To view this full output

Result:
Thus the MapReduce program to implement Matrix Multiplication was successfully executed
EXPT.NO.4 Run a basic Word Count Map Reduce program to understand Map Reduce
Paradigm
DATE:

AIM: To Develop a MapReduce program to calculate the frequency of a given word

in agiven fileMap Function – It takes a set of data and converts it into another set of
data, where individual elements are broken down into tuples (Key-Value pair).
Example – (Map function in Word Count)
Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS,
TRAIN
Output
Convert into another
set of data(Key,Value)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)
Reduce Function – Takes the output from Map as an input and combines
those data tuplesinto a smaller set of tuples.
Example – (Reduce function in Word Count)
Input Set of
Tuples(output of
Map function)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1),
(train,1), (bus,1),(TRAIN,1),(BUS,1),
(buS,1),(caR,1),(CAR,1), (car,1), (BUS,1), (TRAIN,1)
Output Converts into smaller set of tuples
(BUS,7), (CAR,7), (TRAIN,4)

Workflow of MapReduce consists of 5 steps

1. Splitting – The splitting parameter can be anything, e.g. splitting
by space,comma, semicolon, or even by a new line (‘\n’).
2. Mapping – as explained above
3. Intermediate splitting – the entire process in parallel on different clusters.
In orderto group them in “Reduce Phase” the similar KEY data should be
on same cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set
from eachcluster) is combine together to form a Result
Now Let’s See the Word Count Program in Java

Step1 :Make sure Hadoop and Java are installed properly

hadoop version

javac –version

Step 2. Create a directory on the Desktop named Lab and

inside it create two folders;
one called “Input” and the other called “tutorial_classes”.
[You can do this step using GUI normally or through terminal
commands]
cd Desktop
mkdir Lab
mkdir
Lab/Input
mkdir Lab/tutorial_classes
Step 3. Add the file attached with this document
“WordCount.java” in the directory Lab
Step 4. Add the file attached with this document “input.txt” in
the directory Lab/Input.
Step 5. Type the following command to export the hadoop
classpath into bash.
export HADOOP_CLASSPATH=$(hadoop classpath)
Make sure it is now exported.
echo $HADOOP_CLASSPATH
Step 6. It is time to create these directories on HDFS rather than
locally. Type the following commands.
hadoop fs -mkdir /WordCountTutorial
hadoop fs -mkdir /WordCountTutorial/Input
hadoop fs -put Lab/Input/input.txt /WordCountTutorial/Input

Step 7. Go to localhost:9870 from the browser, Open“Utilities→ Browse File System”

and you should see the directories and files we placed in the file system.
Step 8. Then, back to local machine where we will compile the
WordCount.java file.
Assuming we are currently in the Desktop directory.
cd Lab
javac -classpath $HADOOP_CLASSPATH -d tutorial_classes WordCount.java

Put the output files in one jar file (There is a dot at the end)
jar -cvf WordCount.jar -C tutorial_classes .
Step 9. Now, we run the jar file on Hadoop.
hadoop jar WordCount.jar WordCount /WordCountTutorial/Input
/WordCountTutorial/Output
Step 10. Output the result:

hadoop dfs -cat /WordCountTutorial/Output/*

Program: Step 5. Type following Program :

package PackageDemo;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import
org.apache.hadoop.io.IntWritable;
import
org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Job;
import
org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import
org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new
GenericOptionsParser(c,args).getRemainingArgs(); Path
input=new Path(files[0]);
Path output=new
Path(files[1]); Job j=new
Job(c,"wordcount");
j.setJarByClass(WordCount.c
lass);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text,
Text,IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = value.toString();
String[]
words=line.split(",");
for(String word: words
)
{
Text outputKey = new
Text(word.toUpperCase().trim());IntWritable
outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable,
Text,IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con)
throwsIOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}
The output is stored in /r_output/part-00000

OUTPUT:

Result:
Thus the Word Count Map Reduce program to understand Map Reduce Paradigm was
successfully execu
EXPT.NO.5(a)
Installation hive
DATE:

AIM: To installing hive with example

Steps for hive installation
• Download and Unzip Hive
• Edit .bashrc file
• Edit hive-config.sh file
• Create Hive directories in HDFS
• Initiate Derby database
• Configure hive-site.xml file
Step 1:

download and unzip Hive

=============================
wget https://downloads.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
tar xzf apache-hive-3.1.2-bin.tar.gz
step 2:
Edit .bashrc file
========================
sudo nano .bashrc
export HIVE_HOME= /home/hdoop/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin

step 3:
source ~/.bashrc
step 4:
Edit hive-config.sh file
====================================
sudo nano $HIVE_HOME/bin/hive-config.sh
export HADOOP_HOME=/home/cse/hadoop-3.3.6

step 5:
Create Hive directories in HDFS
===================================
hdfs dfs -mkdir /tmp
hdfs dfs -chmod g+w /tmp
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /user/hive/warehouse

step 6:

Fixing guava problem – Additional step

=================

rm $HIVE_HOME/lib/guava-19.0.jar
cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-27.0-jre.jar $HIVE_HOME/lib/

step 7: Configure hive-site.xml File (Optional)

Use the following command to locate the correct file:
cd $HIVE_HOME/conf
List the files contained in the folder using the ls command.
Use the hive-default.xml.template to create the hive-site.xml file:

cp hive-default.xml.template hive-site.xml
Access the hive-site.xml file using the nano text editor:
sudo nano hive-site.xml

Step 8: Initiate Derby Database

============================
$HIVE_HOME/bin/schematool -dbType derby –initSchema
Result:
Thus Installation hive was successfully installed and executed
EXPT.NO.5(b)
hive with examples
DATE:
AIM:

To installing hive with example

Create Database from Hive Beeline shell

1.Create database database_name;
Ex:
>Create database Emp;
>use Emp;
>create table emp.employee(sno int,user String,city String)Row format delimited fields
terminated by /n stored as textfile;

Show Database

Result:
Thus Installation hive was successfully installed and executed with example
EXPT.NO.6(a)
Installation of HBase, Installing thrift along with Practice examples
DATE:

AIM:
To Install HBase on Ubuntu 18.04 HBase in Standalone Mode
PROCEDURE:
Pre-requisite:

Ubuntu 16.04 or higher installed on a virtual machine.

step-1:Make sure that java has installed in your machine to verify that run java –version

If any Error Occured While Execute this command , then java is not installed in your system

To Install Java sudo apt install openjdk-8-jdk -y

Step-2:Download Hbase
wget https://dlcdn.apache.org/hbase/2.5.5/hbase-2.5.5-bin.tar.gz
Step-3:Extract The hbase-2.5.5-bin.tar.gz file by using the command tar xvf hbase-2.5.5-
bin.tar.gz

step-4: goto hbase2.5.5/conf folder and open hbase-env.sh file

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

step-5 : Edit .bashrc file

and then open .bashrc file and mention HBASE_HOME path as shown in below

export HBASE_HOME=/home/prasanna/hbase-2.5.5

here you can change name according to your local machine name

eg : export HBASE_HOME=/home/<your_machine_name>/hbase-2.5.5

export PATH= $PATH:$HBASE_HOME/bin

Note:*make sure that the hbase-2.5.5 folderin home directory before setting HBASE_HOME path
, if not then move the hbase-2.5.5 file to home directory*
step-6 : Add properties in the hbase-site.xml

put the below property between the <configuratio></configuration> tag

<property>
<name>hbase.rootdir</name>
<value>file:///home/prasanna/HBASE/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/prasanna/HBASE/zookeeper</value>
</property>
step-7:Goto To /etc/ folderand run the following command and configure
change in line no-2 by default the ip is 127.0.1.1
change it to 127.0.0.1 in second line onlystep-8:starting hbase
goto hbase-2.5.5/bin folder

After this run jps command to ensure that hbase is running

run http://localhost:16010 to see hbase web UI

step-9: accessing hbase shell by running ./hbase shell command

Result:

HBase was successfully installed on Ubuntu 18.04.

ANJALAI
EXPT.NO.6(b)
HBase, Installing thrift along with Practice examples
DATE:

Aim:

To Install HBase on Ubuntu 18.04 HBase in Standalone Mod

EXAMPLE
1) To create Table
syntax:
create ‘Table_Name’,’col_fam_1’,’col_fam_1’,.......’col_fam-n’
code :
create 'aamec','dept','year'

2) List All Tables

code :
list

3) insert data

syntax:
put ‘table_name’,’row_key’,’column_family:attribute’,’value’

here row_key is a unique key to retrive data

code :
this data will enter data into the dept column family
put 'aamec','cse','dept:studentname','prasanna'
put 'aamec','cse','dept:year','third'
put 'aamec','cse','dept:section','A'
This data will enter data into the year column family
put 'aamec','cse','year:joinedyear','2021'
put 'aamec','cse','year:finishingyear','2025'

4) Scan Table

same as desc in RDBMS

syntax:
scan ‘table_name’
code:
scan ‘aamec’

5) To get specific data

syntax:
get ‘table_name’,’row_key’,[optional column family: attribute]

code :
get ‘aamec’,’cse’
6. update table value

The same put command is used to update the table value ,if the row key is aldready present in the
database then it will update data according to the value ,if not present the it will create new row with the
given row key

previously the value for the section in cse is A ,But after running this command the value will be
changed into B

7)To Delete Data

syntax:
delete ‘table_name’,’row_key’,’column_family:attribute’
code :
delete 'aamec','cse','year:joinedyear'

8.De
lete Table
first we need to disable the table before dropping it
To Disable:
syntax:
disable ‘table_name’
code:
disable ‘aamec’

Result:

HBase was successfully installed with an example on Ubuntu 18.04.

EXPT.NO.7
Practice importing and exporting data from various databases.
DATE:

Aim:
To import or export, the order of columns in MySQL and Hive
Pre-requisite
Hadoop and Java
MySQL
Hive
SQOOP

Step 1:To start hdfs

Step 2: MySQL Installation

sudo apt install mysql-server ( use this command to install MySQL server)

COMMANDS:
~$ sudo su

After this enter your linux user password,then the root mode will be open here we don’t
need any authentication for mysql.

~root$ mysql

Creating user profiles and grant them permissions:

Mysql> CREATE USER ‘bigdata'@'localhost' IDENTIFIED BY ‘bigdata’;

ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE

Mysql>grant all privileges on *.* to bigdata@localhost;

Note: This step is not required if you just use the root user to make CRUD
operations in the MySQL
Mysql> CREATE USER ‘bigdata’@’127.0.0.1' IDENTIFIED BY ‘ bigdata’;
Mysql>grant all privileges on *.* to [email protected];

Note: Here, *.* means that the user we create has all the privileges on all the tables
of all the databases.
Now, we have created user profiles which will be used to make CRUD operations in the
mysql
Step 3: Create a database and table and insert data.
Example:
create database Employe;
create table Employe.Emp(author_name varchar(65), total_no_of_articles int, phone_no
int, address varchar(65));
insert into Emp values(“Rohan”,10,123456789,”Lucknow”);
Step 3: Create a database and table in the hive where data should be imported.
create table geeks_hive_table(name string, total_articles int, phone_no int, address string)
row format delimited fields terminated by ‘,’;
Step 4: SQOOP INSTALLATION :

After downloading the sqoop , go to the directory where we downloaded the sqoop
and then extract it using the following command :

$ tar -xvf sqoop-1.4.4.bin hadoop-2.0.4-alpha.tar.gz

Then enter into the super user : $ su

Next to move that to the usr/lib which requires a super user privilege

$ mv sqoop-1.4.4.bin hadoop-2.0.4-alpha /usr/lib/sqoop

Then exit : $ exit

Goto .bashrc: $ sudo nano .bashrc , and then add the following

export SQOOP_HOME=/usr/lib/sqoop

export PATH=$PATH:$SQOOP_HOME/bin

$ source ~/.bashrc
Then configure the sqoop, goto the directory of the config folder of sqoop_home and then move the
contents of template file to the environment file.
$ cd $SQOOP_HOME/conf
$ mv sqoop-env-template.sh sqoop-env.sh
Then open the sqoop-environment file and then add the following,
export HADOOP_COMMON_HOME=/usr/local/Hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop
Note : Here we add the path of the Hadoop libraries and files and it may different from the path which
we mentioned here. So, add the Hadoop path based on your installation.

Step 5: Download and Configure mysql-connector-java :

We can download mysql-connector-java-5.1.30.tar.gz file from the following link.

Next, to extract the file and place it to the lib folder of sqoop
$ tar -zxf mysql-connector-java-5.1.30.tar.gz
$ su
$ cd mysql-connector-java-5.1.30
$ mv mysql-connector-java-5.1.30-bin.jar /usr/lib/sqoop/lib
Note : This is library file is very important don’t skip this step because it contains the libraries to
connect the mysql databases to jdbc.
Verify sqoop: sqoop-version
Step 3: hive database Creation
hive> create database sqoop_example;
hive>use sqoop_example;
hive>create table sqoop(usr_name string,no_ops int,ops_names string);
Hive commands much more alike mysql commands.Here, we just create the structure to store the data
which we want to import in hive.
Step 6: Importing data from MySQL to hive :
sqoop import --connect \
jdbc:mysql://127.0.0.1:3306/database_name_in_mysql \
--username root --password cloudera \
--table table_name_in_mysql \
--hive-import --hive-table database_name_in_hive.table_name_in_hive \
--m 1
OUTPUT:

Result:
Thus the import and export, the order of columns in MySQL queries are exported
to hive successfully

CS609 Handouts PDF
No ratings yet
CS609 Handouts PDF
336 pages
Commands Duw3001
No ratings yet
Commands Duw3001
5 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Pgdca Syllabus
No ratings yet
Pgdca Syllabus
18 pages
CCS334 Bda Lab Manual
No ratings yet
CCS334 Bda Lab Manual
48 pages
213nt1306 - Big Data Analytics Lab Manual
No ratings yet
213nt1306 - Big Data Analytics Lab Manual
80 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
DA Lab
No ratings yet
DA Lab
89 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
26 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Instant Ebooks Textbook Linux Administration: A Beginner's Guide 8th Edition Wale Soyinka Download All Chapters
100% (4)
Instant Ebooks Textbook Linux Administration: A Beginner's Guide 8th Edition Wale Soyinka Download All Chapters
53 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Bi Lab File
No ratings yet
Bi Lab File
19 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
Exercise 03
No ratings yet
Exercise 03
2 pages
Tool Profiling 07 Oct 2021
No ratings yet
Tool Profiling 07 Oct 2021
91 pages
Big Data
No ratings yet
Big Data
23 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
Bdafile
No ratings yet
Bdafile
9 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Bigdata Lab File
No ratings yet
Bigdata Lab File
20 pages
BDA Record
No ratings yet
BDA Record
34 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
DOS Commands in Preparing Bootable Devices
No ratings yet
DOS Commands in Preparing Bootable Devices
12 pages
Hadoop 1
No ratings yet
Hadoop 1
15 pages
Programming With Python For Engineers
No ratings yet
Programming With Python For Engineers
41 pages
ICS 431-Ch14-File System Implementation
No ratings yet
ICS 431-Ch14-File System Implementation
29 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Bda Record (24-25)
No ratings yet
Bda Record (24-25)
50 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
BDA LabManual
No ratings yet
BDA LabManual
20 pages
CCS334 BIg Data Final Front Sheet BATHIMA - Pagenumber
No ratings yet
CCS334 BIg Data Final Front Sheet BATHIMA - Pagenumber
47 pages
Introduction To Database (Fundamental Concepts) : Fundamentals of Database, Chapter 1
No ratings yet
Introduction To Database (Fundamental Concepts) : Fundamentals of Database, Chapter 1
26 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Big Data File
No ratings yet
Big Data File
16 pages
Filesystem Labeling SELinux
No ratings yet
Filesystem Labeling SELinux
8 pages
Bda Manual
No ratings yet
Bda Manual
80 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
SSJ Bda File
No ratings yet
SSJ Bda File
16 pages
Big Data - ASSIGNMENT 3
No ratings yet
Big Data - ASSIGNMENT 3
2 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
CIS Docker Benchmark v1.3.1 PDF
No ratings yet
CIS Docker Benchmark v1.3.1 PDF
277 pages
DA Lab Program-1
No ratings yet
DA Lab Program-1
3 pages
03 - Mentor Server Architecture
No ratings yet
03 - Mentor Server Architecture
21 pages
Bda Record
No ratings yet
Bda Record
46 pages
Application Programming Open Book Exam
No ratings yet
Application Programming Open Book Exam
9 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
How To Backup and Restore The Virtual I/O Server
No ratings yet
How To Backup and Restore The Virtual I/O Server
9 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Bda Record
No ratings yet
Bda Record
83 pages
Docu92565 Data Domain 6.1.2.30 Release Notes
100% (1)
Docu92565 Data Domain 6.1.2.30 Release Notes
89 pages
Upgrade Guide
100% (1)
Upgrade Guide
27 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
TableServer MySQL 2 PDF
No ratings yet
TableServer MySQL 2 PDF
33 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Big Data
No ratings yet
Big Data
28 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
Basic Linux Commands: College of Engineering
No ratings yet
Basic Linux Commands: College of Engineering
8 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Recovery - Magisk 25.2 (25200) .Apk - 2022 10 31 18 04 49
No ratings yet
Recovery - Magisk 25.2 (25200) .Apk - 2022 10 31 18 04 49
37 pages
Distributed File Systems
No ratings yet
Distributed File Systems
38 pages
HDFS
No ratings yet
HDFS
6 pages
BIGDATALABCURRENT
No ratings yet
BIGDATALABCURRENT
54 pages
Code 71
No ratings yet
Code 71
12 pages
List of Linux/i386 System Calls: 2. System Call in Depth
No ratings yet
List of Linux/i386 System Calls: 2. System Call in Depth
14 pages
CLS Linux Administrator Learning Path Brochure
No ratings yet
CLS Linux Administrator Learning Path Brochure
6 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Digital Meter
No ratings yet
Digital Meter
4 pages
Oracle OIC Dump For 1Z0-1072-24
No ratings yet
Oracle OIC Dump For 1Z0-1072-24
16 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Adding ASM Disks HP-UX
No ratings yet
Adding ASM Disks HP-UX
14 pages
How To Use The Fdisk and Format Tools
No ratings yet
How To Use The Fdisk and Format Tools
12 pages
HP P6000 Enterprise Virtual Array Release Notes
No ratings yet
HP P6000 Enterprise Virtual Array Release Notes
10 pages
Software Requirements Specification: COMSATS University Islamabad, COMSATS Road, Off GT Road, Sahiwal, Pakistan
No ratings yet
Software Requirements Specification: COMSATS University Islamabad, COMSATS Road, Off GT Road, Sahiwal, Pakistan
13 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
OCI Professional 2023
No ratings yet
OCI Professional 2023
43 pages

Ccs 334 Bigdata Manual

Uploaded by

Ccs 334 Bigdata Manual

Uploaded by

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CCS334-BIG DATA ANALYTICS

To Downloading and installing Hadoop; Understanding different Hadoop modes. Startup

Prerequisites to Install Hadoop on Ubuntu

b) Generate Public & Private Key Pairs

c) Configure password-less SSH

e) Now install rsync with command

STEP 2: Configure and Setup Hadoop

Source .bashrc in current login session in terminal

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

b) Start the HDFS

AIM: To implement the following file management tasks in Hadoop:

DESCRIPTION:- HDFS is a scalable distributed filesystem designed to scale to

SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM

Step 1:Starting HDFS

Listing Files in HDFS

$ $HADOOP_HOME/bin/hadoop fs -ls <args>

Inserting Data into HDFS

Step-2: Adding Files and Directories to HDFS

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input

$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Initially, view the data from HDFS using cat command.

$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/

Step 6:Shutting Down the HDFS

AIM: To Develop a MapReduce program to implement Matrix

In mathematics, matrix multiplication or the matrix product is a binary operationthat

Algorithm for Map Function.

Step 1. Creating directory for matrix

Step 2. Creating Mapper file for Matrix Multiplication.

AIM: To Develop a MapReduce program to calculate the frequency of a given word

Workflow of MapReduce consists of 5 steps

Step1 :Make sure Hadoop and Java are installed properly

Step 2. Create a directory on the Desktop named Lab and

Step 7. Go to localhost:9870 from the browser, Open“Utilities→ Browse File System”

hadoop dfs -cat /WordCountTutorial/Output/*

Program: Step 5. Type following Program :

AIM: To installing hive with example

download and unzip Hive

Fixing guava problem – Additional step

step 7: Configure hive-site.xml File (Optional)

Step 8: Initiate Derby Database

To installing hive with example

Create Database from Hive Beeline shell

Ubuntu 16.04 or higher installed on a virtual machine.

To Install Java sudo apt install openjdk-8-jdk -y

step-4: goto hbase2.5.5/conf folder and open hbase-env.sh file

step-5 : Edit .bashrc file

export PATH= $PATH:$HBASE_HOME/bin

put the below property between the <configuratio></configuration> tag

After this run jps command to ensure that hbase is running

run http://localhost:16010 to see hbase web UI

HBase was successfully installed on Ubuntu 18.04.

To Install HBase on Ubuntu 18.04 HBase in Standalone Mod

2) List All Tables

here row_key is a unique key to retrive data

same as desc in RDBMS

5) To get specific data

7)To Delete Data

HBase was successfully installed with an example on Ubuntu 18.04.

Step 1:To start hdfs

Step 2: MySQL Installation

Creating user profiles and grant them permissions:

Mysql> CREATE USER ‘bigdata'@'localhost' IDENTIFIED BY ‘bigdata’;

ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE

$ tar -xvf sqoop-1.4.4.bin hadoop-2.0.4-alpha.tar.gz

Then enter into the super user : $ su

$ mv sqoop-1.4.4.bin hadoop-2.0.4-alpha /usr/lib/sqoop

Step 5: Download and Configure mysql-connector-java :

You might also like