0% found this document useful (0 votes)
28 views35 pages

@bigdatalabfile 09

This document contains details of an experiment conducted by student Ashwini Soni to install Hadoop 3 for a single node as part of their Big Data Lab course at Ujjain Engineering College. The document lists the steps taken to install Hadoop, configure files, and test the setup. It also includes details of two additional experiments conducted - using basic HDFS commands and writing a MapReduce program to count word frequencies in text.

Uploaded by

goatrip2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views35 pages

@bigdatalabfile 09

This document contains details of an experiment conducted by student Ashwini Soni to install Hadoop 3 for a single node as part of their Big Data Lab course at Ujjain Engineering College. The document lists the steps taken to install Hadoop, configure files, and test the setup. It also includes details of two additional experiments conducted - using basic HDFS commands and writing a MapReduce program to count word frequencies in text.

Uploaded by

goatrip2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Ujjain Engineering College, Ujjain

Indore Road, Ujjain, Madhya Pradesh 456010

Department of Computer Science and Engineering


Big Data Lab
LAB FILE
Session 2023-24
B.Tech VII SEMESTER

Name of Faculty Name of Student


Rekha Singh Ashwini Soni
0701CS201009
UJJAIN ENGINEERING COLLEGE, UJJAIN
Indore Road, Ujjain Madhya Pradesh, 456010
Department of Computer Science and Engineering

Subject : Big Data Lab


Semester : VII Semester
Name of Student : Ashwini Soni
Enrollment no. : 0701CS201009
List of Experiments

S.No. Aim Date of Date of Signature Remark


Experiment Submission
1
Install Hadoop 3 for Single
Node
2
HDFS Basic Commands
Write HDFS Commands to
perform following operations.
1.Create a directory in HDFS at
given path(s).
2.List the contents of a
directory.
3.Upload and download a file
in HDFS.
4.See contents of a file
5. Copy a file from source to
Destination
6.Copy a file from/To Local
file system to HDFS
7.Move file from source to
destination.
8.Remove a file or directory in
HDFS.
9. Display last few lines of a
file.

3
MapReduce

Write a program to count words


with its frequency (Write
custom Mapper,
Reducer and Driverclasses)
4
Hive Installation step by step.

5
Hive basic queries –
1. Write a query to count words
with its frequency using hive.
2. Create a managed table
Student with columns roll,
name, address, city, state and
load data into it.
3. Create a managed table
Result with columns roll, marks
and load data into it.
EXPERIMENT-1
AIM :- Install Hadoop 3 for Single Node.

1. JAVA-Java JDK (installed)


2. HADOOP-Hadoop package (Downloaded)

Step 1: Verify the Java installed


javac -version
Step 2: Extract Hadoop at C:\Hadoop

Step 3: Setting up the HADOOP_HOME variable


Use windows environment variable setting for Hadoop Path setting.

Step 4: Set JAVA_HOME variable


Use windows environment variable setting for Hadoop Path setting.
Step 5: Set Hadoop and Java bin directory path

Step 6: Hadoop Configuration :


For Hadoop Configuration we need to modify Six files that are listed below-
1. Core-site.xml
2. Mapred-site.xml
3. Hdfs-site.xml
4. Yarn-site.xml
5. Hadoop-env.cmd
6. Create two folders datanode and namenode
Step 6.1: Core-site.xml configuration
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Step 6.2: Mapred-site.xml configuration
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step 6.3: Hdfs-site.xml configuration
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-2.8.0\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-2.8.0\data\datanode</value>
</property>
</configuration>
Step 6.4: Yarn-site.xml configuration
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Step 6.5: Hadoop-env.cmd configuration
Set "JAVA_HOME=C:\Java" (On C:\java this is path to file jdk.18.0)
Step 6.6: Create datanode and namenode folders
1. Create folder "data" under "C:\Hadoop-2.8.0"
2. Create folder "datanode" under "C:\Hadoop-2.8.0\data"
3. Create folder "namenode" under "C:\Hadoop-2.8.0\data"
Step 7: Format the namenode folder

Open command window (cmd) and typing command “hdfsnamenode –format”

Step 8: Testing the setup


Open command window (cmd) and typing command “start-all.cmd”

Step 8.1: Testing the setup:


Ensure that namenode, datanode, and Resource manager are running
Step 9: Open: http://localhost:8088
Step 10:
Open: http://localhost:50070
EXPERIMENT-2
AIM :- HDFS Basic Commands
Write HDFS Commands to perform following operations.
HDFS is the primary or major component of the Hadoop ecosystem which is responsible for
storing large data sets of structured or unstructured data across various nodes and thereby
maintaining the metadata in the form of log files. To use the HDFS commands, first you need
to start the Hadoop services using the following command: sbin/start-all.sh

1. Create a directory in HDFS at given path(s). 1


mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So let’s
first create it.
Syntax:
hdfs dfs -mkdir / path

2. List the contents of a directory.


ls: This command is used to list all the files. Use lsr for recursive approach. It is useful
when we want a hierarchy of a folder.
Syntax:
hdfs dfs -ls /path

3. Upload and download a file in HDFS.


put: To copy files/folders from local file system to hdfs store. This is the most important
command. Local filesystem means the files present on the OS.
Syntax:
hdfs dfs -put <localsrc> <dest>
4. See contents of a file
cat: To print file contents.
Syntax:
hadoop fs -cat /path_to_file_in hdfs

5. Copy a file from source to destination


cp: This command is used to copy files within hdfs. Lets copy
folder geeks to geeks_copied.
Syntax:
hadoop fs -cp <src> <dest>

6. Copy a file from/To Local file system to HDFS


copyToLocal (or) get: To copy files/folders from hdfs store to local file system.
Syntax:
hadoop fs -copyToLocal <hdfs source> <localdst>

7. Move file from source to destination.


copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is
the most important command. Local filesystem means the files present on the OS.
Syntax:
hdfs dfs -put <localsrc> <dest>
8. Remove a file or directory in HDFS.
rm: This command is similar to the UNIX rm command, and it is used for removing a file
from the HDFS file system. The command –rmr can be used to delete files recursively.
Syntax:
hdfs dfs -rm -r / filename

9. Display last few lines of a file.


tail: Here using the tail command, we are trying to display the 1KB of file ‘test’ present in the
dataflair directory on the HDFS filesystem. The Hadoop fs shell tail command shows the last
1KB of a file on console or stdout.
The -f shows the append data as the file grows.
Syntax:

hdfs dfs -tail / file


EXPERIMENT-3
AIM :- MapReduce
Write a program to count words with its frequency (Write custom
Mapper,Reducer and Driver classes)
In MapReduce word count example, we find out the frequency of each word. Here, the role of
Mapper is to map the keys to the existing values and the role of Reducer is to aggregate the
keys of common values. So, everything is represented in the form of Key-value pair.

Steps to execute MapReduce word count example:

• Create a text file in your local machine and write some text into it.

$ nano data.txt

• Check the text written in the data.txt file.

$ cat data.txt
In this example, we find out the frequency of each word exists in this text file.

• Create a directory in HDFS, where to kept text file.

$ hdfs dfs -mkdir /test

• Upload the data.txt file on HDFS in the specific directory.

$ hdfs dfs -put /home/codegyani/data.txt /test

1.Driver Code (WCDriver.java)


import java.io.IOException;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class WCDriver extends Configured implements Tool {
public int run(String args[]) throws IOException
{
if (args.length < 2)
{
System.out.println("Please give valid inputs");
return -1;
}
JobConf conf = new JobConf(WCDriver.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(WCMapper.class);
conf.setReducerClass(WCReducer.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
// Main Method
public static void main(String args[]) throws Exception
{
int exitCode = ToolRunner.run(new WCDriver(), args);
System.out.println(exitCode);
}
}

2.Mapper Code(WCMapper.java)
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WCMapper extends MapReduceBase implements
Mapper<LongWritable,
Text, Text, IntWritable> {
// Map function
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter rep) throws IOException {
String line = value.toString();
// Splitting the line on spaces
for (String word : line.split(" ")) {
if (word.length() > 0) {
output.collect(new Text(word), new IntWritable(1));
}}
}}

3.Reducer Code
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WCReducer extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
// Reduce function
public void reduce(Text key, Iterator<IntWritable> value,
OutputCollector<Text, IntWritable> output,
Reporter rep) throws IOException
{
int count = 0;
// Counting the frequency of each words
while (value.hasNext())
{
IntWritable i = value.next();
count += i.get();
}
output.collect(key, new IntWritable(count));
}}
Create the jar file of this program and name it countworddemo.jar.
Run the jar file hadoop jar /home/codegyani/wordcountdemo WCDriver /test/data.txt
/r_output.
The output is stored in /r_output/part-00000

Now execute the command to see the output.


hdfs dfs -cat/r_output/part-00000
EXPERIMENT-4
AIM :- Hive Installation step by step.
1. Prerequisites
1. Hardware Requirement
* RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also
work.
* CPU — Min. Quad core, with at least 1.80GHz
2. JRE 1.8 — Offline installer for JRE
3. Java Development Kit — 1.8
4. A Software for Un-Zipping like 7Zip or Win Rar
* I will be using a 64-bit windows for the process, please check and download
the version supported by your system x86 or x64 for all the software.
5. Hadoop
* I am using Hadoop-2.9.2, you can also use any other STABLE version for
Hadoop.
6. MySQL Query Browser
7. Download Hive zip
* I am using Hive-3.1.2, you can also use any other STABLE version for
Hive.

Fig 1:- Download Hive-3.1.2


2. Unzip and Install Hive
After Downloading the Hive, we need to Unzip the apache-hive-3.1.2-bin.tar.gz file.

Fig 2:- Extracting Hive Step-1


Once extracted, we would get a new file apache-hive-3.1.2-bin.tar
Now, once again we need to extract this tar file.

Fig 3:- Extracting Hive Step-2


• Now we can organize our Hive installation, we can create a folder and move the
final extracted file in it. For Eg. :-

Fig 4:- Hive Directory


• Please note while creating folders, DO NOT ADD SPACES IN BETWEEN THE
FOLDER NAME.(it can cause issues later)
• I have placed my Hive in D: drive you can use C: or any other drive also.
3. Setting Up Environment Variables
Another important step in setting up a work environment is to set your Systems
environment variable.
To edit environment variables, go to Control Panel > System > click on the “Advanced
system settings” link
Alternatively, We can Right click on This PC icon and click on Properties and click on
the “Advanced system settings” link

Fig. 5:- Path for Environment Variable

Fig. 6:- Advanced System Settings Screen


3.1 Setting HIVE_HOME
• Open environment Variable and click on “New” in “User Variable”
Fig. 7:- Adding Environment Variable
• On clicking “New”, we get below screen.

Fig. 8:- Adding HIVE_HOME


• Now as shown, add HIVE_HOME in variable name and path of Hive in Variable
Value.
• Click OK and we are half done with setting HIVE_HOME.
3.2 Setting Path Variable
• Last step in setting Environment variable is setting Path in System Variable.

Fig. 9:- Setting Path Variable


• Select Path variable in the system variables and click on “Edit”.
Fig. 10:- Adding Path
• Now we need to add these paths to Path Variable :-
* %HIVE_HOME%\bin
• Click OK and OK. & we are done with Setting Environment Variables.
3.4 Verify the Paths
• Now we need to verify that what we have done is correct and reflecting.
• Open a NEW Command Window
• Run following commands
echo %HIVE_HOME%
4. Editing Hive
Once we have configured the environment variables next step is to configure Hive. It has
7 parts:-
4.1 Replacing bins
First step in configuring the hive is to download and replace the bin folder.
* Go to this GitHub Repo and download the bin folder as a zip.
* Extract the zip and replace all the files present under bin folder to
%HIVE_HOME%\bin
Note:- If you are using different version of HIVE then please search for its respective bin
folder and download it.
4.2 Creating File Hive-site.xml
Now we need to create the Hive-site.xml file in hive for configuring it :-
(We can find these files in Hive -> conf -> hive-default.xml.template)
We need to copy the hive-default.xml.template file and paste it in the same location and
rename it to hive-site.xml. This will act as our main Config file for Hive.

Fig. 11:- Creating Hive-site.xml


4.3 Editing Configuration Files
4.3.1 Editing the Properties
Now Open the newly created Hive-site.xml and we need to edit the following properties
<property>
<name>hive.metastore.uris</name>
<value>thrift://<Your IP Address>:9083</value>
<property>
<name>hive.downloaded.resources.dir</name>
<value><Your drive Folder>/${hive.session.id}_resources</value>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/mydir</value>
Replace the value for <Your IP Address> with the IP Address of your System and
replace <Your drive Folder> with the Hive folder Path.
4.3.2 Removing Special Characters
This is a short step and we need to remove all the &#8 character present in the hive-
site.xml file.
4.3.3 Adding few More Properties
Now we need to add the following properties as it is in the hive-site.xml File.
<property>
<name>hive.querylog.location</name>
<value>$HIVE_HOME/iotmp</value>
<description>Location of Hive run time structured log file</description>
</property><property>
<name>hive.exec.local.scratchdir</name>
<value>$HIVE_HOME/iotmp</value>
<description>Local scratch space for Hive jobs</description>
</property><property>
<name>hive.downloaded.resources.dir</name>
<value>$HIVE_HOME/iotmp</value>
<description>Temporary local directory for added resources in the remote file
system.</description>
</property>
Great..!!! We are almost done with the Hive part, for configuring MySQL database as
Metastore for Hive, we need to follow below steps:-
4.4 Creating Hive User in MySQL
The next important step in configuring Hive is to create users for MySQL.
These Users are used for connecting Hive to MySQL Database for reading and writing
data from it.
Note:- You can skip this step if you have created the hive user while SQOOP installation.
• Firstly, we need to open the MySQL Workbench and open the workspace(default
or any specific, if you want). We will be using the default workspace only for
now.
Fig 12:- Open MySQL Workbench
• Now Open the Administration option in the Workspace and select Users and
privileges option under Management.

Fig 13:- Opening Users and Privileges


• Now select Add Account option and Create an new user with Login
Name as hive and Limit to Host Mapping as the localhost and Password of
your choice.

Fig 14:- Creating Hive User


• Now we have to define the roles for this user under Administrative Roles and
select DBManager ,DBDesigner and BackupAdmin Roles

Fig 15:- Assigning Roles


• Now we need to grant schema privileges for the user by using Add Entry option
and selecting the schemas we need access to.

Fig 16:- Schema Privileges


I am using schema matching pattern as %_bigdata% for all my bigdata related schemas.
You can use other 2 options also.
After clicking OK we need to select All the privileges for this schema.
Fig 17:- Select All privileges in the schema
• Click Apply and we are done with the creating Hive user.
4.5 Granting permission to Users
Once we have created the user hive the next step is to Grant All privileges to this user for
all the Tables in the previously selected Schema.
• Open the MySQL cmd Window. We can open it by using the Window’s Search
bar.
Fig 18:- MySQL cmd
• Upon opening it will ask for your root user password(created while setting up
MySQL).
• Now we need to run the below command in the cmd window.
grant all privileges on test_bigdata.* to 'hive'@'localhost';
where test_bigdata will be you schema name and hive@localhost will be the user name
@ Host name.
4.6 Creating Metastore
Now we need to create our own metastore for Hive in MySQL..
Firstly, we need to create a database for metastore in MySQL OR we can use the one
which used in previous step test_bigdata in my case.
Now Navigate to the below path
hive -> scripts -> metastore -> upgrade -> mysql and execute the file hive-schema-
3.1.0.mysql in MySQL in your database.
Note:- If you are using different Database, select the folder for same in upgrade folder
and execute the hive-schema file.
4.7 Adding Few More Properties(Metastore related Properties)
Finally, we need to open our hive-site.xml file once again and make some changes their,
these are related to Hive metastore that’s why did not add them in starting so as to
distinguish between the different set of properties.
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/<Your
Database>?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL
flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://localhost:9000/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value><Hive Password></value>
<description>password to use against metastore database</description>
</property>
<property>
<name>datanucleus.schema.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>True</value>
</property>
<property>
<name>datanucleus.schema.validateTables</name>
<value>true</value>
<description>validates existing schema against code. turn this on if you want to verify
existing schema</description>
</property>
Replace the value for <Hive Password> with the hive user password that we created in
MySQL user creation. And <Your Database> with the database that we used for
metastore in MySQL.
5. Starting Hive
5.1 Starting Hadoop
Now we need to start a new Command Prompt remember to run it as administrator to
avoid permission issues and execute below commands
start-all.cmd
Fig. 19:- start-all.cmd
All the 4 daemons should be UP and running.
5.2 Starting Hive Metastore
Open a cmd window, run below command to start the Hive metastore.
hive --service metastore

Fig 20:- Starting Hive Metastore


5.3 Starting Hive
Now open a new cmd window and run the below command to start Hive
hive
EXPERIMENT-5
AIM-5. Hive basic queries –
I) Write a query to count words with its frequency using hive.
Input-
This is my first hive tutorial, which is known as hello world program in big data , big
data technologies are now on demand.
Hive Query

Step 1. Create a table in hive


hive> create table feedback(comments string);

Step 2. Load data from the sample file


Syntax:

hive> load data local inpath '/home/ashwini/hadoop_data/comments.txt' into table


feedback

Step 3. Convert comments into an array


hive> select split(comments,' ') from feedback;

Step 4. Use table generation udf


hive> select explode( split(comments,' ')) from feedback;

The output of the above explode with split function is


This
is
my
first
hive
tutorial,
which
is
known
as
hello
world
program
in
big
data
,
big
data
technologies
are
now
on
demand.

Step 5. Final step


hive> select word,count(*) from (select explode( split(comments,' ')) as word from
, 1
This 1
are 1
as 1
big 2
data 2
demand. 1
first 1
hello 1
hive 1
in 1
is 2
known 1
my 1
now 1
on 1
program 1
technologies1
tutorial, 1
which 1
world 1
Output -

II) Create a managed table Student with columns roll, name, address, city, state
and load data into it.
Creating DataBase in Hive

CREATE DATABASE student_detail;

SHOW DATABASES;

USE student_detail;

Creating Table in Hive

CREATE TABLE IF NOT EXISTS student(roll_no INT,name STRING, address


STRING, city STRING, state STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Load Data into table

load data inpath '/hdoop/student.txt' overwrite into table student;


select * from student;

III) Create a managed table Result with columns roll, marks and load data into
it.

Creating DataBase in Hive

CREATE DATABASE student;

SHOW DATABASES;

USE student;
Creating Table in Hive

CREATE TABLE IF NOT EXISTS result(roll_no INT,marks FLOAT)


ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Load Data into table

load data inpath '/hadoop/Result.txt' overwrite into table result;


select * from result;

You might also like