0% found this document useful (0 votes)

7 views44 pages

BigData_Lab_Manual

The document is a lab manual for a Big Data Analytics course at Prathyusha Engineering College, detailing the installation and configuration of Hadoop on Ubuntu, as well as practical exercises including file management tasks and matrix multiplication using Hadoop MapReduce. It outlines step-by-step procedures for downloading and setting up Hadoop, performing file operations in HDFS, and implementing basic MapReduce programs. The manual serves as a comprehensive guide for students to understand and apply Hadoop functionalities in various data processing tasks.

Uploaded by

Jothi LakshmiU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views44 pages

BigData_Lab_Manual

Uploaded by

Jothi LakshmiU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

PRATHYUSHA ENGINEERING COLLEGE

DEPARTMENT OF ARTIFICIAL
INTELLIGENCE AND DATA SCIENCE

LAB MANUAL
for
CCS334 – BIGDATA ANALYTICS
(Regulation 2021, V Semester, Theory cum Practical
Course)

ACADEMIC YEAR: 2023 – 2024

(ODD Semester)
Exp.1 Downloading and installing Hadoop, Understanding different Hadoop modes,
Startup scripts, Configuration files.
AIM:
To Download and install Hadoop, Understanding different Hadoop modes, Startup
scripts, Configuration files.
Procedure:
Step 1 : Install Java Development Kit
The default Ubuntu repositories contain Java 8 and Java 11 both. But, Install Java 8 because
hive only works on this version. Use the following command to install it.

$sudo apt update&&sudo apt install openjdk-8-jdk

Step 2 : Verify the Java version

Once installed, verify the installed version of Java with the following command:
$ java -version
Output:

Step 3: Install SSH

SSH (Secure Shell) installation is vital for Hadoop as it enables secure communication
between nodes in the Hadoop cluster. This ensures data integrity, confidentiality, and allows
for efficient distributed processing of data across the cluster.
$sudo apt install ssh
Step 4 : Create the hadoop user :
All the Hadoop components will run as the user that you create for Apache Hadoop, and the
user will also be used for logging in to Hadoop‟s web interface.
Run the command to create user and set password:

$ sudo adduser hadoop

Output:
Step 5 : Switch user
Switch to the newly created hadoop user:
$ su - hadoop
Step 6 : Configure SSH
Now configure password-less SSH access for the newly created hadoop user, so didn‟t enter
the key to save file and passphrase. Generate an SSH keypair (generate Public and Private
Key Pairs)first

$ssh-keygen -t rsa

Step 7 : Set permissions :

Next, append the generated public keys from id_rsa.pub to authorized_keys and set proper
permission:
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 640 ~/.ssh/authorized_keys
Step 8 : SSH to the localhost
Next, verify the password less SSH authentication with the following command:
$ ssh localhost
You will be asked to authenticate hosts by adding RSA keys to known hosts. Type yes and hit
Enter to authenticate the localhost:

Step 9 : Switch user

Again switch to hadoop. So, First, change the user to hadoop with the following command:
$ su–hadoop
Step 10 : Install hadoop
Next, download the latest version of Hadoop using the wget command:
$ wgethttps://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
Once downloaded, extract the downloaded file:
$ tar -xvzf hadoop-3.3.6.tar.gz
Next, rename the extracted directory to hadoop:
$ mv hadoop-3.3.6 hadoop

Next, you will need to configure Hadoop and Java Environment Variables on your system.
Open the ~/.bashrc file in your favorite text editor. Use nano editior , to pasting the code we
use ctrl+shift+v for saving the file ctrl+x and ctrl+y ,then hit enter:

Next, you will need to configure Hadoop and Java Environment Variables on your system.
Open the ~/.bashrc file in your favorite text editor:
$ nano ~/.bashrc
Append the below lines to file.
Save and close the file. Then, activate the environment variables with the following
command:
s$ source ~/.bashrc
Next, open the Hadoop environment variable file:
$ nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Search for the “export JAVA_HOME” and configure it.
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Save and close the file when you are finished.

Step 11 : Configuring Hadoop :
First, you will need to create the namenode and datanode directories inside the Hadoop user
home directory. Run the following command to create both directories:
$ cd hadoop/
$mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}

 Next, edit the core-site.xml file and update with your system hostname:

$nano $HADOOP_HOME/etc/hadoop/core-site.xml
Change the following name as per your system hostname:

Save and close the file.

Then, edit the hdfs-site.xml file:

$nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
 Change the NameNode and DataNode directory paths as shown below:

 Then, edit the mapred-site.xml file:

$nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
 Make the following changes:

 Then, edit the yarn-site.xml file:

$nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
 Make the following changes:

Save the file and close it .

Step 12 – Start Hadoop Cluster

Before starting the Hadoop cluster. You will need to format the Namenode as a hadoop user.
Run the following command to format the Hadoop Namenode:
$hdfsnamenode –format
Once the namenode directory is successfully formatted with hdfs file system, you will see the
message “Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully
formatted “
Then start the Hadoop cluster with the following command.
$ start-all.sh
You can now check the status of all Hadoop services using the jps command:

$ jps

Step 13 – Access Hadoop Namenode and Resource Manager

 First we need to know our ipaddress,In Ubuntu we need to install net-tools to run ipconfig
command,
If you installing net-tools for the first time switch to default user:
$sudo apt install net-tools
 Then run ifconfig command to know our ip address:
ifconfig

Here my ip address is 192.168.1.6.

 To access the Namenode, open your web browser and visit the URL http://your-server-
ip:9870.
 You should see the following screen:
http://192.168.1.6:9870
To access Resource Manage, open your web browser and visit the URL http://your-server-
ip:8088. You should see the following screen:

http://192.168.16:8088

Step 14 – Verify the Hadoop Cluster

At this point, the Hadoop cluster is installed and configured. Next, we will create some
directories in the HDFS filesystem to test the Hadoop.

Let‟s create some directories in the HDFS filesystem using the following command:

$ hdfsdfs -mkdir /test1

$ hdfsdfs -mkdir /logs
Next, run the following command to list the above directory:

$ hdfs dfs -ls /

You should get the following output:

Also, put some files to hadoop file system. For the example, putting log files from host
machine to hadoop file system.

$ hdfs dfs -put /var/log/* /logs/

You can also verify the above files and directory in the Hadoop Namenode web interface.

Go to the web interface, click on the Utilities => Browse the file system. You should see your
directories which you have created earlier in the following screen:

Step 15 – Stop Hadoop Cluster

To stop the Hadoop all services, run the following command:

$ stop-all.sh
Result:
The step-by-step installation and configuration of Hadoop on Ubutu linux system have been
successfully completed.
Exp 2: Hadoop Implementation of file management tasks, such as Adding files
and directories, retrieving files and Deleting files

AIM:

To implement the file management tasks, such as Adding files and directories, retrieving files
and Deleting files.

DESCRIPTION: -

HDFS is a scalable distributed filesystem designed to scale to petabytes of data while running
on top of the underlying filesystem of the operating system. HDFS keeps track of where the
data resides in a network by associating the name of its rack (or network switch) with the
dataset. This allows Hadoop to efficiently schedule tasks to those nodes that contain data, or
which are nearest to it, optimizing bandwidth utilization. Hadoop provides a set of command
line utilities that work similarly to the Linux file commands and serve as your primary
interface with HDFS.

The most common file management tasks in Hadoop, which include:

● Adding files and directories to HDFS

● Retrieving files from HDFS to local filesystem
● Deleting files from HDFS

PROCEDURE:

SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM

HDFS

Step-1: Adding Files and Directories to HDFS Before you can run Hadoop programs on data
stored in HDFS, you„ll need to put the data into HDFS first. Let„s create a directory and put a
file in it. HDFS has a default working directory of /user/$USER, where $USER is your login
username. This directory isn„t automatically created for you, though, so let„s create it with
the mkdir command. Login with your hadoop user

Firstly, we start those Hadoop service by running this command on terminal :

start-all.sh

For the purpose of illustration, we use chuck. You should substitute your user name in
the example commands.

hadoop fs -mkdir /chuck

hadoop fs -put example.txt /chuck

Step-2 : Retrieving Files from HDFS The Hadoop command get copies files from HDFS
back to the local filesystem. To retrieve example.txt, we can run the following command.
hadoop fs –cat /chuck/example.txt

● Command for creating a directory in hdfs is “hdfs dfs -mkdir /lendicse”

● Adding files to the directory is done through the command
“hdfs dfs –mkdir sanjay_english.txt
Step-3:
Deleting Files from HDFS
hadoop fs -rm example.txt
The above command will delete the file named as “example.txt”
Step-4:
Copying Data from NFS to HDFS
First create set of glossaries as text file.
nano glossary
Put your glossary text in their Copying from directory command is
hdfs dfs -copyFromLocal /home/hadoop/gloassary /sanjay_english
View the file by using the command “hdfs dfs -cat /sanjay_english/glossary”

● Command for listing of items in Hadoop is hdfs dfs -ls hdfs://localhost:9000/

● Command for Deleting files is “hdfs dfs –rmdir /lendicse

EXPECTED OUTPUT:
Result:
Thus, the implementation of the file management tasks, such as Adding files and directories,
retrieving files and Deleting files in hadoop have been successfully completed.
EXP 3: Implement of Matrix Multiplication with Hadoop Map Reduce

AIM:
To implement the Matrix Multiplication with Hadoop Map Reduce

Procedure:

Step 1: Before writing the code let‟s first create matrices and put them in HDFS.
● Create two files M1, M2 and put the matrix values. (seperate columns with spaces and

rows with a line break)

Save the matrices using nano command

$nano m1
123

456

$ nano m2

9 10

11 12
● Put the above files to HDFS at location /user/path/to/matrices/
hdfsdfs -mkdir /user/path/to/matrices
hdfsdfs -put /path/to/M1 /user/path/to/matrices/
hdfsdfs -put /path/to/M2 /user/path/to/matrices/

Step 2: Mapper Function:

Create a mapper function to process the input chunks and generate intermediate key-
value pairs. In the context of matrix multiplication, the mapper processes submatrices A(i,k)
and B(k,j), generating pairs (j, (i,k,B(k,j))) and (i, (j,k,A(i,k))), with i, j, and k as indices.
#!/usr/bin/env python3
import sys
m_r = 2
m_c = 3
n_r = 3
n_c = 2
i=0
for line in sys.stdin:
el = list(map(int, line.split())
if i< m_r:
for j in range(len(el)):
for k in range(n_c):
print("{0}\t{1}\t{2}\t{3}".format(i, k, j, el[j]))
else:
for j in range(len(el)):
for k in range(m_r):
print("{0}\t{1}\t{2}\t{3}".format(k, j, i - m_r, el[j])
i=i+1
Step 3: Reducer Function:
Develop a reducer function to aggregate the intermediate key-value pairs into the final
output. For matrix multiplication, the reducer combines pairs (j, [(i,k,B(k,j)), ...]) and (i,
[(j,k,A(i,k)), ...]) to compute C(i,j) = sum(A(i,k) * B(k,j)).

#!/usr/bin/env python3
import sys
m_r = 2
m_c = 3
n_r = 3
n_c = 2
matrix = []
for row in range(m_r):
r = []
for col in range(n_c):
s=0
for el in range(m_c):
mul = 1
for num in range(2):
line = input() # Read input from standard input
n = list(map(int, line.split('\t')))[-1]
mul *= n
s += mul
r.append(s)
matrix.append(r)
# Print the matrix
for row in matrix:
print('\t'.join([str(x) for x in row]))
Step 4: Running the Map-Reduce Job on Hadoop
You can run the map reduce job and view the result by the following code (considering you
have already put input files in HDFS)
To download the Hadoop jar file, use the following command:
Wget https://jar-download.com/download-handling.php
$ chmod +x /path/to/Mapper.py
$ chmod +x /path/to/Reducer.py

$ hadoop jar /path/to/hadoop-streaming.jar

-input /user/path/to/matrices/
-output /user/path/to/mat_output
-mapper /path/to/Mapper.py
-reducer /path/to/Reducer.py
This will take some time as Hadoop do its mapping and reducing work. After the successful
completion of the above process view the output by:
hdfs dfs -cat /user/path/to/mat_output/*
Output:

Result:
Thus, the matrix multiplication using map reduce program has been successfully completed.
EXP 4: Run a basic Word Count Map Reduce program to understand Map
Reduce Paradigm.
AIM:
To run a basic Word Count MapReduce program.
Procedure:
Step 1: Create Data File:
Create a file named "word_count_data.txt" and populate it with text data that you wish to
analyse.
Login with your hadoop user.

nano word_count.txt
Output:

Step 2: Mapper Logic - mapper.py:

Create a file named "mapper.py" to implement the logic for the mapper. The mapper
will read input data from STDIN, split lines into words, and output each word with its count.

nano mapper.py
# Copy and paste the mapper.py code

#!/usr/bin/env python3
# import sys because we need to read and write data to STDIN and STDOUT
import sys
# reading entire line from STDIN (standard input)
for line insys.stdin:
# to remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()
# we are looping over the words array and printing the word
# with the count of 1 to the STDOUT
for word in words:
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
print('%s\t%s' % (word, 1))
Here in the above program #! is known as shebang and used for interpreting the script.
The file will be run using the command we are specifying.
Step 3: Reducer Logic - reducer.py:
Create a file named "reducer.py" to implement the logic for the reducer. The reducer
will aggregate the occurrences of each word and generate the final output.

nano reducer.py
# Copy and paste the reducer.py code
reducer.py

#!/usr/bin/env python3
from operator importitemgetter
import sys
current_word = None
current_count = 0
word = None
for line insys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count = int(count)
exceptValueError:
continue
ifcurrent_word == word:
current_count += count
else:
ifcurrent_word:
print('%s\t%s' % (current_word, current_count))
current_count = count
current_word = word
if current_word == word:
print('%s\t%s' % (current_word, current_count))
Step 4: Prepare Hadoop Environment:
Start the Hadoop daemons and create a directory in HDFS to store your data.
start-all.sh
hdfsdfs -mkdir /word_count_in_python
hdfsdfs -copyFromLocal /path/to/word_count.txt/word_count_in_python
Step 6: Make Python Files Executable:
Give executable permissions to your mapper.py and reducer.py files.

chmod 777 mapper.py reducer.py

Step 7: Run Word Count using Hadoop Streaming:

Download the latest hadoop-streaming jar file and place it in a location you can
easily access.
Then run the Word Count program using Hadoop Streaming.

hadoop jar /path/to/hadoop-streaming-3.3.6.jar \

-input /word_count_in_python/word_count_data.txt \
-output /word_count_in_python/new_output \
-mapper /path/to/mapper.py \
-reducer /path/to/reducer.py

Step 8: Check Output:

Check the output of the Word Count program in the specified HDFS output
directory.
hdfs dfs -cat /word_count_in_python/new_output/part-00000

Result:
Thus, the program for basic Word Count Map Reduce has been executed successfully.
Exp5: Installation of Hive on Ubuntu

Aim:
To Download and install Hive, Understanding Startup scripts, Configuration files.

Procedure:

Step 1: Download and extract it

Download the Apache hive and extract it use tar, the commands given below:
$wgethttps://downloads.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

$ tar –xvf apache-hive-3.1.2-bin.tar.gz

Step 2: Place different configuration properties in Apache Hive

In this step, we are going to do two things
o Placing Hive Home path in bashrc file
$nano .bashrc
And append the below lines in it

2. Exporting Hadoop path in Hive-config.sh (To communicate with the Hadoop eco
system we are defining Hadoop Home path in hive config field) Open the hive-
config.sh as shown in below
$cd apache-hive-3.1.2-bin/bin
$cp hive-env.sh.template hive-env.sh
$nano hive-env.sh
Append the below commands on it
export HADOOP_HOME=/home/Hadoop/Hadoop
export HIVE_CONF_DIR=/home/Hadoop/apache-hive-3.1.2/conf

Step 3: Install mysql

1. Install mysql in Ubuntu by running this command:
$sudo apt update
$sudo apt install mysql-server

2. Alter username and password for MySQLby running below commands:

$sudomysql
Pops command line interface for MySQLand run the below SQL queries to change username
and set password
mysql> SELECT user, host, plugin FROM mysql.user WHERE user = 'root';
mysql> ALTER USER 'root'@'localhost' IDENTIFIED WITH 'mysql_native_password' BY
'your_new_password';
mysql> FLUSH PRIVILEGES;

Step 4:Config hive-site.xml

Config the hive-site.xml by appending this xml code and change the username and password
according to your MySQL.
$cd apache-hive-3.1.2-bin/bin
$cp hive-default.xml.template hive-site.xml
$nano hive-site.xml
Append these lines into it
Replace root as your username of MySQL
Replaceyour_new_password as with your password of MySQL
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>your_new_password</value>
</property>

<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>

<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>

<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>

</configuration>

Step 5: Setup MySQL java connector:

First, you'll need to download the MySQL Connector/J, which is the JDBC driver for
MySQL. You can download it from the below link
https://drive.google.com/file/d/1QFhB7Kvcat7a4LzDRe6GcmZva1yAxKz-
/view?usp=drive_link
Copy the downloaded MySQL Connector/J JAR file to the Hive library directory. By default,
the Hive library directory is usually located at/path/to/apache-hive-3.1.2/lib/on Ubuntu. Use
the following command to copy the JAR file:
$sudo cp /path/to/mysql-connector-java-8.0.15.jar /path/to/apache-hive-3.1.2/lib/
Replace /path/to/ with the actual path to the JAR file.

Step 6:Initialize the Hive Metastore Schema:

Run the following command to initialize the Hive metastore schema:
$$HIVE_HOME/bin/schematool -initSchema -dbTypemysql

Step 7: Start hive:

You can test Hive by running the Hive shell: Copy code hive You should be able to run Hive
queries, and metadata will be stored in your MySQL database.
$hive

Result:
Thus, the Apache Hive installation is completed successfully on Ubuntu.
Exp 6: Design and test various schema models to optimize data storage and retrieval
Using Hive.

Aim:
To Design and test various schema models to optimize data storage and retrieval Using
Hbase.

Procedure:
Step 1: Start Hive
Open a terminal and start Hive by running:
$hive

Step 2: Create a Database

Create a new database in Hive:
hive>CREATE DATABASE financials;

Step 3: Use the Database:

Switch to the newly created database:
hive>use financials;

Step 4: Create a Table:

Create a simple table in your database:
hive>CREATE TABLE finance_table( id INT, name STRING );

Step 5: Load Sample Data:

You can insert sample data into the table:
hive>INSERT INTO finance_tableVALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');
Step 6: Query Your Data
Use SQL-like queries to retrieve data from your table:
hive>CREATE VIEW myview AS SELECT name, id FROM finance_table;

Step 7: View the data:

To see the data in the view, you would need to query the view
hive>SELECT*FROM myview;

Step 8: Describe a Table:

You can describe the structure of a table using the DESCRIBE command:
hive>DESCRIBE finance_table;

Step 9: Alter a Table:

You can alter the table structure by adding a new column:
hive>ALTER TABLE finance_table ADD COLUMNS (age INT);

Step 10: Quit Hive:

To exit the Hive CLI, simply type:
hive>quit;
Result:
Thus, the usage of various commands in Hive has been successfully completed.
Exp 7: Installation of Hbase on Ubuntu

Aim:
To download and install Hbase, Understanding different Hbase modes, Startup
scripts, Configuration files.

Procedure:

Step 1. Please verify if Hadoop is installed.

Step 2. Please verify if Java is installed.

Step 3. Please download HBase 2.5.5 from the below link.

$wget https://downloads.apache.org/hbase/2.3.0/hbase-2.5.5-bin.tar.gz

Step 4. Let us extract the tar file using the below command and rename the folder to HBase
to make it meaningful.
$tar -xzf hbase-2.5.5-bin.tar.gz
$mv hbase-2.5.5hbase
Step 5. Now edit (hbase-env.sh) configuration file which is present under conf in hbase and
add JAVA path as mentioned below.
/hbase/conf$ nano hbase-env.sh
Now put JAVA path.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Step 6. After this edit (.bashrc) file to update the environment variable of Apache HBase so
that it can be accessed from any directory.
nano .bashrc
Add below path.
export HBASE_HOME= /home/hadoop/hbase
export PATH=$PATH:$HBASE_HOME/bin
Step 7. Now config hbase-site.xml by append these xml code .
cd ..
mkdirhbasfile
mkdir zookeeper
chmod 777 hbasfile zookeeper
cd hbase/conf
nano hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>file:///home/hadoop/hbasfile</value>
</property>
<property>
<name>hbase.zookeeper.property.datadir</name>
<value>/home/Hadoop/zookeeper</value>
</property>
Step 8. Now start Apache HBase and verify it using the below commands.
cd hbase
bin/start-hbase.sh
jps

Step 9. After this we can see Hbase services are running from the JPS command, now let us
start the HBase shell using the below command.
bin/hbase shell

Result:
Thus, the Apache HBase installation is completed successfully on Ubuntu.
Exp 8: Design and test various schema models to optimize data storage and
retrieval Using Hbase.

Aim:
To design and test various schema models to optimize data storage and retrieval Using
Hbase.

Procedure:
Step-by-step schema design experiment with HBase using sample commands:

1. Start HBase:
- If HBase is not running, start it with the following command:

Start -hbase.sh
hbase shell

2. Create a Table:
Create a sample HBase table named "student" with a column family "info" using the
HBase shell:
hbase:002:0>create 'student', 'info'

3. Insert Data:
- Insert sample data into the "student" table. Let's assume you're storing student information
with student IDs as row keys:

hbase:003:0>put 'student', '1', 'info:name', 'Alice'

hbase:004:0>put 'student', '1', 'info:age', '20'

hbase:005:0>put 'student', '2', 'info:name', 'Bob'

hbase:006:0>put 'student', '2', 'info:age', '22'

4.Query Data:
- Retrieve student information using the row key:

hbase:008:0>get 'student', '1'

5. Add Another Column Family:

- You decide to add a new column family "grades" to store student grades.
hbase:00.9:0>alter 'student', {NAME => 'grades', VERSIONS => 1}
6. Insert Data into the New Column Family:
- Insert sample grade data into the "grades" column family:

put 'student', '1', 'grades:math', 'A'

put 'student', '2', 'grades:math', 'B'

7. Query Data from Multiple Column Families:

- Retrieve student information and grades for a student:

get 'student', '1', {COLUMN => ['info', 'grades']}

8. Row Key Design:

- Realize that using student IDs as row keys is not efficient for querying all students.
Redesign the row key to use a composite key that includes the year of enrollment:

put 'student', '2023_1', 'info:name', 'Carol'

put 'student', '2023_2', 'info:name', 'David'

9. Query Data with New Row Key:

- Retrieve student information with the updated row key:

hbase:00.10:0>get 'student', '2023_1'

10. Cleanup and Stop HBase:
- When you're done, you can drop the "student" table and stop HBase:

hbase:017:0>disable 'student'
hbase:018:0>drop 'student'

Result:
Thus, the execution of various Hbase commands has been successfully completed.
Exp 9: Practice importing and exporting data from various databases.

Aim:
To practice importing and exporting data from various databases.

Procedure:
SQOOP is basically used to transfer data from relational databases such as MySQL, Oracle
to data warehouses such as Hadoop HDFS (Hadoop File System). Thus, when data is
transferred from a relational database to HDFS, we say we are importing data. Otherwise,
when we transfer data from HDFS to relational databases, we say we are exporting data.

Note: To import or export, the order of columns in both MySQL and Hive should be the
same.

i)Installation of Sqoop:
Step 1: Download the stable version of Apache SqoopWebsite
URL https://archive.apache.org/dist/sqoop/1.4.7/
$wgethttps://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
Step 2: Unzip the downloaded file using the tar command
$tar -xvzf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
Step 3: Edit the .bashrc file by using the command
$nano .bashrc

$source .bashrc

Step 4: Set up libraries for sqoop:

Download commons lang from the given link and extract it

$wgethttps://dlcdn.apache.org/commons/lang/binaries/commons-lang-2.6-bin.tar.gz
$tar -xvfcommons-lang-2.6-bin.tar.gz
Copy the commons lang.jar file into sqoop-1.4.7/lib.
Next, Copy the mysql-java-connector from hive lib and paste it in sqoop-1.4.7/lib.

Step 5: Check the installed sqoop version using the below command
$sqoop version

ii)Importing data from MySQL to HDFS

In order to store data into HDFS, we make use of Apache Hive which provides an SQL-like
interface between the user and the Hadoop distributed file system (HDFS) which integrates
Hadoop. We perform the following steps:

Step 1: Login into MySQL

$mysql -u root -pyour_new_passwordSt

Step 2: Create a database and table and insert data.

mysql>create database importtohadoop;
mysql>use importtohadoop;
mysql>create table album_details(album_namevarchar(65), year int, artist varchar(65));
(“Rohan”,10,123456789,”Lucknow”);

Insert the values using this command

mysql> insert into album_detailsvalues

Step 3: Create a database and table in the hive where data should be imported.
hive>create database album_hive;

hive>use album_hive;
hive>create table album_details_hive(album_name varchar(65), year int, artist
varchar(65));
Step 4 : Run this command on terminal :
$sqoop export --connect "jdbc:mysql://127.0.0.1:3306/importtohadoop?useSSL=false" \
--username root --password your_new_password \
--table album_details\
--hive-import –hive-table album_hive.album_hive_table \
--m 1 \
Step 5: Check-in hive if data is imported successfully or not.

hive> use album_hive;

hive> select*from album_hive_table;
iii)To export data into MySQL from HDFS, perform the following steps:
Step 1: Create a database and table in the hive.
hive>create table hive_table_export(album_namevarchar(65),company, year int, artist
varchar(65));
Step 2: Insert data into the hive table.
hive>insert into hive_table_export values
hive>select*from hive_table_export;
Step 3: Create a database and table in MySQL in which data should be exported.

Step 4: Run the following command on Terminal.

$sqoop export –connect
"jdbc:mysql://127.0.0.1:3306/mysql_export?useSSL=false" \
--username root --password your_new_password \
--table mysql_table_export \
--export-dir /user/hive/warehouse/hive_export.db/hive_table_export \
--input-fields-terminated-by ',' \
--m 1 \
--driver com.mysql.cj.jdbc.Driver

Step 5: Check-in MySQL if data is exported successfully or not.

mysql> select*from mysql_table_export;

Result:
Importing and exporting data between MySQL and Hive implemented successfully.

Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Bda Manual
No ratings yet
Bda Manual
80 pages
BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
CCS334 BDA LAB MANUAL
No ratings yet
CCS334 BDA LAB MANUAL
48 pages
HADOOP RECORD 2024-FINAL
No ratings yet
HADOOP RECORD 2024-FINAL
59 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
bda lab record
No ratings yet
bda lab record
60 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
BDA Lab manual
No ratings yet
BDA Lab manual
49 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Big_data_Lab_Manual[1] (4)
No ratings yet
Big_data_Lab_Manual[1] (4)
32 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
lab manual
No ratings yet
lab manual
34 pages
BD Lab File
No ratings yet
BD Lab File
39 pages
BDA lab manual UPDATED
No ratings yet
BDA lab manual UPDATED
45 pages
BDA Record (1)
No ratings yet
BDA Record (1)
34 pages
BIG data file
No ratings yet
BIG data file
28 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
big data
No ratings yet
big data
28 pages
Exp-1-1
No ratings yet
Exp-1-1
24 pages
bi lab file
No ratings yet
bi lab file
19 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
BDAO
No ratings yet
BDAO
23 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
EXP 1-2
No ratings yet
EXP 1-2
9 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Big Data File
No ratings yet
Big Data File
16 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Bdafile
No ratings yet
Bdafile
9 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
hbase_installationn
No ratings yet
hbase_installationn
12 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
L3H CAS - Preso
No ratings yet
L3H CAS - Preso
140 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
Designed for Success 310 HVAC Design Report and Rater Review_2
No ratings yet
Designed for Success 310 HVAC Design Report and Rater Review_2
67 pages
Be-One OEM's Manual
No ratings yet
Be-One OEM's Manual
26 pages
vr642d Operating Manual
No ratings yet
vr642d Operating Manual
78 pages
03 Ch10 Managing Working Capital
No ratings yet
03 Ch10 Managing Working Capital
41 pages
Listagem E Código Dos Times: Lista ID Do Time Nome Do Time
No ratings yet
Listagem E Código Dos Times: Lista ID Do Time Nome Do Time
107 pages
Integral Calculus
No ratings yet
Integral Calculus
24 pages
Influence of Price and Brand Image On Restaurant Customers Restaurant Selection Attribute
No ratings yet
Influence of Price and Brand Image On Restaurant Customers Restaurant Selection Attribute
19 pages
Instruction of registration of Firms along with Form No-1
No ratings yet
Instruction of registration of Firms along with Form No-1
3 pages
Q1. Explain The Meaning and Application of Various HR Related Terminologies
No ratings yet
Q1. Explain The Meaning and Application of Various HR Related Terminologies
18 pages
Homework Chart Printable
100% (1)
Homework Chart Printable
7 pages
Wind Tunnel Technique Notes For Aeronautical Engineers TP 1 PDF
50% (2)
Wind Tunnel Technique Notes For Aeronautical Engineers TP 1 PDF
2 pages
Law Cases
No ratings yet
Law Cases
47 pages
Experiment No:9 Aim: Theory
0% (1)
Experiment No:9 Aim: Theory
4 pages
Term Paper About School Administration and Supervision
0% (1)
Term Paper About School Administration and Supervision
4 pages
Proportional Valve Group: Technical Information
No ratings yet
Proportional Valve Group: Technical Information
76 pages
Script 1
No ratings yet
Script 1
19 pages
PROFED 12 Module 1 Lesson 1 Answersheet
No ratings yet
PROFED 12 Module 1 Lesson 1 Answersheet
7 pages
WP QlikView and Big Data en
No ratings yet
WP QlikView and Big Data en
13 pages
Natwest - Applied List
No ratings yet
Natwest - Applied List
22 pages
Data FWK
No ratings yet
Data FWK
74 pages
Doubly Linked List
No ratings yet
Doubly Linked List
3 pages
Agroecology DrBrunoDorinLecture
No ratings yet
Agroecology DrBrunoDorinLecture
3 pages
Goodman Resume
No ratings yet
Goodman Resume
1 page
Asian Paints:: The Customer Experience Gets A New Look With Sap® CRM
No ratings yet
Asian Paints:: The Customer Experience Gets A New Look With Sap® CRM
15 pages
Doosan Gorlift Pro5 Diesel
No ratings yet
Doosan Gorlift Pro5 Diesel
8 pages
KSF Core Dimension - Communication
No ratings yet
KSF Core Dimension - Communication
10 pages
Project: Conti'S Bakeshop & Restaurant - SM Bacoor: LOCATION: Unit 134, Upper Ground Floor, SM City Bacoor, Cavite
No ratings yet
Project: Conti'S Bakeshop & Restaurant - SM Bacoor: LOCATION: Unit 134, Upper Ground Floor, SM City Bacoor, Cavite
1 page
Fabm1 Module 1 Week 2
No ratings yet
Fabm1 Module 1 Week 2
5 pages
Compensation Course Sharing 401
No ratings yet
Compensation Course Sharing 401
57 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet