0% found this document useful (0 votes)
36 views8 pages

BDA Lab2

This document discusses using Sqoop to load data from a relational database (RDBMS) into Hadoop and analyze the data using Pig and Hive. It provides an overview of Sqoop and its features for transferring data between Hadoop and external data stores like RDBMS. The document then outlines steps to use Sqoop to load a dataset from a MySQL database into HDFS and analyze it using Hive, including importing and querying tables and adding new rows.

Uploaded by

Mohit Gangwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

BDA Lab2

This document discusses using Sqoop to load data from a relational database (RDBMS) into Hadoop and analyze the data using Pig and Hive. It provides an overview of Sqoop and its features for transferring data between Hadoop and external data stores like RDBMS. The document then outlines steps to use Sqoop to load a dataset from a MySQL database into HDFS and analyze it using Hive, including importing and querying tables and adding new rows.

Uploaded by

Mohit Gangwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Name: Utsav Vijay

Name:Mohit Gavli
Gangwani
Div: D17A
Roll no-16
Roll No. 17
Div-D17A

Aim: Use sqoop to load data from RDBMS (weblog/transactions data) and analyze it using PIG and HIVE.

Theory:

1. The Hadoop ecosystem consists of various facets specific to different career specialties. One such discipline
centers around Sqoop, which is a tool in the Hadoop ecosystem used to load data from relational database
management systems (RDBMS) to Hadoop and export it back to the RDBMS. Simply put, Sqoop helps
professionals work with large amounts of data in Hadoop.
2. Sqoop is a tool used to transfer bulk data between Hadoop and external datastores, such as relational
databases (MS SQL Server, MySQL). However, it turned out that the process of loading data from several
heterogeneous sources was extremely challenging.

3. The problems administrators encountered included:


a. Maintaining data consistency
b. Ensuring efficient utilization of resources
c. Loading bulk data to Hadoop was not possible
d. Loading data using scripts was slow
e. The solution was Sqoop. Using Sqoop in Hadoop helped to overcome all the challenges of the traditional
approach and it could load bulk data from RDBMS to Hadoop with ease.
4. Sqoop Features:
a. Parallel Import/Export
Sqoop uses the YARN framework to import and export data. This provides fault tolerance on top of
parallelism.
b. Import Results of an SQL Query
Sqoop enables us to import the results returned from an SQL query into HDFS.
c. Connectors For All Major RDBMS Databases
Sqoop provides connectors for multiple RDBMSs, such as the MySQL and Microsoft SQL servers.
d. Kerberos Security Integration
Sqoop supports the Kerberos computer network authentication protocol, which enables nodes
communication over an insecure network to authenticate users securely.
e. Provides Full and Incremental Load
Sqoop can load the entire table or parts of the table with a single command.
4. Sqoop Architecture:
a. The client submits the import/ export command to import or export data.
b. Sqoop fetches data from different databases. Here, we have an enterprise data warehouse,
document-based systems, and a relational database. We have a connector for each of these; connectors
help to work with a range of accessible databases.

c. Multiple mappers perform map tasks to load the data on to HDFS.

d. Similarly, numerous map tasks will export the data from HDFS on to RDBMS using the Sqoop export
command.
5. Sqoop Processing:
Processing takes place step by step, as shown below:
a. Sqoop runs in the Hadoop cluster.
b. It imports data from the RDBMS or NoSQL database to HDFS.
c. It uses mappers to slice the incoming data into multiple formats and loads the data in HDFS.
d. Exports data back into the RDBMS while ensuring that the schema of the data in the database is
maintained.

Conclusion: The use of sqoop to load data from RDBMS has been done and the same has been analyzed using HIVE.
Name: Utsav Vijay Gavli BDA 23

1) Login to mysql

2) Creating Database tables

3) Creating tables

4) Describing table created in MySql


5) Loading dataset from device to sales table

6) Selecting rows from table


7) Using sqoop to list all the tables present in table in MySQL Database

8) Importing tables from RDBMS to HDFS using sqoop

9) Checking if tables are imported properly

10) Importing tables from HDFS to Hive


11) Checking if it has been created in Hive

12) Connecting to hive and show tables

13) Executing Queries in hive


14) Writing a query to add a new tuple in Hive HDFS

15) Checking if row is added


16) Exporting to MySQL

17) Checking if table added to MySQL table

You might also like