BDA Lab2
BDA Lab2
Name:Mohit Gavli
Gangwani
Div: D17A
Roll no-16
Roll No. 17
Div-D17A
Aim: Use sqoop to load data from RDBMS (weblog/transactions data) and analyze it using PIG and HIVE.
Theory:
1. The Hadoop ecosystem consists of various facets specific to different career specialties. One such discipline
centers around Sqoop, which is a tool in the Hadoop ecosystem used to load data from relational database
management systems (RDBMS) to Hadoop and export it back to the RDBMS. Simply put, Sqoop helps
professionals work with large amounts of data in Hadoop.
2. Sqoop is a tool used to transfer bulk data between Hadoop and external datastores, such as relational
databases (MS SQL Server, MySQL). However, it turned out that the process of loading data from several
heterogeneous sources was extremely challenging.
d. Similarly, numerous map tasks will export the data from HDFS on to RDBMS using the Sqoop export
command.
5. Sqoop Processing:
Processing takes place step by step, as shown below:
a. Sqoop runs in the Hadoop cluster.
b. It imports data from the RDBMS or NoSQL database to HDFS.
c. It uses mappers to slice the incoming data into multiple formats and loads the data in HDFS.
d. Exports data back into the RDBMS while ensuring that the schema of the data in the database is
maintained.
Conclusion: The use of sqoop to load data from RDBMS has been done and the same has been analyzed using HIVE.
Name: Utsav Vijay Gavli BDA 23
1) Login to mysql
3) Creating tables