Banking Problem Database
Banking Problem Database
it/cdbm5
You are provided with two datasets(Structured & Semi-structured) containing information about bank deposit details.
INPUT FILES
~/
Desktop/Project/wingst2-banking-challenge/
” folder.
Structured Data :-
Chase_Bank.csv
sqoop_bank
directory.
b. PIG output should to be stored in hdfs:/user/labuser/
bank1
directory.
Output files of your assessment (***.txt) should be present in local challenge folder ( /home/labuser/Desktop/Project/wings-xx-challenge).
Login to MYSQL:
Username: root
Password: labuserbdh
Create new Database and tables using MySQL commands to load structured data.
DB Name:- bank_db
Table:- bank .
Create Table and Load script is given below for you reference
create table bank (Id int , Institution_Name varchar(2000), Branch_Name varchar(2000), Branch_Number int, City varchar(2000),
County varchar(2000), State varchar(2000), Zipcode int, 2010_Deposits int,
2011_Deposits int, 2012_Deposits int, 2013_Deposits int, 2014_Deposits int,
2015_Deposits int, 2016_Deposits int);
Load
Chase_Bank.csv
bank
load data local infile '/home/labuser/Desktop/Project/wingst2-banking-challenge/Chase_Bank.csv' into table bank fields terminated by ',' lines terminated by '\n' ignore 1 rows (Id,
Institution_Name, Branch_Name, Branch_Number, City, County, State, Zipcode, 2010_Deposits, 2011_Deposits, 2012_Deposits, 2013_Deposits, 2014_Deposits, 2015_Deposits,
2016_Deposits);
bank
to HDFS (
/user/labuser/sqoop_bank
Columns to be imported :
2016_Deposits
NOT IN
"
Rochester", "Austin", "Chicago", "Indianapolis"
sqoop_output.txt
file :
Note:- Make sure that your output files are available in challenge folder.
You will be loading and analysing Sqoop output data to Hive using HQL in further steps.
Chase_Bank_1.json
Note:- You can either load this data directly from challenge input folder , or use required commands to copy to hdfs and then to P
find Minimum no of deposits in 2016 (ie, MIN(Deposits_2016)) for each county . Assign
minimum_dep”
minimum_dep”
{"group":"Gillespie","minimum_dep":212776}{"group":"Imperial","minimum_dep":148284}
pig_output.txt.
Database
: hive_db
Partition Table
: bank_part.
Columns
Id, City, County, Zipcode, 2010_Deposits, 2011_Deposits, 2012_Deposits, 2013_Deposits, 2014_Deposits, 2015_Deposits, 2016_Depos
Partition
State
Read records which satisfies the below condition & Load to bank_part table
City in Bronx, NewYorkCity, Dallas, Houston, Columbus
State in "NY","OH","TX"
Hint:- Create a temporary table to load Sqoop output and then load data to partitioned table with necessary filters.
Use the following command and execute hive query to remove the WARNING messages from the HIVE output
export HIVE_SKIP_SPARK_ASSEMBLY=true
Write a HQL query to fetch the records which satisfies below criteria. 2014_Deposits is greater than 50000, 2015_Deposi
ts is greater than 60000, 2016_Deposits is greater than 70000, City in NewYorkCity, Dallas, Houston.
hive_output.txt.
Note:- Given below the sample format to copy output to file from terminal.
bank_part
;" >output.txt
VALIDATION :
Before closing the environment, ensure that all the output files are available in local directory
Desktop/Project/wingst2-banking-challenge/”
sqoop_output.txt
hive_output.txt
pig_output.txt
Click on
SUBMIT