Big Data
Run Hive Job on EMR– Demo
Table of Contents
Steps to create EMR Cluster – Demo ............................................................................................. 2
Step 1: Select the S3 service. ................................................................ Error! Bookmark not defined.
Step 2: Click Create bucket.................................................................... Error! Bookmark not defined.
Step 3: Write the Bucket name. Click Create. ...................................... Error! Bookmark not defined.
Step 4: Click the bucket name. .............................................................. Error! Bookmark not defined.
Step 5: Click Create folder. .................................................................... Error! Bookmark not defined.
Step 6: Type the folder name. Click Save. ........................................... Error! Bookmark not defined.
Step 7: Select the EMR service. ............................................................ Error! Bookmark not defined.
Step 8: Click Create clusters. ................................................................. Error! Bookmark not defined.
Step 9: Type the Cluster name. Click the folder icon........................... Error! Bookmark not defined.
Step 10: Select the S3 bucket created earlier. Click Select. ............... Error! Bookmark not defined.
Step 11: Choose the latest version. ...................................................... Error! Bookmark not defined.
Step 12: Choose the instance type as “m4.large”. Choose the number of instances as per your
requirement. Enter the EC2 key pair. Click Create cluster. ................. Error! Bookmark not defined.
Step 13: Check the cluster status. ......................................................... Error! Bookmark not defined.
Step 14: Go to EC2 service. Three instances are created automatically. ....... Error! Bookmark not
defined.
Step 15: Click the master node Security group. ................................... Error! Bookmark not defined.
Step 16: Click the Inbound tab. Click Edit. ............................................ Error! Bookmark not defined.
Step 17: Click Add Rule button. ............................................................. Error! Bookmark not defined.
Step 18: Add “SSH” and make it anywhere. Click Save. ..................... Error! Bookmark not defined.
Step 19: SSH your instance. .................................................................. Error! Bookmark not defined.
1
Big Data
Steps to run Hive Job on EMR – Demo
Pre-requisite: -------- Commented [SKS[1]: Please add.
Step 1: Click the cluster you created earlier.
Step 2: Click Steps tab. Click “Add Step” button.
2
Big Data
Step 3: Select the step type “Hive program”. Give it a name. Enter the Script S3 location and
Input S3 location.
Script location: S3://us-east-1.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q
Input location: s3://us-east-1.elasticmapreduce.samples
Step 4: Simultaneously, Go to S3 service on a new tab and click the bucket you created earlier.
3
Big Data
Step 5: Click Create folder.
Step 6: Type the folder name. Click Save.
4
Big Data
Step 7: Go to EMR service tab again. Click the folder icon.
Step 8: Select the folder “outputs” from the bucket. Click Select.
5
Big Data
Step 9: Check the cluster status.
Step 10: Select the S3 bucket you created earlier. Select the outputs folder.
6
Big Data
Step 11: Choose the os_requests.
Step 12: Download it. Open it in a notepad.
7
Big Data
Step 13: Check the file.