0% found this document useful (0 votes)
36 views15 pages

Cloud Computing Lab4 Kittu

Uploaded by

swathi sp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views15 pages

Cloud Computing Lab4 Kittu

Uploaded by

swathi sp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

CLOUD COMPUTING LAB EXERCISE -4

AIM: AWS EMR Big Data Processing with Spark and Hadoop

Procedure:

Step1: Login to AWS console and select EMR by searching in search bar.

Step2: Click on create cluster and provide the required details. Change the subnet(
availability zones) if the cluster with the default zone is failed while creating cluster.
Step3: Create a new keypair by clicking on create new keypair option at security
configuration and ec2 pair section. Give key pair name and private key file format is .pem

and then click on create key pair, a .pem private key file with keypair name will be
downloaded to your system. Select the downloaded file as keypair. Later then convert the file
into .ppk using puttygen to connect to compute.
Step4: In IAM roles section, select the default roles. And Let the remaining default then click
on Create cluster. Our cluster will be created and status will be showing starting, wait until
it turns to waiting.
Step5: In meantime we will create a S3 bucket by searching in search bar and open it in a
new tab. Click on create bucket and provide the required details and then click on create
bucket.
Step6: open bucket, click on create folder and give a name to folder and click on create. After
open the folder and load the data source file by clicking on upload and load the file and click
upload.
Step7: Now come back to our cluster, status also changed to waiting. Now create a SSH
inbound rule for the security group, for that goto properties then expand the ec2 security
group and select primary node.
Step8: click on edit inbound rule and add rule – type as ssh, source as myip then click save
rules.
Step9: come back to cluster , click on connect to primary node using ssh. It will show the
instructions to connect to compute using putty. Open putty and give host name and load the
private key file and then click on open.
Step10: now it’s connected to compute, run the pem file with hostname like: ssh -i ~
<pemfilename> <hostname> then click enter , give yes. And then open vi editor by
running the command ‘ vi filename.py ‘. Now click ‘i’ to write the code, after writing the
code, then click esc and enter :wq then click enter button to return back to the compute.
Step11: Now run the code by running command : spark-submit filename.py. It process
the source file and generate output in out bucket.
Step12: Now let’s checkout the result in s3 bucket. Click on the output folder, select csv file -
> download it and open. Here is our required result.
Step13: Clean up your Amazon EMR resources by Terminating your cluster :

1. Choose Clusters, and then choose the cluster you want to terminate.

2. Under the Actions dropdown menu, choose Terminate cluster.

Deleting S3 resources: To avoid additional charges, you should delete your Amazon S3
bucket. Deleting the bucket removes all of the Amazon S3 resources.
==================================================================================

Thank you

You might also like