0% found this document useful (0 votes)
96 views2 pages

Downloading Pig Datasets

This document provides instructions for downloading various datasets that will be used in a Pig module. It lists links to 9 datasets hosted on S3 and provides instructions for downloading them on Windows, Linux/Mac systems, or an AWS EC2 instance. The instructions specify using the browser on Windows or the wget command on Linux/Mac and EC2 after ensuring wget is installed using yum.

Uploaded by

Ram Guggul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views2 pages

Downloading Pig Datasets

This document provides instructions for downloading various datasets that will be used in a Pig module. It lists links to 9 datasets hosted on S3 and provides instructions for downloading them on Windows, Linux/Mac systems, or an AWS EC2 instance. The instructions specify using the browser on Windows or the wget command on Linux/Mac and EC2 after ensuring wget is installed using yum.

Uploaded by

Ram Guggul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

DOWNLOADING PIG DATASETS

The following links contain the datasets used throughout the module. The instructions to
download the same on your windows machine, Linux/Mac machine and the AWS EC2 instance
have been provided below.

● https://s3.amazonaws.com/pig-dataset/count-words.pig

● https://s3.amazonaws.com/pig-dataset/data-bag.txt

● https://s3.amazonaws.com/pig-dataset/dropbox-policy.txt

● https://s3.amazonaws.com/pig-dataset/u.data

● https://s3.amazonaws.com/pig-dataset/u.item

● https://s3.amazonaws.com/pig-dataset/products.csv

● https://s3.amazonaws.com/pig-dataset/discountCodes.props

● https://s3.amazonaws.com/pig-dataset/sales_code.csv

● https://s3.amazonaws.com/pig-dataset/sales.csv

For Windows users:

In case you want to download the datasets on your windows machine, copy the above links
directly in the browser after which the file will directly start downloading.

For Linux/Mac users:


Use ​wget​ command on the terminal to download the files. (Note that ​wget​ package must be
installed in your machine. Use ​yum -y install wget ​to install the​ ​wget package)
For example,

wget https://s3.amazonaws.com/pig-dataset/data-bag.txt
For AWS EC2 instance:
Use wget command on EC2 terminal to download the files. (Note that wget package must be
installed in your machine. Use

yum -y ​install​ wget

to install​ ​wget package)

For example,

wget https://s3.amazonaws.com/pig-dataset/data-bag.txt

You might also like