Answers
Answers
executing all 8 questions. I'll guide you through every step, explaining where you need to be in terms
of directories and the exact commands.
First, let's get your Hadoop environment running in Docker (since we’re doing this on macOS).
You can use the official Hadoop Docker image (or a pre-configured one for ease). If you’re new to
Hadoop setup, I recommend using a pre-configured image that will save you time.
Now, let's run the container, which will automatically set up a small Hadoop cluster with HDFS.
Once the container starts, you should be inside the container’s shell.
Let’s break down each task one by one now that your Docker container is up and running.
Sub-steps:
1. Create directories in HDFS: Use the following command to create a directory in HDFS:
3. Upload files from local to HDFS: To upload a file from your local system to HDFS:
5. Display the file: To display the content of the file you uploaded:
Sub-steps:
If Hive is not already installed in your Docker container, follow the installation steps:
# Download Hive
wget http://apache.mirror.digitalpacific.com.au/hive/stable/hive-2.3.7-bin.tar.gz
mv hive-2.3.7-bin /opt/hive
3. export HADOOP_HOME=/opt/hadoop
4. export HIVE_HOME=/opt/hive
5. export PATH=$HIVE_HOME/bin:$PATH
hive
Sub-steps:
This is more about the process than just the code. Follow these steps:
1. Format HDFS:
start-dfs.sh
start-yarn.sh
5. View Results:
Same as in Question 1:
3. Verify Output:
4. hdfs dfs -cat /output/part-r-00000
1. Install HBase:
wget https://downloads.apache.org/hbase/2.4.9/hbase-2.4.9-bin.tar.gz
mv hbase-2.4.9 /opt/hbase
2. Configure HBase:
4. hbase shell
8. scan 'employee'
Pig is a high-level platform for Hadoop. Here's a simple Pig script to find the maximum temperature
per year:
5. DUMP max_temp;
Question 8: Pig Latin Script for Filtering, Grouping, and Revenue Calculation
6. DUMP revenue;
8. pig revenue_analysis.pig
Directory Information:
• You don’t need to change directories for each of these commands in HDFS. You’ll mostly be
using absolute paths or relative paths under /user/<username>/.
• Just ensure you're in the right directory where your input files are located and where you
want to store outputs.
Final Thoughts
This should give you step-by-step guidance for the 8 questions, starting from Docker setup to
Hadoop commands. You can execute all these tasks in the Docker container environment.