Lab 1: Accessing Cloudera Distribution For Hadoop (Vmware & Cluster Environment)
Lab 1: Accessing Cloudera Distribution For Hadoop (Vmware & Cluster Environment)
Read on….
https://www.edureka.co/blog/cloudera-hadoop-tutorial/
Accessing Cloudera Methods:
Hadoop
A: Cloudera Quick Start VM (Single Machine)
Installation https://ugc.futurelearn.com/uploads/files/3c/c9/3cc92360-1155-4eee-8d59-
4c7c0e3d192c/Instructions_Installing_Cloudera.pdf
Note: You can follow this note for installing cloudera, except the link to download the
VM since the link is no longer available. Refer to the links below, to download the
respective VM.
Here are the external links that provide overview about cloudera VM:
https://www.coursera.org/learn/hadoop/lecture/oPPuR/exploring-the-cloudera-vm-
hands-on-part-1
https://www.edureka.co/blog/cloudera-hadoop-tutorial/
If you choose virtualbox as the virtualization tool, then download cloudera VM from here
- https://downloads.cloudera.com/demo_vm/virtualbox/cloudera-quickstart-vm-5.12.0-
0-virtualbox.zip
If you choose vmware as the virtualization tool, then download cloudera VM
from here - https://downloads.cloudera.com/demo_vm/vmware/cloudera-quickstart-vm-
5.12.0-0-vmware.zip
Desktop If you manage to run the VM in any of your selected virtualization tool, you should get
the following:
Note: Make sure you have set at least 2 processor and a minimum memory of 8GB in
vmware or virtualbox (depends on your selected virtualization tool)
Terminal You can open terminal by clicking the icon at the right-top:
1 NameNode/Manager Node
1 DataNode/Worker Node
Cloudera Home
Directory
Accessing Cloudera
Manager
Architecture of Server
Environment
In this architecture, there are six virtual machines that form a cluster
Five virtual machines are dedicated for cloudera hadoop
One server is dedicated for the edge node (as well as for the rapidminer server)
From the five virtual machines, one is meant for the master node, and the rest are meant for
data nodes
All the main services such as Cloudera Manager, HDFS, MariaDB, Hive Server, Hue Server,
Spark Server, YARN, Python are installed in the master node
The data nodes / worker nodes consist of HDFS datanode, YARN node manager and Python.
The edge node consists rapidminer server, radoop, jupyter notebook, and gateway to access
hdfs, hive, spark, sqoop and YARN.
Getting into UiTM Each student will be given access to a pc in big data lab
Network (Outside UiTM network);
You will need AnyDesk software to access the pc
Id and password will be given
Step 2: Type in the given password (Note: the cursor won’t move while you typing the
password)
If you have typed correctly, then, you will be prompted with the following:
Step 2: Enter the given username and password and click Sign In, you should get the
following:
If you have executed any mapreduce application earlier (Executing MapReduce Program), you
can view as follows:
Accessing HUE Pre-requisite: not needed
Step 2: Insert the given username and password, and you should get the following:
Step 3: You can view HDFS contents as follows: