CC Record
CC Record
Para-Virtualization
Para-Virtualization is a computer hardware virtualization technique that allows virtual
machines (VMs) to have an interface similar to that of the underlying or host hardware. This
technique aims to improve the VM’s performance by modifying the guest operating system
(OS).
Para-Virtualization (PV) is an enhancement of virtualization technology in which a guest
operating system (guest OS) is modified prior to installation inside a virtual machine (VM) in
order to allow all guest OS within the system to share resources and successfully collaborate,
rather than attempt to emulate an entire hardware environment.
With para-Virtualization, virtual machines can be accessed through interfaces that are similar
to the underlying hardware. This capacity minimizes overhead and optimizes system
performance by supporting the use of VMs that would otherwise be underutilized in
conventional or full hardware virtualization.
STEP 2: The above step opens up a new page which gives the list of links to Virtual Box
binaries and its source code. Click on the option “Window hosts” which is listed below the
heading ‘VirtualBox 5.2.8 platform packages’. And the VirtualBox EXE file will be
downloaded.
STEP 3: Once the VirtualBox EXE file is downloaded double-click on the EXE file to open
up the VirtualBox installation window.
➢ Now, click on the option ’Yes’ to proceed with the installation process.
➢ Once the installation is completed this prompt opens up. Check on ‘Start Oracle VM
VirtualBox 6.1.12after installation’ and then Click on the ‘Finish’ option. This will
now open the Virtual Box that is successfully installed, which enables you to create a
virtual machine to run any operating system on your PC.
STEP 2: ‘Create Virtual Machine’ dialog opens up. Choose a descriptive name for the new
virtual machine, and also the type and version of the virtual machine to be installed.
STEP 3: Now, select the amount of memory to be allocated to the virtual machine. Here the
recommended memory size of 1024 MB is selected. And click on ‘Next’.
STEP 4: To create a new virtual hard disk to a new machine. Click on the ‘Create a virtual
hard disk now’ option and click on ‘Create’ button.
STEP 5: Choose the hard disk file type as VDI (VirtualBox Disk Image) and click on’Next’.
STEP 6: Choose the Storage type on the physical hard disk as the ‘Dynamically allocated’
and click on ‘Next’.
STEP 7: Now select the file location and select the size of the virtual hard disk. Then, click
on the ‘Create’.
STEP 8: Customize Virtual Machine – You have created a new virtual machine. Now you
have to customize the virtual machine created. Click on the ‘Start’ option on the menu bar, a
drop-down box appears, select ‘Normal Start’.
STEP 9: Now select a start-up disk from your PC and click on ‘Start’.
STEP 11: Click on the ‘Install Ubuntu’ option to start the installation process.
STEP 12: Now select the language to be used for the installation process as English and
Enter.
b. Select ‘English (US)’ as the country of origin of the keyboard and the layout of
the keyboard.
10 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
STEP 14: Your installation process of loading the additional components starts.
STEP 16: Now set up the users and the passwords in the following steps:
a. Enter the full name of the user.
11 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
STEP 18: Choose the ‘Guided – use entire disk and set up LVM’ option to partition the
disks.
12 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
STEP 20: Click on ‘Yes’ to write the changes to disks and configure LVM.
STEP 21: Give the amount of volume group to use for guided partitioning as 10.7 GB.
13 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
STEP 24: To configure the package manager leave the HTTP proxy information as blank to
indicate none and click on ‘Continue’.
STEP 26: Once the installation process has been completed successfully. The window pops
up for the user to login on to the system
14 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
15 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
16 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
c. Use the cat command to store the public key as authorized_keys in the ssh
directory. Set the permissions for your user with the chmod command:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
The new user is now able to SSH without needing to enter a password every time.
Verify everything is set up correctly by using the hdoop user to SSH to localhost:
ssh localhost
Once the download is complete, extract the files to initiate the Hadoop installation:
tar xzf hadoop-3.2.1.tar.gz
The Hadoop binary files are now located within the hadoop-3.2.1 directory.
17 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Define the Hadoop environment variables by adding the following content to the end of the
file:
Apply the changes to the current running environment by using the following command:
source ~/.bashrc
B. Edit hadoop-env.sh File
sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
18 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Uncomment the $JAVA_HOME variable (i.e., remove the # sign) and add the full path to the
OpenJDK installation on your system and then, add the following line:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
The path needs to match the location of the Java installation on your system.
Add the following configuration to override the default values for the temporary
directory and add your HDFS URL to replace the default local file system setting.
19 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
20 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
21 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Once the namenode, datanodes, and secondary namenode are up and running, start the YARN
resource and nodemanagers by typing:
./start-yarn.sh
Type this simple command to check if all the daemons are active and running as Java
processes:
Jps
If everything is working as intended, the resulting list of running Java processes contains all
the HDFS and YARN daemons.
22 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
The default port 9864 is used to access individual DataNodes directly from your browser:
http://localhost:9864
23 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
24 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
5. Delete the MSc_Practicals folder from the desktop and move it from HDFS to
desktop
25 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
7. Create a text file titled file in desktop. Move the text file to the hadoop file system
and open the text file
26 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
WordCount.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
27 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}}}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}}
28 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
29 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
9. Compile java code – Create a tutorial file based on the Class folder
10. Go to WordCount Directory using cd and put the output files in one jar file
30 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
31 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
32 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Localhost:8088
Localhost:9870
33 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
34 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Localhost:9864
35 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
36 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
37 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
8. Make a maxmin directory and another folder input inside maxmin folder in
HDFS
38 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
10. Compile java code – Create a file based on the Class folder
11. Go to maxmin directory usind cg and put the output files in one jar file
39 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
40 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
• To serve the input to the python program, we use the cat command
cat input_file | mapfunction.py
• To run the entire MapReduce in one line, we can make use of:
cat input_file | mapfunction.py | reducefunction.p
• To make sure that the python script can successfully be executed, we must give or change
the permission rights to executable
• Make sure the file exist in HDFS by making use of the ls command.
• HADOOP_STREAMING_PATH - $HADOOP_HOME/hadoop/tools/lib/Hadoop-
streaming 3.2.1.jar
This notifies the hadoop jar command of the presence of Hadoop streaming so that we can use
this instead of the javac compiler. Make sure that the folder destination exist in the correct
folder.
41 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
42 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
43 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
44 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
45 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Create input directory in python folder and move the sample.txt to Hadoop file system
46 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
47 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
48 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
49 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
50 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
51 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
52 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
53 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Create a mapper file in the cricket folder for finding the batting average
Create another mapper file in the cricket folder for finding the wickets average
Create a reducer file in the cricket folder for finding the batting average
54 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Create another reducer file in the cricket folder for finding the wickets average
View the output of the mapper files in sorted order using the cat command
55 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
Run the reducer files on their respective mapper files and check if the python code is
working properly
56 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
57 | I I I M S c C o m p u t e r S c i e n c e
Cloud Computing Record
58 | I I I M S c C o m p u t e r S c i e n c e