0% found this document useful (0 votes)
109 views39 pages

Training Report

Training report

Uploaded by

Vidhi Purohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views39 pages

Training Report

Training report

Uploaded by

Vidhi Purohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

A REPORT OF SIX WEEKS INDUSTRIAL TRAINING

at

NATIONAL INSTITUTE OF ELECTRONICS & INFORMATION TECHNOLOGY


(NIELIT), DELHI.
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD

OF THE DEGREE OF

BACHELOR OF ENGINEERING

(Computer Science & Engineering)

JUNE-JULY, 2019

SUBMITTED BY:

NAME: JASVEEN KAUR

UNIVERSITY UID: 17BCS1074

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CHANDIGARH UNIVERSITY GHARUAN, MOHALI


CONTENTS

Topic Page No.

Certificate by Company/Industry/Institute i

Candidate’s Declaration ii

Abstract iii

Acknowledgement iv

About the Company/ Industry / Institute v

List of Figures vi

List of Tables vii

Definitions, Acronyms and Abbreviations viii

CHAPTER 1 INTRODUCTION 1-19

1.1 Background of the topic of the training 1

1.2 Theoretical explanation about the same 4

1.3 SW/HW tools learned 7

1.4 Operating system used 14

CHAPTER 2 TRAINING WORK UNDERTAKEN 20-40

2.1 Sequential learning steps 20

2.2 Methodology followed 30

2.3 Project 40

CHAPTER 3 RESULTS AND DISCUSSION 40-50

3.1 Result __

3.2 Discussion __
3.3 Screenshots

CHAPTER 4 CONCLUSION AND FUTURE SCOPE 50-60

4.1 Conclusion 51

4.2 Future Scope 52

REFERENCES __

APPENDIX (Program or any additional information regarding training) __

(Note: Page No’s for different topics in report may vary according to the contents.

Headings within the chapters should be numbered as 1.1, 1.2, 1.3 and so on for chapter 1. Similarly,

as 2.1,2.2, 2.3 and so on for chapter 2. The corresponding subheadings as 1.1.1, 1.1.2, 1.1.3 and so

on.)
CERTIFICATE
CHANDIGARH UNIVERSITY, GHARUAN, MOHALI

CANDIDATE'S DECLARATION

I, Jasveen Kaur, hereby declare that I have undertaken six weeks industrial training at National

Institute of Electronics & Information Technology (NIELIT), Delhi during a period from 13th June

2019 to 24rd July, 2019 in partial fulfillment of requirements for the award of degree of B.E

(COMPUTER SCIENCE & ENGINEERING) at CHANDIGARH UNIVERSITY GHARUAN,

MOHALI. The work which is being presented in the training report submitted to Department of

Computer Science & Engineering at CHANDIGARH UNIVERSITY GHARUAN, MOHALI is an

authentic record of training work.

Signature of the Student

The six weeks industrial training Viva–Voce Examination of__________________ has been held on
____________ and accepted.

Signature of Internal Examiner Signature of External Examiner


ABSTRACT

This is a report that contains details about what is Big Data, advantages and disadvantages of Big Data, some things that

you can accomplish with Big Data, Utilization of Big Data and a conclusion. The Utilization of Big Data part consists

of significant information about where does the data comes from, what they can do with data and how does this benefit

them.

The conclusion part consists of information about with big data what would be the future like, what are people going to

be doing when everything makes data and finally what do I want to do with big data.
ACKNOWLEDGMENT

On the submission of our project report on “Big Data Analysis with Hadoop”, I would like to

extend my gratitude and sincere thanks to my teacher Prof. Jyoti of National Institute of

Electronics & Information Technology (NIELIT), Delhi for her constant motivation and support

during the course of our work. This project required lot of work, patience and dedication.

Still, implementation would not have been possible, if I did not have a support of teacher. I

truly appreciate and value her esteemed guidance and encouragement from the beginning to

the end of this thesis. I am indebted to her for having helped me shape the problem and

providing insights towards the solution.

Above all, I would like to thank all my friends whose direct and indirect support helped us

complete our project in time. The project would have been impossible without their perpetual

moral support.

-Jasveen Kaur
About the Institute

NIELIT, New Delhi was setup in March 2000. It is a professionally managed Centre with a proven

track record. It is an IT corporate with clear-cut strategies and its various operations are aimed at

giving its customers a total package of IT solutions and products. It has proven its capability of

providing quality Computer Education and handling large projects of Govt. Organizations in

different sectors. The Centre was initially worked as Branch Office of NIELIT, Chandigarh Centre.

It has become an independent Centre of NIELIT since 1st November, 2012 after jurisdiction of

Centres. The centre has accomplished many feats in executing various turnkey IT Projects which

involved the computerization of many Hospitals and various Government offices of Delhi Govt.,

PSU and Autonomous bodies of Govt. of India. Preparation of IT Plan has also been undertaken for

many Offices of the Delhi Government.

Training, Computerization, IT Planning, Website Development and Web Application Development

have been their major thrust areas where they have excelled as a Centre and have braced a name for

itself. The Centre imparts training on DOEACC O/A/B Level courses. In addition, it also offers

various short and long terms computer courses for all categories of students and professionals.
List of Figures
List of Tables
Definitions, Acronyms and Abbreviations
CHAPTER 1 INTRODUCTION

1.1 Background of the topic of the training

Data can be defined as the quantities, characters, or symbols on which operations are performed by a

computer, which may be stored and transmitted in the form of electrical signals and recorded on

magnetic, optical, or mechanical recording media.

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data

that is huge in size and yet growing exponentially with time. Big Data is large and complex, which

thus makes it difficult to store and process using available database management tools or traditional

data processing applications.

Big Data solutions provide the tools, methodologies, and technologies that are used to capture, store,

search & analyse the data in seconds to find relationships and insights for innovation and competitive

gain that were previously unavailable.

Hadoop is the heart of Big Data. Hadoop is developed by Apache and it is an open source tool whose

source code can be modified by any developers according to their requirements. It is done based on

the Google’s MapReduce, a framework which is used to segregate a large part into a set of smaller

parts. Hadoop would divide a Big Data into small sets of data and store them on different servers at a

time.

So if we need to do any data manipulation or for searching any particular record, then it would be

made faster using Hadoop as it would process small parts of data in parallel by different server and so

fetching any particular record would be faster when compared with storing the Big Data in only one

server.

Examples of Big Data

 The New York Stock Exchange generates about one terabyte of new trade data per day.
 Social Media: The statistic shows that 500+terabytes of new data get ingested into the

databases of social media site Facebook, every day. This data is mainly generated in terms of

photo and video uploads, message exchanges, putting comments etc.

 A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With

many thousand flights per day, generation of data reaches up to many Petabytes.

Characteristics of Big Data

 i) Volume – The name Big Data itself is related to a size which is enormous. Size of data

plays a very crucial role in determining value out of data. Also, whether a particular data can

actually be considered as a Big Data or not, is dependent upon the volume of data.

Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big

Data.

(ii) Variety – The next aspect of Big Data is its variety. Variety refers to heterogeneous

sources and the nature of data, both structured and unstructured. During earlier days,

spreadsheets and databases were the only sources of data considered by most of the

applications. Nowadays, data in the form of emails, photos, videos, monitoring devices,

PDFs, audio, etc. are also being considered in the analysis applications. This variety of

unstructured data poses certain issues for storage, mining and analyzing data.

(iii) Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data

is generated and processed to meet the demands, determines real potential in the data. Big

Data Velocity deals with the speed at which data flows in from sources like business

processes, application logs, networks, and social media sites, sensors, Mobile devices, etc.

The flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data at times,

thus hampering the process of being able to handle and manage the data effectively.
Benefits of Big Data Processing

Ability to process Big Data brings in multiple benefits, such as-

o Businesses can utilize outside intelligence while taking decisions: Access to social data from

search engines and sites like Facebook, twitter are enabling organizations to fine tune their

business strategies.

o Improved customer service: Traditional customer feedback systems are getting replaced by new

systems designed with Big Data technologies. In these new systems, Big Data and natural

language processing technologies are being used to read and evaluate consumer responses.

o Early identification of risk to the product/services, if any

o Better operational efficiency

Big Data technologies can be used for creating a staging area or landing zone for new data before

identifying what data should be moved to the data warehouse. In addition, such integration of Big

Data technologies and data warehouse helps an organization to offload infrequently accessed data.
1.2 Theoretical explanation about the same

Types of Big Data

Big Data could be found in three forms:

1. Structured

2. Unstructured

3. Semi-structured

1. Structured: Any data that can be stored, accessed and processed in the form of fixed format is

termed as a 'structured' data. Over the period of time, talent in computer science has achieved

greater success in developing techniques for working with such kind of data (where the

format is well known in advance) and also deriving value out of it. However, nowadays, we

are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being

in the rage of multiple zettabytes.

2. Unstructured: Any data with unknown form or the structure is classified as unstructured data.

In addition to the size being huge, un-structured data poses multiple challenges in terms of its

processing for deriving value out of it. A typical example of unstructured data is a

heterogeneous data source containing a combination of simple text files, images, videos etc.

Now day organizations have wealth of data available with them but unfortunately, they don't

know how to derive value out of it since this data is in its raw form or unstructured format.

The output returned by 'Google Search' is an example of unstructured data.

3. Semi-structured

Semi-structured data can contain both the forms of data. We can see semi-structured data as a

structured in form but it is actually not defined with e.g. a table definition in relational DBMS.

Example of semi-structured data is a data represented in an XML file. Example of Semi-structured

Data is Personal data stored in an XML file


Apache Hadoop

Apache Hadoop is an open source framework that allows to store and process big data. Hadoop has

its own cluster (set of machines) with commodity hardware where numbers of machines are working

in distributed way.

Hadoop is written in Java and is not OLAP (online analytical processing). It is used for batch/offline

processing. It is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and many more.

Moreover, it can be scaled up just by adding nodes in the cluster.

Modules of Hadoop

1. HDFS: Hadoop Distributed File System. Google published its paper GFS and on the basis of that

HDFS was developed. It states that the files will be broken into blocks and stored in nodes over

the distributed architecture.

2. Yarn: Yet another Resource Negotiator is used for job scheduling and manage the cluster.

3. Map Reduce: This is a framework which helps Java programs to do the parallel computation on

data using key value pair. The Map task takes input data and converts it into a data set which can

be computed in Key value pair. The output of Map task is consumed by reduce task and then the

out of reducer gives the desired result.


1.3 SW/HW tools learned

1.3.1 Map reduce

MapReduce is a processing technique and a program model for distributed computing based on

java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map

takes a set of data and converts it into another set of data, where individual elements are broken

down into tuples (key/value pairs). Secondly, reduce task, which takes the output from a map as

an input and combines those data tuples into a smaller set of tuples. As the sequence of the name

MapReduce implies, the reduce task is always performed after the map job.

The major advantage of MapReduce is that it is easy to scale data processing over multiple

computing nodes. Under the MapReduce model, the data processing primitives are called

mappers and reducers. Decomposing a data processing application into mappers and reducers is

sometimes nontrivial. But, once we write an application in the MapReduce form, scaling the

application to run over hundreds, thousands, or even tens of thousands of machines in a cluster is

merely a configuration change. This simple scalability is what has attracted many programmers to

use the MapReduce model.

The Algorithm

 MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce

stage.

 Map Stage: The map or mapper¡¦s job is to process the input data. Generally, the input data is

in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file

is passed to the mapper function line by line. The mapper processes the data and creates

several small chunks of data.

 During a MapReduce job, Hadoop sends the Map and Reduce tasks to the
appropriate servers in the cluster.
 The framework manages all the details of data-passing such as issuing tasks,
verifying task completion, and copying data around the cluster between the nodes.
 Most of the computing takes place on nodes with data on local disks that reduces the
network traffic.
 After completion of the given tasks, the cluster collects and reduces the data to form
an appropriate result, and sends it back to the Hadoop
 server.

 Reduce Stage: This stage is the combination of the Shuffle stage and the Reduce stage. The

Reducer¡¦s job is to process the data that comes from the mapper. After processing, it

produces a new set of output, which will be stored in the HDFS.

Figure 1: MapReduce Flow

Figure 2: The overall MapReduce word count process

1.3.2 Pig
Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyse larger sets of
data representing them as data flows. Pig is generally used with Hadoop; we can perform all the data
manipulation operations in Hadoop using Apache Pig.
To write data analysis programs, Pig provides a high-level language known as Pig Latin. This language
provides various operators using which programmers can develop their own functions for reading,
writing, and processing data.
All these scripts are internally converted to Map and Reduce tasks. Apache Pig has a component known
as Pig Engine that accepts the Pig Latin scripts as input and converts those scripts into MapReduce jobs.

Pig Architecture
The language used to analyse data in Hadoop using Pig is known as Pig Latin. It is a high-level data
processing language which provides a rich set of data types and operators to perform various operations
on the data.
To perform a particular task Programmers using Pig, programmers need to write a Pig script using the Pig
Latin language, and execute them using any of the execution mechanisms (Grunt Shell, UDFs,
Embedded). After execution, these scripts will go through a series of transformations applied by the Pig
Framework, to produce the desired output.
Internally, Apache Pig converts these scripts into a series of MapReduce jobs, and thus, it makes the
programmer’s job easy.
Apache Pig Components
As shown in the figure, there are various components in the Apache Pig framework. Let us take a look at
the major components.
Parser
Initially the Pig Scripts are handled by the Parser. It checks the syntax of the script, does type checking,
and other miscellaneous checks. The output of the parser will be a DAG (directed acyclic graph), which
represents the Pig Latin statements and logical operators.
In the DAG, the logical operators of the script are represented as the nodes and the data flows are
represented as edges.

Figure 3: Apache Pig Components


Pig Latin Data Model
The data model of Pig Latin is fully nested and it allows complex non-atomic datatypes such as map and
tuple. Given below is the diagrammatical representation of Pig Latin’s data model.
Atom
Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string
and can be used as string and number. int, long, float, double, chararray, and bytearray are the atomic
values of Pig. A piece of data or a simple atomic value is known as a field.
Example − ‘raja’ or ‘30’
Tuple
A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A
tuple is similar to a row in a table of RDBMS.
Example − (Raja, 30)
Bag
A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a bag.
Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is similar to a
table in RDBMS, but unlike a a table in RDBMS, it is not necessary that every tuple contain the same
number of fields or that the fields in the same position (column) have the same type.

Hive
Hive is a data warehouse package used for processing, managing and querying structured data in Hadoop.
It eases analysing process and summarises big data.
It acts as a platform used to develop SQL type scripts to do MapReduce operations.
Initially Hive was started by Facebook and after that Apache Software Foundation uses it and further
modify as an open source named Apache Hive. Several companies are now using hive. For example,
Amazon uses it in Amazon Elastic MapReduce.

Figure 4: Architecture of Hive

This component diagram contains different Operation


units. The following table describes each
unit: Unit Name
User Interface Hive is a data warehouse infrastructure software
that can create interaction between user and
HDFS. The user interfaces that Hive supports are
Hive Web UI, Hive command line, and Hive HD
Insight (In Windows server).
Meta Store Hive chooses respective database servers to store
the schema or Metadata of tables, databases,
columns in a table, their data types, and HDFS
mapping.
HiveQL Process Engine HiveQL is similar to SQL for querying on schema
info on the Metastore. It is one of the replacements
of traditional approach for MapReduce program.
Instead of writing MapReduce program in Java,
we can write a query for MapReduce job and
process it.
Execution Engine The conjunction part of HiveQL process Engine
and MapReduce is Hive Execution Engine.
Execution engine processes the query and
generates results as same as MapReduce results. It
uses the flavor of MapReduce.
HDFS or HBASE Hadoop distributed file system or HBASE are the
data storage techniques to store data into file
system.

1.4 Operating system used


The operating system used was Linux Ubuntu. Some of the common commands were.

1. Linux Basic Commands tar command examples Extract from an existing tar archive. $ tar xvf

archive_name.tar

2. grep command examples Search for a given string in a file (case in-sensitive search). $ grep -i

"the" demo_file Print the matched line, along with the 3 lines after it. $ grep -A 3 -i "example"

demo_text Search for a given string in all files recursively $ grep -r "dexlab" *

3. find command examples Find files using file-name ( case in-sensitve find) $ find -iname

"MyCProgram.c" Find all empty files in home directory $ find ~ -empty

4.ssh command examples Login to remote host ssh -l jsmith remotehost.example.com Debug ssh

client ssh -v -l jsmith remotehost.example.com Display ssh client version $ ssh –V

5. vim command examples Go to the 143rd line of file $ vim +143 filename.txt Go to the first

match of the specified $ vim +/search-term filename.txt Open the file in read only mode.

6. sort command examples Sort a file in ascending order $ sort names.txt Sort a file in descending

order $ sort -r names.txt Sort passwd file by 3rd field. $ sort -t: -k 3n /etc/passwd | more

7. ls command examples Display filesize in human readable format (e.g. KB, MB etc.,) $ ls –lh
unix process. $ ps -ef | grep vim dexlab 7243 7222 9 22:43 pts/2 00:00:00 vim $ kill -9 7243

8 rm command examples Get confirmation before removing the file $ rm -i filename.txt It is very

useful while giving shell metacharacters in the file name argument. Print the filename and get

confirmation before removing the file. $ rm -i file*

Following example recursively removes all files and directories under the example directory. This

also removes the example directory itself. $ rm -r example

9.cp command examples Copy file1 to file2 preserving the mode, ownership and timestamp $ cp -

p file1 file2 Copy file1 to file2. if file2 exists prompt for confirmation before overwritting it. $ cp

-i file1 file

10.mv command examples Rename file1 to file2. if file2 exists prompt for confirmation before

11.cat command examples You can view multiple files at the same time. Following example

prints the content of file1 followed by file2 to stdout. $ cat file1 file2 While displaying the file,

following cat -n command will prepend the line number to each line of the output. $ cat -n

/etc/logrotate.conf

12.Commands chmod command examples chmod command is used to change the permissions for

a file or directory. Give full access to user and group (i.e read, write and execute) on a specific

file. $ chmod ug+rwx file.txt Revoke all access for the group (i.e read, write and execute ) on a

specific file.

$ chmod g-rwx file.txt Apply the file permissions recursively to all the files in the sub-directories.

$ chmod -R ug+rwx file.txt

chown command examples chown command is used to change the owner and group of a file. To

change owner to oracle and group to db on a file. i.e Change both owner and group at the same

time. $ chown oracle:dba dbora.sh Use -R to change the ownership recursively $ chown -R

oracle:dba /home/oracle

passwd command examples Change your password from command line using passwd. This will

prompt for the old password followed by the new password. $ passwd Super user can use passwd
command to reset others password. This will not prompt for current password of the user. $

passwd dexlab Remove password for a specific user. Root user can disable password for a

specific user. Once the password is disabled, the user can login without entering the password. $

passwd -d dexlab

mkdir command examples Following example creates a directory called temp under your home

directory. $ mkdir ~/temp Create nested directories using one mkdir command. If any of these

directories exist already, it will not display any error. If any of these $ mkdir -p

dir1/dir2/dir3/dir4/

uname command examples Uname command displays important information about the system

such as Kernel name, Host name, Kernel release number,Processor type, etc., Sample uname

output from a Ubuntu laptop is shown below

$ uname –a

whereis command examples When you want to find out where a specific Unix command exists

(for example, where does ls command exists?), you can execute the following command. $

whereis ls When you want to search an executable from a path other than the whereis default

path, you can use -B option and give path as argument to it. This searches for the executable lsmk

in the /tmp directory, and displays it, if it is available. $ whereis -u -B /tmp -f lsmk

whatis command examples Whatis command displays a single line description about a command

$ whatis ls

tail command examples Print the last 10 lines of a file by default. $ tail filename.txt Print N

number of lines from the file named filename.txt $ tail -n N filename.txt View the content of the

file in real time using tail -f. This is useful to view the log files, that keeps growing. The

command can be terminated using CTRL-C. $ tail -f log-file

less command examples less is very efficient while viewing huge log files, as it doesn’t need to

load the full file while opening. $ less huge-log-file.log One you open a file using less command,

following two keys are very helpful. CTRL+F forward one window CTRL+B backward one
window su command examples Switch to a different user account using su command. Super user

can switch to any other user without entering their password.

Chapter 4: HADOOP ADMIN | 24

Copyright © 2017-2018 DexLab Solutions Corp All Rights Reserved

$ su – dexlab Execute a single command from a different account name. In the following

example, john can execute the ls command as raj username. Once the command is executed, it

will come back to john’s account. [dexlab@dexlab]$ su - raj -c 'ls'

mysql command examples

mysql is probably the most widely used open source database on Linux. Even if you run a mysql

database on your server, you might end-up using the mysql command (client) to connect to a

mysql database running on the remote server.

To connect to a remote mysql database this will prompt for a password $ mysql -u root -p -h

192.168.1.2

To connect to a local mysql database $ mysql -u root –p

If you want to specify the mysql root password in the command line itself, enter it immediately

after -p (without any space).

to install package in Linux $ sudo-apt

4.2 Some Hadoop Basic Shell Commands Print the Hadoop version $ hadoop version

List the contents of the root directory in HDFS $ hadoop fs -ls /

Report the amount of space used and available on currently mounted filesystem $ hadoop fs -df

hdfs:/

Count the number of directories,files and bytes under the paths that match the specified file

pattern

Chapter 4: HADOOP ADMIN | 25

Copyright © 2017-2018 DexLab Solutions Corp All Rights Reserved

hadoop fs -count hdfs:/


Run a cluster balancing utility $ hadoop balancer

Create New Hdfs Directory $ hadoop fs -mkdir /home/dexlab/hadoop Add a sample text file from

the local directory named sample to the new directory you created in HDFS during the previous

step. Create new sample file $ vim sample.txt -> i -> "text" -> :wq $ hadoop fs -put

/home/dexlab/pg/sample.csv /"directory name" $ hadoop fs -ls /vivek $ hadoop fs -cat

/vivek/sample.csv hadoop fs -put data/sample.txt /home/dexlab

List the contents of this new directory in HDFS. $ hadoop fs -ls /home/dexlab/hadoop

Add the entire local directory in to /home/dexlab/training directory in HDFS $ hadoop fs -put

data/retail /home/dexlab/hadoop

List the hadoop directory again $ hadoop fs -ls hadoop

Add the purchases.txt file from the local directory named purchases.txt /home/ dexlab/training/

purchases.txt to the hadoop directory you created in HDFS $ hadoop fs -copyFromLocal

/home/training/purchases.txt hadoop/

To view the contents of your text file purchases.txt which is present in your hadoop directory $

hadoop fs -cat hadoop/purchases.txt

cp is used to copy files between directories present in HDFS $ hadoop fs -cp /user/training/*.txt

/home/dexlab/hadoop

get command can be used alternaively to -copyToLocal command $ hadoop fs -get

hadoop/sample.txt /home/dexlab/training/

Command to make the name node leave safe mode $ hadoop fs -expunge $ sudo -u hdfs hdfs

dfsadmin -safemode leave

List all the hadoop file system shell commands $ hadoop fs

Help command -$ hadoop fs –help


CHAPTER 2 TRAINING WORK UNDERTAKEN

2.1 Sequential learning steps

2.1.2 Installing Java

Step I: Following Command are used to install Sudo apt-get install default-jdk

Step II: For setting up PATH and JAVA_HOME variables, add the following commands to

~/.bashrc file. export JAVA_HOME=/usr/local/jdk1.7.0_71 export

PATH=$PATH:$JAVA_HOME/bin

Now apply all the changes into the current running system. $ source ~/.bashrc

Step III: Now verify the installation using the command java -version from the terminal as

explained above.

2.1.2 Installing Hadoop in Pseudo Distributed Mode

The following steps are used to install Hadoop 2.6.0 in pseudo distributed mode.

Step I: Setting up Hadoop

You can set Hadoop environment variables by appending the following commands to

~/.bashrc file. export HADOOP_HOME=/home/dexlab/hadoop export

HADOOP_MAPRED_HOME=$/home/dexlab/hadoop export

HADOOP_COMMON_HOME=$/home/dexlab/hadoop export

HADOOP_HDFS_HOME=$/home/dexlab/hadoop export

YARN_HOME=$/home/dexlab/hadoop export
HADOOP_COMMON_LIB_NATIVE_DIR=$/home/dexlab/hadoop/lib/native export

PATH=$PATH:$/home/dexlab/hadoop/sbin:$/home/dexlab/hadoop/bin

Now apply all the changes into the current running system. $ source ~/.bashrc

Step II: Hadoop Configuration

You can find all the Hadoop configuration files in the location

“/home/dexlab/hadoop/etc/hadoop”. You need to make suitable changes in those

configuration files according to your Hadoop infrastructure. $ cd

$/home/dexlab/hadoop/etc/hadoop

In order to develop Hadoop programs using java, you have to reset the java environment

variables in hadoop-env.sh file by replacing JAVA_HOME value with the location of java in

your system. export JAVA_HOME=/usr/

Given below are the list of files that you have to edit to configure Hadoop.

core-site.xml
The core-site.xml file contains information such as the port number used for Hadoop
instance, memory allocated for the file system, memory limit for storing the data, and the
size of Read/Write buffers.
Open the core-site.xml and add
the following properties in
between the <configuration> and
</configuration> tags.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml
The hdfs-site.xml file contains information such as the value of replication data, the
namenode path, and the datanode path of your local file systems. It means the place where
you want to store the Hadoop infra.
Let us assume the following data. dfs.replication (data replication value)
=1
(In the following path /dexlab/ is the user name.
/home/dexlab/hadoop/hdfs/namenode is the directory created by hdfs file
system.)
namenode path = /home/dexlab/hadoop/hdfs/namenode
(hadoopinfra/hdfs/datanode is the directory created by hdfs file system.)
datanode path = /home/dexlab/hadoop/hdfs/datanode
CU Citation Reference

Citation standards in this reference are provided for:

 Books

 Conference Technical Articles/Papers

 Periodicals (Journals/ Transaction/Magazines/Letters)

 Reports

 Online sources

 Patents, Standards, Thesis (M.E) and Dissertations (Ph.D )

NOTE: For two authors use style [J. K. Author and A. N. Writer] and

For three or more authors: [separate author names by comma and also use word ‘and’ before

the name of last author e.g.: J. K. Author, R. Cogdell, R. E. Haskell, and A. N. Writer]

Books

Basic Format:

[1] J. K. Author, Title of His Published Book, xth ed. City of Publisher, Country: Abbrev. of

Publisher, year.
Examples:

[1] B. Klaus and P. Horn, Robot Vision. Cambridge, USA: MIT Press, 1986.

[2] L. Stein, Computers and You, J. S. Brake, Ed. New York, USA: Wiley, 1994.

[3] M. Abramowitz and I. A. Stegun, Eds., Handbook of Mathematical Functions (Applied

Mathematics Series 55).

Washington, DC, USA: NBS, 1964.

Conference Technical Articles/Papers

Basic Format:

[1] J. K. Author, “Title of paper,” Unabbreviated Name of Conference, City of Conference,

Country, year, pp. xxx-xxx.

Example:

[1] H. Chen, S. C. Laroiya, and M. Adithan, “Precision Machining of Advanced Ceremics”

International Conference on Advanced Manufacturing Technology (ICMAT - 94), Johor

Bahru, Malaysia, 1994, pp. 203-210.

Periodicals (Journals/ Transaction/Magazines/Letters)

Basic Format:

[1] J. K. Author, “Name of paper,” Unabbreviated Title of Periodical, vol. x, no. x, pp. xxx-xxx,

Abbrev. Month, year.

Examples:

[1] R. E. Kalman, “New results in linear filtering and prediction theory,” Journal of Electrical

Engineering, vol. 83, no. 5, pp. 95-108, Mar. 1961.


[2] Y. V. Lavrova, “Geographic distribution of ionospheric disturbances in the F2 layer,” IET

Microwaves, Antennas and Propagation, vol. 19, no. 29, pp. 31–43, Feb. 1961.

[3] E. P. Wigner, “On a modification of the Rayleigh–Schrodinger perturbation theory,” (in

German), International Journal of Computational Intelligence Studies, vol. 53, p. 475, Sep.

1935.

[4] W. Rafferty, “Ground antennas in NASA’s deep space telecommunications,” IEEE

Transactions on Antennas and Propagation, vol. 82, pp. 636-640, May 1994.

Reports:

The general form for citing technical reports is to place the name and location of the company or

institution after the author and title and to give the report number and date at the end of the reference.

Basic Format:

[1] J. K. Author, “Title of report,” Name of Company, City of Company, Country, Report No.,

xxx, year.

Examples:

[1] E. E. Reber “Oxygen absorption in the earth’s atmosphere,” Aerospace Corporation, Los

Angeles, USA, Tech. Rep. TR-0200 (4230-46)-3, Nov., 1988.

Online Sources

FTP

Basic Format:

[1] J. K. Author. (year). Title (edition) [Type of medium]. Available FTP: Directory: File:

Example:
[1] R. J. Vidmar. (1994). On the use of atmospheric plasmas as electromagnetic reflectors

[Online]. Available FTP:atmnext.usc.edu Directory: pub/etext/1994 File: atmosplasma.txt

WWW

Basic Format:

[1] J. K. Author. (year, month day). Title (edition) [Type of medium]. Available:

http://www.(URL)

Example:

[1] J. Jones. (1991, May 10). Networks (2nd ed.) [Online]. Available: http://www.atm.com

Patents, Standards, Thesis (M.S.) and Dissertations (Ph.D.)

Patents

Basic Format:

[1] J. K. Author, “Title of patent,” U.S. Patent x xxx xxx, Abbrev. Month, day, year.

Example:

[1] J. P. Wilkinson, “Nonlinear resonant circuit devices,” U.S. Patent 3 624 125, July 16, 1990.

NOTE: Use “issued date” if several dates are given.

Standards

Basic Format:

[1] Title of Standard, Standard number, date.

Examples:

[1] IEEE Criteria for Class IE Electric Systems, IEEE Standard 308, 1969.

[2] Letter Symbols for Quantities, ANSI Standard Y10.5-1968.


Thesis (Master) and Dissertations (Ph.D.)

Basic Format:

[1] J. K. Author, “Title of thesis,” M.S. thesis, Abbrev. Dept., Abbrev. Univ., City of Univ.,

Country, year.

[2] J. K. Author, “Title of dissertation,” Ph.D. dissertation, Abbrev. Dept., Abbrev. Univ., City

of Univ., Country, year.

Examples:

[1] J. O. Williams, “Narrow-band analyzer,” Ph.D. dissertation, Dept. Elect. Eng., Harvard Univ.,

Cambridge, MA, 1993.

[2] N. Kawasaki, “Parametric study of thermal and chemical non equilibrium nozzle flow,” M.S.

thesis, Dept. Electron. Eng., Osaka Univ., Osaka, Japan, 1993.

References in Text

References in Text:

References are needed be cited in the text and they should appear on the line, in square inside the

punctuation. Grammatically, they may be treated as if they were footnote numbers, e.g.,

as shown by Brown [4], [5]; as mentioned earlier [2], [4]–[7], [9]; Smith [4] and Brown and

Jones [5]; Wood et al. [7]

or as nouns:

as demonstrated in [3]; according to [4] and [6]–[9].

NOTE: Use et al. when three or more names are given.


Reference List Style

Reference numbers are set flush left and form a column of their own, hanging out beyond the body of

the reference.

The reference numbers are on the line, enclosed in square brackets. In all references, the given name

of the author or editor is abbreviated to the initial only and precedes the last name. There must be

only one reference with each number.

[1] R. E. Kalman, “New results in linear filtering and prediction theory,” Journal of Electrical Engineering, vol. 83,
no. 5, pp. 95-108, Mar. 1961.
[2] Ye. V. Lavrova, “Geographic distribution of ionospheric disturbances in the F2 layer,” Applied Soft Computing,
vol. 19, no. 29, pp. 31–43, Feb. 1961.

[3] E. P. Wigner, “On a modification of the Rayleigh–Schrodinger perturbation theory,” (in German), International
Journal of Computational Intelligence Studies, vol. 53, p. 475, Sep. 1935.
[4] W. Rafferty, “Ground antennas in NASA’s deep space telecommunications,” IEEE Transactions on Antennas
and Propagation, vol. 82, no. 3, pp. 636-640, May 1994.

Important: Editing of references may entail careful renumbering of references, as well as the citations in text.
B.E TRAINING REPORT GUIDELINES

1. The report shall be computer typed (English- British, Font -Times Roman, Size-12 point, Double spacing

between lines) and printed on A4 size paper.

2. The report shall be spiral bound. The name of the candidate, degree, month of training, college name shall be

printed on the title page [refer sample sheet (title page/front page)].

3. The report shall be typed on one side only with double space with a margin 3.5 cm on the left, 2.5 cm on the

top, and 1.25 cm on the right and at bottom.

4. In the report, the title page [Refer sample sheet (title Page/front page)] should be given first then the

Certificate by Company/Industry/Institute and then candidate’s declaration, followed by an abstract of the

report (not exceeding one page). This should be followed by the acknowledgment, list of figures/list of tables,

notations/nomenclature, and then contents with page nos.

5. The diagrams should be printed on a light/white background, Tabular matter should be clearly arranged and

the font of the Tabular matter should be Font -Times Roman, Size-10 point, Single spacing between lines.

Decimal point may be indicated by full stop(.). The caption for figure must be given at the BOTTOM(center
aligned) of the figure and Caption for the Table must be given at the TOP(center aligned) of the Table. The

font for the captions should be Times Roman, Italics, Size-10 point.

6. The font for the chapter titles should be Times Roman, Bold, Capital, Size-16 point and center aligned. The

font for the Headings should be Times Roman, Bold, and Size-14 point. The font for the sub-headings should

be Times Roman, Bold, and Size-12 point.

7. Equations should be numbered as 1.1, 1.2, 1.3 etc in chapter 1. Similarly as 2.1, 2.2, 2.3 etc in chapter 2 and so

on.

8. Figures should be numbered as Figure1.1, Figure 1.2, Figure 1.3 etc in chapter 1. Similarly as Figure 2.1,

Figure 2.2, Figure 2.3 etc in chapter 2 and so on.

9. Tables should be numbered as Table 1.1, Table 1.2, Table 1.3 etc in chapter 1. Similarly as Table 2.1, Table

2.2, Table 2.3 etc in chapter 2 and so on.

10. Conclusions and future scope each must not exceed more than one page.

11. The graphs (optional) should be combined for the same parameters for proper comparison. Single graph

should be avoided as far as possible.

12. The training report must consist of following chapters:

[Chapter-1] INTRODUCTION

[Chapter-2] TRAINING WORK UNDERTAKEN

[Chapter-3] RESULTS AND DISCUSSIONS

[Chapter-4] CONCLUSION AND FUTURE SCOPE

13. References (For style of references follow the instructions attached)

14. Appendix (Any additional information regarding training, (If any) e.g. program, is supposed to be included in

appendix )

15. Paste a CD containing the soft copy of Report (in Docx and PDF), Implementation & Reference papers and

other material (if any,) related to the work, on the inner side of back hard cover.

You might also like