0% found this document useful (0 votes)

70 views13 pages

Hdfs and Pig

HDFS follows a master-slave architecture with a NameNode that manages metadata and DataNodes that store data blocks. Files are divided into blocks that are replicated across DataNodes for reliability. The NameNode records metadata like file locations and permissions, while DataNodes store and retrieve blocks. A Secondary NameNode helps merge edits from the NameNode to prevent it from running out of memory. HDFS provides commands like put, get, and ls to interact with files.

Uploaded by

DEEPINDER SINGH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views13 pages

Hdfs and Pig

Uploaded by

DEEPINDER SINGH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

HDFS:

Apache HDFS or Hadoop Distributed File System is a block-structured file system where
each file is divided into blocks of a pre-determined size. These blocks are stored across a
cluster of one or several machines. Apache Hadoop HDFS Architecture follows
a Master/Slave Architecture, where a cluster comprises of a single NameNode (Master node)
and all the other nodes are DataNodes (Slave nodes). HDFS can be deployed on a broad
spectrum of machines that support Java. Though one can run several DataNodes on a single
machine, but in the practical world, these DataNodes are spread across various machines..

Features of HDFS

 It is suitable for the distributed storage and processing.

 Hadoop provides a command interface to interact with HDFS.
 The built-in servers of namenode and datanode help users to easily check the status of
cluster.
 Streaming access to file system data.
 HDFS provides file permissions and authentication.

HDFS Architecture:

HDFS follows the master-slave architecture and it has the following elements.

NameNode:
NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and
manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly
available server that manages the File System Namespace and controls access to files by
clients. I will be discussing this High Availability feature of Apache Hadoop HDFS in my
next blog. The HDFS architecture is built in such a way that the user data never resides on the
NameNode. The data resides on DataNodes only.
Functions of NameNode:

 It is the master daemon that maintains and manages the DataNodes (slave nodes)
 It records the metadata of all the files stored in the cluster, e.g. The location of blocks
stored, the size of the files, permissions, hierarchy, etc. There are two files associated
with the metadata:
o FsImage: It contains the complete state of the file system namespace since the
start of the NameNode.
o EditLogs: It contains all the recent modifications made to the file system with
respect to the most recent FsImage.
 It records each change that takes place to the file system metadata. For example, if a
file is deleted in HDFS, the NameNode will immediately record this in the EditLog.
 It regularly receives a Heartbeat and a block report from all the DataNodes in the
cluster to ensure that the DataNodes are live.
 It keeps a record of all the blocks in HDFS and in which nodes these blocks are
located.
 The NameNode is also responsible to take care of the replication factor of all the
blocks which we will discuss in detail later in this HDFS tutorial blog.
 In case of the DataNode failure, the NameNode chooses new DataNodes for new
replicas, balance disk usage and manages the communication traffic to the DataNodes.

DataNode:
DataNodes are the slave nodes in HDFS. Unlike NameNode, DataNode is a
commodity hardware, that is, a non-expensive system which is not of high quality or
high-availability. The DataNode is a block server that stores the data in the local file
ext3 or ext4.

Functions of DataNode:

 These are slave daemons or process which runs on each slave machine.
 The actual data is stored on DataNodes.
 The DataNodes perform the low-level read and write requests from the file system’s
clients.
 They send heartbeats to the NameNode periodically to report the overall health of
HDFS, by default, this frequency is set to 3 seconds.

Till now, you must have realized that the NameNode is pretty much important to us. If it fails,
we are doomed. But don’t worry, we will be talking about how Hadoop solved this single
point of failure problem in the next Apache Hadoop HDFS Architecture blog. So, just relax
for now and let’s take one step at a time.

Secondary NameNode:

Apart from these two daemons, there is a third daemon or a process called Secondary
NameNode. The Secondary NameNode works concurrently with the primary NameNode as a
helper daemon. And don’t be confused about the Secondary NameNode being a backup
NameNode because it is not.
Functions of Secondary NameNode:

 The Secondary NameNode is one which constantly reads all the file systems and
metadata from the RAM of the NameNode and writes it into the hard disk or the file
system.
 It is responsible for combining the EditLogs with FsImage from the NameNode.
 It downloads the EditLogs from the NameNode at regular intervals and applies to
FsImage. The new FsImage is copied back to the NameNode, which is used whenever
the NameNode is started the next time.

Hence, Secondary NameNode performs regular checkpoints in HDFS. Therefore, it is also

called CheckpointNode.

Blocks:
Blocks are the nothing but the smallest continuous location on your hard drive where data is
stored. In general, in any of the File System, you store the data as a collection of blocks.
Similarly, HDFS stores each file as blocks which are scattered throughout the Apache
Hadoop cluster. The default size of each block is 128 MB which you can configure as per
your requirement.

It is not necessary that in HDFS, each file is stored in exact multiple of the configured block
size (128 MB, 256 MB etc.). Let’s take an example where I have a file “example.txt” of size
514 MB as shown in above figure. Suppose that we are using the default configuration of
block size, which is 128 MB. Then, how many blocks will be created? 5, Right. The first four
blocks will be of 128 MB. But, the last block will be of 2 MB size only.

Replication Management:

HDFS provides a reliable way to store huge data in a distributed environment as data blocks.
The blocks are also replicated to provide fault tolerance. The default replication factor is 3
which is again configurable. So, as you can see in the figure below where each block is
replicated three times and stored on different DataNodes (considering the default replication
factor):

Therefore, if you are storing a file of 128 MB in HDFS using the default configuration, you
will end up occupying a space of 384 MB (3*128 MB) as the blocks will be replicated three
times and each replica will be residing on a different DataNode.

Note: The NameNode collects block report from DataNode periodically to maintain the
replication factor. Therefore, whenever a block is over-replicated or under-replicated the
NameNode deletes or add replicas as needed.
Hadoop Shell Commands:
1.cat:

Copies source paths to stdout.

Usage: hadoop fs -cat URI [URI …]

Example: hdfs dfs -cat sample

2. chgrp:

Change group association of files. With -R, make the change recursively through the
directory structure. The user must be the owner of files, or else a super-user.
Additionalinformation is in the Permissions User Guide.

Usage: hadoop fs -chgrp [-R] GROUP URI [URI …]

Example: hdfs dfs -chgrp [-R] New Group sample

3. chmod:

Change the permissions of files. With -R, make the change recursively through the
directorystructure. The user must be the owner of the file, or else a super-user. Additional
informationis in the Permissions User Guide.

Usage: hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI

[URI …]

Example: hdfs dfs -chmod 777 /user/hadoop/dir1/sample

4. chown:

Change the owner of files. With -R, make the change recursively through the directory
structure. The user must be a super-user. Additional information is in the Permissions User
Guide.

Usage: hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

Example: hdfs dfs -chown -R xdark/opt/hadoop/logs

5. put:

Copy single src, or multiple srcs from local file system to the destination filesystem. Also
reads input from stdin and writes to destination filesystem.

Usage: hadoop fs -put <localsrc> ... <dst>

Example: hdfs dfs -put /home/xdark/Desktop/sample /user/hadoop

6. copyFromLocal:

Similar to put command, except that the source is restricted to a local file reference.

Usage: hadoop fs -copyFromLocal <localsrc> URI

Example: hdfs dfs -copyFromLocal /home/xdark/Desktop/sample /user/hadoop

7. copyToLocal:

Similar to get command, except that the destination is restricted to a local file reference.

Usage: hadoop fs -copyToLocal [-ignorecrc] [-crc] URI

Example: hdfs dfs -copyToLocal /user/hadoop/sample /home/xdark/Desktop

8. cp:

Copy files from source to destination. This command allows multiple sources as well in
which case the destination must be a directory.

Usage: hadoop fs -cp URI [URI …] <dest>

Example: hadoop fs -cp /user/hadoop/dir2/purchases.txt /user/hadoop/dir1

9. du:

Displays aggregate length of files contained in the directory or the length of a file in case its
just a file.

Usage: hadoop fs -du URI [URI …]

Example: hdfs dfs -du /user/hadoop/dir1/sample

10. dus:

Displays a summary of file lengths.

Usage: hadoop fs -dus <args>

Example: hdfs dfs -dus /sample

11. expunge:

Empty the Trash. Refer to HDFS Design for more information on Trash feature.

Usage: hadoop fs –expunge

Example: hdfs dfs -expunge

12. get:

Copy files to the local file system. Files that fail the CRC check may be copied with the
-ignorecrc option. Files and CRCs may be copied using the -crc option.

Usage: hadoop fs -get [-ignorecrc] [-crc] <src> <localdst>

Example: hdfs dfs -get /user/hadoop/dir2/sample /home/dataflair/Desktop

13. getmerge:

Takes a source directory and a destination file as input and concatenates files in src into the
destination local file. Optionally addnl can be set to enable adding a newline character at
the end of each file.

Usage: hadoop fs -getmerge <src> <localdst> [addnl]

Example: hdfs dfs -getmerge /user/hadoop/dir1/sample.txt /user/hadoop/dir2/sample2.txt

/home/sample1.txt

14. ls:

For a file returns stat on the file with the following format:
filename <number of replicas> filesize modification_date
modification_time permissions userid groupid
For a directory it returns list of its direct children as in unix. A directory is listed as:
dirname <dir> modification_time modification_time permissions
userid groupid

Usage: hadoop fs -ls <args>

Example: hdfs dfs -ls /user/hadoop/dir1

15. lsr:

Recursive version of ls. Similar to Unix ls -R.

Usage: hadoop fs -lsr <args>

Example: hadoop fs -lsr

16. mkdir:

Takes path uri's as argument and creates directories. The behavior is much like unix mkdir -p
creating parent directories along the path.

Usage: hadoop fs -mkdir <paths>

Example: hdfs dfs -mkdir /user/hadoop/dir1

17. movefromLocal:

Displays a "not implemented" message.

Usage: dfs -moveFromLocal <src> <dst>

Example: hdfs dfs -moveFromLocal /home/xdark/Desktop/sample /user/hadoop/dir1

18. mv:

Moves files from source to destination. This command allows multiple sources as well in
which case the destination needs to be a directory. Moving files across filesystems is not
permitted.

Usage: hadoop fs -mv URI [URI …] <dest>

Example: hdfs dfs -mv myfile.txt /dir1

19. rm:

Delete files specified as args. Only deletes non empty directory and files. Refer to rmr for
recursive deletes.

Usage: hadoop fs -mv URI [URI …] <dest>

Example: hdfs dfs -rm /user/hadoop/dir2/sample

20. rmr:

Recursive version of delete.

Usage: hadoop fs -rmr URI [URI …]

Example: hdfs dfs -rmr /sample

21. setrep:

Changes the replication factor of a file. -R option is for recursively increasing the replication
factor of files within a directory.

Usage: hadoop fs -setrep [-R] <path>

Example: hdfs dfs -setrep -w 3 /user/hadoop/dir1

22. stat:

Returns the stat information on the path.

Usage: hadoop fs -tail [-f] URI

Example: hdfs dfs -stat /user/hadoop/dir1

23. tail:

Displays last kilobyte of the file to stdout. -f option can be used as in Unix.

Usage: hadoop fs -tail [-f] URI

Example: hdfs dfs -tail -f /user/hadoop/dir2/purchases.txt

24. test:

The Hadoop fs shell command test is used for file test operations.
It gives 1 output if a path exists; it has zero length, or it is a directory or otherwise 0.
Options:
-e check to see if the file exists. Return 0 if true.
-z check to see if the file is zero length. Return 0 if true
-d check return 1 if the path is directory else return 0.

Usage: hadoop fs -test -[ezd] URI

Example: 1.hdfs dfs -test -e sample

2.hdfs dfs -test -z sample

3.hdfs dfs -test -d sample

25. text:

Takes a source file and outputs the file in text format. The allowed formats are zip and
TextRecordInputStream.

Usage: hadoop fs -text <src>

Example: hdfs dfs -text /user/hadoop/dir1/sample

26. touchz:
Create a file of zero length.

Usage: hadoop fs -touchz URI [URI …]

Example: hadoop fs –touchz sample

27. version :

This Hadoop command prints the Hadoop version

Usage: version

Example: hdfs dfs version

28.df:

df displays free space.This was all on Hadoop HDFS Commands.

Usage: hdfs dfs -df [-h] URI [URI ...]

Example: hdfs dfs -df -h

HDFS dfs Command Description

Apache Pig:
Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze
larger sets of data representing them as data flows. Pig is generally used with Hadoop; we can
perform all the data manipulation operations in Hadoop using Apache Pig.
To write data analysis programs, Pig provides a high-level language known as Pig Latin.
Thislanguage provides various operators using which programmers can develop their own
functionsfor reading, writing, and processing data.
To analyze data using Apache Pig, programmers need to write scripts using Pig Latin
language.
All these scripts are internally converted to Map and Reduce tasks. Apache Pig has a
componentknown as Pig Engine that accepts the Pig Latin scripts as input and converts those
scripts intoMapReduce jobs.

Apache Pig – Architecture:

The language used to analyze data in Hadoop using Pig is known as Pig Latin. It is a
highlevel
data processing language which provides a rich set of data types and operators to perform
various operations on the data.
To perform a particular task Programmers using Pig, programmers need to write a Pig script
using the Pig Latin language, and execute them using any of the execution mechanisms
(Grunt
Shell, UDFs, Embedded). After execution, these scripts will go through a series of
transformations applied by the Pig Framework, to produce the desired output.
Internally, Apache Pig converts these scripts into a series of MapReduce jobs, and thus, it
makes
the programmer’s job easy. The architecture of Apache Pig is shown below.
Apache Pig Components:
As shown in the figure, there are various components in the Apache Pig framework. Let us
take a
look at the major components.

Parser:

Initially the Pig Scripts are handled by the Parser. It checks the syntax of the script, does type
checking, and other miscellaneous checks. The output of the parser will be a DAG (directed
acyclic graph), which represents the Pig Latin statements and logical operators.
In the DAG, the logical operators of the script are represented as the nodes and the data flows
arerepresented as edges.

Optimizer:

The logical plan (DAG) is passed to the logical optimizer, which carries out the logical
optimizations such as projection and pushdown.

Compiler:

The compiler compiles the optimized logical plan into a series of MapReduce jobs.

Execution engine

Finally the MapReduce jobs are submitted to Hadoop in a sorted order. Finally, these
MapReduce jobs are executed on Hadoop producing the desired results.

Pig Latin Data Model:

The data model of Pig Latin is fully nested and it allows complex non-atomic datatypes such
as map and tuple. Given below is the diagrammatical representation of Pig Latin’s data model.

Atom:

Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is
stored as string and can be used as string and number. int, long, float, double, chararray, and
bytearray are the atomic values of Pig. A piece of data or a simple atomic value is known as a
field.
Example − ‘raja’ or ‘30’

Tuple:

A record that is formed by an ordered set of fields is known as a tuple, the fields can be of
any type. A tuple is similar to a row in a table of RDBMS.
Example − (Raja, 30)

Bag:

A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is

known as a bag. Each tuple can have any number of fields (flexible schema). A bag is
represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not
necessary that every tuple contain the same number of fields or that the fields in the same
position (column) have the same type.

Example − {(Raja, 30), (Mohammad, 45)}

A bag can be a field in a relation; in that context, it is known as inner bag.

Example − {Raja, 30, {9848022338, [email protected],}}

Map:

A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and
should be unique. The value might be of any type. It is represented by ‘[]’
Example − [name#Raja, age#30]

Relation:

A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee
that tuples are processed in any particular order).

Apache Pig Execution Modes:

You can run Apache Pig in two modes, namely, Local Mode and HDFS mode.

Local Mode:

In this mode, all the files are installed and run from your local host and local file system.
There is no need of Hadoop or HDFS. This mode is generally used for testing purpose.

MapReduce Mode:

MapReduce mode is where we load or process the data that exists in the Hadoop File System
(HDFS) using Apache Pig. In this mode, whenever we execute the Pig Latin statements to
process the data, a MapReduce job is invoked in the back-end to perform a particular
operation on the data that exists in the HDFS.

Apache Pig Execution Mechanisms:

Apache Pig scripts can be executed in three ways, namely, interactive mode, batch mode, and
embedded mode.

Interactive Mode (Grunt shell) − You can run Apache Pig in interactive mode using the
Grunt shell. In this shell, you can enter the Pig Latin statements and get the output (using
Dump operator).

Batch Mode (Script) − You can run Apache Pig in Batch mode by writing the Pig Latin
script in a single file with .pig extension.

Embedded Mode (UDF) − Apache Pig provides the provision of defining our own
functions (User Defined Functions) in programming languages such as Java, and using them
in our script.

Bda Unit 5
No ratings yet
Bda Unit 5
17 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
21CS72 Bigdata Module 2 HDFS
No ratings yet
21CS72 Bigdata Module 2 HDFS
55 pages
HDFS
100% (2)
HDFS
6 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
5 Final Hadoop Ecosystem Hdfs
No ratings yet
5 Final Hadoop Ecosystem Hdfs
130 pages
03 Hdfs
No ratings yet
03 Hdfs
27 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
7 pages
Hdfs Architecture
No ratings yet
Hdfs Architecture
16 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
BigData Fundamental and Hadoop Interview Questions
No ratings yet
BigData Fundamental and Hadoop Interview Questions
33 pages
Unit - 3 (HDFS)
No ratings yet
Unit - 3 (HDFS)
23 pages
HDFS Concepts
No ratings yet
HDFS Concepts
10 pages
BDS Session 5
No ratings yet
BDS Session 5
57 pages
BCS061 Notes Unit3
No ratings yet
BCS061 Notes Unit3
23 pages
5.apache Hadoop
No ratings yet
5.apache Hadoop
33 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Module 4 - Hadoop HDFS
No ratings yet
Module 4 - Hadoop HDFS
102 pages
Unit - II
No ratings yet
Unit - II
64 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
No ratings yet
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
56 pages
BDA UNIT - 3 Updated
No ratings yet
BDA UNIT - 3 Updated
25 pages
Hadoop
No ratings yet
Hadoop
31 pages
Unit 4
No ratings yet
Unit 4
104 pages
Unit-2 CH 1 Updated
No ratings yet
Unit-2 CH 1 Updated
22 pages
HDFS
No ratings yet
HDFS
16 pages
Bda - M 2
No ratings yet
Bda - M 2
113 pages
Exp3 BDI 60004200124
No ratings yet
Exp3 BDI 60004200124
5 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
22 pages
Lab2 BD
No ratings yet
Lab2 BD
20 pages
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
No ratings yet
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
10 pages
HDFS
No ratings yet
HDFS
15 pages
Hdfs R20it III
No ratings yet
Hdfs R20it III
19 pages
HDFSnew
No ratings yet
HDFSnew
20 pages
Unit - 3 (HDFS) - 1
No ratings yet
Unit - 3 (HDFS) - 1
24 pages
Huawei
No ratings yet
Huawei
32 pages
Yahoo Hadoop Tutorial
No ratings yet
Yahoo Hadoop Tutorial
28 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Unit 2
No ratings yet
Unit 2
22 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Big Data Lecture # 05
No ratings yet
Big Data Lecture # 05
22 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
05 - Introduction To HDFS
No ratings yet
05 - Introduction To HDFS
27 pages
HDFS (27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS (27 Jan 2025 Hadoop Distributed File System)
73 pages
Big Data Unit 3 by Multi Atoms
No ratings yet
Big Data Unit 3 by Multi Atoms
6 pages
Complete Hadoop Notes Final
No ratings yet
Complete Hadoop Notes Final
4 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
HDFS
No ratings yet
HDFS
11 pages
BD Unit-IIINotes
No ratings yet
BD Unit-IIINotes
17 pages
BDA - Unit-2
No ratings yet
BDA - Unit-2
24 pages
(17CS82) 8 Semester CSE: Big Data Analytics
No ratings yet
(17CS82) 8 Semester CSE: Big Data Analytics
169 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
5 pages
E Tech Netiquette Semi-Detailed-Plan
No ratings yet
E Tech Netiquette Semi-Detailed-Plan
2 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Hadoop File System
No ratings yet
Hadoop File System
36 pages
CAT6A Reference Guide
No ratings yet
CAT6A Reference Guide
56 pages
1783-Um007 Stratix 800 - En-P
No ratings yet
1783-Um007 Stratix 800 - En-P
438 pages
Exceed™: User's Guide
No ratings yet
Exceed™: User's Guide
296 pages
Task 1
No ratings yet
Task 1
48 pages
2G/2.5G/3G Architecture: Manu Mittal
No ratings yet
2G/2.5G/3G Architecture: Manu Mittal
37 pages
IDOCs (For Functional Consultants - Set Up & Trouble Shooting)
100% (1)
IDOCs (For Functional Consultants - Set Up & Trouble Shooting)
33 pages
Nectar Monitoring
No ratings yet
Nectar Monitoring
35 pages
LEC (LIGHTNING EVEN COUNTER) Sebagai Alat Penghitung Sambarang Petir Yang Di Pasang Di Grounding
No ratings yet
LEC (LIGHTNING EVEN COUNTER) Sebagai Alat Penghitung Sambarang Petir Yang Di Pasang Di Grounding
2 pages
PowerEdge Corrective Maintenance Module2 - Participant Guide
No ratings yet
PowerEdge Corrective Maintenance Module2 - Participant Guide
64 pages
Junos MulticastVPNs
No ratings yet
Junos MulticastVPNs
46 pages
SG - Virtual Phone Service Provider
No ratings yet
SG - Virtual Phone Service Provider
10 pages
Unit 1
No ratings yet
Unit 1
118 pages
BE SG RWF Frick II Rotary
No ratings yet
BE SG RWF Frick II Rotary
8 pages
The Convergence of Internet of Things and Cloud For Smart Computing
No ratings yet
The Convergence of Internet of Things and Cloud For Smart Computing
139 pages
Fundamental of Automation FIRST SOLAR
No ratings yet
Fundamental of Automation FIRST SOLAR
10 pages
Microsoft Network NAP
No ratings yet
Microsoft Network NAP
113 pages
NETWORK TOPOLOGY Reviewer
No ratings yet
NETWORK TOPOLOGY Reviewer
43 pages
Wireless Communication Systems For Underground Mines - A Critical Appraisal
No ratings yet
Wireless Communication Systems For Underground Mines - A Critical Appraisal
5 pages
Eluna at A Glance V 2
No ratings yet
Eluna at A Glance V 2
25 pages
Discovery 26: Configure Control Plane Policing: Task 1: Configure and Verify Copp On R1
No ratings yet
Discovery 26: Configure Control Plane Policing: Task 1: Configure and Verify Copp On R1
2 pages
Modem Tellabs Omh-A Sco 0
No ratings yet
Modem Tellabs Omh-A Sco 0
2 pages
2.7.6 Packet Tracer - Implement Basic Connectivity
No ratings yet
2.7.6 Packet Tracer - Implement Basic Connectivity
7 pages
Windows Internals Expert Speaks On Source Code Leak Updated
No ratings yet
Windows Internals Expert Speaks On Source Code Leak Updated
8 pages
Network Scheme
No ratings yet
Network Scheme
3 pages
"Spi.H": Spi1 - Init Spi1 - TX Spi1 - Txbuffer
No ratings yet
"Spi.H": Spi1 - Init Spi1 - TX Spi1 - Txbuffer
6 pages
WWW - Manaresults.Co - In: II B. Tech II Semester Regular/Supplementary Examinations, April/May-2017 Analog Communications
No ratings yet
WWW - Manaresults.Co - In: II B. Tech II Semester Regular/Supplementary Examinations, April/May-2017 Analog Communications
4 pages
Pasos
No ratings yet
Pasos
4 pages
FortiGate Product Matrix
No ratings yet
FortiGate Product Matrix
3 pages
Configuration SWC 9200 1 DAT
No ratings yet
Configuration SWC 9200 1 DAT
7 pages

Hdfs and Pig

Uploaded by

Hdfs and Pig

Uploaded by

HDFS:

 It is suitable for the distributed storage and processing.

Hence, Secondary NameNode performs regular checkpoints in HDFS. Therefore, it is also

Copies source paths to stdout.

Usage: hadoop fs -cat URI [URI …]

Example: hdfs dfs -cat sample

Usage: hadoop fs -chgrp [-R] GROUP URI [URI …]

Example: hdfs dfs -chgrp [-R] New Group sample

Usage: hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI

Example: hdfs dfs -chmod 777 /user/hadoop/dir1/sample

Usage: hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

Example: hdfs dfs -chown -R xdark/opt/hadoop/logs

Usage: hadoop fs -put <localsrc> ... <dst>

Example: hdfs dfs -put /home/xdark/Desktop/sample /user/hadoop

Usage: hadoop fs -copyFromLocal <localsrc> URI

Example: hdfs dfs -copyFromLocal /home/xdark/Desktop/sample /user/hadoop

Usage: hadoop fs -copyToLocal [-ignorecrc] [-crc] URI

Example: hdfs dfs -copyToLocal /user/hadoop/sample /home/xdark/Desktop

Usage: hadoop fs -cp URI [URI …] <dest>

Example: hadoop fs -cp /user/hadoop/dir2/purchases.txt /user/hadoop/dir1

Usage: hadoop fs -du URI [URI …]

Example: hdfs dfs -du /user/hadoop/dir1/sample

Displays a summary of file lengths.

Usage: hadoop fs -dus <args>

Example: hdfs dfs -dus /sample

Usage: hadoop fs –expunge

Usage: hadoop fs -get [-ignorecrc] [-crc] <src> <localdst>

Example: hdfs dfs -get /user/hadoop/dir2/sample /home/dataflair/Desktop

Usage: hadoop fs -getmerge <src> <localdst> [addnl]

Example: hdfs dfs -getmerge /user/hadoop/dir1/sample.txt /user/hadoop/dir2/sample2.txt

Usage: hadoop fs -ls <args>

Example: hdfs dfs -ls /user/hadoop/dir1

Recursive version of ls. Similar to Unix ls -R.

Usage: hadoop fs -lsr <args>

Example: hadoop fs -lsr

Usage: hadoop fs -mkdir <paths>

Displays a "not implemented" message.

Example: hdfs dfs -moveFromLocal /home/xdark/Desktop/sample /user/hadoop/dir1

Usage: hadoop fs -mv URI [URI …] <dest>

Example: hdfs dfs -mv myfile.txt /dir1

Usage: hadoop fs -mv URI [URI …] <dest>

Example: hdfs dfs -rm /user/hadoop/dir2/sample

Recursive version of delete.

Usage: hadoop fs -rmr URI [URI …]

Example: hdfs dfs -rmr /sample

Usage: hadoop fs -setrep [-R] <path>

Example: hdfs dfs -setrep -w 3 /user/hadoop/dir1

Returns the stat information on the path.

Example: hdfs dfs -stat /user/hadoop/dir1

Usage: hadoop fs -tail [-f] URI

Example: hdfs dfs -tail -f /user/hadoop/dir2/purchases.txt

Usage: hadoop fs -test -[ezd] URI

Example: 1.hdfs dfs -test -e sample

2.hdfs dfs -test -z sample

3.hdfs dfs -test -d sample

Usage: hadoop fs -text <src>

Example: hdfs dfs -text /user/hadoop/dir1/sample

Usage: hadoop fs -touchz URI [URI …]

Example: hadoop fs –touchz sample

This Hadoop command prints the Hadoop version

Example: hdfs dfs version

df displays free space.This was all on Hadoop HDFS Commands.

Usage: hdfs dfs -df [-h] URI [URI ...]

Example: hdfs dfs -df -h

HDFS dfs Command Description

Apache Pig – Architecture:

Pig Latin Data Model:

A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is

Example − {(Raja, 30), (Mohammad, 45)}

Example − {Raja, 30, {9848022338, [email protected],}}

Apache Pig Execution Modes:

Apache Pig Execution Mechanisms:

You might also like