0% found this document useful (0 votes)

27 views4 pages

Checkpointing and Deepdive

Uploaded by

Nadeem Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Checkpointing and Deepdive

Uploaded by

Nadeem Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

HDFS CEHCKPOINTING INTRODUCTION AND

DEEPDIVE

Namenode maintains two constructs to manage the metadata, that is FSimage and
editlog

FSimage will be the actual snapshot of what is available within the Namenode
memory,

and edit log will have the historical changes happened within the metadata

let us take an example, assume I am adding a file A into the system

I am moving the file to a different location and I am deleting file A

I am adding another File C. So effectively I added a file and deleted the file A

so there is no point in maintaining the meta information about this file A. I can
regenerate the FSImage

at the Namenode, otherwise for backup purpose, I can take a checkpoint and

generate the new FSImage with the help of two different system, that is secondary

Namenode and standby Namenode and that process we call it as check pointing

new FSimage getting generated will have the existing metadata what ever was
available in the FSimage

initial FSimage assume we already had file 1, it will continue to exist

and in the process or in the mean time, we have made four changes, that is
effectively file A was

added and deleted so it is not existing, so only file C will be effectively will be
existing and meta information about file C will be included

within the FSimage, so the old, the meta information as a part of old FSImage

plus the effective value of what ever that is available

after doing the playback of the editlog will be included as a part of FSImage

and generating this new FSImage, the process we call it as checkpointing

and each entry within the editlog we call it as an segment

so every transaction happening within the metadata will be added as a segment

within the editlog and check point can happen by two systems

one is secondary Namenode which acts as cold backup, on regular interval it will do
the checkpoint

it will communicate with the Namenode gets the necessary files, edits fsimage

creates the new fsimage and push it back to the Namenode, the other system which

which does the check pointing is standby Namenode, which provides the hot backup
again standby Namenode can be used to generate the check pointing an a regular
interval or

through a manual trigger, we can make the standby Namenode to take the checkpoint

even though the standby Namenode is not designed to do the check pointing

it does the high availability of Namenode. additional to high availability support

it does the check pointing as well. The primary responsibility of the standby
Namenode is to

keep the high availability of Namenode, that is different topic, we will see it in
another session, as a part of check pointing, standby Namenode

needs to meet a pre condition, either on a regular interval or some trigger has to
happen through administrative commands

if any of the pre condition is met, it will create a new FSImage and md5 for that
FSimage and

push that particular FSImage into the active namenode

it doesn't need to look for the Namenode for the changes, because standby

Namenode is a hot backup and at any point of time, it will be in sync with the

Namenode through an NFS. If we are going to use secondary Namenode to do the check
pointing

secondary Namenode communicates with the Namenode

on a regular interval, the default is 60 mins. when ever the precondition happens,
that is on a regular interval or

through administrative command, the secondary Namenode will communicate with the
Namenode,

all the communication between the secondary Namenode and Namenode happens through
http

protocol. It will fetch the new FSImage, and the edits that

happened within the Namenode till that particular point. Before providing the
FSImage,

and the edits logs to the secondary namenode, what Namenode does, it rolls all the
edit logs,

basically it does finalizing all the changes,

what ever that happened till now and bundle it as a file and start making the entry
of new entry or any change that is happening in to that

Namenode from that point it will make it as new entry into the Namenode

so it will roll the current edit log as a file .1 or with a prefix and a new log
will get rolled. so from that particular point,
onwards what ever fsimage and edit logs available that will be brought into the
secondary namenode

and if any change happens to the FSImage, within the Namenode, that will get
loaded, into the secondary Namenode, and secondary

Namenode will playback the edit logs and apply all the changes and generate a new
FSImage

and push that new FSImage into the Namenode so that in next restart Namenode can
use the new

FSImage so that restart of Namenode will be fast and quick

In a summary the check pointing is done by secondary Namenode or standby Namenode

so there will be a question, why can't Namenode do the checkpoint, Checkpoint is

heavily CPU intensive and IO intensive process, and we have

to pass the users accessing the files system, when the check point is happening

I cannot hold the users from accessing the data or accessing the hdfs when the
checkpoint process is going on,

so we offload the work, or the Namenode offloads the check pointing process either
to secondary

Namenode or to the standby Namenode. Secondary Namenode is dedicatedly made for

check pointing process and standby Namenode along with the high availability it can
do the check pointing as well

and the interval at which how the check point should happen, where the checkpoint
should happen, all that can be

controlled from the configuration files

so let me check the configurations

what is the checkpoint interval

and where the checkpoint needs to be stored

and whenever the transactions, the number of transactions from editlogs, if it

reaches the threshold limit the checkpoint will also happen

and there will be a threshold limit to keep verifying the number of transactions
whether the checkpoint

should happen or not, that can also be controlled

so down the line we will see in practical, by adding files, by making changes to
hdfs,

how edit files getting updated, how the fsimage getting updated
information alert
Schedule learning time
Learning a little each day adds up. Research shows that students who make learning
a habit are more likely to reach their goals. Set time aside to learn and get
reminders using your learning scheduler.
About this course

Crazy Loca
100% (2)
Crazy Loca
13 pages
Yarn Ha Federation
No ratings yet
Yarn Ha Federation
64 pages
Solution Stanford Library Management System
100% (1)
Solution Stanford Library Management System
10 pages
Dataguard 12c
No ratings yet
Dataguard 12c
8 pages
HDFS Checkpointing
No ratings yet
HDFS Checkpointing
5 pages
5.apache Hadoop
No ratings yet
5.apache Hadoop
33 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
4 pages
Bda - M 2
No ratings yet
Bda - M 2
113 pages
Hadoop Fundamentals
No ratings yet
Hadoop Fundamentals
45 pages
Recover From Namenode Failure
No ratings yet
Recover From Namenode Failure
14 pages
Unit - 3 (HDFS)
No ratings yet
Unit - 3 (HDFS)
23 pages
Unit - 3 (HDFS) - 1
No ratings yet
Unit - 3 (HDFS) - 1
24 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
22 pages
HDFS (27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS (27 Jan 2025 Hadoop Distributed File System)
73 pages
2018 Unit1 Lecture5 HDFS HA
No ratings yet
2018 Unit1 Lecture5 HDFS HA
29 pages
What Is Hadoop HDF1
No ratings yet
What Is Hadoop HDF1
6 pages
Distributed File Systems: Arvind Krishnamurthy Spring 2001
No ratings yet
Distributed File Systems: Arvind Krishnamurthy Spring 2001
3 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Hdfs Cartoon
No ratings yet
Hdfs Cartoon
5 pages
03 Hdfs
No ratings yet
03 Hdfs
27 pages
Hdfs
No ratings yet
Hdfs
10 pages
HDFS Comic
No ratings yet
HDFS Comic
5 pages
The Virtual File System (VFS)
No ratings yet
The Virtual File System (VFS)
60 pages
Bda Unit 5
No ratings yet
Bda Unit 5
17 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
37 pages
4.2 HDFS Federation
No ratings yet
4.2 HDFS Federation
23 pages
HDFS
No ratings yet
HDFS
20 pages
3.3 HDFS
No ratings yet
3.3 HDFS
30 pages
Huawei
No ratings yet
Huawei
32 pages
Name Node Federation
No ratings yet
Name Node Federation
3 pages
What Is Hadoop HDFS
No ratings yet
What Is Hadoop HDFS
20 pages
004 - Hadoop Daemons (HDFS Only)
No ratings yet
004 - Hadoop Daemons (HDFS Only)
3 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Hadoop
No ratings yet
Hadoop
23 pages
Hadoop File System
No ratings yet
Hadoop File System
36 pages
Module 4 - Hadoop HDFS
No ratings yet
Module 4 - Hadoop HDFS
102 pages
HDFS
No ratings yet
HDFS
16 pages
HDFSnew
No ratings yet
HDFSnew
20 pages
Unit2 HDFS
No ratings yet
Unit2 HDFS
17 pages
Lec7 Logging
No ratings yet
Lec7 Logging
4 pages
Distributed File System
No ratings yet
Distributed File System
68 pages
Unit 2
No ratings yet
Unit 2
14 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
5 pages
22 File Systems 2
No ratings yet
22 File Systems 2
28 pages
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
No ratings yet
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
17 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Transient-Snapshot Based Minimum-Process Synchronized Check Pointing Etiquette For Mobile Distributed Systems
No ratings yet
Transient-Snapshot Based Minimum-Process Synchronized Check Pointing Etiquette For Mobile Distributed Systems
6 pages
Module 1 PDF
No ratings yet
Module 1 PDF
42 pages
HDFS
No ratings yet
HDFS
16 pages
Ext3 Journaling File System: Chadd Williams Shrug 10/05/2001
No ratings yet
Ext3 Journaling File System: Chadd Williams Shrug 10/05/2001
21 pages
HDFS Presentation Kunal Yadav
No ratings yet
HDFS Presentation Kunal Yadav
11 pages
Outline: File System Consistency Issues in The Presence of Failures
No ratings yet
Outline: File System Consistency Issues in The Presence of Failures
4 pages
Fffs Camera Ready
No ratings yet
Fffs Camera Ready
14 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
44 pages
Unit-Ii Bda
No ratings yet
Unit-Ii Bda
103 pages
ZFS: The Last Word in File Systems
No ratings yet
ZFS: The Last Word in File Systems
29 pages
Speculative Execution in A Distributed File System: E. B. Nightingale P. M. Chen J. Flint
No ratings yet
Speculative Execution in A Distributed File System: E. B. Nightingale P. M. Chen J. Flint
30 pages
Issues in Distributed File Systems
No ratings yet
Issues in Distributed File Systems
10 pages
The Architecture of Open Source Applications - The Hadoop Distributed File System
No ratings yet
The Architecture of Open Source Applications - The Hadoop Distributed File System
6 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Chatgpt and Excel - Trust, But Verify
No ratings yet
Chatgpt and Excel - Trust, But Verify
15 pages
Slidesgo The Expansive Realm of Python Unlocking Versatile Applications 20240722113602rJO8
No ratings yet
Slidesgo The Expansive Realm of Python Unlocking Versatile Applications 20240722113602rJO8
12 pages
H0060912011 MARCHCert
No ratings yet
H0060912011 MARCHCert
1 page
Chatgpt Prompt For Forex Trader
100% (1)
Chatgpt Prompt For Forex Trader
2 pages
Nadeem Khan CV
No ratings yet
Nadeem Khan CV
3 pages
Nimwadi Type III AAC Block
No ratings yet
Nimwadi Type III AAC Block
11 pages
Parvez Musharraf CV
No ratings yet
Parvez Musharraf CV
1 page
Aksha Interview Questions
100% (1)
Aksha Interview Questions
52 pages
CONFG
No ratings yet
CONFG
2 pages
BSC IT TB For 5th Semester (Data Warehousing - 53) Kuvempu University
No ratings yet
BSC IT TB For 5th Semester (Data Warehousing - 53) Kuvempu University
7 pages
ةعجارم ةلئسأ Review Questions: Mohammed S. Almosawi
No ratings yet
ةعجارم ةلئسأ Review Questions: Mohammed S. Almosawi
12 pages
Icinga
No ratings yet
Icinga
81 pages
Sap PM
50% (2)
Sap PM
4 pages
Enterprise Caching Strategies For Caching at Scale
No ratings yet
Enterprise Caching Strategies For Caching at Scale
30 pages
PRD Totalchrom63
No ratings yet
PRD Totalchrom63
4 pages
Important Port Number
No ratings yet
Important Port Number
5 pages
Computer Science CLS 12
100% (1)
Computer Science CLS 12
10 pages
DBMS Unit1
No ratings yet
DBMS Unit1
48 pages
Engineering Design Met SolidWorks Studentenwerkboek (PDFDrive)
No ratings yet
Engineering Design Met SolidWorks Studentenwerkboek (PDFDrive)
158 pages
Os Module 1
No ratings yet
Os Module 1
21 pages
Moocs SQL
No ratings yet
Moocs SQL
10 pages
Project Scope Management
No ratings yet
Project Scope Management
48 pages
Suresh Nagavali Oracle EBS Technical Consultant
No ratings yet
Suresh Nagavali Oracle EBS Technical Consultant
5 pages
Dice Resume CV Shilpa Kasthala
No ratings yet
Dice Resume CV Shilpa Kasthala
5 pages
SAAD Chapter 2
No ratings yet
SAAD Chapter 2
16 pages
Cloud Computing
No ratings yet
Cloud Computing
3 pages
Koppu Eshwar Aug32
No ratings yet
Koppu Eshwar Aug32
1 page
Professional Cloud Security Engineer Free Updated Dumps
No ratings yet
Professional Cloud Security Engineer Free Updated Dumps
23 pages
Financial Accounting Hub (FAH) - Advantages
No ratings yet
Financial Accounting Hub (FAH) - Advantages
4 pages
PKI and Digital SignaturesWA.1
No ratings yet
PKI and Digital SignaturesWA.1
10 pages
INDEX
No ratings yet
INDEX
5 pages
Big Data For 5G Intelligent Network Slicing
No ratings yet
Big Data For 5G Intelligent Network Slicing
7 pages
Levels of Testing
No ratings yet
Levels of Testing
8 pages
Software Engineering - ESC501: - Prof. Poulami Dutta
No ratings yet
Software Engineering - ESC501: - Prof. Poulami Dutta
7 pages
Azure Synapse Guidebook
100% (1)
Azure Synapse Guidebook
15 pages
DICOM CommunicationModule Flyer
No ratings yet
DICOM CommunicationModule Flyer
2 pages

Checkpointing and Deepdive

Uploaded by

Checkpointing and Deepdive

Uploaded by

HDFS CEHCKPOINTING INTRODUCTION AND

let us take an example, assume I am adding a file A into the system

I am moving the file to a different location and I am deleting file A

initial FSimage assume we already had file 1, it will continue to exist

plus the effective value of what ever that is available

and generating this new FSImage, the process we call it as checkpointing

and each entry within the editlog we call it as an segment

so every transaction happening within the metadata will be added as a segment

it does the high availability of Namenode. additional to high availability support

push that particular FSImage into the active namenode

secondary Namenode communicates with the Namenode

basically it does finalizing all the changes,

FSImage so that restart of Namenode will be fast and quick

In a summary the check pointing is done by secondary Namenode or standby Namenode

so there will be a question, why can't Namenode do the checkpoint, Checkpoint is

heavily CPU intensive and IO intensive process, and we have

Namenode or to the standby Namenode. Secondary Namenode is dedicatedly made for

controlled from the configuration files

so let me check the configurations

what is the checkpoint interval

and where the checkpoint needs to be stored

and whenever the transactions, the number of transactions from editlogs, if it

should happen or not, that can also be controlled

You might also like