Checkpointing and Deepdive
Checkpointing and Deepdive
DEEPDIVE
Namenode maintains two constructs to manage the metadata, that is FSimage and
editlog
FSimage will be the actual snapshot of what is available within the Namenode
memory,
and edit log will have the historical changes happened within the metadata
I am adding another File C. So effectively I added a file and deleted the file A
so there is no point in maintaining the meta information about this file A. I can
regenerate the FSImage
at the Namenode, otherwise for backup purpose, I can take a checkpoint and
generate the new FSImage with the help of two different system, that is secondary
Namenode and standby Namenode and that process we call it as check pointing
new FSimage getting generated will have the existing metadata what ever was
available in the FSimage
and in the process or in the mean time, we have made four changes, that is
effectively file A was
added and deleted so it is not existing, so only file C will be effectively will be
existing and meta information about file C will be included
within the FSimage, so the old, the meta information as a part of old FSImage
after doing the playback of the editlog will be included as a part of FSImage
within the editlog and check point can happen by two systems
one is secondary Namenode which acts as cold backup, on regular interval it will do
the checkpoint
it will communicate with the Namenode gets the necessary files, edits fsimage
creates the new fsimage and push it back to the Namenode, the other system which
which does the check pointing is standby Namenode, which provides the hot backup
again standby Namenode can be used to generate the check pointing an a regular
interval or
through a manual trigger, we can make the standby Namenode to take the checkpoint
even though the standby Namenode is not designed to do the check pointing
it does the check pointing as well. The primary responsibility of the standby
Namenode is to
keep the high availability of Namenode, that is different topic, we will see it in
another session, as a part of check pointing, standby Namenode
needs to meet a pre condition, either on a regular interval or some trigger has to
happen through administrative commands
if any of the pre condition is met, it will create a new FSImage and md5 for that
FSimage and
it doesn't need to look for the Namenode for the changes, because standby
Namenode is a hot backup and at any point of time, it will be in sync with the
Namenode through an NFS. If we are going to use secondary Namenode to do the check
pointing
on a regular interval, the default is 60 mins. when ever the precondition happens,
that is on a regular interval or
through administrative command, the secondary Namenode will communicate with the
Namenode,
all the communication between the secondary Namenode and Namenode happens through
http
protocol. It will fetch the new FSImage, and the edits that
happened within the Namenode till that particular point. Before providing the
FSImage,
and the edits logs to the secondary namenode, what Namenode does, it rolls all the
edit logs,
what ever that happened till now and bundle it as a file and start making the entry
of new entry or any change that is happening in to that
Namenode from that point it will make it as new entry into the Namenode
so it will roll the current edit log as a file .1 or with a prefix and a new log
will get rolled. so from that particular point,
onwards what ever fsimage and edit logs available that will be brought into the
secondary namenode
and if any change happens to the FSImage, within the Namenode, that will get
loaded, into the secondary Namenode, and secondary
Namenode will playback the edit logs and apply all the changes and generate a new
FSImage
and push that new FSImage into the Namenode so that in next restart Namenode can
use the new
to pass the users accessing the files system, when the check point is happening
I cannot hold the users from accessing the data or accessing the hdfs when the
checkpoint process is going on,
so we offload the work, or the Namenode offloads the check pointing process either
to secondary
check pointing process and standby Namenode along with the high availability it can
do the check pointing as well
and the interval at which how the check point should happen, where the checkpoint
should happen, all that can be
and there will be a threshold limit to keep verifying the number of transactions
whether the checkpoint
so down the line we will see in practical, by adding files, by making changes to
hdfs,
how edit files getting updated, how the fsimage getting updated
information alert
Schedule learning time
Learning a little each day adds up. Research shows that students who make learning
a habit are more likely to reach their goals. Set time aside to learn and get
reminders using your learning scheduler.
About this course