Transient-Snapshot Based Minimum-Process Synchronized Check Pointing Etiquette For Mobile Distributed Systems
Transient-Snapshot Based Minimum-Process Synchronized Check Pointing Etiquette For Mobile Distributed Systems
2861
Deepak Chandra Uprety et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2861 – 2866
reinstatement_point is trivial as paralleled to the tentative one. method fails to register its tentative reinstatement_point, then
We put forward three phase etiquettes for CGS_assortment. M_S_Sti informs M_S_Stinand M_S_Stin issues abort.
But, in the suggested etiquette, the harmonization with the Otherwise, after the timeout of timer_tent, M_S_Sti commits
originator M_S_St is done without sending explicit the reinstatement_points of the methods of the
orchestration communications. We want to emphasize that in least_int_method_set [], which are local to its cubicle. On
all collaborative CGS_assortment schemes, available in expiry of timer_tent, if M_S_Sti does not get abort massage
literature, harmonization among methods and originator takes from M_S_Stin, it is presumed that all pertinent methods have
place by directing categorical orchestration communications captured their tentative reinstatement_points; and the
[2, 3, 4, 7]. In this way, we try to significantly diminish the etiquette should enter the third phase in which all pertinent
orchestration overhead in collaborative CGS_assortment. methods convert their tentative reinstatement_points into the
permanent ones. In this way, three-phase collaborative
In order to keep the hindering of methods bare minimum, we CGS_assortment etiquette commits without sending or
assemble the causal_depend_arrays[] (causal dependency receiving much orchestration communications. Only in the
arrays) and compute the exact least_int_method_set[] in the case of a failure, an M_S_St issues the failure communication
beginning of the etiquette as in [3]. The number of methods to M_S_Stin and M_S_Stin issues the abandon. The suggested
that register reinstatement_points is curtailed to 1) avoid etiquette may register longer time to commit. But in doing so,
arising of Mob_Nodes in doze mode of operation, 2) curtail we are saving orchestration communications to significant
whipping of Mob_Nodes with CGS_assortment action, 3) extent and no extra hindering of methods takes place due to
save limited battery life of Mob_Nodes and low bandwidth of longer commit time.
wireless channels.
2. THE PROPOSED CHECKPOINTING ALGORITHM
The new ideas used in this etiquette are given as follows. In
the suggested etiquette, the harmonization with the originator 2.1 System Model and Data Structures
M_S_St is done without sending explicit orchestration
communications. The originator M_S_St (say M_S_Stin) Our frame of reference model is similar to [4]. The list of data
collects the causal_depend_array [] of all methods, computes structures is given as follows. All data structures are adjusted
the least_int_method_set [] and broadcasts the transient on accomplishment of a CGS_assortment method, if not
reinstatement_point invitation to all M_S_Sts along with the mentioned unambiguously.
least_int_method_set[] . Suppose, M_S_Sti gets the transient
reinstatement_point invitation in the first phase from (a) Each method Pi maintains the following data
M_S_Stin. It sets its timer (timer_transient) and sends the structures, which are preferably stored on local M_S_St:
transient reinstatement_point invitation to all pertinent local
Mob_Nodes. timer_transient is the extreme permissible time p-c_s_ni
for all pertinent methods to register their transient A monotonically increasing integerreinstatement_point
reinstatement_points. On receiving the transient sequence number for each method. It is incremented by 1 on
reinstatement_point invitation, a Mob_Node registers its transient reinstatement_point.
transient reinstatement_point and sends the response to
M_S_Sti. Before the expiry of the timer_transient, if tentativei
M_S_Stigets the negative response from some Mob_Node to A flag that indicates that Pi has captured its tentative
its transient reinstatement_point invitation, then M_S_Sti reinstatement_point for the current initiation.
sends the negative response to M_S_Stin; and M_S_Stin issues
abandon communication to all M_S_Sts. Otherwise, on cdd_set []
expiry of timer_transient, if M_S_Sti does not get the positive A bit array of size n; cdd_seti [j] is set to ‘1’ if Pi receives an
response to transient reinstatement_point invitation from all application_communication from Pj such that Pi becomes
pertinent local Mob_Nodes, it informs failure communication causally dependent upon Pj for the current CI. Initially, the bit
to M_S_Stinand M_S_Stinissues abandon broadcast. array is initialized to zeroes for all methods except for itself,
Alternatively, on expiry of timer_transient, M_S_Sti issues which is initialized to ‘1’. For Mob_Nodei it is kept at local
tentative reinstatement_point invitation to the pertinent M_S_St. On global commit, cdd_set [] of all methods are
Mob_Nodes in its cubicle and sets timer_tent. On expiry of updated.
timer_transient, if M_S_Sti does not get abort massage from
M_S_Stin, it is presumed that all pertinent methods have hinderingi
captured their transient reinstatement_points; and the A flag that indicates that the method is in hindering period.
etiquette should enter the second phase in which all pertinent Set to ‘1’ when Pi receives the cdd_set [] invitation; A
methods convert their transient reinstatement_points into the method comes out of the hindering state only after taking its
tentative ones. Similarly, timer_tent is the maximum transient instatement_point if it is a member of the
allowable time for all pertinent methods to convert their least_int_method_set []; otherwise, it comes out of hindering
transient reinstatement_points into tentative ones. If some state after getting the transient reinstatement_point invitation.
2862
Deepak Chandra Uprety et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2861 – 2866
least_int_method_set [] Chkpt
A bit array of size n. Computed by taking transitive closure A flag which is set to 1 when the M_S_St receives the
of cdd_set [] of all methods with the cdd_set [] of the reinstatement_point invitation in the least_int_method
originator method. Minimum set= {Pk such that etiquette.
least_int_method_set [k] =1}.
Mss_id
r_tent [] An integer. It is unique to each M_S_St and cannot be null.
A bit array of length n. r_tent [i] is set to ‘1’ if Pi has captured
a tentative reinstatement_point. timer_transient
It shows the maximum allowable time for all pertinent
r_mut [] methods to register their transient reinstatement_points. It
A bit array of length n. r_mut [i] is set to ‘1’ if Pi has captured also includes the time in which an M_S_St informs the
a transient reinstatement_point. M_S_Stin and M_S_Stininforms all M_S_Sts.
timer1 timer_tent
A flag; set to ‘1’ when maximum allowable time for It shows the maximum allowable time for all pertinent
collecting least_int_method global reinstatement_point methods to convert their transient reinstatement_points into
expires. tentative ones. It also includes the time in which an M_S_St
informs the M_S_Stin and M_S_Stininforms all M_S_Sts.
(c) Each M_S_St (including originator_M_S_St)
maintains the following data structures 2.2 Proposed Algorithm
out of hindering state after getting the least_int_method_set []. causal_depend_arrays[]. At time t1, P5 receives the
At this point, we conclude that this method is not going to be causal_depend_arrays[] from all methods and computes the
included in the minimum set. It should be noted that the least_int_method_set[] which is {P4, P5, P6}. For the sake of
hindering time of a method is bare minimum. simplicity, the control communications by which the methods
send their causal_depend_arrays[] to the originator method P5
On receiving the transient reinstatement_point invitation are not shown in the Figure 1. P5 sends least_int_method_set
along with the least_int_method_set [], an M_S_St, say []to all methods and registers its own transient
M_S_Stj, registers the following actions. It sets the timer reinstatement_point C51. On receiving least_int_method_set[]
timer_transient; sends the transient reinstatement_point , a method records its transient reinstatement_point if it is a
invitation to Pi only if Pi belongs to the least_int_method_set member of least_int_method_set[]. When P4 and P6 get the
[] and Pi is running in its cubicle. On receiving the least_int_method_set [], they find themselves to be the
reinstatement_point invitation, Pi registers its transient members of the least_int_method_set []; therefore, they
reinstatement_point and informs M_S_Stj. On receiving register their transient reinstatement_points, C41 and C61,
positive response from Pi , M_S_Stj updates p-c_s_ni, resets respectively. When P1, P2 and P3 get the least_int_method_set
hinderingi, and sends the buffered [], they find that they do not have its place in
application_communications to Pi, if any. Alternatively, If Pi least_int_method_set [], therefore, they do not register their
is not in the least_int_method_set [] and Pi is in the cubicle of transient reinstatement_points. It should be noted that these
M_S_Stj, M_S_Stj resets hinderingi and sends the buffered methods have not sent any application_communication to any
application_communication to Pi , if any. For a disconnected method of the least_int_method_set []. In other words, P5 is
Mob_Node, that is a member of least_int_method_set [], the not transitively dependent upon them. Therefore, for the sake
M_S_St that has its disconnected reinstatement_point, of consistency, it is not necessary for them to register their
converts its disconnected reinstatement_point into the reinstatement_points in the current initiation.
required one.
A method comes into the hindering state immediately after
During hindering period, Pi processes m, received from Pj , if sending the cdd_set [] []. A method comes out of the
following conditions are met: (i) (! bufferi) i.e. Pi has not hindering state only after taking its transient
buffered any application_communication (ii) reinstatement_point if it is a member of the
(m.psn<=c_s_n[j]) i.e. Pj has not registered its least_int_method_set []; otherwise, it comes out of hindering
reinstatement_point before sending m (iii) (cdd_set[] i[j]=1) state after getting the least_int_method_set[]. We want to say
Pi is already dependent upon Pj in the current CI or Pj has that the hindering time of a method in this etiquette is
captured some permanent reinstatement_point after sending negligibly small. Moreover, a method is allowed to perform
m. Otherwise, the local M_S_St of Pi buffers m for the its normal computation, send application_communications
hindering period of Pi and sets bufferi. and partially receive them during the hindering period. For
example, P5 receives m4 during its hindering period. As
On expiry of timer_transient, if M_S_Stj does not get the cdd_set [] 5[6]=1 due to m2, and receive of m4 will not alter
positive response to transient reinstatement_point invitation cdd_set[] 5[]; therefore P5 methods m4. P2 receives m15 from
from all pertinent local Mob_Nodes, it informs failure P3 during its hindering period; cdd_set[]2[3]=0 and the
communication to M_S_Stinand M_S_Stinissues abort. receiver of m15 can alter cdd_set[]2; therefore, P2 buffers m15.
Alternatively, on expiry of timer_transient, M_S_Stj issues Similarly, P4 buffers m16. P4 dispenses m16 only after taking its
tentative reinstatement_point invitation to the pertinent transient reinstatement_point C41. P2 dispenses m15 after
Mob_Nodes in its cubicle and sets timer_tent. getting the least_int_method_set []. P4 dispenses m7, because,
at this moment, it not in the hindering state. Similarly, P4
If some method fails to register its tentative processes m8.
reinstatement_point, then M_S_Stj informs M_S_Stin and
M_S_Stin issues abort. Otherwise, after the timeout of
On getting the transient reinstatement_point invitation, a
timer_tent, M_S_Stj commits the reinstatement_points of the method, say P6, sets the timer timer_transient. If P6 fails to
methods of the least_int_method_set [] which are local to its register its transient reinstatement_point, it informs P5 and P5
cubicle. On expiry of timer_tent, if M_S_Sti does not get will issue abort. In this way, if any method fails to register its
abort massage from M_S_Stin, it is presumed that all pertinent reinstatement_point in harmonization with others in the first
methods have captured their tentative reinstatement_points phase, then all the methods need to abort their transient
successfully; and the etiquette should enter the third phase in reinstatement_points only and not the tentative
which all pertinent methods convert their tentative reinstatement_points as in other etiquettes [2, 3, 4]. In this
reinstatement_points into the permanent ones. way, we are able to significantly diminish the forfeiture of
We explain the recommended least_int_method CGS_assortment effort in case of a failure during
CGS_assortment etiquette with the help of an example. In CGS_assortment. On the other hand, on timeout of
Figure 1, at time t0, P5 initiates CGS_assortment procedure timer_transient and no abort communication from P5, it is
and sends invitation to all methods for their presumed that all pertinent methods have captured their
2864
Deepak Chandra Uprety et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2861 – 2866
transient reinstatement_points successfully and the etiquette 2*0.8=1.6 ms. In the suggested etiquette, selective incoming
should enter into the second phase. Therefore, P6 converts its application_communications at a method are blocked during
transient reinstatement_point into tentative one and sets the its hindering period. We consider the worst case in which all
timer timer_tent. If P6 fails to convert its transient incoming application_communications are blocked. Blocking
reinstatement_point into tentative one, it informs P5 and P5 period in the suggested scheme is negligibly small; therefore
will issue abort. Similarly, if any other method fails to register the number of application_communications blocked in the
its transient reinstatement_point, it will inform P5 and P5 will etiquettes is insignificant [Refer Table 1]. It should be noted
act accordingly. Otherwise, on timeout of timer_tent, P6 that the number of application_communication blocked
converts its tentative reinstatement_point into permanent one. during CGS_assortment depends upon the
on timeout of timer_tentand no abort communication from application_communication sending rate and the capacity of
applicable methods, it is presumed that all pertinent methods the static communication link. Referring Table 1, we can say
have captured their tentative reinstatement_points that the no. of application_communications buffered during
successfully and the etiquette should enter into the second CGS_assortment in the suggested etiquette is negligibly
phase. In this way, we commit the reinstatement_points small.
without much harmonization.
Table 1: Average number of communications buffered during
2.3 Performance Analysis of the Proposed Protocol CGS_assortment
2865
Deepak Chandra Uprety et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2861 – 2866
3. Guohong Cao and Mukesh Singhal, “Mutable 17. J.L. Kim, T. Park, “An efficient Protocol for
Checkpoints: A New Checkpointing Approach for checkpointing Recovery in Distributed Systems”,
Mobile Computing Systems”, IEEE Transaction On IEEE Trans. Parallel and Distributed Systems, pp.955-
Parallel and Distributed Systems, vol. 12, no. 2, pp. 157- 960, Aug.1993.
171, February2001. 18. Mansouri, H., Pathan, A-S.K.: Review of checkpointing
4. Guohong Cao and Mukesh Singhal, “On Coordinated and rollback recovery protocols for mobile distributed
Checkpointing in Distributed Systems” IEEE computing systems. In: Ghosh, U., Rawat D.B., Datta, R.,
Transaction on Parallel and Distributed Systems, vol. 9, Pathan, A-S. K (eds.) Internet of Things and Secure
no. 12, pp. 1213-1224, December 1998. Smart Environments: Successes and Pitfalls, CRC Press,
5. Weigang Ni, Susan V. Vrbsky and Sibabrata Ray Taylor & Francis Group (2020).
“Pitfalls in Distributed Non-blocking Checkpointing”, 19. Mansouri, H., Pathan, A.-S.K.: Checkpointing distributed
University of Alabama. application running on mobile Ad Hoc networks. Int. J.
6. Prakash R. and Singhal M. “Maximal Global Snapshot High Perform. Comput. Networking 11(2), 95–107
with concurrent initiators”, Proc. Sixth IEEE Symp. (2018).
Parallel and Distributed Processing, pp.344-351, 20. Mansouri, H., Pathan, A.-S.: A resilient hierarchical
Oct.1994. checkpointing algorithm for distributed systems running
7. Koo. R. and S. Toueg. “Checkpointing and Rollback- on cluster federation. In: Thampi, S.M., Martinez Perez,
Recovery for Distributed Systems”. IEEE Transactions G., Ko, R., Rawat, D.B. (eds.) SSCC 2019. CCIS, vol.
on Software Engineering, SE-13(1):23-31, January1987. 1208, pp. 99–110. Springer, Singapore (2020).
8. Bidyut Gupta, S. Rahimi and Z. Lui. “A New High
Performance Checkpointing Approach for Mobile
Computing Systems”. IJCSNS International Journal of
Computer Science and Network Security, Vol.6 No.5B,
May 2006.
9. Acharya A. and Badrinath B. R., “Checkpointing
Distributed Applications on Mobile Computers”,
Proceedings of the 3rd International Conference on
Parallel and Distributed Information Systems, pp. 73-80,
September,1994.
10. Ch. D.V. Subba Rao and M. M. Naidu. “A New,
Efficient Coordinated Checkpointing Protocol
Combined with Selective Sender-Based Message
Logging”.
11. Nuno Neves and W. Kent Fuchs. “Adaptive Recovery
for Mobile Environments”, in Proc. IEEE High-
Assurance Systems Engineering Workshop, October 21-
22, 1996, pp.134-141.
12. Y. Manable. “A Distributed Consistent Global
Checkpoint Algorithm With minimum number of
Checkpoints”. Technical Report of IEICE, COMP97-
6(April1997).
13. J. L. Kim and T. Park. “An efficient protocol for
checkpointing recovery in Distributed Systems” IEEE
Transaction on Parallel and Distributed Systems,4(8):
pp.955-960, Aug 1993.
14. Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson D.B.,
“Survey of Rollback-Recovery Protocols in Message-
Passing Systems”, ACM Computing Surveys, vol. 34, no.
3, pp. 375-408, 2002.
15. S. Venkatesan and T.T.-Y. Juang, “Low Overhead
Optimistic Crash Recovery”, Preliminary version
appears in Proc. 11th Int’l Conf. Distributed Computing
Systems as “Crash Recovery with Little Overhead,”
pp.454- 461,1991.
16. Parveen Kumar, Lalit Kumar, R K Chauhan, “A Non-
intrusive Hybrid Synchronous Checkpointing
Protocol for Mobile Systems”, IETE Journal of
Research, Vol. 52 No. 2 & 3, 2006.
2866