ES Unit 4 QBWA
ES Unit 4 QBWA
DEPARTMENT OF ECE
EC T72 – EMBEDDED SYSTEMS
Unit – 4 Reliability and Clock Synchronization
Type: 100% Theory
Question Bank with Answers
PART - A ANSWERS
1. Define reliability.
Let the random variable X be the lifetime or the time to failure of a component. The probability that
the component survives until sometime t is called the reliability R(t) of the component.
R(t) = (X>t) =1-F(t)
The component is assumed to be working properly at time t=0 and no component can work forever
without failure.
2. What is hazard rate?
The Hazard/Instantaneous Failure Rate measures the dynamic (instantaneous) speed of failures. To
understand the hazard function we need to review conditional probability and conditional density
functions (very similar concepts).
Hazard measures the conditional probability of a failure given the system is currently working. The
failure density (pdf) measures the overall speed of failures.
3. Elucidate fault latency.
The Fault latency is the duration between the onset of a fault and its manifestation as error. This
duration can be considerable.
Since the faults are invisible, only showing themselves when they cause errors, such latency, we
see, impact reliability of allover the system.
4. Mention the types of faults.
There are three fault types. They are
Transient Faults
Intermittent Faults
Permanent Faults
Unit – 4 Page 1 of 9
5. Define transient faults.
Let a and b be the permanent and transient failure rates, respectively, of each processor, Failures are
assumed to occur as a Poisson process. Intermittent failures are ignored in this model.
We assume that all the failures are independent of one another, and that faults manifest themselves
immediately (i.e., the fault latency is zero).
Processors that are taken offline due to a fault being detected are tested continuously to see if their
fault is transient, and if it is, whether it has died away.
6. Draw the structure of correction of propagation delays.
Unit – 4 Page 2 of 9
are immediately identified and disconnect from the system.
As a result, the system will always consist of good processors only, and will continue to function
until it has fewer than two function processors.
19. Write short notes on reliability evaluation technique.
Computers used in life-critical applications must be so reliable that they cannot be validate by
experiment alone so mathematical models are used for reliability.
We construct a mathematical model of the real-time computer, and solve it. We are adding one
possible source of error for the assumptions of the mathematical model.
If these are not correct, then results in our model. (i.e) reliability evaluation techniques are
introduced.
20. Draw markov chain model for NMR cluster.
Unit – 4 Page 3 of 9
The voter becomes less reliable as the cluster size increase; an increase in the cluster size can
actually decrease the reliability of the cluster.
25. Sketch the slave – master interaction in read clock request.
Unit – 4 Page 4 of 9
30. State drift rate.
The drift rate is the rate at which the clock can gain or lose time. If ρ is the maximum drift rate of
non fault clock, then
(1-ρ) (t1-t2) ≤ C(t2) – C (t1) ≤ (1+ρ) (t1-t2)
PART – B ANSWERS
distribution is Fl(t)=1-exp(-[λt] α). We will denote by fl(t)the associated density function (we will
assume here that Fl(t) is differentiable).
The hazard rate h(t) of a component with age t is defined as the rate of failure at time t, given
that it has not failed up to time t. We can use Bayes’s law to express the hazard rate as function
of the lifetime distribution function.
If failure process is poission with rate λ (i.e., if the lifetime distribution is exponentially
distributed is exponentially distributed with mean 1/ λ), then the hazard rate is h(t) is λ
The hazard rate is thus independent of the age of the component if the failure process is
poission. If the failure process is Weibull with shape and scale parameters α and λ,
respectively.
If 0< α<1, then h(t) decrease with time. This means that the failure rate of a component drops as it
ages. Component with decreasing hazard rates are said to have the used- better-than-new property.
Unit – 4 Page 5 of 9
If α =1, the failure process is Poisson. If α > 1, h(t) increase with time; that is, the failure rate
increase with age, and such components have the new-better-than-used property.
The rate then becomes approximately constant, before aging effects set in and cause the hazard
rate to rise with age.
.
2. Discuss in detail about fault tolerant synchronization algorithms.
3. With neat diagram, explain about completely connected zero propagation time system.
4. Elaborate about sparse interconnected zero propagation time system.
5. Briefly explain non-fault tolerant synchronization algorithms.
6. Explain about transient faults in detail.
Let a and b be the permanent and transient failure rates, respectively, of each processor, Failures are
assumed to occur as a Poisson process. Intermittent failures are ignored in his model.
We assume that all the failures are independent of one another, and that faults manifest themselves
immediately (i.e., the fault latency is zero).
Processors that are taken offline due to a fault being detected are tested continuously to see if their
fault is transient, and if it is, whether it has died away.
If this is the case, the processors are inducted back into the cluster. Let the time between when a
processor suffers transient failure and when it is brought back on line be exponentially distributed
with mean 1/e.
We will ignore the time it takes to reintegrate it into the system; this can be taken into account by
assuming it to be part of the delay time. Here we assume that system failure occurs when there are
fewer than two operational processors.
Once again, we use a Markov chain. However, unlike in the previous case where processors could
only be in one of two states, permanently failed and good, in this model they can be in one of three
states: permanently failed, currently offline due to transient failure, and good.
Since the total number of processor is fixed at N, we need two state variables to denote the state of
the system. Let us denote the state by (s1, s2) where s1 and s2 denote, respectively, the number of
functional processors and the number of processor currently undergoing transient failure.
The number of processor that have failed permanently is N- s1-s2.the markov chain for this model
is shown below.
Unit – 4 Page 6 of 9
While it may look complicated, generating this chain is quite simple. Let us consider transition out
of state i, j. In this state, we have I functional processors and j processors that are currently suffering
transient failure.
The rest of the processors, numbering N - i - j have suffered permanent failure. Of course, the
system does not know whether a failed processor is suffering a transient or a permanent failure.
It will keep trying to run tests on all the failed processors, and if a previously failed processor
recovers, it will pass the test.
The i functional processors may suffer either permanent or transient failure. The permanent failure
rate per processors is a, so the overall rate due to permanent failure out of state i, j (and into state i-
1, j) is ia.
Similarly, the overall rate due to transient failure out of state i, j (and into state i-1, j+1) is ib. Even
processors that are currently suffering transient failures are not immune to permanent failures; this
explains the transition from state i, j to j-1 with a rate of ja.
Transient faults die away in an exponentially distributed duration of mean 1/e, and so the rate out of
state i, j to i+1, uj-1 is given by je.
When only two processors are functionally, the failure of any one of these spells failure for the
whole system, which explains the transition to the FAIL state.
It only remains for us to write the differential equations connected with this process. This can be
done by inspection of the Markov chain.
Let π i, j(t) denote the probability of being in state i, j ≤ N. if I < 0, j<0, or i+ j > N, define πi,j(t)=0.
7. Write a detailed note on permanent faults in detail.
NMR clusters.
Consider an N-modular-redundant cluster. For the moment, let us assume that the fault latency is
zero (i.e., faults start generating errors immediately when they arrive) and that faulty processors
are immediately identified and disconnect from the system.
As a result, the system will always consist of good processors only, and will continue to function
until it has fewer than two function processors.
Combinatorial model
The system will fail only if there are fewer than two functional Processors Left in the system.
Unit – 4 Page 7 of 9
Since there is no repair and all the failures are assumed to be permanent, the failure Probability can
be found by counting (hence the term combinatorial) all the various ways in which fewer than two
processors are left and weighting each by its Probability of occurrence.
The probability that an individual processor suffers failure some time in an interval of duration t is
given by Fl(t)=1-exp(-λt).
Markov chain model
Markov chain models, while more complex than combinatorial models for such simple cases, are
the solution method of choice when the system are more complex.
The system can be modeled as a Markov chain as shown in figure below, where the states
represent the number of functional processors.
Since failed units are removed immediately from the system, and there is no repair, we have a
“pure-death process”. this chin is identical to one discussed
Unit – 4 Page 8 of 9
Block R1 receives the kth ticks from ci and cj at real times cki and ckj + d(j,i),respectively, while
block R2 receives these signals at times cki + d(i,j) + d(j,i) and ckj + d(j,i) respectively.
The skews as determine by R1 and R2 are therefore
cki - ckj - d(j,i) and cki +d(i,j)+d(j,i)- ckj - d(j,i),respectively.
Averaging them
If d(i,j)=d(j,i) , the output of the average approximates the correct skew, cki - ckj
Unit – 4 Page 9 of 9