0% found this document useful (0 votes)
9 views9 pages

ES Unit 4 QBWA

Uploaded by

shreesamyuktha.v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

ES Unit 4 QBWA

Uploaded by

shreesamyuktha.v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

CHRIST COLLEGE OF ENGG & TECH, PUDUCHERRY

DEPARTMENT OF ECE
EC T72 – EMBEDDED SYSTEMS
Unit – 4 Reliability and Clock Synchronization
Type: 100% Theory
Question Bank with Answers
PART - A ANSWERS

1. Define reliability.
 Let the random variable X be the lifetime or the time to failure of a component. The probability that
the component survives until sometime t is called the reliability R(t) of the component.
R(t) = (X>t) =1-F(t)
 The component is assumed to be working properly at time t=0 and no component can work forever
without failure.
2. What is hazard rate?
 The Hazard/Instantaneous Failure Rate measures the dynamic (instantaneous) speed of failures. To
understand the hazard function we need to review conditional probability and conditional density
functions (very similar concepts).
 Hazard measures the conditional probability of a failure given the system is currently working. The
failure density (pdf) measures the overall speed of failures.
3. Elucidate fault latency.
 The Fault latency is the duration between the onset of a fault and its manifestation as error. This
duration can be considerable.
 Since the faults are invisible, only showing themselves when they cause errors, such latency, we
see, impact reliability of allover the system.
4. Mention the types of faults.
 There are three fault types. They are
 Transient Faults
 Intermittent Faults
 Permanent Faults

Unit – 4 Page 1 of 9
5. Define transient faults.
 Let a and b be the permanent and transient failure rates, respectively, of each processor, Failures are
assumed to occur as a Poisson process. Intermittent failures are ignored in this model.
 We assume that all the failures are independent of one another, and that faults manifest themselves
immediately (i.e., the fault latency is zero).
 Processors that are taken offline due to a fault being detected are tested continuously to see if their
fault is transient, and if it is, whether it has died away.
6. Draw the structure of correction of propagation delays.

7. Elucidate clock synchronization.


8. Define redundancy.
9. Enumerate the advantages of hardware synchronization.
10. Write about slave – master interaction in read clock request.
11. Define synchronization in hardware.
12. Elucidate permanent faults only.
13. What is the need for reliability evaluation?
14. Mention the condition for correctness.
15. List the disadvantages of hardware synchronization.
16. Write short notes on synchronization in software.
17. Enumerate the types of redundancy.
18. Define NMR clusters.
 Consider an N-modular-redundant cluster. For the moment, let us assume that the fault latency is
zero (i.e., faults start generating errors immediately when they arrive) and that faulty processors

Unit – 4 Page 2 of 9
are immediately identified and disconnect from the system.
 As a result, the system will always consist of good processors only, and will continue to function
until it has fewer than two function processors.
19. Write short notes on reliability evaluation technique.
 Computers used in life-critical applications must be so reliable that they cannot be validate by
experiment alone so mathematical models are used for reliability.
 We construct a mathematical model of the real-time computer, and solve it. We are adding one
possible source of error for the assumptions of the mathematical model.
 If these are not correct, then results in our model. (i.e) reliability evaluation techniques are
introduced.
20. Draw markov chain model for NMR cluster.

21. What is the need for synchronization?


22. What are the impacts of faults?
23. Elucidate combinatorial model.
 The system will fail only if there are fewer than two functional Processors Left in the system.
 Since there is no repair and all the failures are assumed to be permanent, the failure Probability can
be found by counting (hence the term combinatorial) all the various ways in which fewer than two
processors are left and weighting each by its Probability of occurrence.
 The probability that an individual processor suffers failure some time in an interval of duration t is
given by Fl(t)=1-exp(-λt).
24. State Voter reliability.
 There are two typical designs for voter reliability, one in which there is exactly one voter providing
output for the cluster, and the second in which there are N voters, one per processor.
 We focus on the first design, and leave the second as an exercise. The system will fail whenever
fewer than two processors are functioning or the voter fails.

Unit – 4 Page 3 of 9
 The voter becomes less reliable as the cluster size increase; an increase in the cluster size can
actually decrease the reliability of the cluster.
25. Sketch the slave – master interaction in read clock request.

26. Define PLL.


 A phase-locked loop (PLL) is an electronic circuit with a voltage or voltage-driven oscillator that
constantly adjusts to match the frequency of an input signal.
 PLLs are used to generate, stabilize, modulate, demodulate, filter or recover a signal from a "noisy"

communications channel where data has been interrupted.


27. Mention the types of interconnection system.
 The types of interconnection system are
 Completely connected, zero propagation time system
 Sparse interconnection, zero propagation time system
28. Draw the structure of PLL using synchronization model.

29. What are propagation delays?


 Propagation delay, is the time required for a digital signal to travel from the input(s) of a logic
gate to the output. It is measured in microseconds (µs), nanoseconds (ns), or picoseconds (ps).
 The propagation delay for an integrated circuit (IC) logic gate may differ for each of the inputs. If
all other factors are held constant, the average propagation delay in a logic gate IC increases as the
complexity of the internal circuitry increases.

Unit – 4 Page 4 of 9
30. State drift rate.
 The drift rate is the rate at which the clock can gain or lose time. If ρ is the maximum drift rate of
non fault clock, then
(1-ρ) (t1-t2) ≤ C(t2) – C (t1) ≤ (1+ρ) (t1-t2)
PART – B ANSWERS

1. Explain reliability models for hardware redundancy.


 The most difficult problem in reliability modeling is to keep the complexity of the models
sufficiently small.
 When the various parameters of the model are exponentially distributed result in an unacceptable
complexity for all current techniques to reduce the complexity of such models consist largely of
state aggregation, in which multiple states are grouped together and treated as a single state; and
decomposition, in which the overall model is broken down into sub models, each sub model is
solved.
 The reliability of components is usually specified through a probability distribution
function of the lifetime of those components.
 For example, if failures occur as Poisson process with rate λ, the life time distribution is given by
Fl(t)=1-exp(-λt).
 If failure occur as a Weibull process with a SHAPE parameter α and scale parameter, the lifetime

distribution is Fl(t)=1-exp(-[λt] α). We will denote by fl(t)the associated density function (we will
assume here that Fl(t) is differentiable).
 The hazard rate h(t) of a component with age t is defined as the rate of failure at time t, given
that it has not failed up to time t. We can use Bayes’s law to express the hazard rate as function
of the lifetime distribution function.
 If failure process is poission with rate λ (i.e., if the lifetime distribution is exponentially
distributed is exponentially distributed with mean 1/ λ), then the hazard rate is h(t) is λ
 The hazard rate is thus independent of the age of the component if the failure process is
poission. If the failure process is Weibull with shape and scale parameters α and λ,
respectively.
 If 0< α<1, then h(t) decrease with time. This means that the failure rate of a component drops as it
ages. Component with decreasing hazard rates are said to have the used- better-than-new property.

Unit – 4 Page 5 of 9
 If α =1, the failure process is Poisson. If α > 1, h(t) increase with time; that is, the failure rate
increase with age, and such components have the new-better-than-used property.
 The rate then becomes approximately constant, before aging effects set in and cause the hazard
rate to rise with age.
.
2. Discuss in detail about fault tolerant synchronization algorithms.
3. With neat diagram, explain about completely connected zero propagation time system.
4. Elaborate about sparse interconnected zero propagation time system.
5. Briefly explain non-fault tolerant synchronization algorithms.
6. Explain about transient faults in detail.
 Let a and b be the permanent and transient failure rates, respectively, of each processor, Failures are
assumed to occur as a Poisson process. Intermittent failures are ignored in his model.
 We assume that all the failures are independent of one another, and that faults manifest themselves
immediately (i.e., the fault latency is zero).
 Processors that are taken offline due to a fault being detected are tested continuously to see if their
fault is transient, and if it is, whether it has died away.
 If this is the case, the processors are inducted back into the cluster. Let the time between when a
processor suffers transient failure and when it is brought back on line be exponentially distributed
with mean 1/e.
 We will ignore the time it takes to reintegrate it into the system; this can be taken into account by
assuming it to be part of the delay time. Here we assume that system failure occurs when there are
fewer than two operational processors.
 Once again, we use a Markov chain. However, unlike in the previous case where processors could
only be in one of two states, permanently failed and good, in this model they can be in one of three
states: permanently failed, currently offline due to transient failure, and good.
 Since the total number of processor is fixed at N, we need two state variables to denote the state of
the system. Let us denote the state by (s1, s2) where s1 and s2 denote, respectively, the number of
functional processors and the number of processor currently undergoing transient failure.
 The number of processor that have failed permanently is N- s1-s2.the markov chain for this model
is shown below.

Unit – 4 Page 6 of 9
 While it may look complicated, generating this chain is quite simple. Let us consider transition out
of state i, j. In this state, we have I functional processors and j processors that are currently suffering
transient failure.
 The rest of the processors, numbering N - i - j have suffered permanent failure. Of course, the
system does not know whether a failed processor is suffering a transient or a permanent failure.
 It will keep trying to run tests on all the failed processors, and if a previously failed processor
recovers, it will pass the test.
 The i functional processors may suffer either permanent or transient failure. The permanent failure
rate per processors is a, so the overall rate due to permanent failure out of state i, j (and into state i-
1, j) is ia.
 Similarly, the overall rate due to transient failure out of state i, j (and into state i-1, j+1) is ib. Even
processors that are currently suffering transient failures are not immune to permanent failures; this
explains the transition from state i, j to j-1 with a rate of ja.
 Transient faults die away in an exponentially distributed duration of mean 1/e, and so the rate out of
state i, j to i+1, uj-1 is given by je.
 When only two processors are functionally, the failure of any one of these spells failure for the
whole system, which explains the transition to the FAIL state.
 It only remains for us to write the differential equations connected with this process. This can be
done by inspection of the Markov chain.

 Let π i, j(t) denote the probability of being in state i, j ≤ N. if I < 0, j<0, or i+ j > N, define πi,j(t)=0.
7. Write a detailed note on permanent faults in detail.
NMR clusters.
 Consider an N-modular-redundant cluster. For the moment, let us assume that the fault latency is
zero (i.e., faults start generating errors immediately when they arrive) and that faulty processors
are immediately identified and disconnect from the system.
 As a result, the system will always consist of good processors only, and will continue to function
until it has fewer than two function processors.
Combinatorial model
 The system will fail only if there are fewer than two functional Processors Left in the system.

Unit – 4 Page 7 of 9
 Since there is no repair and all the failures are assumed to be permanent, the failure Probability can
be found by counting (hence the term combinatorial) all the various ways in which fewer than two
processors are left and weighting each by its Probability of occurrence.
 The probability that an individual processor suffers failure some time in an interval of duration t is
given by Fl(t)=1-exp(-λt).
Markov chain model
 Markov chain models, while more complex than combinatorial models for such simple cases, are
the solution method of choice when the system are more complex.
 The system can be modeled as a Markov chain as shown in figure below, where the states
represent the number of functional processors.
 Since failed units are removed immediately from the system, and there is no repair, we have a
“pure-death process”. this chin is identical to one discussed

8. Explain about fault tolerant using signal propagation delays.


 We have assumed that signal-propagation times are negligible. This assumption is true if the
geographical extent of the system is not large.
 If Φ is the nominal clock frequency and θ is the minimum phase difference that can be resolved by
the reference circuitry, signal propagation delays are negligible if they are less than θ/2π Φ.
 If propagation times are greater than θ/2π Φ, we must design the system to compensate for them. If
there is a large variation in the propagation time between the various clock pairs, failure to correct
for it can result in the formation of multiple non overlapping cliques.
 For example, consider c1 and c2 are faulty, because the propagation time from c1,c2 to clocks c3,
…..,c8 is much less than that to clocks c9,….c15.
 If the connections are point-to-point and dedicated to transmitting clock pulses, it is possible to
accurately estimate propagation delays between clocks.If these dealys can be timed during the
design process, the reference can correct for them.
 A second approach involves the estimation of the delays during operation, at the cost of doubling
the number of lines interconnecting the clocks. The scheme is the propagation delay from clock ci
to clock cj.
 When clock cj receives a clock tick from ci, it immediately sends it back to ci on a special line
provided. Blocks R1 and R2 determine the skew between the signals that they receive.
 Let cki and ckj be the real times when clocks ci and cj deliver their k th ticks.

Unit – 4 Page 8 of 9
 Block R1 receives the kth ticks from ci and cj at real times cki and ckj + d(j,i),respectively, while
block R2 receives these signals at times cki + d(i,j) + d(j,i) and ckj + d(j,i) respectively.
 The skews as determine by R1 and R2 are therefore
cki - ckj - d(j,i) and cki +d(i,j)+d(j,i)- ckj - d(j,i),respectively.
 Averaging them

 If d(i,j)=d(j,i) , the output of the average approximates the correct skew, cki - ckj

Unit – 4 Page 9 of 9

You might also like