0% found this document useful (0 votes)

32 views7 pages

Using An Innovative SoC-level FMEA Methodology To

The document discusses an innovative methodology for performing a Failure Mode and Effects Analysis (FMEA) at the System-on-Chip (SoC) level in compliance with IEC 61508. The methodology identifies 'sensible zones' from the RTL design and uses tools to extract these zones and compute metrics like Diagnostic Coverage required by IEC 61508.

Uploaded by

Edwin Ramirez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views7 pages

Using An Innovative SoC-level FMEA Methodology To

Uploaded by

Edwin Ramirez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221339759

Using an innovative SoC-level FMEA methodology to design in compliance

with IEC61508

Conference Paper · April 2007

DOI: 10.1145/1266366.1266472 · Source: DBLP

CITATIONS READS

38 708

3 authors, including:

Riccardo Mariani
YOGITECH SpA
61 PUBLICATIONS 564 CITATIONS

SEE PROFILE

All content following this page was uploaded by Riccardo Mariani on 08 November 2014.

The user has requested enhancement of the downloaded file.

Using an innovative SoC-level FMEA methodology
to design in compliance with IEC61508

Riccardo Mariani, Gabriele Boschi, Federico Colucci

YOGITECH SpA
Pisa, Italy
http://www.yogitech.com

Abstract systematic, random or malicious faults [1,2], how to

make such systems more robust ?
This paper proposes an innovative methodology to For automotive, aerospace, biomedical and similar
perform and validate a Failure Mode and Effects applications where the human life is concerned, safety is
Analysis (FMEA) at System-on-Chip (SoC) level. This is the driving factor. In such context, fault-oriented quality
done in compliance with the IEC 61508, an international metrics (e.g. ppms) are not enough since they mostly
norm for the functional safety of electronic safety-related confine the reliability issues to the semiconductor duty.
systems, of which an overview is given in the paper. The International norms exist to define requirements for
methodology is based on a theory to decompose a digital safety, such the IEC61508 for functional safety of
circuit in “sensible zones” and a tool that automatically electrical/electronic/programmable electronic safety-
extracts these sensible zones from the RTL description. It related systems [3,4] or its “customization” to the
includes as well a spreadsheet to compute the metrics automotive field, the ISO26262, still in the preliminary
required by the IEC norm such Diagnostic Coverage and definition phase. Therefore, designers of electronic
Safe Failure Fraction. The FMEA results are validated systems to be used in safety-critical applications should
by using another tool suite including a fault injection take into account these requirements and adapt their
environment. The paper explains how to take benefits of architectures. It is worth to note that the IEC61508
the information provided by such approach and as introduces also requirements in terms of design flows
example it is described how the methodology has been and validation criteria, so all the implementation process
applied to design memory sub-systems to be used in fault – from specs to verification and validation - should be
robust microcontrollers for automotive applications. adapted accordingly.
This methodology has been approved by TÜV-SÜD as However, these norms generally refer to complete
the flow to assess and validate the Safe Failure Fraction system and not to System-On-Chips: even if they contain
of a given SoC in adherence to IEC 61508 also guidelines and requirements for the system
components (including CPUs, memory systems, bus
infrastructure and so on) and even if an extension of
1. Introduction IEC61508 to ASIC is likely to appear in the next
New technologies allow deep system integration in months, nevertheless it doesn’t exist yet a consolidated
automotive as well: replacing mechanics with electronics methodology to systematically transport at SoC level the
becomes to be a reality. Therefore, automotive System IEC61508 requirements. For instance, it’s not so trivial
on Chip (SoC) are more and more complex: they have a to compute the Safe Failure Fraction (SFF, better defined
mix of commodity and safety functions, an increased use in the next section) of a SoC and also the extension of
of third-parties IPs and complex interconnection system-level methods such Failure Mode and Effects
scenarios. On the other side, as a consequence of such Analysis (FMEA) to SoC is still confined to low-
increased complexity, the population of faults is complexity integrated circuits or to basic critical points
increasing as well. These include: modelling uncertainty, such muxing or digital-to-analog interfaces.
functional verification holes, unforeseen interactions and This paper shows how to make use of the FMEA at
System-on-Chip (SoC) level as well and how to take
misuse, specification misunderstanding, more
benefits of the information provided by such analysis to
electromagnetic susceptibility, soft-errors and malicious
implement a structured approach to increase the
accesses. In particular, hardware faults (systematic or
robustness of the SoC. This is done in compliance with
random) are worsened by: the increased soft-error failure the IEC61508, i.e. taking into account the failure modes
rates (i.e. cosmic rays); coupling effects and disturbances and requirements therein described and it allows
are more and more important; and intrinsic uncertainty extracting the main metrics required by the norm. It will
due to model inaccuracy is a problem of new be also described how the proposed methodology has
technologies. If we define "robustness" as the ability to been applied to design memory sub-systems to be used
continue mission reliably despite the existence of

978-3-9810801-2-4/DATE07 © 2007 EDAA

in fault-robust microcontrollers for automotive In a first step, a set of “sensible zones” are identified
applications. from the RTL description. A sensible zone is one of the
elementary failure points of the SoC in which one or
2. IEC61508 basic concepts more faults converge to lead a failure (Figure 1).
Valid definitions of sensible zones are:
The basic concept of IEC61508 is the definition of • Memory elements such registers, flip-flops or
“Safety Integrity Level” (SIL), i.e. the discrete level (one variables. These sensible zones are points where
out of a possible four) for specifying the safety integrity many kinds of faults converge. Example: stuck-at
requirements of the safety functions to be allocated to or bridging faults in the combination logic
the safety-related systems, where safety integrity level 4 generating the input of the memory element.
has the highest level of safety integrity and safety • Primary input and primary outputs of the SoC
integrity level 1 has the lowest [3]. As already said, the • Logical entities that can or cannot directly map to a
IEC61508 requirements generally are related to complete memory element. Example: wrong conditional field
systems: however, also for system components it can be of a conditional instruction, where this wrong field
said that the safety integrity level is granted based on the can be caused either by a bit-flip or by a wrong
value of Safe Failure Fraction (SFF) for the given processing of logic reading opcode from the bus.
component. SFF is equal to the ratio between the sum of
• Critical nets such clocks or long nets that could
safe failures (i.e. failures which don’t have the potential
generate multiple failures.
to put the safety-related system in a hazardous or fail-to-
function state) and detected dangerous failures over the • Entire sub-blocks, to take more simply into account
sum of all the possible failures (safe plus dangerous). bigger cones of logic or to consider all together a
complex block with a small number of outputs.
Another important concept is the Hardware Fault
Example: faults in a coder bringing to a wrong
Tolerance (HFT). A system with a HFT of N means that
output value.
N+1 faults could cause a loss of the safety function.
With a HFT equal to zero, a SFF equal or greater than
99% is required in order that the system or component
can be granted with SIL3. With a HFT equal to one, the Failure mode
SFF should be greater than 90%. It is worth to be noted
that SIL3 is the safety integrity level required for x-by- Main Effect
wire systems or systems with high criticality such active
brake systems.
The IEC61508 also specifies faults or failures to be
detected during operation or to be analyzed in the Sensible zone

derivation of safe failure fraction: some examples are the Observation

point
following. For variable memories: DC fault model for
data and addresses; dynamic cross-over for memory Figure 1: the sensible zone
cells; no, wrong or multiple addressing; change of
information caused by soft-errors. For processing units: It is worth noting that electronic circuits, in particular
DC fault model for data and Addresses for both internal processing units, are mostly architected as groups of
registers and RAMs; Dynamic cross-over for memory interconnected Moore machines. In such structures, the
cells; Wrong coding or wrong execution for coding state register has a fundamental role in the functional
execution including flag registers and so on. behaviour of the machine, so it is worth to consider such
The norm also assesses some of the state-of-art state registers as the best candidates to become sensible
techniques for fault-detection and tolerance respect the zones.
maximum diagnostic coverage (i.e. probability of Another important element of the SoC-level FMEA
detection of dangerous failures) considered achievable: is the “Observation point”. The observation point is
as example, RAM monitoring with Hamming code or either: another sensible zone, a primary output (most of
ECCs or double RAMs with hardware/software the cases), a primary function of the SoC (when the
comparison are the ones with the highest value. analysis is more high-level) or an alarm of the
The IEC61508 specifies as well which kind of diagnostic. The effects of failure modes in a sensible
documentation and design flow should be followed, such zone are measured at these observation points.
as the release of a Safety Requirements Specification Failure modes can be of two main types. It can be
(SRS) including a detailed FMEA (Failure Mode and directly linked to physical faults. Example: if the
Effects Analysis) of the system or sub-system. sensible zone is a memory element, it can be a bit-flip in
the register. It can be the end consequence of faults in
3. Extending FMEA to SoC: the principles the logic cone of the sensible zone. Example: a wrong
value in a register bit due to stuck-at or bridging faults of
The commonly used way to provide the information the combinational logic in front of the D pin of a
required by SRS is to perform a Failure Modes and register. Failure modes can be also a temporal sum of
Effects Analysis. This paper presents a way to perform faulty events (such multiple faults hitting a memory
the FMEA at SoC level with a systematic approach, element). The basic failure modes for a given SoC can
supported by a spreadsheet and a tool to extract the be determined from the tables in Appendix of IEC
information from the RTL. 61508-2 [3].
Concerning the correspondence between failure the sensible zone failure through its output logic cone
modes of sensible zones and HW faults of their and from there to other sensible zones till the other
converging cones, it is useful to distinguish three classes observation points. These are particular important to take
of physical HW faults: local, wide and global HW faults. into account the very frequent situation in which a single
We consider “local” the physical HW faults affecting local HW fault generates a failure of a single sensible
one or more gates of a logic cone contributing to a single zone, but the effect manifests itself at different
sensible zone. Each local HW fault or combination of observation points (see Figure 3).
them occurring in the logic cone in front of the sensible Observation
point

zone – if not masked by conditions or by other HW

faults - will result in a failure in it.It is worth to note that Secondary Effect
if a certain local HW fault is masked so it doesn’t
generate any effect in the sensible zone (e.g. if there is a
transient fault in a gate but this glitch isn’t sampled by
the clock of the register corresponding to its sensible Failure mode

zone and so on), this fault is not considered as an hazard Main Effect

since it doesn’t perturb the function to be performed by

such sensible zone. The type of failure that will occur in
Sensible zone
the sensible zone depends on the type of occurred
physical faults (e.g stuck-at, bridging fault, etc…). Observation
point

We consider “wide” the physical HW faults affecting Figure 3: secondary effects

one or more gates of a logic cone contributing to more
than one sensible zone. Examples of wide physical HW The extraction of sensible zones and observation
faults is a single physical HW fault (e.g. a stuck-at at the points is automatically performed by a tool based on
output of a gate) generating a failure in two or more commercially available EDA tools such Cadence or
sensible zones, or a single physical HW fault belonging Synopsys, working on the synthesized RTL. Besides to
to a logic cone contributing to two or more sensible collect and properly compact the registers, the tool
zones but generating a fault only in some of these zones extracts as well the data needed by the FMEA statistical
(see Figure 2). In such a case, we have multiple failures. model, such the composition of the logic cone in front of
It is worth to note that such case also includes situations each sensible zone (i.e. gate-count, interconnections and
like faults in clock or reset buffers affecting multiple so forth) and the correlation between each sensible zone
flip-flops. Physical faults like resistive or capacitive in terms of shared gates and nets.
coupling between lines are also included in such model. Starting from the elementary failure in time (FIT) per
gate and per register both for transient and permanent
Observation
point faults, all the data automatically extracted by the tool are
used to compute the failure rates for each sensible zone.
Failure mode Sensible zone
A spreadsheet contains all these data as also other
Main/Secondary
Effect information provided by the user, such:
• S and D factors to estimate the Safe fraction and
Dangerous fraction of the possible failures for the
Failure mode
given failure mode in the given sensible zone. Two
types of S and D factors are used: architectural and
Main/Secondary
Effect applicational. Example of architectural dependent
S/D is a sensible zone always inactive at run-time
Sensible zone because blocked by a set of masking gates.
Observation
Example of application dependent S/D is a sensible
point zone not used by the given application. Usually
only architectural S/D factors are considered.
Figure 2: multiple failures
• The frequency class F of the given sensible zone,
We consider “global” the physical HW faults used to estimate its usage frequencies.
affecting many logic cones and therefore contributing to • The lifetime ζ, defined as the time between the
more than one sensible zone. Examples of global average last read and the write in such zone.
physical HW faults are the following: faults in the PLL Based on this information, the spreadsheet computes
or clock generation or first level of clock trees affecting all the metrics required by the IEC61508, such as the
large number of sensible zones; power supply faults safe (λS) and dangerous (λD) failure rates for each
affecting large areas of the silicon component; thermal sensible zone and for all the SoC. It also delivers a
faults making slower consistent region of the SoC. ranking of sensible zones in terms of their criticality.
Concerning the effects of a fault, we define the “main The proposed approach is therefore a mix between
effect” as the effect that at least will occur as result of analysis performed at different levels, such RTL level
failure mode of the considered sensible zone respect an (for the estimation of S, D and F factors) and at gate-
observation point, if not masked internally. The level (for the statistics related to the logic cones of the
“secondary effects” are the other effects occurring at sensible zones and so forth): this way guarantees the best
other observation points resulting from the migration of accuracy of the results and offers as well the possibility
to analyze which of different possible implementations is coverage-driven functional verification tool allows to
the more critical in term of safety. uniquely correlating Workload, Operational Profiles,
Fault List, and final measures.
4. Using the FMEA to design diagnostic The fault injector is composed by (figure 4):
• Environment builder: this block extracts from the
The methodology proposed in this paper has the FMEA all the information related to the
specific target to evaluate the SoC in order to find the environment for the injection campaign and builds
best strategy for error detection and correction. all the required environment configuration files.
Therefore, the two main quantities that have to be • Operational Profiler, Collapser and Randomiser:
measured are the Diagnostic Coverage and the Safe starting from the information extracted by the
Failure Fraction, defined by the following formulas [3]: Environment Builder, this block extracts the
Operational Profile (OP) from a given workload.
DC = SFF = An Operational Profile (OP) is a collection of
information about all relevant fault-free system
activities: traced information items are read/write
where λDD is the rate of dangerous detected failures and activity associated with processor registers, address
λDU is the rate of dangerous undetected failures bus, data bus, and memory locations in the system
(λD=λDD+λDU). under test, but they may also include other more
To compute such values, the spreadsheet includes for high level information like the most probable
each sensible zone the fraction of the dangerous failure expected sets of inputs that the system or
rate associated with each failure mode that is claimed to application will receive. The purpose of the OP is
be detected by the diagnostic technique, distinguished in to better understand the situation in which the
Detected Dangerous Failure fraction (DDF) for system or the application will be used, and then
transient/intermittent faults and permanent faults. It is analyze this information to ensure that only faults
also distinguished between DDF due to HW and SW which will produce an error are selected during the
techniques. fault list generation process. In this way the
These coverage values are computed both based on generated fault list is compacted and non trivial. As
the architecture, by the numbers given by the previous mentioned afterwards, the completeness of the
described tool (concerning the interconnections between workload is measured in a deterministic way to
sensible zones), by what accepted by the IEC norm check if it complete in terms of its capability to
(Annex 2, tables A.2-A.13, where it is specified the trigger all the sensible zones of the DUT.
maximum diagnostic coverage considered achievable by • Fault Injection Manager: this function runs all the
a given technique) and by the estimation of the user. The injection campaign based on automatically
FMEA validation flow described in next section must be generated fault lists and collects all the results.
executed in order to have the highest level of confidence • Result analyzer: this function collects all the results
in such estimations. generated by the injection campaign and
An important step of the FMEA is to span the values automatically fills a sheet included in the FMEA
of the assumptions (such the elementary failure rates for spreadsheet. In particular, S, D, F and DDF are
transient and permanent faults or the user assumptions extracted and compared with the values in the
such S, D and F) in order to measure the sensitivity of FMEA. The validation is successful if the
the final DC/SFF to these changes. percentages are in line with the estimated values. It
is also extracted a “table of effects” for each
5. How to validate the FMEA sensible zone, i.e. table of observation points in
which has been measured a deviation respect a
A strict and measurable validation flow is rather golden simulation without injected faults. This
important in order to cross check the FMEA. As table is automatically compared with the FMEA to
recommended by the IEC61508 norm, fault injection has check if the identification of main/secondary effects
a crucial role in that. The proposed methodology uses a is consistent.
validation flow based on a mix of tools which the main • Monitors and Coverage Collection: this function,
ones are a simulation-based fault injector [8,9] and a composed by a set of monitors automatically
fault simulator like [11]. The fault injector tool is built instantiated by the environment builder, generates
on top of a state-of-art functional verification tool [10] and collects all the information needed to build the
and makes use of a standard verification language [12]. coverage measures for the analysis of fault
By integrating fault injection with functional injection campaign completeness. In this context,
verification, it is possible to set up a fault injection flow “coverage” means a measure of the completeness
that solves many of the issues that affect most of the of the fault injection experiment. It is measured
environments presented in literature. Thanks to the how many times a fault injection point (SENS) is
interaction with the functional verification tool, triggered by an injection, how many changes
verification components available on the market can be occurred on the observation point (OBSE), how
easily reused as a workload to inject faults, obtaining at many mismatches occurred between faulty and
same time design validation and reliability evaluation. golden DUT, how many times the diagnostic point
The use of a standard language enables an easy and (DIAG) changed and so forth. Only when all the
configurable way to model the faults. The engine of the
coverage items are covered at 100% we can Memory
consider complete the fault injection experiment.

The validation procedure is the following:

ECC_SHELL
F-MEM
a) it is performed an exhaustive fault injection of
ALARMS
ERROR
CODER DECODER CTRL

sensible zone failures: based on the failure mode and

condition specified in the FMEA and on the workload SCRUBBING

and extraction of operational profile, for each sensible

MCE/MPU
zone it is injected a certain number of faults. At the end MCE MUX MCE
DMA

of this analysis, both the results and the coverage are

mem ctrl
cross-checked with FMEA. MPU

b) in parallel, the efficiency of the workload in

MCE AHBIF
covering the HW gates of the gate-level netlist is
measured, for instance by using a toggle count coverage BUS

or a standard fault coverage. If the toggle count

percentage (i.e. nets/gates toggling at least once) or the Figure 5: the memory sub-system
fault coverage is greater than a defined value (default
design of a memory sub-system, based on the
99%), the validation is successful.
architecture already presented in [6,9]. This architecture
c) for critical areas (where the analysis is more
is represented in figure 5 and it is mainly composed,
difficult) or where particular HW implementation are
besides the memory controller and the memory array, by
present (asynchronous circuitry and so on), a selective
a memory protection IP composed by two different
HW fault injection is performed, injecting local faults
functional units:
with fault injector. The validation is successful if the
a) F-MEM: it interfaces the memory array and it
results of such injection confirm the results of the
hosts the coder/decoder and a “scrubbing” feature, as
exhaustive sensible zone failure fault injection.
also the controller to generate the corresponding alarms.
Otherwise, some new lines should be included in the
In a few words, The scrubbing function stores the
FMEA to take into account the newly detected effects.
locations where an error occurred, in order to repair them
For these critical areas, the fault simulator can be used to
when the memory isn’t used by the system or it can also
precisely measure the fault coverage vs permanent faults
perform a background scanning of the memory for fault-
respect the workload and the implemented diagnostic.
forecasting.
The validation is successful if the results of such run are
b) MCE: it interface the F-MEM with the memory
in line with DDF estimated in the FMEA sheet
controller and with the bus, providing the DMA access
d) for wide/global HW faults, a selective fault
for F-MEM scrubbing feature as also a “distributed
injection is performed. The validation is successful if the
MPU” functionality. This MPU function considers that
results of such injection confirm the results of the
the memory is divided in number of pages associated
exhaustive sensible zone failure fault injection.
with attributes and permissions. The MCE block uses
Otherwise, some new lines should be included in the
signals from the bus (in such a case a AHB multilayer
FMEA to take into account the newly detected effects.
bus) to discriminate these attributes and permissions and
in case of faults, proper alarms are generated.
Golden
DUT
The FMEA methodology has been applied to such
architecture with the goal of achieving a SIL3 memory
WORKLOAD
(TESTBENCH)
Monitors for
SENS
Monitors for
OBSE
sub-system, i.e. with a SFF equal or greater than 99%.
A first implementation of the memory sub-system
Faulty Coverage was done. Concerning the coder/decoder, a SEC-DED
Collection
fault DUT algorithm was used with a standard modified Hamming
Monitors for
Result
architecture. This first circuit included a write buffer and
Fault Injection
Manager
DIAG
Analyzer a pipeline stage in the decoder, in order to guarantee the
from FMEA
timing closure and to avoid the degradation of the
GUI

Fault List
memory access time due to the ECC.
OP
List of
Sensible zones
At first, the sensible zones have been extracted by
Randomizer
using the previously described tool: about 170 sensible
Collapser
Operational
Environment builder
zones resulted, including the memory controller, the
Candidate
Profiler
memory and the F-MEM/MCE blocks. The memory has
fault list
case of random fault injection been modeled by using a proper fault model as for
Figure 4 : the fault injector instance described in [13-15]. Then, the FMEA
spreadsheet have been completed including S,D, F and
DDF values following the procedure described in the
6. Example
sections 3 and 4.
To show how this methodology can be successfully The spreadsheet identified the critical zones. Besides
applied to the design of safety-critical SoCs, a proof-of- the memory array itself, the most critical blocks were the
concept example is described in the following. It is the BIST control logic, the registers involved in addresses
latching, most of the blocks of the decoder, the registers The methodology has been used to certify the
of the write buffer, some of the blocks of the MCE fRMEM product of YOGITECH SpA according IEC
handling the interconnections with the bus and so forth. 61508. It is currently in use for the final certification of
With the initial implementation, resulting SFF (around the other IPs of YOGITECH faultRobust technology and
95%) was not enough to reach SIL3. Then, the for the complete analysis of fault-robust microcontrollers
architecture was modified by adding the addresses to the for automotive applications [16,17].
coding (required as well by IEC61508), by adding parity
bits to the write buffer and by deeply modifying the
decoder implementation. In particular, this last action References
was really important to increase the SFF: [1] J.C Laprie, “Dependable Computing and Fault Tolerance Concepts
i) an “error checker” was added immediately after the and Terminology “, IEEE Computer, 1985
“code generator” section of the decoder, in order to [2] H. Tahne, “Safe and Reliable Computer Control: Systems Concepts
and Methods”, Mech. Lab, Univ. Stock, 1996
cover also the errors in such coder; [3] CEI International Standard IEC 61508, 1998-2000
ii) a double-redundant “error checker” was [4] S.Brown, “Overview of IEC 61508 Design of
implemented after the intermediate decoder pipeline electrical/electronic/programmable electronic safetyrelated
stage, to check the correctness of code and data fields systems”, Computing & Control Engineering Journal February
2000, pages 6-12
after the pipeline as also – in case of no errors – directly [5] R.E. McDermott et al, “The Basic of FMEA”, Quality Resources
connect the decoder output with the memory data. The Press, 1996
spreadsheet shown that this measure was strongly [6] R. Mariani, G. Boschi, “A System Level Approach for Embedded
decreasing the error probability of the second part of the Memory Robustness” Special Issue: Papers selected from the 1st
International Conference on Memory Technology and Design -
decoder architecture; ICMTD’05.
iii) a “distributed” syndrome checking architecture [7] R. Mariani, M. Chiavacci, S. Motto, “Dependable microcontroller,
was implemented to allow a finer error detection (i.e. to method for designing a dependable microcontroller and computer
discriminate if an error is in the code field, or in data program product therefor”, European Patent, EP1496435
[8] R. Mariani, P. Fuhrmann, B. Vittorelli, “Cost-effective Approach to
field or if it was an addressing error, etc…). As shown in Error Detection for an Embedded Automotive Platform”, SAE
the FMEA, also this architecture strongly decreased the 2006 World Congress & Exhibition, April 2006, Detroit, MI, USA
error probability. New alarms were generated by these [9] www.fr.yogitech.com
checking architectures: as shown by the FMEA, by [10] http://www.cadence.com/products/functional_ver
[11] http://www.cadence.com/products/digital_ic/encountertest
combining the alarms generated by the error checker [12] IEEE standard 1647, http://www.ieee1647.org/
after the decoder’s coder, the redundant error checkers [13] S. Mukherjee et al. “Cache scrubbing in Microprocessors: Mith or
after the pipeline and the final syndrome checks, it is Necessaity?”, 2004
possible to cover with a very high level of coverage the [14] S. Mukherjee et al. “A Systematic Methodology to Compute the
Architectural Vulnerability Factors for a High-Performance
possible error combinations in the decoder. Microprocessor”, 2003
Moreover, some SW start-up tests were identified for [15] M. Spica, “Do we need anything more than single bit error
the memory controller parts not covered by the memory correction (ECC)?”, 2004
protection IP. The resulting SFF of this second [16] R. Mariani, “A Platform-based Technology For Fault-robust Soc
Design”, IP/SOC 2006 Conference, December 2006, Grenoble,
implementation was 99,38% and it was very stable as France
well, i.e. changes on S,D,F and fault models didn’t [17] R. Mariani, P. Fuhrmann, B. Vittorelli, “Fault-Robust
change the result in a sensible way. The previous microcontrollers for automotive applications”, 12th IEEE
described validation flow was run in order to have the International On-Line Testing Symposium - 12 July 2006 -
Como,Italy
highest confidence on the results, with different
synthesis of the design in order to cross check the
sensitivity to the final implementation.

7. Conclusions
In summary, the methodology proposed in this paper
is a new way to extract useful information from a SoC,
to take into consideration the IEC guidelines about fault
models and failure modes, to compute (following IEC
61508 norm) the Safe Failure Fraction and the
Diagnostic Coverage, to validate the results by means of
a complete flow including a fault-injector. It’s an
innovative and systematic approach to assess the safety
of a circuit, delivering very detailed reports on sensible
zones, fault effects, failure rates, etc… that can be used
for SoC analysis. It allows the identification of critical
part of a circuit and the exploration of possible
implementations for best safety as well.
The methodology has been developed under the
supervision of TÜV-SÜD and it has been approved by
TÜV as the flow to assess and validate the Safe Failure
Fraction of a given SoC in adherence to IEC 61508.

View publication stats

Din en Iso 13849 2 2013
75% (4)
Din en Iso 13849 2 2013
90 pages
Control Systems PPT 1
No ratings yet
Control Systems PPT 1
27 pages
3 ISO26262 Assessment
No ratings yet
3 ISO26262 Assessment
19 pages
MCU System Based On IEC61508 For Autonomous Functional Safety Platform
No ratings yet
MCU System Based On IEC61508 For Autonomous Functional Safety Platform
6 pages
Matrix Approach To Perform Dependent Failure Analysis in Compliance With Functional Safety Standards
No ratings yet
Matrix Approach To Perform Dependent Failure Analysis in Compliance With Functional Safety Standards
6 pages
IEC 61508 and ISO 26262 A Comparison Study
No ratings yet
IEC 61508 and ISO 26262 A Comparison Study
5 pages
Siemens SW Improving The Reliability and Performance of RF ICS WP 83544 C2
No ratings yet
Siemens SW Improving The Reliability and Performance of RF ICS WP 83544 C2
9 pages
Index - 2016 - The Safety Critical Systems Handbook
No ratings yet
Index - 2016 - The Safety Critical Systems Handbook
3 pages
Esd Unit3
No ratings yet
Esd Unit3
8 pages
Technical Safety, Reliability and Resilience
No ratings yet
Technical Safety, Reliability and Resilience
332 pages
Design of Micro Controllers For Safety
No ratings yet
Design of Micro Controllers For Safety
50 pages
Is Iec 61508 6 2000
No ratings yet
Is Iec 61508 6 2000
77 pages
Embedded Unit-3
No ratings yet
Embedded Unit-3
10 pages
Security Assessments For Automotive Controllers Using Side Channel and Fault Injection Attack
No ratings yet
Security Assessments For Automotive Controllers Using Side Channel and Fault Injection Attack
64 pages
2014 04 Robustness Validation Manual 2nd Edition
No ratings yet
2014 04 Robustness Validation Manual 2nd Edition
26 pages
Mapping To IEC 61508 Software Developed To ISO 26262
No ratings yet
Mapping To IEC 61508 Software Developed To ISO 26262
17 pages
(Integrated Circuits and Systems) Masashi Horiguchi, Kiyoo Itoh (Auth.) - Nanoscale Memory Repair-Springer-Verlag New York (2011)
No ratings yet
(Integrated Circuits and Systems) Masashi Horiguchi, Kiyoo Itoh (Auth.) - Nanoscale Memory Repair-Springer-Verlag New York (2011)
226 pages
SoC Fuzzing Intro
No ratings yet
SoC Fuzzing Intro
22 pages
Built-In Fault-Tolerant Computing Paradigm For Resilient Large-Scale Chip Design
No ratings yet
Built-In Fault-Tolerant Computing Paradigm For Resilient Large-Scale Chip Design
318 pages
The Definitive Guide To Outsourcing Semiconductor Design Projects What You Need To Know Now
No ratings yet
The Definitive Guide To Outsourcing Semiconductor Design Projects What You Need To Know Now
9 pages
YT RADWG Rev1.1
No ratings yet
YT RADWG Rev1.1
17 pages
Hercules RM
No ratings yet
Hercules RM
69 pages
Evaluation of Electric-Vehicle Architecture Alternatives
No ratings yet
Evaluation of Electric-Vehicle Architecture Alternatives
6 pages
Circuit Design For Reliability
No ratings yet
Circuit Design For Reliability
271 pages
Test & Reliability Challenges in Advance Semiconductor Geometries
No ratings yet
Test & Reliability Challenges in Advance Semiconductor Geometries
66 pages
Functional Safety For Embedded Systems by Guoqi Xie
No ratings yet
Functional Safety For Embedded Systems by Guoqi Xie
182 pages
SOFIA An Automated Framework For Early Soft Error Assessment - Identification and Mitigation
No ratings yet
SOFIA An Automated Framework For Early Soft Error Assessment - Identification and Mitigation
13 pages
Automotive Test and Realiability Strategies: Sudhir Borra-0000777923
No ratings yet
Automotive Test and Realiability Strategies: Sudhir Borra-0000777923
3 pages
Formal Fault Analysis Paperr 11
No ratings yet
Formal Fault Analysis Paperr 11
8 pages
L09 Safety Security
No ratings yet
L09 Safety Security
32 pages
Reliability Improvement and Validation F
No ratings yet
Reliability Improvement and Validation F
118 pages
TI FunctionalSafety Industry4-0
No ratings yet
TI FunctionalSafety Industry4-0
10 pages
F S: A Fast, Configurable Memory-Reliability Simulator For Conventional and 3D-Stacked Systems
No ratings yet
F S: A Fast, Configurable Memory-Reliability Simulator For Conventional and 3D-Stacked Systems
24 pages
Oscar Ballan Evaluation of Iso 26262 and Iec 61508
No ratings yet
Oscar Ballan Evaluation of Iso 26262 and Iec 61508
8 pages
Introduction To Designing For High-Reliability: Prepared by Adiuvo Engineering and Training, LTD
No ratings yet
Introduction To Designing For High-Reliability: Prepared by Adiuvo Engineering and Training, LTD
60 pages
Star Memory System
No ratings yet
Star Memory System
10 pages
Ic Design For New Mobility: Andrew Macleod, Mentor, A Siemens Business
No ratings yet
Ic Design For New Mobility: Andrew Macleod, Mentor, A Siemens Business
10 pages
When Correct Is Not Enough
No ratings yet
When Correct Is Not Enough
15 pages
Fault Tolerance
No ratings yet
Fault Tolerance
17 pages
Iec 61508
100% (2)
Iec 61508
10 pages
Key Concerns For Verifying Socs: Figure 1: Important Areas During Soc Verification
No ratings yet
Key Concerns For Verifying Socs: Figure 1: Important Areas During Soc Verification
5 pages
IEC 61508 and Functional Safety
No ratings yet
IEC 61508 and Functional Safety
25 pages
2014 2Q Exida TI Safety Webinar
No ratings yet
2014 2Q Exida TI Safety Webinar
40 pages
Document
No ratings yet
Document
17 pages
Using Single Error Correction Codes To Protect Against Isolated Defects and Soft Errors
No ratings yet
Using Single Error Correction Codes To Protect Against Isolated Defects and Soft Errors
6 pages
Infineon Scaling Processor Performance Safety Automotive WP
No ratings yet
Infineon Scaling Processor Performance Safety Automotive WP
12 pages
Fine-Grained Aging Prediction Based On The Monitoring of Run-Time Stress Using DFT Infrastructure
No ratings yet
Fine-Grained Aging Prediction Based On The Monitoring of Run-Time Stress Using DFT Infrastructure
35 pages
Everything You Should Know About Functional Safety
No ratings yet
Everything You Should Know About Functional Safety
20 pages
16 Fault Tolerance
No ratings yet
16 Fault Tolerance
34 pages
Notes Reliability
No ratings yet
Notes Reliability
41 pages
Fusa Automated Resiliency Addition
No ratings yet
Fusa Automated Resiliency Addition
6 pages
Methods To Optimize Functional Safety Assessment For Automotive Integrated Circuits
No ratings yet
Methods To Optimize Functional Safety Assessment For Automotive Integrated Circuits
209 pages
Jucs 24 12 1776 1799 Kokila
No ratings yet
Jucs 24 12 1776 1799 Kokila
24 pages
Amf Aut T2713
No ratings yet
Amf Aut T2713
46 pages
FULLTEXT01
No ratings yet
FULLTEXT01
96 pages
TC 16 51
No ratings yet
TC 16 51
121 pages
Assessment of The Iso 26262 Standard, "Road Vehicles - Functional Safety"
No ratings yet
Assessment of The Iso 26262 Standard, "Road Vehicles - Functional Safety"
19 pages
Safety Instrumented System - Basics
No ratings yet
Safety Instrumented System - Basics
12 pages
Smart Fmea Critical
100% (1)
Smart Fmea Critical
27 pages
Critical Risks Method CRM A New Safety Allocation
No ratings yet
Critical Risks Method CRM A New Safety Allocation
20 pages
Pit-4403 11222020 1027
No ratings yet
Pit-4403 11222020 1027
1 page
Flow Mater
No ratings yet
Flow Mater
70 pages
Nimbus Datasheet Issue 7 PDF
No ratings yet
Nimbus Datasheet Issue 7 PDF
4 pages
M07208 Nimbus Issue 7 080609 PDF
No ratings yet
M07208 Nimbus Issue 7 080609 PDF
2 pages
Global Training Brochure Rev 1 Fmsm025
No ratings yet
Global Training Brochure Rev 1 Fmsm025
7 pages
Fixed Point Infrared Flammable Gas Detector: Nimbus
No ratings yet
Fixed Point Infrared Flammable Gas Detector: Nimbus
2 pages
Emerson Digital Transformation PDF
No ratings yet
Emerson Digital Transformation PDF
31 pages
Reducing Energy Costs and Emissions With Combustion Control: Plant Operations
No ratings yet
Reducing Energy Costs and Emissions With Combustion Control: Plant Operations
6 pages
Controllogix Festo
100% (1)
Controllogix Festo
43 pages
Solve High-Pressure Boiler Water Challenges: Save Time and Money in High-Purity Water PH Measurement
No ratings yet
Solve High-Pressure Boiler Water Challenges: Save Time and Money in High-Purity Water PH Measurement
10 pages
Manual de Operación Manual de Operación Roclink para Windows Roclink para Windows
No ratings yet
Manual de Operación Manual de Operación Roclink para Windows Roclink para Windows
80 pages
Power Off Reset Reason
No ratings yet
Power Off Reset Reason
2 pages
STM32F 100pin SCH
No ratings yet
STM32F 100pin SCH
1 page
Comparative Study of Various Open Source Cyber Security Tools
No ratings yet
Comparative Study of Various Open Source Cyber Security Tools
6 pages
Case Tools
No ratings yet
Case Tools
25 pages
Mlops 101
No ratings yet
Mlops 101
33 pages
OOADP (CSSD 202) - Course Outline - Eric Amankwa
No ratings yet
OOADP (CSSD 202) - Course Outline - Eric Amankwa
7 pages
Someotherresume
No ratings yet
Someotherresume
2 pages
AYPY
No ratings yet
AYPY
39 pages
System Analysis and Designing Group Assignment
No ratings yet
System Analysis and Designing Group Assignment
37 pages
Mass Production Preparation Plan Specified Action Plan
No ratings yet
Mass Production Preparation Plan Specified Action Plan
1 page
Complex Systems Design Research Overview
No ratings yet
Complex Systems Design Research Overview
20 pages
Supply Chain Aarong
No ratings yet
Supply Chain Aarong
29 pages
Materi Manajemen Persediaan Manlog PDF
No ratings yet
Materi Manajemen Persediaan Manlog PDF
16 pages
Resume Swaraj 17052022 PDF
No ratings yet
Resume Swaraj 17052022 PDF
2 pages
Naming, Scope, and Binding Are Important Concepts in High-Level Languages
No ratings yet
Naming, Scope, and Binding Are Important Concepts in High-Level Languages
29 pages
Crankshaft Position For Fuel Injector Adjustment and Valve Lash Setting
No ratings yet
Crankshaft Position For Fuel Injector Adjustment and Valve Lash Setting
3 pages
Functional Safety Assessment PDF
No ratings yet
Functional Safety Assessment PDF
26 pages
ERP Flow Chart: Master Purchase Production QC Qa
100% (1)
ERP Flow Chart: Master Purchase Production QC Qa
5 pages
1 S.mordue Definition of BIM 01
No ratings yet
1 S.mordue Definition of BIM 01
2 pages
DS PF4-71T
100% (1)
DS PF4-71T
2 pages
ERTMS/ETCS - Indian Railways Perspective: P Venkata Ramana, IRSSE, MIRSTE, MIRSE
No ratings yet
ERTMS/ETCS - Indian Railways Perspective: P Venkata Ramana, IRSSE, MIRSTE, MIRSE
6 pages
Hecht 546sh
No ratings yet
Hecht 546sh
7 pages
Student Feedback System Complete Documentation
No ratings yet
Student Feedback System Complete Documentation
47 pages
Safety Level
No ratings yet
Safety Level
26 pages
Software Options: 1. Collision Guard
No ratings yet
Software Options: 1. Collision Guard
3 pages
Security Engineering
No ratings yet
Security Engineering
26 pages
Muhammad Ilyas
No ratings yet
Muhammad Ilyas
4 pages

Using An Innovative SoC-level FMEA Methodology To

Uploaded by

Using An Innovative SoC-level FMEA Methodology To

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Using an innovative SoC-level FMEA methodology to design in compliance

Conference Paper · April 2007

The user has requested enhancement of the downloaded file.

Riccardo Mariani, Gabriele Boschi, Federico Colucci

Abstract systematic, random or malicious faults [1,2], how to

978-3-9810801-2-4/DATE07 © 2007 EDAA

derivation of safe failure fraction: some examples are the Observation

zone – if not masked by conditions or by other HW

since it doesn’t perturb the function to be performed by

We consider “wide” the physical HW faults affecting Figure 3: secondary effects

The validation procedure is the following:

sensible zone failures: based on the failure mode and

and extraction of operational profile, for each sensible

of this analysis, both the results and the coverage are

b) in parallel, the efficiency of the workload in

or a standard fault coverage. If the toggle count

View publication stats

You might also like