D2 L5, Module-7 ABC of Product Reliability
D2 L5, Module-7 ABC of Product Reliability
Product Reliability
(Module- 7)
By
P. NARASIMHA RAO
Retd. Scientist-G, DLRL(DRDO)
FIETE, MSEMCI, MISCA, MISOI, MIDST
Chartered Engineer (IETE)
IPMA Certified Project Management Professional
STQC Certified Reliability Professional
Adjunct Faculty, NI-MSME
Mentor, BYST (CII)
at a 5- Day Course on
“Design Thinking and New Product Innovation
in the DRDO Context”
(9 – 13 October 2023)
Organised by:
Institute of Defence Scientists & Technologists
DLRL Campus, Hyderabad – 500 005.
Agenda
Assigning Reliability
Building Reliability
Conforming Reliability
Conclusion
Discussions
Quality and Reliability
Reliability is the probability that the unit perform its intended function
adequately well for a given period of time under the stated operating conditions
or environment. It is primarily associated with the design.
More simply, Reliability is the ability of the unit to maintain its Quality under
specified conditions for a specified time.
Unlike Quality, Reliability is measurable and several metrics have been defined
to express reliability. They include Failure rate, MTBF, MTTF MTTR and so on.
A good Quality Product need not be always reliable. (A high quality smart
phone may not work reliably when you are travelling in a train)
And some times a Reliable product may not be of good quality. (The Walkie
Talkies sets used by the Guard and the Driver of the train are definitely not of
good quality but work reliably)
Reliability Indices
Reliability is the probability that the unit perform its intended function adequately
for a given period of time under the stated operating conditions or environment.
Reliability R(t) of a unit under test (UUT) can be mathematically expressed as :
R(t) = e- λt where, λ = Failure rate = 1/MTBF, t = Mission Time
Reliability, Rsys of complex system is the product of individual reliabilities R1, R2,
R3, ….. Rn of all its constituent units can be mathematically expressed as :
Rsys = R1 * R2 * R3 * ….. Rn
Mean Time Between Failures (MTBF) is the average time the equipment
performed its intended functions between failures ie., the productive time divided
by the number of failures during that time.
Mean Time to Repair (MTTR) is the average time to correct a failure and return the
equipment to a condition where it can perform the intended function. It is the sum
of all repair time (elapsed time) incurred during a specified period (including the
equipment and process test time (but not including maintenance delay), divided by
the number of failures during that period.
Failure Rate is the ratio of the number of failures reported / experienced by a
device and the total equipment operating time. Failure rate is the reciprocal of
Meantime between failures and is typically measured in failures per million hours.
Reliability Block Diagram (RBD)
R1 R2 R3 Rn
Rsys = R1 * R2 * R3 * ….. Rn
Electronic Human
Mechanical Reliability
Reliability Reliability
Mature • Operator error is inevitable
Maturing
• Human error rates are high
• Derating Guidelines • Sensory • Documented rates are often wrong, often
• Computerized Predictions misapplied
- See it
• Part Screening Techniques - Feel it • Error rate “ Controls” are available, e.g. :
• Reliability Design Guides - Hear it – Training / retraining / drilling
• EE Reliability Engineers – Using checklists / inspectors / backups
– Using Performance Shaping Factors
– Exploiting “stereotypical” behavior
It is easy for some one ask “What is the• BUT: Person-to-Person variability precludes
“hard rules.”
reliability of your system?”
The field of Human Factors is a
complex one with many aspects.
But it is very difficult for them to quantify Much has been studied. Much has
the reliability figure they expect. been documented. Certainties are
few, save one:
ie. Human Operators WILL ERR!
Reliability Engineering
Reliability Goal
Reliability apportionment
at module level Reliability
Improvement
by change /
Reliability FMECA replacement
BOM of Each of
Parts Count
Module components
Method FTA
13
Assigning Reliability – A case study of COMINT System
System Controller
(Within MARS LRU)
COMINT/OWS Simulator
Product Assurance
Choice of components/parts
Component De-rating
The design shall be fool proof to prevent any intentional and unintentional
vandalism. This should be achieved by providing sufficient inter-locking
mechanisms both in hardware and in the user interfaces to protect the product
from permanent failures during its usage by untrained or unauthorized operators
Why ?
Philosophy
Overstress is a valid way to accelerate the discovery of deficiencies and
defects, but it is not valid means of compressing test time when reliability is to be
measured. Ideal testing for reliability is seldom practical and cost effective.
Categories of Reliability Testing
S.No Name of the
Test Description
1 Reliability Reliability Qualification Testing is carried out for
Qualification Reliability Design Qualification and is performed on pre-
Testing (RQT) production or initial production hardware to determine
design compliance with the specified reliability
requirements.
2 Production Production Reliability Acceptance Testing is a periodic
Reliability series of tests to indicate continuing production of
Acceptance acceptable equipment used to assure individual item or
Testing (PRAT) lot compliance with reliability requirements.
3 Reliability This test is an experiment used to show whether or not
Compliance the value of a reliability characteristic or an item /
Testing (RCT) component meets its stated reliability requirement.
This test is used as a condition of acceptance of the
units by the customer.
4 Reliability This test is an experiment used to determine the value
Determination of a reliability characteristic of an item / component.
Testing (RDT) This test is normally used to provide information where
a specific reliability requirement has not been stated.
Reliability Testing – Parameters measured
S.No Name of the
Parameter Description
1 Upper Test An effective test plan will ‘accept’ with high probability,
MTBF (θo) equipment with a true MTBF which approaches or
exceeds θo .
2 Lower Test An effective test plan will ‘reject’ with high probability,
MTBF (θ1) equipment with a true MTBF which approaches or is
less than θ1 .
3 Discrimination Indicates the capacity of the test to discriminate
ratio (ϒ) between good and bad equipment.
4 Producer’s Probability of ‘rejecting’ equipment which has true MTBF
Risk (α) equal to the Upper test MTBF (θo).
5 Consumer’s Probability of ‘accepting’ equipment which has true
Risk (β) MTBF equal to the Lower Test MTBF (θ1).
Important : Reliability tests are either Failure Terminated or Time Terminated.
Failure terminated tests are performed on a sample (of that particular production
batch) and give a realistic figures on the reliability test parameters. Time terminated
tests are performed over a large population and presents a less realistic figures on
the reliability test parameters. It is important to know whether the reliability test
parameters presented by a manufacturer is derived from time terminated tests or
failure terminated tests. 38
Defect Management – An Example & Case Study
Problem Definition : Electronic packaging of modules with differently rated Operating
Temperature. Specification of one SRU is –10oC to +50oC against the specified Operating
temperature of – 20oC to + 55oC complied by all other SRUs.
Design Approach : Use appropriate Heaters and Fans to create environment inside the
LRU within the operational limits of all SRUs
Initial Design logic : 1. Fans ‘on’ by default with power ’on’. 2. Heater ‘on’ when inside
temperature is between – 20oC to - 10oC
Outcome : Unit failed in low temperature testing at – 20oC. Inside temperature of the
unit is not rising because of the fans which are continuously ‘on’
Design logic-2 : 1.Heater ‘on’ and Fans ‘off’ when inside temperature is between – 20oC
to + 10oC. 2. Heater ‘off’ and Fans ‘on’ when inside temperature is above+ 10oC
Outcome : Unit passed the low/high temperature tests but failed in High altitude (11800
mtrs @– 20oC) tests. Defect found is the burn ‘out’ of one power supply module inside
the LRU. It is not a random failure as expected in the first two iterations. Problem
identified due to the fans are ineffective cooling due to air stagnation at high altitude
resulting in excessive heating due to heaters (switched on at – 20oC) resulting in a
thermal runaway of the zener diode in the power supply module
Design logic-3 : 1. more Number of heaters and more number of Fans with lesser rating
employed. 2. Different sets of heaters and fans are pressed to act in different
temperature ranges viz., –20oC to –15oC, –15oC to –10oC, –10oC to –5oC, –5oC to 0oC, 0oC
to +5oC, and +5oC to +10oC. 3. Heater ‘off’ and Fans ‘on’ when inside temperature is
above +10oC.
Outcome : The unit worked defect free in its rated operating temperature range.
Hardware Components of
Monitoring, Analysis and Recording Subsystem
Analysis & Recording Module Monitoring Receivers CSM Controller
Flash Memory
MARS + CC LRU
CSM AEW&C - Operational Requirements
Mechanical Details assembly wise
Monitoring (Narrow Band)
CSM AEW&C - Operational Requirements
MARS+CC Enclosure
Monitoring (Narrow Band)
Non-Conformance Management
Following committees / Boards shall be constituted for effective Non-
conformance management :
Material Review Board : A Material Review Board (MRB) shall give dispositions for
various non-conformances which could be deviations in parts, materials, fabrication
process, plating / painting process, dimensional deviations in mechanical
components, PCB etc.
Test & Evaluation Committee : Apart from activities related to test and evaluation
like generation of test plan, review of test results etc, T&E committee shall also look
into the non-conformities arising due to module / subsystem testing.
Configuration Change Control Management : Configuration Management Review
Board (CMRB). This board also reviews and approves test document of each
subsystem. Any changes in design subsequent to drawing approval shall go through
Design Change Control Procedure.
System Review Board (SSRB) : The SRB shall look into the configuration changes
if any proposed/required, critically and dispose appropriately at the System level.
Any issue related to intra-system and intersystem interfaces shall be referred to SRB
at the system level.
Waivers / Dispositions : Waivers in respect of any specifications shall not be
allowed. However if such waivers/dispositions are inevitable, they shall be approved
by project Director / Programme Director.
Design Approvals and Type Certification
S/N Activity Stage of interaction Purpose and Outcome
1 Preliminary System level After the Project To ensure QA aspects be properly
Design sanction and adequately reflected in the
Review System level Design. Design Go-
ahead given for the project
2 Detailed Subsystem level after the To ensure QA aspects be properly
Design preliminary Design is cleared and adequately reflected in the
Review Subsystem level Design. Design Go-
ahead given at the subsystem level
3 Critical Carried out on the Engg. Model To ensure QA aspects be properly
Design with Hardware developed, and adequately reflected in the
Review integrated and tested meeting finalized Design.
the Operational Requirements
4 Type Full Range Qualification Tests Certifies the design for Realization /
Certification on the Qualification model with Production of subsequent units for
the involvement of all operational use in the project.
concerned
5 Provisional Full Range Acceptance Tests on Clearance given for the usability of
Clearance the Flight model with the the SRU / LRU in the application/
involvement of all concerned platform Specified for the project.
6 Production The UUT has successfully Clearance given for the usability of
Clearance completed its field trials and the SRU / LRU for bulk production
proven for operation in the by the PSU and induction in
intended environment. services
Conclusion
In most of the projects, we are quick enough to prove the proto model at
the earliest but slowed down in slating it to the development flight / user trials.
The delays are purely in our own field trials for want of reliable working of the
system. Somehow, we are under the impression that the reliability either
automatically comes from heaven or it is the responsibility of manufacturer or
Production Agency which can be added at a later stage. While emphasizing the
need to deliver Reliable products, it important to realise that the reliability need to
be built into the design and not in the manufacturing process.
Quality & Reliability shall be built-in right from component level to system
level. They need to be addressed both at hardware level and software level.
There is no dearth on the availability of Reliability Prediction / Estimation Tools in
the market place which need to be relevantly selected and optimally utilised.
Our aim should be to realize products in the very first attempt without
any time and cost overruns and deliver to the customer who is not only satisfied
but delighted with its performance quality and reliability.
Thank You
47