0% found this document useful (0 votes)
37 views14 pages

ASSINGMENT1

The document discusses various software failures across different industries, highlighting the responsibilities, causes, and programming languages involved in each incident. Key cases include the Therac-25 radiation overdoses, Ariane 5 rocket explosion, Knight Capital financial collapse, and others, emphasizing the shared responsibility among engineers, management, and organizations. Each case illustrates the critical importance of thorough testing, proper design, and effective error handling in software development.

Uploaded by

tilay1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views14 pages

ASSINGMENT1

The document discusses various software failures across different industries, highlighting the responsibilities, causes, and programming languages involved in each incident. Key cases include the Therac-25 radiation overdoses, Ariane 5 rocket explosion, Knight Capital financial collapse, and others, emphasizing the shared responsibility among engineers, management, and organizations. Each case illustrates the critical importance of thorough testing, proper design, and effective error handling in software development.

Uploaded by

tilay1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

MEKELLE UNIVERSITY

SCHOOL OF COMPUTING EIT-M


SOFRWARE TESTING AND QUALITY ASSURANCE

TILAHUN ……………….EITM/UR170315/12

Q1The Therac-25 radiation therapy overdoses are a tragic case of software-related


errors in a critical medical device.

Responsibility

No single engineer was directly named as responsible for the failures, as


responsibility was broadly shared among the manufacturer (Atomic Energy of
Canada Limited, AECL) and the software development team.

The root causes of the incident included:

1. Inadequate testing: AECL did not test the software thoroughly for real-world, edge-case
scenarios.
2. Poor design and error handling: The software had no failsafe mechanisms for detecting and
responding to errors.
3. Failure to apply lessons learned: Similar issues had been identified in earlier machines
(Therac-6 and Therac-20), but the lessons were not fully carried over to the design of Therac-
25.
Cause of the Failure

1. Concurrency Issue: A race condition in the software allowed incorrect dose settings. Rapid
keypresses by operators could cause the machine to switch modes incorrectly, leading to
massive radiation overdoses.
2. Lack of Safeguards: The reliance on software-only safety mechanisms (instead of hardware
interlocks) meant there were no physical backups to catch software errors.
3. Inadequate Error Messages: Errors were displayed as cryptic codes, which operators
misunderstood as minor issues, leading to repeated overdoses.

Programming Language

The control software for the Therac-25 was written in Assembly language,
specifically for the DEC PDP-11 minicomputer. Assembly is a low-level language
that is highly efficient but prone to human error due to its complexity and lack of
abstraction.

Q2 The Ariane 5 rocket explosion in 1996 was a costly and dramatic failure caused
by a software bug.

Responsibility

No single engineer was held personally responsible for the failure. Instead, the blame
was shared among:

1. Ariane 5 project managers who approved the reuse of software without ensuring it was
adapted for Ariane 5’s operational parameters.
2. Software engineers and developers who failed to consider the potential incompatibilities
between Ariane 4 and Ariane 5.
3. Testing teams who overlooked the need for testing under the conditions unique to Ariane 5's
higher speeds and accelerations.

Ultimately, the European Space Agency (ESA) and the rocket's manufacturer,
Aérospatiale, bore the organizational responsibility.

Cause of the Failure

1. Reused Software: The inertial reference system (SRI) used in Ariane 5 relied on software
originally written for Ariane 4.
2. Data Conversion Error: A floating-point number (related to horizontal velocity) was
converted to a 16-bit integer, but Ariane 5’s faster horizontal acceleration exceeded the
integer’s maximum range. This caused an arithmetic overflow error.
3. Unnecessary Functionality: The failed calculation came from a function that was no longer
relevant to Ariane 5 but had not been removed from the software.
4. Inadequate Testing: The software wasn’t tested under Ariane 5’s unique flight conditions, as
engineers believed the software was already reliable from its use in Ariane 4.

This chain of errors caused the rocket to interpret the overflow as a critical failure,
triggering the onboard self-destruction mechanism.

Programming Language

The software for the inertial reference system (SRI) was written in Ada, a
programming language commonly used in aerospace and other high-assurance
systems. Ada is known for its robustness and strong typing, but even with these
safeguards, the overflow error occurred because:

 The specific range of values that Ariane 5 would experience wasn’t accounted for.
 Overflow checks were intentionally disabled to save computing time, which turned out to be
a critical mistake.

Q3 Knight Capital Financial Collapse (2012) and the Patriot Missile Failure (1991
Gulf War), focusing on responsibility, causes, and programming languages used:

Knight Capital Financial Collapse (2012)

Responsibility

 No single engineer was held directly accountable. The failure was attributed to organizational
lapses in deployment practices and inadequate testing.
 The responsibility lay with Knight Capital’s technical team, particularly those overseeing
software deployment and change management. Leadership failed to ensure robust
safeguards against deployment errors.

Cause

1. Software Deployment Error:

o An outdated feature flag from a prior system called "Power Peg" was inadvertently
activated due to an incomplete software update. The system started generating
unintended, erroneous trades.
o Specifically, the deployment process failed to update software on one of the eight
servers, creating an inconsistency.
2. Lack of Testing:

o The new trading algorithm wasn’t properly tested in production-like environments.

3. No Failsafe Mechanism:

o There were no effective systems in place to halt the runaway trades quickly once
the problem was detected.

Programming Language

 The exact programming language used by Knight Capital was not disclosed, but trading
systems are often built with C++, Java, or Python for performance and integration with
financial systems.

Q3 Patriot Missile Failure (1991 Gulf War)

Responsibility

 The failure was attributed to systemic oversight in software design and deployment:
o The U.S. Army and Raytheon, the manufacturer of the Patriot missile system, were
held accountable for not addressing known limitations of the system’s time-tracking
mechanism.
o There was a failure to update the system to handle extended operation times in
combat scenarios.

Cause

1. Software Clock Drift:

o The system used a 24-bit floating-point number to track time, but the resolution of
the clock was not precise enough to handle long operation times. After running for
extended periods (e.g., over 100 hours), rounding errors accumulated, leading to a
trajectory miscalculation.
o The error caused the system to underestimate the position of the incoming Scud
missile by 600 meters.

2. Inadequate Updates:

o The Patriot system was originally designed for short-term deployments, and its
software wasn’t updated to handle prolonged use in the Gulf War.

Programming Language

 The Patriot missile system was written in Ada, a language widely used in military and
aerospace applications for its reliability and support for real-time systems.

o
Q4 Toyota Unintended Acceleration (2000s)

Responsibility

 No individual engineer was specifically held accountable. Responsibility fell on:


o Toyota's engineering team: For failing to adequately address critical flaws in the
Electronic Throttle Control System (ETCS).
o Toyota management: For insufficient oversight and lack of transparency when
issues surfaced.
o Independent auditors: During investigations, an independent analysis by Barr
Group uncovered the software defects.

Cause

1. Software Bug in ETCS:

o The ETCS had stack overflow vulnerabilities, which caused memory corruption and
erratic behavior in the throttle control.
o It was poorly designed and lacked defensive coding practices. A single-point failure
could lead to unintended acceleration.

2. Inadequate Fail-Safes:

o Toyota’s software lacked sufficient fail-safes to detect and mitigate software errors.
For example, watchdog timers or redundancy in the throttle system were either
ineffective or absent.

3. Poor Code Quality:

o Software was found to have over 10,000 global variables, making it prone to bugs
and difficult to debug.
o Critical tasks in the system were not protected from interference by non-critical
tasks.

Programming Language

 The ETCS software was written in C, which is commonly used for embedded systems. The
lack of proper safeguards and testing in a safety-critical application made it vulnerable to
catastrophic failures.

Q5 Stuxnet Malware Attack (2010)

Responsibility

 Stuxnet was a highly sophisticated cyber weapon believed to have been developed jointly by
intelligence agencies of the United States (NSA) and Israel (Unit 8200). It targeted Iran’s
nuclear program.
 It was crafted by a team of highly skilled cybersecurity engineers and developers, but no
individuals have been officially named.

Cause

1. Targeted Malware:

o Stuxnet exploited four zero-day vulnerabilities in Microsoft Windows, along with


vulnerabilities in Siemens PLC (Programmable Logic Controller) systems.
o It specifically targeted Siemens Step7 software to manipulate centrifuge speeds in
Iran’s Natanz uranium enrichment facility.

2. Malware Functionality:

o Stuxnet subtly altered centrifuge speeds (spinning them too fast or too slow) while
sending false operational data to monitoring systems, making the sabotage difficult
to detect.

3. Highly Specific Design:

o The malware was custom-designed to exploit specific vulnerabilities in Siemens


industrial control systems used by the Iranian facility.

Programming Language

 Stuxnet was primarily written in a combination of:

o C and C++: For its payload and core malware logic.


o Assembly language: For low-level code to interface with PLCs and exploit
vulnerabilities in the Step7 software.
o Custom scripts: Likely used for Windows exploitation and deployment.

Q6. Heathrow Terminal 5 Baggage System Failure


(2008)

Responsibility

 Responsibility lay with BAA (British Airports Authority), the system implementers, and
Siemens, which developed the baggage handling system.
 The software engineers and project managers overseeing the system were blamed for poor
planning, insufficient testing, and an overly ambitious rollout.
 British Airways (BA), the airline operating at Terminal 5, was also criticized for failing to
adequately prepare for operational challenges.
Cause

1. Software Glitch:

o The root cause was a series of errors in the automated baggage handling system
software, which failed to correctly track and route luggage.
o The glitches caused mismatched tags, leading to lost and misrouted baggage.

2. System Overload:

o The system could not handle the real-world load of passengers and baggage, leading
to cascading failures.

3. Inadequate Testing:

o The system was not tested thoroughly under realistic operational conditions before
launch.

4. Poor Integration:

o Communication issues between baggage conveyors, scanners, and the software


exacerbated the failure.

Programming Language

 The baggage handling system was developed using Java, C, and C++, which are commonly
used for high-performance real-time systems. Specific details of the software
implementation are not fully public.

Q7. AT&T Network Outage (1990)

Responsibility

 The failure was caused by a bug in AT&T's long-distance switching software.


 The AT&T software engineering team responsible for updating the switching system was
held accountable for introducing the bug.
 AT&T management was criticized for rolling out the software without adequate fail-safes or
testing.

Cause

1. Subtle Software Bug:

o The issue stemmed from a single line of faulty code in the switching software used
in AT&T’s network.
o A break statement was placed incorrectly in the code handling the system recovery
process. When switches experienced high traffic, this bug caused cascading reboots
across the network, leading to a total collapse.

2. Cascading Failures:
o The bug caused switches to repeatedly fail and reboot, flooding the network with
error messages and preventing recovery.

3. Insufficient Testing:

o The updated software wasn’t rigorously tested under high-load conditions, which
might have exposed the bug.

Programming Language

 The AT&T switching software was written in C, which is widely used in telecommunications
for its performance and control over hardware.

Q8. Intel Pentium Floating-Point Bug (1994)

Responsibility

 The Intel hardware design team was responsible for the issue. It arose during the
development of the floating-point division algorithm in the Pentium processor.
 Intel’s management exacerbated the problem by initially downplaying its significance, which
led to a public relations disaster.

Cause

1. Flawed Microcode in the Floating-Point Unit (FPU):

o The bug stemmed from missing entries in the lookup table used by the floating-
point division (FDIV) algorithm. Five entries in this table were omitted due to a
human error during the design phase.
o As a result, certain division calculations involving specific ranges of numbers
produced incorrect results, with errors up to 0.006%.

2. Delayed Acknowledgment:

o Intel initially claimed that the error would only affect users performing complex
calculations, alienating the scientific and engineering community. Public outrage
forced Intel to offer free replacements and recall affected processors.

Programming Language

 The microcode controlling the floating-point unit was written in Assembly language, which is
commonly used for processor-level programming.
 The error was related to the hardware implementation of this microcode rather than high-
level software.

Q9. Mars Climate Orbiter Loss (1999)


Responsibility

 Responsibility lay with both Lockheed Martin, which built the spacecraft and provided the
software, and NASA’s navigation team, which failed to detect and correct the mismatch
during testing and operations.
 A lack of communication and system integration checks between the teams contributed to
the failure.

Cause

1. Unit Conversion Error:

o The spacecraft’s navigation software, developed by Lockheed Martin, produced


data in imperial units (pound-seconds) for force measurements. NASA’s onboard
software expected the data in metric units (Newton-seconds).
o This mismatch caused the spacecraft to miscalculate its trajectory, leading it to
enter the Martian atmosphere at an incorrect angle and burn up.

2. Lack of System Integration Testing:

o Despite multiple opportunities, the error was not caught during pre-launch
simulations or operations because the system was not tested end-to-end with
realistic inputs.

Programming Language

 The ground navigation software and onboard spacecraft software were likely written in a
combination of:

o C and C++: Commonly used for flight control and navigation systems at NASA.
o FORTRAN: Historically used in scientific computing and space missions during that
era.

Q10. Microsoft Global Outage

Responsibility

 The outage was triggered by a faulty update from CrowdStrike, a third-party cybersecurity
company. While CrowdStrike deployed the update, Microsoft shared responsibility for
insufficient testing and lack of safeguards to prevent widespread disruption on its Windows
systems.
 Both CrowdStrike engineers (responsible for the update) and Microsoft’s quality assurance
and response teams (for insufficient error-handling mechanisms) were implicated.

Cause

1. Faulty Update:
o The CrowdStrike update incorrectly flagged core Windows system files as malicious.
This caused Windows Defender (Microsoft’s built-in antivirus) to quarantine critical
system files, triggering the Blue Screen of Death (BSOD) on millions of devices.

2. Widespread Disruption:

o Devices worldwide became non-operational, affecting critical services like hospitals,


airlines, and emergency services.

3. Lack of Testing:

o The update wasn’t adequately tested under real-world scenarios or with various
configurations before being pushed globally.

4. Automatic Updates:

o The automatic application of updates on devices amplified the scope of the impact,
as there were no effective rollback mechanisms in place.

Programming Language

 CrowdStrike software:

o Likely developed using C++, Python, or Go for cybersecurity tools, given their
widespread use for performance, system-level operations, and malware analysis.

 Windows Defender and Core OS Components:

o Written in C, C++, and C#, the primary languages for Microsoft’s operating systems
and antivirus tools.

Q11. Change Healthcare Ransomware Attack

Responsibility

 Responsibility for this attack primarily lies with the cybercriminals behind the ransomware.
However, Change Healthcare’s IT and security teams share responsibility for failing to
implement sufficient cybersecurity defenses.
 Ransomware attacks often exploit known vulnerabilities or weak defenses in IT systems.

Cause

1. Ransomware Infection:

o Hackers deployed ransomware to encrypt critical systems used by Change


Healthcare, disrupting claims processing and payments across U.S. pharmacies and
hospitals.
o Sensitive medical data of millions of Americans was exfiltrated during the attack.

2. Cybersecurity Gaps:
o Potential causes include:

 Unpatched software vulnerabilities exploited by attackers.


 Phishing or social engineering targeting employees.
 Insufficient monitoring or failure to detect and respond to early signs of
intrusion.

3. Ransom Payment:

o Change Healthcare reportedly paid a ransom to recover encrypted systems,


highlighting a lack of backup strategies and contingency plans.

Programming Language

 Ransomware:

o Typically written in C, C++, or Python for payload delivery and encryption


functionality.
o The ransomware may have used PowerShell scripts or Assembly for low-level
operations.

 Change Healthcare Systems:

o Likely built using Java, C++, and SQL for enterprise-grade backend processing, along
with JavaScript or .NET for web interfaces.

Q12: The Ivanti VPN Vulnerability

Responsibility

 Ivanti's software development team was responsible for the flaw in its Pulse Secure VPN
product. Federal agencies also bore some blame for delayed updates.

Cause

 Authentication Bypass Vulnerability: Attackers exploited a flaw allowing unauthorized


access to sensitive networks. This highlighted poor code validation and lack of robust
security testing.

Programming Language

 Likely written in C/C++ for VPN core functionality and Java or Python for management
interfaces.

Q13. Tesla Memory Failure

Responsibility
 Tesla’s firmware and hardware engineering team faced criticism for using eMMC memory
chips with limited write cycles in the Media Control Unit (MCU).

Cause

 Wear-out of memory chips storing logs, causing failure of safety-critical systems like turn
signals and rearview cameras. This revealed inadequate hardware lifecycle testing.

Programming Language

 C/C++ for embedded systems, with Python or JavaScript for vehicle UI features.

Both failures underscore the need for proactive testing and security-first designs in
critical systems.

Q14: XZ Utils Supply Chain Compromise

Responsibility

 Malicious actors inserted the code, but XZ Utils maintainers were responsible for
governance and reviewing contributions.

Cause

 Code Injection: A compromised repository allowed insertion of malicious code, highlighting


insufficient vetting of contributions.

Programming Language

 Written in C, the common language for compression libraries in Linux systems.

Q15 Amazon AWS Outage

Responsibility

 AWS configuration team caused the issue during a routine update, revealing weaknesses in
operational safeguards.

Cause

 Misconfigured Resource: A single error in resource scaling led to cascading service


disruptions due to centralized dependencies.

Programming Language

 Core AWS systems are primarily written in Java, Python, and C++ for backend infrastructure.
These failures stress the need for supply chain security and fail-safe designs in
cloud architectures.

Q16. Critical SOHO Router Attacks

Responsibility

 Chinese threat group exploited the vulnerability, but router manufacturers (e.g., Linksys,
Netgear) and networking software engineers were responsible for inadequate security
hardening.

Cause

 Exploitable Vulnerabilities: Weak default passwords and unpatched firmware allowed


attackers to create a botnet targeting critical infrastructure.

Programming Language

 Likely written in C/C++ for low-level router firmware, with some higher-level scripting (e.g.,
Python) for attack automation.

Q17. Zoom Security Lapses

Responsibility

 Zoom’s engineering team failed to implement strong security measures, particularly around
user privacy and meeting access.

Cause

 Inadequate Security Features: Lack of end-to-end encryption and weak authentication led to
unauthorized access, including Zoombombing incidents.

Programming Language

 C/C++ for core app development, with JavaScript for web-based components, and Python
for backend infrastructure.

Q18. Boeing CST-100 Starliner Issues

Responsibility

 Boeing engineering team and software developers were responsible for the technical
glitches, particularly in the mission control system.
Cause

 Software Bugs: Coding errors and inadequate testing led to failures in the spacecraft’s
software system, delaying mission readiness and raising concerns about Boeing’s quality
control.

Programming Language

 C/C++ for embedded systems and flight software, with Java or Python for testing and
simulation tools.

Q19. Virtual Health Vaccine Scheduling Glitch

Responsibility

 Development team behind the scheduling software and system testers were responsible for
the glitch.

Cause

 Database Issues: Poor data handling and lack of proper validation led to duplicate
appointments and scheduling inefficiencies.

Programming Language

 Likely Java or Ruby on Rails for backend, and JavaScript for front-end interfaces.

You might also like