ASSINGMENT1
ASSINGMENT1
TILAHUN ……………….EITM/UR170315/12
Responsibility
1. Inadequate testing: AECL did not test the software thoroughly for real-world, edge-case
scenarios.
2. Poor design and error handling: The software had no failsafe mechanisms for detecting and
responding to errors.
3. Failure to apply lessons learned: Similar issues had been identified in earlier machines
(Therac-6 and Therac-20), but the lessons were not fully carried over to the design of Therac-
25.
Cause of the Failure
1. Concurrency Issue: A race condition in the software allowed incorrect dose settings. Rapid
keypresses by operators could cause the machine to switch modes incorrectly, leading to
massive radiation overdoses.
2. Lack of Safeguards: The reliance on software-only safety mechanisms (instead of hardware
interlocks) meant there were no physical backups to catch software errors.
3. Inadequate Error Messages: Errors were displayed as cryptic codes, which operators
misunderstood as minor issues, leading to repeated overdoses.
Programming Language
The control software for the Therac-25 was written in Assembly language,
specifically for the DEC PDP-11 minicomputer. Assembly is a low-level language
that is highly efficient but prone to human error due to its complexity and lack of
abstraction.
Q2 The Ariane 5 rocket explosion in 1996 was a costly and dramatic failure caused
by a software bug.
Responsibility
No single engineer was held personally responsible for the failure. Instead, the blame
was shared among:
1. Ariane 5 project managers who approved the reuse of software without ensuring it was
adapted for Ariane 5’s operational parameters.
2. Software engineers and developers who failed to consider the potential incompatibilities
between Ariane 4 and Ariane 5.
3. Testing teams who overlooked the need for testing under the conditions unique to Ariane 5's
higher speeds and accelerations.
Ultimately, the European Space Agency (ESA) and the rocket's manufacturer,
Aérospatiale, bore the organizational responsibility.
1. Reused Software: The inertial reference system (SRI) used in Ariane 5 relied on software
originally written for Ariane 4.
2. Data Conversion Error: A floating-point number (related to horizontal velocity) was
converted to a 16-bit integer, but Ariane 5’s faster horizontal acceleration exceeded the
integer’s maximum range. This caused an arithmetic overflow error.
3. Unnecessary Functionality: The failed calculation came from a function that was no longer
relevant to Ariane 5 but had not been removed from the software.
4. Inadequate Testing: The software wasn’t tested under Ariane 5’s unique flight conditions, as
engineers believed the software was already reliable from its use in Ariane 4.
This chain of errors caused the rocket to interpret the overflow as a critical failure,
triggering the onboard self-destruction mechanism.
Programming Language
The software for the inertial reference system (SRI) was written in Ada, a
programming language commonly used in aerospace and other high-assurance
systems. Ada is known for its robustness and strong typing, but even with these
safeguards, the overflow error occurred because:
The specific range of values that Ariane 5 would experience wasn’t accounted for.
Overflow checks were intentionally disabled to save computing time, which turned out to be
a critical mistake.
Q3 Knight Capital Financial Collapse (2012) and the Patriot Missile Failure (1991
Gulf War), focusing on responsibility, causes, and programming languages used:
Responsibility
No single engineer was held directly accountable. The failure was attributed to organizational
lapses in deployment practices and inadequate testing.
The responsibility lay with Knight Capital’s technical team, particularly those overseeing
software deployment and change management. Leadership failed to ensure robust
safeguards against deployment errors.
Cause
o An outdated feature flag from a prior system called "Power Peg" was inadvertently
activated due to an incomplete software update. The system started generating
unintended, erroneous trades.
o Specifically, the deployment process failed to update software on one of the eight
servers, creating an inconsistency.
2. Lack of Testing:
3. No Failsafe Mechanism:
o There were no effective systems in place to halt the runaway trades quickly once
the problem was detected.
Programming Language
The exact programming language used by Knight Capital was not disclosed, but trading
systems are often built with C++, Java, or Python for performance and integration with
financial systems.
Responsibility
The failure was attributed to systemic oversight in software design and deployment:
o The U.S. Army and Raytheon, the manufacturer of the Patriot missile system, were
held accountable for not addressing known limitations of the system’s time-tracking
mechanism.
o There was a failure to update the system to handle extended operation times in
combat scenarios.
Cause
o The system used a 24-bit floating-point number to track time, but the resolution of
the clock was not precise enough to handle long operation times. After running for
extended periods (e.g., over 100 hours), rounding errors accumulated, leading to a
trajectory miscalculation.
o The error caused the system to underestimate the position of the incoming Scud
missile by 600 meters.
2. Inadequate Updates:
o The Patriot system was originally designed for short-term deployments, and its
software wasn’t updated to handle prolonged use in the Gulf War.
Programming Language
The Patriot missile system was written in Ada, a language widely used in military and
aerospace applications for its reliability and support for real-time systems.
o
Q4 Toyota Unintended Acceleration (2000s)
Responsibility
Cause
o The ETCS had stack overflow vulnerabilities, which caused memory corruption and
erratic behavior in the throttle control.
o It was poorly designed and lacked defensive coding practices. A single-point failure
could lead to unintended acceleration.
2. Inadequate Fail-Safes:
o Toyota’s software lacked sufficient fail-safes to detect and mitigate software errors.
For example, watchdog timers or redundancy in the throttle system were either
ineffective or absent.
o Software was found to have over 10,000 global variables, making it prone to bugs
and difficult to debug.
o Critical tasks in the system were not protected from interference by non-critical
tasks.
Programming Language
The ETCS software was written in C, which is commonly used for embedded systems. The
lack of proper safeguards and testing in a safety-critical application made it vulnerable to
catastrophic failures.
Responsibility
Stuxnet was a highly sophisticated cyber weapon believed to have been developed jointly by
intelligence agencies of the United States (NSA) and Israel (Unit 8200). It targeted Iran’s
nuclear program.
It was crafted by a team of highly skilled cybersecurity engineers and developers, but no
individuals have been officially named.
Cause
1. Targeted Malware:
2. Malware Functionality:
o Stuxnet subtly altered centrifuge speeds (spinning them too fast or too slow) while
sending false operational data to monitoring systems, making the sabotage difficult
to detect.
Programming Language
Responsibility
Responsibility lay with BAA (British Airports Authority), the system implementers, and
Siemens, which developed the baggage handling system.
The software engineers and project managers overseeing the system were blamed for poor
planning, insufficient testing, and an overly ambitious rollout.
British Airways (BA), the airline operating at Terminal 5, was also criticized for failing to
adequately prepare for operational challenges.
Cause
1. Software Glitch:
o The root cause was a series of errors in the automated baggage handling system
software, which failed to correctly track and route luggage.
o The glitches caused mismatched tags, leading to lost and misrouted baggage.
2. System Overload:
o The system could not handle the real-world load of passengers and baggage, leading
to cascading failures.
3. Inadequate Testing:
o The system was not tested thoroughly under realistic operational conditions before
launch.
4. Poor Integration:
Programming Language
The baggage handling system was developed using Java, C, and C++, which are commonly
used for high-performance real-time systems. Specific details of the software
implementation are not fully public.
Responsibility
Cause
o The issue stemmed from a single line of faulty code in the switching software used
in AT&T’s network.
o A break statement was placed incorrectly in the code handling the system recovery
process. When switches experienced high traffic, this bug caused cascading reboots
across the network, leading to a total collapse.
2. Cascading Failures:
o The bug caused switches to repeatedly fail and reboot, flooding the network with
error messages and preventing recovery.
3. Insufficient Testing:
o The updated software wasn’t rigorously tested under high-load conditions, which
might have exposed the bug.
Programming Language
The AT&T switching software was written in C, which is widely used in telecommunications
for its performance and control over hardware.
Responsibility
The Intel hardware design team was responsible for the issue. It arose during the
development of the floating-point division algorithm in the Pentium processor.
Intel’s management exacerbated the problem by initially downplaying its significance, which
led to a public relations disaster.
Cause
o The bug stemmed from missing entries in the lookup table used by the floating-
point division (FDIV) algorithm. Five entries in this table were omitted due to a
human error during the design phase.
o As a result, certain division calculations involving specific ranges of numbers
produced incorrect results, with errors up to 0.006%.
2. Delayed Acknowledgment:
o Intel initially claimed that the error would only affect users performing complex
calculations, alienating the scientific and engineering community. Public outrage
forced Intel to offer free replacements and recall affected processors.
Programming Language
The microcode controlling the floating-point unit was written in Assembly language, which is
commonly used for processor-level programming.
The error was related to the hardware implementation of this microcode rather than high-
level software.
Responsibility lay with both Lockheed Martin, which built the spacecraft and provided the
software, and NASA’s navigation team, which failed to detect and correct the mismatch
during testing and operations.
A lack of communication and system integration checks between the teams contributed to
the failure.
Cause
o Despite multiple opportunities, the error was not caught during pre-launch
simulations or operations because the system was not tested end-to-end with
realistic inputs.
Programming Language
The ground navigation software and onboard spacecraft software were likely written in a
combination of:
o C and C++: Commonly used for flight control and navigation systems at NASA.
o FORTRAN: Historically used in scientific computing and space missions during that
era.
Responsibility
The outage was triggered by a faulty update from CrowdStrike, a third-party cybersecurity
company. While CrowdStrike deployed the update, Microsoft shared responsibility for
insufficient testing and lack of safeguards to prevent widespread disruption on its Windows
systems.
Both CrowdStrike engineers (responsible for the update) and Microsoft’s quality assurance
and response teams (for insufficient error-handling mechanisms) were implicated.
Cause
1. Faulty Update:
o The CrowdStrike update incorrectly flagged core Windows system files as malicious.
This caused Windows Defender (Microsoft’s built-in antivirus) to quarantine critical
system files, triggering the Blue Screen of Death (BSOD) on millions of devices.
2. Widespread Disruption:
3. Lack of Testing:
o The update wasn’t adequately tested under real-world scenarios or with various
configurations before being pushed globally.
4. Automatic Updates:
o The automatic application of updates on devices amplified the scope of the impact,
as there were no effective rollback mechanisms in place.
Programming Language
CrowdStrike software:
o Likely developed using C++, Python, or Go for cybersecurity tools, given their
widespread use for performance, system-level operations, and malware analysis.
o Written in C, C++, and C#, the primary languages for Microsoft’s operating systems
and antivirus tools.
Responsibility
Responsibility for this attack primarily lies with the cybercriminals behind the ransomware.
However, Change Healthcare’s IT and security teams share responsibility for failing to
implement sufficient cybersecurity defenses.
Ransomware attacks often exploit known vulnerabilities or weak defenses in IT systems.
Cause
1. Ransomware Infection:
2. Cybersecurity Gaps:
o Potential causes include:
3. Ransom Payment:
Programming Language
Ransomware:
o Likely built using Java, C++, and SQL for enterprise-grade backend processing, along
with JavaScript or .NET for web interfaces.
Responsibility
Ivanti's software development team was responsible for the flaw in its Pulse Secure VPN
product. Federal agencies also bore some blame for delayed updates.
Cause
Programming Language
Likely written in C/C++ for VPN core functionality and Java or Python for management
interfaces.
Responsibility
Tesla’s firmware and hardware engineering team faced criticism for using eMMC memory
chips with limited write cycles in the Media Control Unit (MCU).
Cause
Wear-out of memory chips storing logs, causing failure of safety-critical systems like turn
signals and rearview cameras. This revealed inadequate hardware lifecycle testing.
Programming Language
C/C++ for embedded systems, with Python or JavaScript for vehicle UI features.
Both failures underscore the need for proactive testing and security-first designs in
critical systems.
Responsibility
Malicious actors inserted the code, but XZ Utils maintainers were responsible for
governance and reviewing contributions.
Cause
Programming Language
Responsibility
AWS configuration team caused the issue during a routine update, revealing weaknesses in
operational safeguards.
Cause
Programming Language
Core AWS systems are primarily written in Java, Python, and C++ for backend infrastructure.
These failures stress the need for supply chain security and fail-safe designs in
cloud architectures.
Responsibility
Chinese threat group exploited the vulnerability, but router manufacturers (e.g., Linksys,
Netgear) and networking software engineers were responsible for inadequate security
hardening.
Cause
Programming Language
Likely written in C/C++ for low-level router firmware, with some higher-level scripting (e.g.,
Python) for attack automation.
Responsibility
Zoom’s engineering team failed to implement strong security measures, particularly around
user privacy and meeting access.
Cause
Inadequate Security Features: Lack of end-to-end encryption and weak authentication led to
unauthorized access, including Zoombombing incidents.
Programming Language
C/C++ for core app development, with JavaScript for web-based components, and Python
for backend infrastructure.
Responsibility
Boeing engineering team and software developers were responsible for the technical
glitches, particularly in the mission control system.
Cause
Software Bugs: Coding errors and inadequate testing led to failures in the spacecraft’s
software system, delaying mission readiness and raising concerns about Boeing’s quality
control.
Programming Language
C/C++ for embedded systems and flight software, with Java or Python for testing and
simulation tools.
Responsibility
Development team behind the scheduling software and system testers were responsible for
the glitch.
Cause
Database Issues: Poor data handling and lack of proper validation led to duplicate
appointments and scheduling inefficiencies.
Programming Language
Likely Java or Ruby on Rails for backend, and JavaScript for front-end interfaces.