0% found this document useful (0 votes)
14 views30 pages

IAU ST Lecture2

The document discusses the design and architecture of reliable, scalable, and maintainable data-intensive applications. It highlights key concerns such as reliability, scalability, and maintainability, and outlines strategies to address challenges like fault tolerance, human errors, and system complexity. Additionally, it emphasizes the importance of operability, simplicity, and evolvability in software systems to minimize maintenance costs and adapt to changing requirements.

Uploaded by

asa5tanha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views30 pages

IAU ST Lecture2

The document discusses the design and architecture of reliable, scalable, and maintainable data-intensive applications. It highlights key concerns such as reliability, scalability, and maintainability, and outlines strategies to address challenges like fault tolerance, human errors, and system complexity. Additionally, it emphasizes the importance of operability, simplicity, and evolvability in software systems to minimize maintenance costs and adapt to changing requirements.

Uploaded by

asa5tanha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Big Data Analytics

Lecture 2
Mohammad Hamzei
Department of Computer Engineering
Islamic Azad University, South Tehran Branch
[email protected]
Reliable, Scalable, and Maintainable
Applications
Introduction

• Many applications today are data-intensive, as


opposed to compute-intensive.
– The amount of data, the complexity of data, and the
speed at which it is changing.
Introduction

Standard building blocks of data-intensive


applications:
• Store data so that they, or another application, can find it again later
(databases)
• Remember the result of an expensive operation, to speed up reads(caches)
• Allow users to search data by keyword or filter it in various ways (search
indexes)
• Send a message to another process, to be handled asynchronously (stream
processing)
• Periodically crunch a large amount of accumulated data (batch processing)
Architecture (Example)
Challenges

• If you are designing a data system or service, a


lot of tricky questions arise.
– How do you ensure that the data remains correct
and complete, even when things go wrong
internally?
– How do you provide consistently good performance
to clients, even when parts of your system are
degraded?
– How do you scale to handle an increase in load?
– What does a good API for the service look like?
Concerns

• Important concerns in most software systems


1. Reliability
2. Scalability
3. Maintainability
Reliability

• Reliability: The system should continue to work


correctly (performing the correct function at the
desired level of performance) even in the face of
adversity (hardware or software faults, and even
human error).
Reliability

• “Continuing to work correctly, even when things


go wrong.”
• The application performs the function that the user
expected.
• It can tolerate the user making mistakes or using the
software in unexpected ways.
• Its performance is good enough for the required use
case, under the expected load and data volume.
• The system prevents any unauthorized access and
abuse.
Fault and Failure

• The things that can go wrong are called faults,


and systems that anticipate faults and can cope
with them are called fault-tolerant or resilient.
• A fault is usually defined as one component of
the system deviating from its spec, whereas a
failure is when the system as a whole stops
providing the required service to the user.
• fault-tolerance: use mechanisms that prevent
faults from causing failures.
Fault tolerance

• Although we generally prefer tolerating faults


over preventing faults, there are cases where
prevention is better than cure (e.g., because no
cure exists).
– This is the case with security matters, for example: if
an attacker has compromised a system and gained
access to sensitive data, that event cannot be
undone.
Faults and Errors

• Hardware faults

• Software errors

• Human errors
Hardware Faults

• Hard disks crash, RAM becomes faulty, the


power grid has a blackout, someone unplugs the
wrong network cable.
• Solutions:
– add redundancy to the individual hardware components in
order to reduce the failure rate of the system.
– Disks may be set up in a RAID configuration
– servers may have dual power supplies and hot-swappable
CPUs
– datacenters may have batteries and diesel generators for
backup power.
Hardware Faults

• There is a move toward systems that can


tolerate the loss of entire machines, by using
software fault-tolerance techniques in
preference or in addition to hardware
redundancy.
• Amazon Web Services (AWS) it is fairly common
for virtual machine instances to become
unavailable without warning, as the platforms
are designed to prioritize flexibility and elasticity
over single-machine reliability
Software Errors

• There is no quick solution to the problem of


systematic faults in software.
• Lots of small things can help:
– carefully thinking about assumptions and
interactions in the system; thorough
• testing
• process isolation
• allowing processes to crash and restart
• Measuring and monitoring
• analyzing system behavior in production
Human Errors

• How do we make our systems reliable, in spite of


unreliable humans?
• The best systems combine several approaches:
– Design systems in a way that minimizes
opportunities for error.
– Decouple the places where people make the most
mistakes from the places where they can cause
failures.
• provide fully featured non-production sandbox
environments where people can explore and experiment
safely, using real data, without affecting real users.
Human Errors

• The best systems combine several approaches


(Continued…):
– Test thoroughly at all levels, from unit tests to
whole-system integration tests and manual tests
– Allow quick and easy recovery from human errors, to
minimize the impact in the case of a failure
– Set up detailed and clear monitoring, such as
performance metrics and error rates.
Scalability

• Scalability: As the system grows (in data volume,


traffic volume, or complexity), there should be
reasonable ways of dealing with that growth.
• Scalability is the term we use to describe a
system’s ability to cope with increased Load.
• How do we maintain good performance even
when our load parameters increase by some
amount?
Scalability

• Describing the performance of a system:


– In a batch processing system such as Hadoop, we
usually care about throughput—the number of
records we can process per second, or the total time
it takes to run a job on a dataset of a certain size.
– In online systems, what’s usually more important is
the service’s response time—that is, the time
between a client sending a request and receiving a
response.
Scalability

• Scale up (vertical scaling)


– moving to a more powerful machine

• Scale out (horizontal scaling)


– distributing the load across multiple smaller
machines
Scalability

• Some systems are elastic, meaning that they can


automatically add computing resources when
they detect a load increase
• other systems are scaled manually.
Scalability

• Distributing stateless services across multiple


machines is fairly straightforward
• Taking stateful data systems from a single node
to a distributed setup can introduce a lot of
additional complexity
– keep your database on a single node (scale up) until
scaling cost or high-availability requirements forced
you to make it distributed
Scalability

• The problem may be


– the volume of reads
– the volume of writes
– the volume of data to store
– the complexity of the data
– The response time requirements
– the access patterns
– or (usually) some mixture of all of these plus many
more issues.
Maintainability

• It is well known that the majority of the cost of


software is not in its initial development, but in
its ongoing maintenance
– fixing bugs
– keeping its systems operational
– investigating failures
– adapting it to new platforms
– modifying it for new use cases
– repaying technical debt
– adding new features.
Maintainability

• Three design principles for software systems to


minimize pain during maintenance :
– Operability

– Simplicity

– Evolvability
Operability

Operability: Make it easy for operations teams to


keep the system running smoothly.
• Data systems can do various things to make

routine tasks easy, including:


– Providing visibility into the runtime behavior and
internals of the system, with good monitoring
– Providing good support for automation and
integration with standard tools
– Avoiding dependency on individual machines
Operability

• Data systems can do various things to make


routine tasks easy, including:
– Providing good documentation and an easy-to-
understand operational model
– Providing good default behavior, but also giving
administrators the freedom to override defaults
when needed
– Self-healing where appropriate, but also giving
administrators manual control over the system state
when needed
– Exhibiting predictable behavior, minimizing surprises
Simplicity

• Make it easy for new engineers to understand


the system, by removing as much complexity as
possible from the system
• In complex software, there is also a greater risk
of introducing bugs when making a change:
– when the system is harder for developers to
understand and reason about, hidden assumptions,
unintended consequences, and unexpected
interactions are more easily overlooked
Simplicity

• Making a system simpler does not necessarily


mean reducing its functionality; it can also mean
removing accidental complexity.
• Complexity is accidental if it is not inherent in
the problem that the software solves (as seen by
the users) but arises only from the
implementation.
• One of the best tools we have for removing
accidental complexity is abstraction.
Evolvability

• Make it easy for engineers to make changes to


the system in the future, adapting it for
unanticipated use cases as requirements change.
• Also known as extensibilIty, modifiability, or
plasticity.
• The Agile community has also developed
technical tools and patterns that are helpful
when developing software in a frequently
changing environment, such as test-driven
development (TDD) and refactoring.

You might also like