100% found this document useful (1 vote)
27 views4 pages

Avaialbility Management Worksheet

Uploaded by

JMJ m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
27 views4 pages

Avaialbility Management Worksheet

Uploaded by

JMJ m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Module: ITIL2

Worksheet of Session3: Availability Management CNET Department


Sana’a Community College 2nd Year

Name: Qaed Ahmed Qaed Azzan

1. Define the following terms:


 Availability: Availability means that the IT service is continuously available to the customer, as
there is little downtime; e and rapid service recovery.
 Reliability: Reliability: Adequate reliability means that the service is available for an agreed
period without interruptions. This concept also includes resilience. The reliability of a service
increases if downtime can be prevented. Reliability is calculated using statistics.
 Maintainability: Maintainability and recoverability relate to the activities needed to keep the
service in operation, and to restore it when it fails. This includes preventive maintenance and
scheduled inspections.
 Serviceability: Serviceability relates to contractual obligations of external service providers
(contractors, third parties).
 MTTR (Mean Time to Repair): average time between the occurrence of a fault and service
recovery, also known as the downtime. It is the sum of the detection time and the resolution time.
This metric relates to the recoverability and serviceability of the service.
 MTBF (Mean Time Between Failures): mean time between the recovery from one incident
and the occurrence of the next incident, also known as uptime. This metric relates to the
reliability of the service
 MTBSI (Mean Time Between System Incidents): mean time between the occurrence of two
consecutive incidents. The MTBSI is the sum of the MTTR and MTBF.

2. Objectives of Availability Management


 The objective of Availability Management is to provide a cost-effective and defined level of availability of
the IT service that enables the business to reach its objectives.
 This means that the demands of the customer (the business) have to be aligned with what the IT
infrastructure and IT organization is able to offer.
 If there is a difference between supply and demand then Availability Management will have to provide a
solution.
 Furthermore, Availability Management ensures that the achieved availability levels are measured, and,
where necessary, improved continuously. This means that the process includes both proactive and reactive
activities.

3. Activities of Availability Management


 Planning
o Determining the availability requirements
 This activity must be undertaken before a SLA can be concluded and should address both new IT services
and changes to existing services.
 It must be decided at the earliest possible stage if and how the IT organization can fulfill the
requirements.
 This activity should identify:
 Key business functions.
 Agreed definition of IT service downtime.
 Quantifiable availability requirements.
 Quantifiable impact on the business functions of unscheduled IT service downtime.
 Business hours of the customer.
 Agreements about maintenance windows.
o Designing for availability
 Early as possible. This will prevent excessive development costs, unplanned expenditure at later stages,
Single Points Of Failure (SPOF), additional costs charged by suppliers, and delayed releases.
 A good design, based on the appropriate availability standards, will make it possible to conclude effective
maintenance contracts with suppliers.
 The design process employs a range of techniques such as Component Failure Impact Analysis (CFIA) to
identify SPOFs, CRAMM (see IT Service Continuity Management) and simulation techniques.
o Designing for recoverability
 As completely uninterrupted availability is rarely feasible, periods of unavailability must be considered.
When an IT service is interrupted it is important that the fault is quickly and adequately solved, and that
the agreed availability standards are fulfilled.
 Designing for recoverability involves issues such as an effective Incident Management process with
appropriate escalation, communication, and backup and recovery procedures. The tasks, responsibilities
and authority should be clearly defined.
o Security issues
 Security and reliability are closely linked. A poor information security design can affect the availability of
the service. High availability can be supported by effective information security. During the planning
stage, the security issues should be considered and their impact on the provision of services should be
analyzed.
 Some of the issues include:
 Determining who is authorized to access secure areas.
 Determining which critical authorizations may be issued.
o Maintenance management
 Normally, there will always be scheduled windows of unavailability.
 These periods can be used for preventive action, such as software and hardware upgrades. Changes can
also be implemented during these windows.
 The definition, implementation, and verification of maintenance activities have developed into major
issues in Availability Management.
 Maintenance must be carried out when the impact on services can be minimized.
 This means that it must be known in advance what the maintenance objectives are, when the
maintenance should be undertaken and what maintenance activities are involved (this could be based on
CFIA).
 This information is essential for Change Management and other activities.
o Developing the Availability Plan
 The Availability Plan is one of the major products of Availability Management.
 It is a long-term plan concerning availability over the next few years.
 It is not the implementation plan for Availability Management.
 The plan should be a living document. Initially it should describe the current situation, and at a later stage
it can be expanded to include improvement activities for existing services and guidelines, as well as plans
for new services and guidelines for maintenance.
 A comprehensive and accurate plan requires liaison with areas such as Service Level Management, IT
Service Continuity Management, Capacity Management, and Financial Management for IT Services and
Application Development (directly or through Change Management).
 Monitoring
o Measuring and reporting
Measuring and reporting are important Availability Management activities as they provide the basis for verifying
service agreements, resolving problems, and defining proposals for improvement.
If you don't measure it, you can't manage it.
If you don't measure it, you can't improve it.
If you don't measure it, you probably don't care.
If you can't influence it, then don't measure it.
The life cycle of each incident includes the following elements:
 Occurrence of the incident: the time at which the user becomes aware of the fault, or when the fault is
identified by other means (technically or physically).
 Detection: the service provider is informed of the fault. The incident status is now 'reported'. The time this
took is known as the detection time.
 Response: the service provider needs time to respond. This is known as the response time. This time is used
for diagnosis, which can then be followed by repair. The Incident Management process includes Acceptance
and Registration, Classification, Matching, Analysis, and Diagnosis.
 Repair: the service provider restores the service or the components that caused the fault.
 Service recovery: the service is restored. This includes activities such as configuration and initialization, and
the service is restored to the user.

Figure below illustrates the periods that can be measured.

Figure: Availability measurement.

As the figure shows, the response time of the IT organization and any external contractors is one of the factors
determining the downtime. As this factor can be controlled by the IT organization and directly affects the service
quality, agreements about it can be included in the SLA.
The measurements can be averaged to give a good impression of the relevant factors. The averages can be used
to determine the achieved service levels, and to estimate the expected future availability of a service.
This information can also be used to develop improvement plans.
The following metrics are commonly used in Availability Management:
 Mean Time to Repair - MTTR: average time between the occurrence of a fault and service recovery, also
known as the downtime. It is the sum of the detection time and the resolution time. This metric relates to
the recoverability and serviceability of the service.
 Mean Time Between Failures - MTBF: mean time between the recovery from one incident and the
occurrence of the next incident, also known as uptime. This metric relates to the reliability of the service.
 Mean Time Between System Incidents - MTBSI: mean time between the occurrence of two consecutive
incidents. The MTBSI is the sum of the MTTR and MTBF.
The ratio of the MTBF and the MTBSI indicates if there are many minor faults or just a few major faults.

4. Critical Success Factors


 The business must have clearly defined availability objectives and wishes.
 Service Level Management must have been set up to formalize agreements.
 Both parties must use the same definitions of availability and downtime.
 Both the business and the IT organization must be aware of the benefits of Availability
Management.
5. Performance Indicators
 The following performance indicators show the effectiveness and efficiency of Availability
Management:
 Percentage availability (uptime) per service or group of users.
 Downtime duration.
 Downtime frequency.
6. Reports
 The availability reports for the customer were discussed above. The following metrics can be
reported for process control:
 Detection times.
 Response times.
 Repair times.
 Recovery times.
 Successful use of appropriate methods (CFIA, CRAMM, SOA).
 Extent of process implementation: services, SLAs and customer groups covered by SLAs.
7. Roles and Responsibilities
 The organization can establish the role of Availability Manager to define and control the process.
 The task of the Availability Manager could include the following elements:
 Defining and developing the process in the organization.
 Ensuring that IT services are designed such that the achieved service levels (in terms of availability,
reliability, serviceability, maintainability, and recoverability) correspond with the agreed service
levels.
 Reporting.
 Optimizing the availability of the IT infrastructure to provide a cost-effective improvement of the
service provided to the business.
8. Costs and Problems
 Most problems concern the organization. Problems to be expected include:
 Senior management divides responsibility for availability between several disciplines (line
managers, process managers).
 Each manager feels responsible for his or her own area, and there is no overall coordination.
 IT management fails to understand the added value provided to the Incident, Problem, and Change
Management processes.
 The current availability level is considered sufficient.
 There is no support for appointing a single, responsible process manager.
 The process manager does not have the required authority.
 Even with sufficient management support, problems may still arise due to:
 Underestimating resources.
 Lack of effective measurement and reporting tools.
 Lack of other processes such as Service Level Management, Configuration Management, and
Problem Management.
 If Availability Management is used inefficiently the following problems may arise:
 It will be difficult to define appropriate availability standards.
 It will be more difficult to guide internal and external suppliers.
 It will be difficult to compare the costs of availability and unavailability.
 If availability standards were not considered during the design, later modification to meet these
standards may be many times more expensive.
 Availability standards are not fulfilled which may lead to failure to meet the business objectives.
 Customer satisfaction may be reduced.

You might also like