Digital Service Efficiency - A New Management Scorecard - Shekhar Dasgupta

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

1

Digital Service Efficiency:


- A New Management Scorecard
(DCM 10.2)
Shekhar Dasgupta
Founder
GreenField Software

2
Digital Service Efficiency: A New Management Scorecard


This presentation defines and outlines management scorecards, including Kaplan
& Nortons Balanced Scorecard. It then discusses why a supplementary scorecard
should be used to measure IT efficiencies with respect to specific Data Center
operational roles. Finally, it goes to show how next-gen DCIM software should
build a role-based DSE framework to achieve organizational objectives and goals.
3
Scorecard Examples

Management Scorecards
What are they?

Organizational Performance Management frameworks
Mix of financial & non-financial measures against benchmarks
Started in late 1970s by Dr. Aubrey Daniels
Goal: alignment of top management towards common organizational
objectives & positive outcomes through key measurement parameters

What do they measure?

People Performance
Process Efficiency
Systems Efficiency

4
Balanced Scorecard
Developed by Kaplan & Norton in
1990s
Linked company strategy to Financial
& Non-Financial KPIs
Multiple variants, including industry-
specific templates
Technology recognized as enabler for
business process efficiencies and
driver for innovation and growth
Observations:
Tool for objectively incentivizing
executives on non-financial KPIs
Practitioners have not evolved
any IT infrastructure-related KPIs
nor directly linked them to the
BSC framework


5
Why a New Scorecard for Data Centers?
For the Data Center to function effectively/compete better
Who are responsible to make that happen?
How does one measure the new processes are being managed effectively?
What are the costs and how do they measure against benchmarks?
How does one measure that the innovations/ new systems are delivering
desired outcomes?
6
Digital Service Efficiency: A New Scorecard for Data Centers.
Digital Service Efficiency (DSE) methodology is ebays miles-per-gallon (MPG)
equivalent for viewing the productivity and efficiency of technical
infrastructure across four key areas: performance, cost, environmental impact
and revenue.
The DSE methodology equips decision-makers to see the results of their
technical infrastructure choices to date (i.e., what MPG they achieved with
their design and operations), and serves as the flexible tool they need when
faced with making new decisions (i.e., what knobs to turn to achieve maximum
performance across all dimensions). Ultimately, DSE enables balance within the
technology ecosystem by exposing how turning knobs in one dimension affects
the others.
Original Designer: Dean Nelson from eBay, Inc.
7

DSE Dashboard
ebays real-time dashboard available on http://tech.ebay.com/dashboard
8
Next Gen DCIM
Delivering Role-Based DSE Scorecard
9

DCIM & Role-Based Digital Services Efficiency

Todays DCIM
Current DCIM measures real-time:
IT Asset utilization
DC Power usage (PUE, CUE)

Cooling Requirements
Floor & Rack Space
Occupancy
Data analyzed for improving
Energy Efficiencies & Capacity
Planning
Helps to predict & prevent failures

Next-Gen DCIM
DCIM DSE will provide Role-Based
Scorecards
DCIM DSE will provide granular
cost measurement across
complete IT infrastructure
DCIM DSE scorecard will measure
Infrastructure Capability for
Process Improvements &
Technology Innovations.


10
Data Center Operations & Roles
Data Center Manager
Facility Staff IT Staff
Data Center Manager: Responsible
for overall data center operations
Data Center Facility Staff:
Responsible for data center facilities
operations
Data Center IT Staff: Responsible for
data center IT operations
11
DCIM DSE Scorecard
For Facility Staff
12
Data Center Facility Staff
F
a
c
i
l
i
t
y

K
R
A

Infrastructure monitoring & health check
Scheduled & Preventive Maintenance
Incident and Problem Management
Maintaining Energy efficiency
Uptime Reporting
F
a
c
i
l
i
t
y

K
P
I
s

13
Infrastructure Monitoring KRA & KPIs
DCIM provides better DC facility monitoring by
Real-time monitoring of power systems >
Electrical panels (HT & LT panels), UPS, PDUs
(row & rack)
Real-time monitoring of cooling systems >
Chillers, PACs, AHU
Real-time monitoring of environmental
statistics of DC > temperature, humidity, water-
leak, smoke, fire
Ability to monitor above subsystems through a
single dashboard and get alerts on abnormal
conditions over email/SMS
KRA: Data Center
infrastructure monitoring
& health check
D
C
I
M

Cooling KPIs UPS KPIs Environment KPIs
Fan Runtime Utility Line & Output
Voltage
Cabinet Internal
Temperature
Supply Air
Temperature
Power Loss Cabinet Internal
Humidity
Supply Air Humidity UPS Load Room Ambient
Temperature/Humi
dity
Rack Cooling Index Remaining Battery
Capacity
Smoke
Return Temperature
Index
Internal UPS
temperature
Water Leak
Detection
Return Air Humidity UPS battery run time
remaining before
battery exhaustion
Cabinet Door Ajar
Power Consumption
(kW)
The elapsed time since
the UPS has switched to
battery power
Motion
14
Maintenance KRA & KPIs
Real-time monitoring & alerts help staff
during routine checks as well as preventing
failures of facility equipment.
Helps scheduling routine maintenance for
facility devices
Breakdown maintenance analysis prevents
similar failures or enables faster recovery
KRA: Preventive &
Breakdown Maintenance
D
C
I
M

Scheduled Maintenance Breakdown Maintenance
Age of Device Failure Rate
Criticality of Device Mean Time Between
Failures
Date of Last Check-Up Mean Time To Repair
Check-Up Frequency Total Maintenance Cost
Asset Replacement Value
Condition Based
Maintenance %
Uptime
Required Time
Spare Part Used Versus
Availability
Immediate Corrective Maint.
Time
Total DT Related to
Maintenance
15
Incident Management KRA & KPIs
DCIM enforces ITSM best practice framework on
data center facility operations and ensures that
all incidents, service requests are tracked till
closure
KRA: Incident and Problem
Management
D
C
I
M

Incident Measures Resolution Measures
Number of Incidents Mean Response Time
versus target response time
Breakdown of incidents at
each stage (logged, WIP
and closed)
Mean elapsed time for
incident resolution (Turn
around Time)

Number and % of major
incidents
% of incidents resolved
within target resolution
time

Number of incidents
reopened as % of total
Number and % of incidents
incorrectly assigned

Breakdown of incidents by
time of day
Number and % of incidents
incorrectly categorized


16
DCIM: Helping facility staff with their PUE KPI
Ensure a stable PUE for the
data center
DCIM monitors data center PUE at real-time and
also does analytics on historical PUE data to
recommend ways to improve PUE
KPI: Maintain efficiency
level (PUE)
DCIM
Other power management
measures: watt per sf, RCI
17
DCIM: Helping facility staff in their Uptime KPI
Periodic reporting of
Facility uptime, RTO & RPO
statistics of Facility Services
& subsystems.
DCIM provides Facility Uptime and recovery
metrics. Includes reporting on health &
functional statistics of facility subsystems like
power, cooling and environmental components.
DCIM provides dashboards, analytics and
scheduled reports on facility uptime, DC energy
efficiency (PUE) and incident management
KPI: Facility Uptime as per
SLA
DCIM
18
DCIM DSE Scorecard
For IT Staff
19
KRA: Data Center IT Staff
I
T

K
R
A

IT Monitoring
IT Hardware Maintenance
IT Asset Management
IT Vendor/Contract Management
Business Continuity
Reporting
I
T

K
P
I
s

20
Monitoring & Provisioning KRAs & KPIs
Real-time monitoring of resource utilization of IT
devices: server CPU, memory, storage, network
bandwidth.
KRA: IT Monitoring &
Provisioning
D
C
I
M

Proactive monitoring enables alerts when
thresholds are breached.
Auto Provisioning of Racks & Devices
Virtualization Planner Identifies servers that can
be virtualized. Also identifies under-utilized IT
devices; recommends retirement, replacement.
Monitoring Provisioning
CPU Utilization Time to Harden a New
Server
Memory Utilization Time to Provision a New
Device
Power Consumption Time to Provision New Rack
space
Storage Utilization versus Free
Storage
Time to Virtualize a new
system
Server Uptime versus Target Time to replace a legacy
system
Failures Prevented Due to
proactive monitoring
Time to decommission a
legacy system
Failures due to human errors Time to install patches &
updates
21
IT Hardware Maintenance KRA & KPIs
DCIM helps schedule preventive
maintenance (PM) based on following:

Age of a device as recorded in DCIM

Utilization/load of device as monitored by
DCIM

DCIM helps IT staff understand cascading
effect of temporary unavailability (due to
PM) of a particular device: send prior
notification
KRA: IT Hardware
Maintenance
D
C
I
M

Scheduled Maintenance Breakdown Maintenance
Age of Device Failure Rate
Criticality of Device based on
utilization and application
hosted
Mean Time Between
Failures
Date of Last Check-Up Mean Time To Repair
Date of last upgrade/nature of
upgrade
Total Maintenance Cost
Asset Replacement Value
Condition Based Maintenance
%
Uptime
Required Time
Spare Part Used Versus
Availability
Immediate Corrective
Maint. Time
Total DT Related to
Maintenance
22
IT Asset Management KRA & KPIs
DCIM serves as enterprise asset
management software for both IT & Facilities.
DCIM auto-discovers intelligent assets and
creates asset database.
DCIM helps manage IT asset relationships
DCIM also maintains information about
redundant assets in HA and DR setup

KRA: IT Asset Management
D
C
I
M

Asset Management
Time taken to add or delete intelligent &
Non-intelligent asset
Time taken to update due to MAC
Time taken to add interdependencies
between assets
% accuracy of asset database
% Over & Under Provisioned
23
Vendor/Contract Management KRA & KPIs
DCIM tracks support renewal dates
Tracks hardware vendor/supplier and services
provider
KRA: Vendor/Contract
Management
D
C
I
M

Vendor Management
% of systems out of support
renewal
% Uptime by device category and
vendor
% Contractors Compliance by SLA
terms
24
Business Continuity KRA & KPIs
DCIM helps in better impact analysis of outages
and in faster RCA of any incident and thereby
helps in faster turn-around-time
KRA: Business Continuity
D
C
I
M

Business Continuity
Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
Actual versus RTO & RPO
25
DCIM: Helping IT staff in their Reporting KRA
DCIM provides superior reporting
on IT infra availability, resource
utilization and incident
management
KRA: Reporting
D
C
I
M

Trend Comparison for Multiple Servers
26
DCIM DSE Scorecard
For Data Center Manager
27
KRA: Data Center Manager
D
C

M
a
n
a
g
e
r

K
R
A

Increase profitability by controlling data center cost
Minimize DC failure and improve availability
Improve operational efficiency and meet business SLA
Data center capacity planning
Adopt Green practices for sustainable DC operations
Reports & Analytics
D
C

M
a
n
a
g
e
r

K
P
I
s

28
Data Center Manager Cost & Profitability KRA & KPIs
Control CapEx:
Repurpose under-utilized
servers
Discover stranded capacities
& defer costly upgrade

Reduce OpEx:
Reduce cooling costs
Reduce server footprint




KRA: Increase profitability
by controlling data center
cost

D
C
I
M

29
Data Center Manager Availability KRA & PIs
Ability to predict
failures
Better impact analysis
in the event of
subsystem/component
failure
Faster RCA and Turn
around Time capabilities
KRA: Minimize DC failure
and improve availability


D
C
I
M

Actuals
Number of Incidents/alarms







SLA Benchmarks

Breakdown of alarms at each
stage (logged, WIP and
closed)
Major alarms by type
Facilities: Fire, Temp, .
IT: Server, Storage,
Application
RTO
RPO
30
Data Center Manager Operational Efficiency vs. SLA
DCIM automates critical data center processes
like Asset Management, Capacity Planning and
Provisioning, thereby minimizing human error,
increasing accuracy and data integrity and
improving operational efficiency of the data
center.

KRA: Improve operational
efficiency and meet
business SLA



D
C
I
M

Actuals
Asset DB Accuracy










SLA Benchmarks

Time and Cost to Provision
additional resources
Availability by Servers,
Storage and Applications

Watt Per Rack and Watt per
sq ft
PUE & CUE
31
Data Center Manager: Capacity Planning KRA & KPIs
Monitor current capacity
utilization
Forecast future capacity
requirement accurately
Design and implement
critical capacities efficiently
without under/over-
provisioning
KRA: Data center capacity
planning




D
C
I
M

Monitoring Planning & Forecasting
Incidents due to Capacity Shortages

Exactness of Capacity Forecast

Capacity Adjustments % reduction in panic buying
Unplanned Capacity Adjustments
% reduction in lost business due to
inadequate capacity
Resolution Time of Capacity Shortage Capacity Reserves
Percentage of Capacity Monitoring
Relative reduction in cost of
production of Capacity Plan
Sources: 1. Clemson Computing & Information Technology
2. IT Process Maps
32
Data Center Manager Green Practices KRA & KPIs
Monitor energy
consumption in the data
center till the lowest level
Find ways to reduce
energy consumption and
improve efficiency
Ensure that DC operations
comply with organizations
sustainability goals

KRA: Adopt Green practices
for sustainable DC operations







D
C
I
M

33
Data Center Manager Reporting KRA & KPIs
Reports & Analytics on

- Uptime and availability
- Energy efficiency and health
- Data center costs and
savings
- Capacity/Resource
utilization and availability
- Operational efficiency and
SLA Compliance

KRA: Reports & Analytics





D
C
I
M

34
How Will DSE Scorecard Help Data Center Operations?
Link Back to Organizational Vision & Strategy & BSC


Are the Data Centre
Infrastructure & Capital Costs
aligned to process
improvements?


Have we been able to reduce
Infrastructure OpEx?
Are we maintaining a Risk-free
Data Centre Infrastructure?
Is the infrastructure delivering on
the technology innovation?
Next Gen
DCIM w/ DSE
Scorecard
35
Thank You
Shekhar Dasgupta
[email protected]
Mobile: 408-431-1044

You might also like