ETS 9000mailboxes
ETS 9000mailboxes
In Exchange 2010 Tested Solutions, Microsoft and participating server, storage, and network
partners examine common customer scenarios and key design decision points facing customers
who plan to deploy Microsoft Exchange Server 2010. Through this series of white papers, we
provide examples of well-designed, cost-effective Exchange 2010 solutions deployed on
hardware offered by some of our server, storage, and network partners.
You can download this document from the Microsoft Download Center.
Applies to:
Microsoft Exchange Server 2010 release to manufacturing (RTM)
Microsoft Exchange Server 2010 with Service Pack 1 (SP1)
Windows Server 2008 R2
Windows Server 2008 R2 Hyper-V
Table of Contents
Solution Summary
Customer Requirements
Design Assumptions
Solution Design
Determine Whether Client Access and Hub Transport Server Roles Will Be Deployed in
Separate Virtual Machines
Determine Number of Client Access and Hub Transport Server Combo Virtual Machines
Required
Determine Memory Required per Combined Client Access and Hub Transport Virtual
Machines
Plan Namespaces
Solution Overview
Database Layout
Storage Configuration
This document provides an example of how to design, test, and validate an Exchange Server
2010 solution for environments with 9,000 mailboxes deployed on Dell server and storage
solutions and F5 load balancing solutions. One of the key challenges with designing Exchange
2010 environments is examining the current server and storage options available and making the
right hardware choices that provide the best value over the anticipated life of the solution.
Following the step-by-step methodology in this document, we walk through the important design
decision points that help address these key challenges while ensuring that the customer's core
business requirements are met. After we have determined the optimal solution for this customer,
the solution undergoes a standard validation process to ensure that it holds up under simulated
production workloads for normal operating, maintenance, and failure scenarios.
Return to top
Solution Summary
The following tables summarize the key Exchange and hardware components of this solution.
Exchange components
Exchange component
Value or description
9000
None
Exchange component
Value or description
Site resiliency
Yes
Virtualization
Hyper-V
Hardware components
Hardware component
Value or description
Server partner
Dell
Server model
PowerEdge M610
Server type
Blade
Processor
Storage partner
Dell EqualLogic
Storage type
Disk type
Return to top
Customer Requirements
One of the most important first steps in Exchange solution design is to accurately summarize the
business and technical requirements that are critical to making the correct design decisions. The
following sections outline the customer requirements for this solution.
Return to top
Value
9000
0%
100%
Value
750 MB (742)
Yes
450 @ 4 gigabytes (GB)
900 @ 1 GB
7650 @ 512 MB
included
750 MB
Value
Yes
450 @ 150 messages per day
8550 @ 100 messages per day
75
100
Value
% in Exchange ActiveSync
Return to top
Value
9000
The following table outlines the geographic distribution of datacenters that could potentially
support the Exchange e-mail infrastructure.
Geographic distribution of datacenters
Datacenter site requirements
Value or description
9000
Yes
Return to top
Value
Value or description
No
Yes
Yes
Yes
No
Not applicable
14 days
Return to top
Design Assumptions
This section includes information that isn't typically collected as part of customer requirements,
but is critical to both the design and the approach to validating the design.
Return to top
Value
<70%
<70%
<70%
<70%
<70%
<80%
<80%
<80%
<80%
<80%
Return to top
Value or description
20%
Value or description
1%
No
20%
Yes
Yes
Value or description
20%
None
Return to top
Solution Design
The following section provides a step-by-step methodology used to design this solution. This
methodology takes customer requirements and design assumptions and walks through the key
design decision points that need to be made when designing an Exchange 2010 environment.
Return to top
Active/Passive distribution Active mailbox database copies are deployed in the primary
datacenter and only passive database copies are deployed in a secondary datacenter. The
secondary datacenter serves as a standby datacenter and no active mailboxes are hosted in
the datacenter under normal operating conditions. In the event of an outage impacting the
primary datacenter, a manual switchover to the secondary datacenter is performed and active
databases are hosted there until the primary datacenter returns online.
Active/Passive distribution
10
Active/Active distribution (single DAG) Active mailbox databases are deployed in the
primary and secondary datacenters. A corresponding passive copy is located in the alternate
datacenter. All Mailbox servers are members of a single database availability group (DAG). In
this model, the wide area network (WAN) connection between two datacenters is potentially a
single point of failure. Loss of the WAN connection results in Mailbox servers in one of the
datacenters going into a failed state due to loss of quorum.
Active/Active distribution (single DAG)
11
Disaster recovery In the event of a hardware or software failure, multiple database copies
in a DAG enable high availability with fast failover and no data loss. DAGs can be extended
to multiple sites and can provide resilience against datacenter failures.
Recovery of accidentally deleted items With the new Recoverable Items folder in
Exchange 2010 and the hold policy that can be applied to it, it's possible to retain all deleted
and modified data for a specified period of time, so recovery of these items is easier and
faster. For more information, see Messaging Policy and Compliance, Understanding
Recoverable Items, and Understanding Retention Tags and Retention Policies.
12
Long-term data storage Sometimes, backups also serve an archival purpose. Typically,
tape is used to preserve point-in-time snapshots of data for extended periods of time as
governed by compliance requirements. The new archiving, multiple-mailbox search, and
message retention features in Exchange 2010 provide a mechanism to efficiently preserve
data in an end-user accessible manner for extended periods of time. For more information,
see Understanding Personal Archives, Understanding Multi-Mailbox Search, and
Understanding Retention Tags and Retention Policies.
There are technical reasons and several issues that you should consider before using the
features built into Exchange 2010 as a replacement for traditional backups. Prior to making this
decision, see Understanding Backup, Restore and Disaster Recovery.
*Design Decision Point*
In this example, maintaining tape backups has been difficult, and testing and validating restore
procedures hasn't occurred on a regular basis. Therefore, using Exchange native data protection
in place of traditional backups as the database resiliency strategy is preferred.
High availability database copy This database copy is configured with a replay lag time of
zero. As the name implies, high availability database copies are kept up-to-date by the
system, can be automatically activated by the system, and are used to provide high
availability for mailbox service and data.
Lagged database copy This database copy is configured to delay transaction log replay for
a period of time. Lagged database copies are designed to provide point-in-time protection,
which can be used to recover from store logical corruptions, administrative errors (for
13
example, deleting or purging a disconnected mailbox), and automation errors (for example,
bulk purging of disconnected mailboxes).
*Design Decision Point*
In this example, all three mailbox database copies will be deployed as high availability database
copies. The primary need for a lagged copy is to provide the ability to recover single deleted
items. This requirement can be met using the deleted items retention feature.
You have active mailbox users in multiple sites (active/active site configuration).
Design for all copies activated In this model, the Mailbox server role is sized to
accommodate the activation of all database copies on the server. For example, a Mailbox
server may host four database copies. During normal operating conditions, the server may
have two active database copies and two passive database copies. During a failure or
maintenance event, all four database copies would become active on the Mailbox server.
14
This solution is usually deployed in pairs. For example, if deploying four servers, the first pair
is servers MBX1 and MBX2, and the second pair is servers MBX3 and MBX4. In addition,
when designing for this model, you will size each Mailbox server for no more than 40 percent
of available resources during normal operating conditions. In a site resilient deployment with
three database copies and six servers, this model can be deployed in sets of three servers,
with the third server residing in the secondary datacenter. This model provides a three-server
building block for solutions using an active/passive site resiliency model.
This model can be used in the following scenarios:
Active/Passive multisite configuration where failure domains (for example, racks, blade
enclosures, and storage arrays) require easy isolation of database copies in the primary
datacenter
Configurations that aren't required to survive the simultaneous loss of any two Mailbox
servers in the DAG
This model requires servers to be deployed in pairs for single site deployments and sets of
three for multisite deployments. The following table illustrates a sample database layout for
this model.
Design for all copies activated
Design for targeted failure scenarios In this model, the Mailbox server role is designed to
accommodate the activation of a subset of the database copies on the server. The number of
15
database copies in the subset will depend on the specific failure scenario that you're
designing for. The main goal of this design is to evenly distribute active database load across
the remaining Mailbox servers in the DAG.
This model should be used in the following scenarios:
Configurations required to survive the simultaneous loss of any two Mailbox servers in
the DAG
The DAG design for this model requires between 3 and 16 Mailbox servers. The following
table illustrates a sample database layout for this model.
Design for targeted failure scenarios
16
Secondary datacenter
12
Return to top
17
variations per mailbox. The Mailbox Server Role Requirements Calculator does these
calculations for you. You can also use the following information to do the calculations manually.
The following calculations are used to determine the mailbox size on disk for the three mailbox
tiers in this solution:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average
message size)
Average size on disk = [(657 7650) + (1205 900) + (4548 450)] 9000
= 907 MB
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
18
Total database capacity = (database size + index size) 0.80 to add 20% volume free
space
= (5890 + 589) 0.8
= 8099 GB
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Total database capacity = (database size + index size) 0.80 to add 20% volume free
space
= (1271 + 127) 0.8
= 1747 GB
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average
message size)
Total database capacity = (database size + index size) 0.80 to add 20% volume free
space
= (2400+ 240) 0.8
= 3301 GB
19
= 13147 GB
= 12.3 terabytes
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Log files size = (log file size number of logs per mailbox per day number of days
required to replace failed infrastructure number of mailbox users) + (1% mailbox move
overhead)
= (1 MB 20 3 7650) + (7650 0.01 512)
= 498168 MB
= 487 GB
Total log capacity = log files size 0.80 to add 20% volume free space
= (487) 0.80
= 608 GB
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Log files size = (log file size number of logs per mailbox per day number of days
required to replace failed infrastructure number of mailbox users) + (1% mailbox move
overhead)
= (1 MB 20 3 900) + (900 0.01 1024)
= 63216 MB
= 62 GB
Total log capacity = log files size 0.80 to add 20% volume free space
= (62) 0.80
= 77 GB
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average
message size)
20
Log files size = (log file size number of logs per mailbox per day number of days
required to replace failed infrastructure number of mailbox users) + (1% mailbox move
overhead) = (1 MB 30 3 450) + (450 0.01 4096)
= 58932 MB
= 58 GB
Total log capacity = log files size 0.80 to add 20% volume free space
= (58) 0.80
= 72 GB
Value
907
13147
757
13904
41712
41
Return to top
21
because storage subsystems can handle sequential I/O much more efficiently than random I/O.
These operations include background database maintenance, log transactional I/O, and log
replication I/O. In this step, you calculate the total IOPS required to support all mailbox users,
using the following:
Note:
To determine the IOPS profile for a different message profile, see the table "Database
cache and estimated IOPS per mailbox based on message activity" in Understanding
Database and Log Performance Factors.
Total required IOPS = IOPS per mailbox user number of mailboxes I/O overhead factor
= 0.15 450 1.2
= 81
Total required IOPS (all tiers) = 1107
Average IOPS per mailbox = 1107 9000 = 0.123
The high level storage IOPS requirements are approximately 1107. When choosing a storage
solution, ensure that the solution meets this requirement.
Return to top
22
Storage controllers:
23
Components
data protection.
data protection.
Volumes
Up to 1024.
Up to 1024.
RAID support
Network interfaces
Reliability
Redundant, hot-swappable
controllers, power supplies,
cooling fans, and disks.
Redundant, hot-swappable
controllers, power supplies,
cooling fans, and disks.
24
For a list of supported disk types, see "Physical Disk Types" in Understanding Storage
Configuration.
To help determine which disk type to choose, see "Factors to Consider When Choosing Disk
Types" in Understanding Storage Configuration.
Return to top
25
IOPS per mailbox based on message activity and mailbox database cache" in Understanding the
Mailbox Database Cache.
The following table outlines the database cache per user for various message profiles.
Database cache per user
Messages sent or received per mailbox per day
50
3 MB
100
6 MB
150
9 MB
200
12 MB
In this step, you determine high level memory requirements for the entire environment. In a later
step, you use this result to determine the amount of physical memory needed for each Mailbox
server. Use the following information:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average
message size)
26
Return to top
Megacycles per
Megacycles per
Megacycles per
per day
mailbox database
passive mailbox
passive mailbox
database
50
0.1
0.15
100
0.2
0.3
150
0.3
0.45
200
0.4
0.6
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
27
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average
message size)
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average
message size)
28
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average
message size)
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average
message size)
29
If you expect server capacity to be underutilized and anticipate better utilization, you may
purchase fewer servers as a result of virtualization.
You may want to use Windows Network Load Balancing when deploying Client Access, Hub
Transport, and Mailbox server roles on the same physical server.
If your organization is using virtualization in all server infrastructure, you may want to use
virtualization with Exchange, to be in alignment with corporate standard policy.
30
The Dell eleventh generation PowerEdge servers offer industry leading performance and
efficiency. Innovations include increased memory capacity and faster I/O rates, which help deliver
the performance required by today's most demanding applications.
Description
Chassis\enclosure
Power supplies
Cooling fans
Input device
31
Components
Description
Management
Description
Processors (x2)
Form factor
Memory
12 DIMM slots
1 GB/2 GB/4 GB/8 GB/16 GB ECC DDR3
Support for up to 192 GB using 12 16 GB
DIMMs
Drives
32
Components
Description
I/O slots
Description
Processors (x2)
Form factor
Memory
18 DIMM slots
1 GB/2 GB/4 GB/8 GB/16 GB ECC DDR3
Support for up to 192 GB using 12 16 GB
DIMMs
Drives
SSD:
33
Components
Description
Hard Drives
I/O slots
Description
Processors (x2)
Form factor
2U rack
Memory
Drives
I/O slots
34
Description
Processors (x4)
Form factor
2U rack
Memory
Drives
I/O slots
6 PCIe G2 slots:
Five x8 slot
One x4 slot
35
To help simplify the process of obtaining the benchmark value for your server and processor, we
recommend you use the Exchange Processor Query tool. This tool automates the manual steps
to determine your planned processor's SPECInt 2006 rate value. To run this tool, your computer
must be connected to the Internet. The tool uses your planned processor model as input, and
then runs a query against the Standard Performance Evaluation Corporation Web site returning
all test result data for that specific processor model. The tool also calculates an average SPECint
2006 rate value based on the number of processors planned to be used in each Mailbox server
Use the following calculations:
Processor and server platform = Intel X5550 2.6 gigahertz (GHz) in a Dell M610
Adjusted megacycles per core = (new platform per core value) (hertz per core of baseline
platform) (baseline per core value)
= (29.25 3330) 18.75
= 5195
Adjusted megacycles per server = adjusted megacycles per core number of cores
= 5195 8
= 41558
36
Available megacycles per VM = adjusted available megacycles per server number of VMs
= 37403 2
= 18701
Return to top
37
Number of VMs required = total mailbox count in site active mailboxes per VM
= 9000 4250
= 2.2
Based on processor capacity, minimum of three Mailbox server VMs to support the anticipated
peak work load during normal operating conditions is required.
Secondary datacenter
12
Return to top
Number of active mailboxes per server = total mailbox count server count
= 9000 6
= 1500
38
Step 2: Determine number of active mailboxes per server worst case failure
event
To determine the number of active mailboxes per server worst case failure event, use the
following calculation:
Number of active mailboxes per server = total mailbox count server count
= 9000 3
= 3000
Return to top
Step 1: Determine database cache requirements per server for the worst
case failure scenario
In a previous step, you determined that the database cache requirements for all mailboxes was
55 GB and the average cache required per active mailbox was 6.2 MB.
To design for the worst case failure scenario, you calculate based on active mailboxes residing
on three of six Mailbox servers. Use the following calculation:
Memory required for database cache = number of active mailboxes average cache per
mailbox
= 3000 6.2 MB
= 18600 MB
= 18.2 GB
24 GB
17.6 GB
32 GB
24.4 GB
48 GB
39.2 GB
The recommended memory configuration to support 18.2 GB of database cache for a mailbox
role server is 32 GB.
Return to top
39
1:1
Minimum supported
Recommended maximum
4 GB
1 GB per core
4 GB
2 GB per core
2 GB per core
Based on the preceding table, each combination Client Access and Hub Transport server VM
requires a minimum of 8 GB of memory.
Return to top
40
The correct distribution is one Client Access and Hub Transport server role VM on each of the
physical host servers and one Mailbox server role VM on each of the physical host servers. So in
this solution there will be nine Hyper-V root servers each supporting one Client Access and Hub
Transport server role VM and one Mailbox server role VM.
Virtual machine distribution (correct)
Return to top
Root server memory = Client Access and Hub Transport server role VM memory + Mailbox
server role VM memory
= 8 GB + 32 GB
= 40 GB
41
Database configuration
In this solution, a minimum of 12 databases will be used. The exact number of databases may be
adjusted in future steps to accommodate the database copy layout.
Return to top
42
In the previous step, it was determined that PS6500E represents three failure domains. Consider
when all six blades in the first enclosure to the two PS6500Es in the primary datacenter are
connected. In the event that there is an issue impacting the enclosure, there are no other servers
in the primary datacenter and you're forced to conduct a manual site switchover to the secondary
datacenter. A better design is to deploy three blade enclosures, each with three of the nine server
blades. Pair the servers in the first enclosure with the first PS6500E, the servers in the second
enclosure with the second PS6500E, and the three servers in the secondary site with the
PS6500E in the secondary site. By aligning the server and storage failure domains, the database
copies are set in a manner that protects against issues with either the storage array or an entire
blade enclosure.
Failure domains associated with servers in two sites
Return to top
Unique database count = total number of Mailbox servers in primary datacenter number of
Mailbox servers in failure domain
=63
43
=18
MBX1
DB1
C1
DB2
C1
DB3
C1
MBX2
DB4
C1
DB5
C1
DB6
C1
MBX3
DB7
C1
DB8
C1
DB9
C1
MBX4
DB10
C1
DB11
C1
DB12
C1
MBX5
DB13
C1
DB14
C1
DB15
C1
MBX6
DB16
C1
DB17
C1
DB18
C1
Next distribute the C2 database copies (or the copies with an activation preference value of 2) to
the servers in the second failure domain. During the distribution, you distribute the C2 copies
across as many servers in the alternate failure domain as possible to ensure that a single server
failure has a minimal impact on the servers in the alternate failure domain.
44
MBX1
DB1
C1
DB2
C1
DB3
C1
MBX2
MBX3
MBX4
MBX5
MBX6
C2
C2
C2
DB4
C1
DB5
C1
DB6
C1
C2
C2
C2
DB7
C1
DB8
C1
DB9
C1
C2
C2
C2
Consider the opposite configuration for the other failure domain. Again, you distribute the C2
copies across as many servers in the alternate failure domain as possible to ensure that a single
server failure has a minimal impact on the servers in the alternate failure domain.
Database copy layout with C2 database copies distributed in the opposite configuration
DB
MBX1
DB10
C2
DB11
MBX2
C2
C2
DB17
DB18
MBX6
C1
C1
C2
DB15
DB16
MBX5
C1
C2
DB14
MBX4
C1
DB12
DB13
MBX3
C1
C2
C1
C2
C1
C2
C1
C2
C1
45
46
In a maintenance scenario, you could move the active mailbox databases from the servers in the
first failure domain (MBX1, MBX2, MBX3) to the servers in the second failure domain (MBX4,
MBX5, MBX6), complete maintenance activities, and then move the active database copies back
to the C1 copies on the servers in the first failure domain. You can conduct maintenance activities
on all servers in the primary datacenter in two passes.
Database copy layout during server maintenance
47
MBX1
DB1
C1
DB2
C1
DB3
C1
MBX2
MBX3
C1
DB6
C1
C1
DB9
C1
C2
C2
DB18
C3
C2
C3
C3
C2
C3
C2
C3
C3
C1
C2
C3
C1
C2
C3
C1
C2
DB15
DB17
C3
C1
DB12
C3
C2
DB8
C3
C1
C2
C2
C3
C1
C3
C1
C2
C1
C2
MBX9
C3
C2
C1
DB16
MBX8
C3
C2
DB7
DB14
MBX7
C2
DB5
DB13
MBX6
C2
C1
DB11
MBX5
C2
DB4
DB10
MBX4
C3
C3
C1
C3
48
Return to top
Value
907
13147
757
13904
41712
41
49
Active databases
Passive databases
Lagged databases
Total LUNs
27
Database capacity = [(number of mailbox users average mailbox size on disk) + (20% data
overhead factor)] + (10% content indexing overhead)
= [(500 907) + (90700)] + 54420
= 598620 MB
= 585 GB
Log capacity = (log size number of logs per mailbox per day number of days required to
replace hardware number of mailbox users) + (mailbox move percent overhead)
= (1 MB 20.5 3 500) + (500 0.01 907 MB)
=35285 MB
=35 GB
LUN size = [(database capacity) + (log capacity)] +20% volume free space
= [(585) + (35)] .8
= 775 GB
21299 27 = 789 GB
The actual LUN size will be 789 GB, which will support the required LUN size of 775 GB.
50
Value
Usable capacity
21299 GB
27
775 GB
789 GB
Array1
Database
Array2
Database
Array3
DB1
C1
DB1
C2
DB1
C3
DB2
C1
DB2
C2
DB2
C3
DB3
C1
DB3
C2
DB3
C3
DB4
C1
DB4
C2
DB4
C3
DB5
C1
DB5
C2
DB5
C3
DB6
C1
DB6
C2
DB6
C3
DB7
C1
DB7
C2
DB7
C3
DB8
C1
DB8
C2
DB8
C3
DB9
C1
DB9
C2
DB9
C3
DB10
C2
DB10
C1
DB10
C3
DB11
C2
DB11
C1
DB11
C3
DB12
C2
DB12
C1
DB12
C3
DB13
C2
DB13
C1
DB13
C3
DB14
C2
DB14
C1
DB14
C3
DB15
C2
DB15
C1
DB15
C3
DB16
C2
DB16
C1
DB16
C3
DB17
C2
DB17
C1
DB17
C3
DB18
C2
DB18
C1
DB18
C3
51
Return to top
Plan Namespaces
When you plan your Exchange 2010 organization, one of the most important decisions that you
must make is how to arrange your organization's external namespace. A namespace is a logical
structure usually represented by a domain name in Domain Name System (DNS). When you
define your namespace, you must consider the different locations of your clients and the servers
that house their mailboxes. In addition to the physical locations of clients, you must evaluate how
they connect to Exchange 2010. The answers to these questions will determine how many
namespaces you must have. Your namespaces will typically align with your DNS configuration.
We recommend that each Active Directory site in a region that has one or more Internet-facing
Client Access servers have a unique namespace. This is usually represented in DNS by an A
record, for example, mail.contoso.com or mail.europe.contoso.com.
For more information, see Understanding Client Access Server Namespaces.
There are a number of different ways to arrange your external namespaces, but usually your
requirements can be met with one of the following namespace models:
52
Consolidated datacenter model This model consists of a single physical site. All servers
are located within the site, and there is a single namespace, for example, mail.contoso.com.
Single namespace with proxy sites This model consists of multiple physical sites. Only
one site contains an Internet-facing Client Access server. The other sites aren't exposed to
the Internet. There is only one namespace for the sites in this model, for example,
mail.contoso.com.
Single namespace and multiple sites This model consists of multiple physical sites. Each
site can have an Internet-facing Client Access server. Alternatively, there may be only a
single site that contains Internet-facing Client Access servers. There is only one namespace
for the sites in this model, for example, mail.contoso.com.
Regional namespaces This model consists of multiple physical sites and multiple
namespaces. For example, a site located in New York City would have the namespace
mail.usa.contoso.com, a site located in Toronto would have the namespace
mail.canada.contoso.com, and a site located in London would have the namespace
mail.europe.contoso.com.
Multiple forests This model consists of multiple forests that have multiple namespaces. An
organization that uses this model could be made up of two partner companies, for example,
Contoso and Fabrikam. Namespaces might include mail.usa.contoso.com,
mail.europe.contoso.com, mail.asia.fabrikam.com, and mail.europe.fabrikam.com.
53
server arrays. A single namespace will be load balanced across the Client Access servers in the
primary active Client Access server array using redundant hardware load balancers. In a site
failure, the namespace will be load balanced across the Client Access servers in the secondary
Client Access server array.
Return to top
BIG-IP Local Traffic Manager (LTM) BIG-IP LTM is designed to monitor and manage traffic
to Client Access, Hub Transport, Edge Transport, and Unified Messaging servers, while
ensuring that users are always sent to the best performing resource. Whether your users are
connecting via MAPI, Outlook Web App, ActiveSync, or Outlook Anywhere, BIG-IP LTM will
load balance the connections appropriately, allowing you to seamlessly scale to any size
deployment. BIG-IP LTM now offers several modules that also provide significant value in an
Exchange environment, which include:
Access Policy Manager (APM) Designed to secure access to Exchange resources, APM
can authenticate users before they attach to your Exchange Client Access servers,
providing a strong perimeter security.
WAN Optimization Module (WOM) Focused on network optimization for WANs, WOM
has proven capable in accelerating DAG replication by over five times between
datacenters.
BIG-IP Global Traffic Manager (GTM) BIG-IP GTM can provide wide area resiliency,
providing disaster recovery and load balancing for those with multiple datacenter Exchange
deployments.
BIG-IP Application Security Manager (ASM) A fully featured Layer 7 firewall, ASM thwarts
HTTP, XML, and SMTP based attacks. By combining a negative and positive security model,
ASM provides protection against all L7 attacks, both known and unknown.
For more information about these technologies, see F5 Solutions for Exchange Server.
54
Sizing the appropriate F5 hardware model for your Exchange 2010 deployment is an exercise
best done with the guidance of your local F5 team. F5 offers production hardware-based and
software-based BIG-IP platforms that range from supporting up to 200 megabits per second
(Mbps) all the way up to 80 Gbps. To learn more about the specifications for each of the F5 BIGIP LTM hardware platforms, see BIG-IP System Hardware Datasheet.
Option 1: BIG-IP 1600 series
The BIG-IP 1600 offers all the functionality of TMOS in a cost-effective, entry-level platform for
intelligent application delivery.
BIG-IP 1600 appliance-based networking technologies
Components
Value or description
Traffic throughput
1 Gbps
Software compression
Included: 50 Mbps
Maximum: 1 Gbps
Processor
Memory
4 GB
Power supply
Typical consumption
Value or description
Traffic throughput
4 Gbps
Hardware SSL
Software compression
Included: 50 Mbps
55
Components
Value or description
Memory
8 GB
Power supply
Typical consumption
Value or description
Traffic throughput
6 Gbps
Hardware SSL
FIPS SSL
Software compression
Included: 50 Mbps
Maximum: 5 Gbps
Processor
Memory
8 GB
16
Power supply
Typical consumption
56
Security
Acceleration
IMAP
Outlook Anywhere
Average user expectations, such as number of messages per day average e-mail message
size
This information can be used to ensure the right BIG-IP LTM platform is selected.
*Design Decision Point*
The BIG-IP 3900 is selected for this solution. The 3900 4 GB capacity and connection count limits
are enough to cover normal usage as well as unexpected traffic spikes for 15,000 active
mailboxes with a 50 message per day profile. The quad core CPU is also capable enough to
handle the processing associated with connection and persistence handling.
Return to top
Connection mirroring This ensures the connection table in each BIG-IP LTM is mirrored to
its peer. This means that in case of a BIG-IP LTM failure, no connections are dropped
because the BIG-IP LTM failover partner is already aware of the previously established
connections and it assumes responsibilities for the network.
57
Network-based outage detection This ensures that a network outage is just as critical as a
server outage for the BIG-IP LTM, and that proper remediation steps need to be taken to
attempt to remedy the situation.
Besides deploying BIG-IP LTMs in redundant pairs, customers often build redundancy into the
architecture by building a multiple datacenter environment. BIG-IP GTM is designed to add
datacenter load balancing so that wide area resiliency is also achieved. For more information
about GTM, see Global Load Balancing Solutions.
Return to top
Solution Overview
The previous section provided information about the design decisions that were made when
considering an Exchange 2010 solution. The following section provides an overview of the
solution.
Return to top
58
Logical solution
Return to top
59
Physical solution
Return to top
60
Description
Server vendor
Dell
Server model
Processor
Chipset
Intel 5520/5500/X58
Memory
48 GB
Operating system
Virtualization
Microsoft Hyper-V
Internal disk
RAID-1
RAID controller
Network interface
Description
Physical or virtual
Hyper-V VM
Virtual processors
Memory
8 GB
Storage
Operating system
Exchange version
Return to top
61
Description
Physical or virtual
Hyper-V VM
Virtual processors
Memory
32 GB
Storage
Pass-through storage
9 volumes 789 GB
Operating system
Exchange version
Third-party software
None
Return to top
Database Layout
The following diagram illustrates the database layout across the primary and secondary
datacenters.
62
Database layout
Return to top
Description
Storage vendor
Dell
Storage model
EqualLogic PS6500E
63
Component
Description
Category
iSCSI
Disks
Active disks
46
Spares
RAID level
10
Usable capacity
20.8 terabytes
Storage Configuration
Each of the Dell EqualLogic PS6500E storage arrays used in the solution were configured as
illustrated in the following table.
Storage configuration
Component
Description
Storage enclosures
27
LUN size
798 GB
RAID level
RAID-10
The following table illustrates how the available storage was designed and allocated between the
three PS6500E storage arrays.
PS6500 storage array design and allocation
Database
Array1
Database
Array2
Database
Array3
DB1
C1
DB1
C2
DB1
C3
DB2
C1
DB2
C2
DB2
C3
DB3
C1
DB3
C2
DB3
C3
DB4
C1
DB4
C2
DB4
C3
DB5
C1
DB5
C2
DB5
C3
DB6
C1
DB6
C2
DB6
C3
64
Database
Array1
Database
Array2
Database
Array3
DB7
C1
DB7
C2
DB7
C3
DB8
C1
DB8
C2
DB8
C3
DB9
C1
DB9
C2
DB9
C3
DB10
C2
DB10
C1
DB10
C3
DB11
C2
DB11
C1
DB11
C3
DB12
C2
DB12
C1
DB12
C3
DB13
C2
DB13
C1
DB13
C3
DB14
C2
DB14
C1
DB14
C3
DB15
C2
DB15
C1
DB15
C3
DB16
C2
DB16
C1
DB16
C3
DB17
C2
DB17
C1
DB17
C3
DB18
C2
DB18
C1
DB18
C3
Return to top
Description
Vendor
Dell
Model
Ports
Port bandwidth
128 Gbps
For more information, download a .pdf file about the PowerConnect M6220 Ethernet Switch.
Return to top
65
Description
Vendor
F5
Model
BIG-IP 3900
Traffic throughput
4 Gbps
Hardware SSL
Software compression
Included: 50 Mbps
Maximum: 3.8 Gbps
Processor
Memory
8 GB
Power supply
Typical consumption
Return to top
Performance tests
Functional tests
66
Return to top
Tool Set
For validating Exchange storage sizing and configuration, we recommend the Microsoft
Exchange Server Jetstress tool. The Jetstress tool is designed to simulate an Exchange I/O
workload at the database level by interacting directly with the ESE, which is also known as Jet.
The ESE is the database technology that Exchange uses to store messaging data on the Mailbox
server role. Jetstress can be configured to test the maximum I/O throughput available to your
storage subsystem within the required performance constraints of Exchange. Or, Jetstress can
accept a target profile of user count and per-user IOPS, and validate that the storage subsystem
is capable of maintaining an acceptable level of performance with the target profile. Test duration
is adjustable and can be run for a minimal period of time to validate adequate performance or for
an extended period of time to additionally validate storage subsystem reliability.
The Jetstress tool can be obtained from the Microsoft Download Center at the following locations:
The documentation included with the Jetstress installer describes how to configure and execute a
Jetstress validation test on your server hardware.
Approach to Storage Validation
There are two main types of storage configurations:
With DAS or internal disk scenarios, there's only one server accessing the disk subsystem, so the
performance capabilities of the storage subsystem can be validated in isolation.
In SAN scenarios, the storage utilized by the solution may be shared by many servers and the
infrastructure that connects the servers to the storage may also be a shared dependency. This
requires additional testing, as the impact of other servers on the shared infrastructure must be
adequately simulated to validate performance and functionality.
67
validation requirements that can be met with additional testing, so this list isn't intended to be
exhaustive:
Validation of worst case database switchover scenario In this test case, the level of I/O
is expected to be serviced by the storage subsystem in a worst case switchover scenario
(largest possible number of active copies on fewest servers). Depending on whether the
storage subsystem is DAS or SAN, this test may be required to run on multiple hosts to
ensure that the end-to-end solution load on the storage subsystem can be sustained.
Validation of storage performance under storage failure and recovery scenario (for
example, failed disk replacement and rebuild) In this test case, the performance of the
storage subsystem during a failure and rebuild scenario is evaluated to ensure that the
necessary level of performance is maintained for optimal Exchange client experience. The
same caveat applies for a DAS vs. SAN deployment: If multiple hosts are dependent on a
shared storage subsystem, the test must include load from these hosts to simulate the entire
effect of the failure and rebuild.
%Processor Time
The report file shows various categories of I/O performed by the Exchange system:
Transactional I/O Performance This table reports I/O that represents user activity against
the database (for example, Outlook generated I/O). This data is generated by subtracting
background maintenance I/O and log replication I/O from the total I/O measured during the
test. This data provides the actual database IOPS generated along with I/O latency
measurements required to determine whether a Jetstress performance test passed or failed.
68
Background Database Maintenance I/O Performance This table reports the I/O
generated due to ongoing ESE database background maintenance.
Log Replication I/O Performance This table reports the I/O generated from simulated log
replication.
Total I/O Performance This table reports the total I/O generated during the Jetstress test.
Return to top
Tool Set
For validation of end-to-end solution performance and scalability, we recommend the Microsoft
Exchange Server Load Generator tool (Loadgen). Loadgen is designed to produce a simulated
client workload against an Exchange deployment. This workload can be used to evaluate the
performance of the Exchange system, and can also be used to evaluate the effect of various
configuration changes on the overall solution while the system is under load. Loadgen is capable
of simulating Microsoft Office Outlook 2007 (online and cached), Office Outlook 2003 (online and
cached), POP3, IMAP4, SMTP, ActiveSync, and Outlook Web App (known in Exchange 2007
and earlier versions as Outlook Web Access) client activity. It can be used to generate a single
protocol workload, or these client protocols can be combined to generate a multiple protocol
workload.
You can get the Loadgen tool from the Microsoft Download Center at the following locations:
The documentation included with the Loadgen installer describes how to configure and execute a
Loadgen test against an Exchange deployment.
Approach to Server Validation
When validating your server design, test the worst case scenario under anticipated peak
workload. Based on a number of data sets from Microsoft IT and other customers, peak load is
generally equal to 2x the average workload throughout the remainder of the work day. This is
referred to as the peak-to-average workload ratio.
69
Peak load
In this Performance Monitor snapshot, which displays various counters that represent the amount
of Exchange work being performed over time on a production Mailbox server, the average value
for RPC operations per second (the highlighted line) is about 2,386 when averaged across the
entire day. The average for this counter during the peak period from 10:00 through 11:00 is about
4,971, giving a peak-to-average ratio of 2.08.
To ensure that the Exchange solution is capable of sustaining the workload generated during the
peak average, modify Loadgen settings to generate a constant amount of load at the peak
average level, rather than spreading out the workload over the entire simulated work day.
Loadgen task-based simulation modules (like the Outlook simulation modules) utilize a task
profile that defines the number of times each task will occur for an average user within a
simulated day.
The total number of tasks that need to run during a simulated day is calculated as the number of
users multiplied by the sum of task counts in the configured task profile. Loadgen then
determines the rate at which it should run tasks for the configured set of users by dividing the
total number of tasks to run in the simulated day by the simulated day length. For example, if
Loadgen needs to run 1,000,000 tasks in a simulated day, and a simulated day is equal to 8
hours (28,800 seconds), Loadgen must run 1,000,000 28,800 = 34.72 tasks per second to meet
the required workload definition. To increase the amount of load to the desired peak average,
divide the default simulated day length (8 hours) by the peak-to-average ratio (2) and use this as
the new simulated day length.
Using the task rate example again, 1,000,000 14,400 = 69.44 tasks per second. This reduces
the simulated day length by half, which results in doubling the actual workload run against the
server and achieving our goal of a peak average workload. You don't adjust the run length
duration of the test in the Loadgen configuration. The run length duration specifies the duration of
the test and doesn't affect the rate at which tasks will be run against the Exchange server.
70
Normal operating conditions In this test case, the basic design of the solution is validated
with all components in their normal operating state (no failures simulated). The desired
workload is generated against the solution, and the overall performance of the solution is
validated against the metrics that follow.
Single server failure or single server maintenance (in site) In this test case, a single
server is taken down to simulate either an unexpected failure of the server or a planned
maintenance operation for the server. The workload that would normally be handled by the
unavailable server is now handled by other servers in the solution topology, and the overall
performance of the solution is validated.
50
10
40
100
20
80
71
Message profile
150
30
120
200
40
160
The following example assumes that each Mailbox server has 5,000 active mailboxes with a 150
messages per day profile (30 messages sent and 120 messages received per day).
Peak message delivery rate for 5,000 active mailboxes
Description
Calculation
Value
Message profile
120
5000
5000 120
600000
600000 28800
20.83
20.83 2
41.67
You expect 41.67 messages per second delivered on each Mailbox server running 5,000 active
mailboxes with a message profile of 150 messages per day during peak load.
Measuring Actual Message Delivery Rate
The actual message delivery rate can be measured using the following counter on each Mailbox
server: MSExchangeIS Mailbox(_Total)\Messages Delivered/sec. If the measured message
delivery rate is within one or two messages per second of the target message delivery rate, you
can be confident that the desired load profile was run successfully.
72
Hyper-V has three main components: the virtualization stack, the hypervisor, and devices. The
virtualization stack handles emulated devices, manages VMs, and services I/O. The hypervisor
schedules virtual processors, manages interrupts, services timers, and controls other chip-level
functions. The hypervisor doesn't handle devices or I/O (for example, there are no hypervisor
drivers). The devices are part of the root server or installed in guest servers as part of integration
services. Because the root server has a full view of the system and controls the VMs, it also
provides monitoring information via Windows Management Instrumentation (WMI) and
performance counters.
Processor
When validating physical processor utilization on the root server (or within the guest VM), the
standard Processor\% Processor Time counter isn't very useful.
Instead, you can examine the Hyper-V Hypervisor Logical Processor\% Total Run Time counter.
This counter shows the percentage of processor time spent in guest and hypervisor runtime and
should be used to measure the total processor utilization for the hypervisor and all VMs running
on the root server. This counter shouldn't exceed 80 percent or whatever the maximum utilization
target you have designed for.
Counter
Target
<80%
If you're interested in what percentage of processor time is spent servicing the guest VMs, you
can examine the Hyper-V Hypervisor Logical Processor\% Guest Run Time counter. If you're
interested in what percentage of processor time is spent in hypervisor, you can look at the HyperV Hypervisor Logical Processor\% Hypervisor Run Time counter. This counter should be below
5 percent. The Hyper-V Hypervisor Root Virtual Processor\% Guest Run Time counter shows the
percentage of processor time spent in the virtualization stack. This counter should also be below
5 percent. These two counters can be used to determine what percentage of your available
physical processor time is being used to support virtualization.
Counter
Target
<80%
<5%
<5%
Memory
73
You need to ensure that your Hyper-V root server has enough memory to support the memory
allocated to VMs. Hyper-V automatically reserves 512 MB (this may vary with different Hyper-V
releases) for the root operating system. If you don't have enough memory, Hyper-V will prevent
the last VM from starting. In general, don't worry about validating the memory on a Hyper-V root
server. Be more concerned with ensuring that sufficient memory is allocated to the VMs to
support the Exchange roles.
Application Health
An easy way to determine whether all the VMs are in a healthy state is to look at the Hyper-V
Virtual Machine Health Summary counters.
Counter
Target
Mailbox Servers
When validating whether a Mailbox server was properly sized, focus on processor, memory,
storage, and Exchange application health. This section describes the approach to validating each
of these components.
Processor
During the design process, you calculated the adjusted megacycle capacity of the server or
processor platform. You then determined the maximum number of active mailboxes that could be
supported by the server without exceeding 80 percent of the available megacycle capacity. You
also determined what the projected CPU utilization should be during normal operating conditions
and during various server maintenance or failure scenarios.
During the validation process, verify that the worst case scenario workload doesn't exceed
80 percent of the available megacycles. Also, verify that actual CPU utilization is close to the
expected CPU utilization during normal operating conditions and during various server
maintenance or failure scenarios.
For physical Exchange deployments, use the Processor(_Total)\% Processor Time counter and
verify that this counter is less than 80 percent on average.
Counter
Target
<80%
For virtual Exchange deployments, the Processor(_Total)\% Processor Time counter is measured
within the VM. In this case, the counter isn't measuring the physical CPU utilization. It's
measuring the utilization of the virtual CPU provided by the hypervisor. Therefore, it doesn't
provide an accurate reading of the physical processor and shouldn't be used for design validation
74
purposes. For more information, see Hyper-V: Clocks lie... which performance counters can you
trust.
For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor
Virtual Processor\% Guest Run Time counter. This provides a more accurate value for the
amount of physical CPU being utilized by the guest operating system. This counter should be less
than 80 percent on average.
Counter
Target
<80%
Memory
During the design process, you calculated the amount of database cache required to support the
maximum number of active databases on each Mailbox server. You then determined the optimal
physical memory configuration to support the database cache and system memory requirements.
Validating whether an Exchange Mailbox server has sufficient memory to support the target
workload isn't a simple task. Using available memory counters to view how much physical
memory is remaining isn't helpful because the memory manager in Exchange is designed to use
almost all of the available physical memory. The information store (store.exe) reserves a large
portion of physical memory for database cache. The database cache is used to store database
pages in memory. When a page is accessed in memory, the information doesn't have to be
retrieved from disk, reducing read I/O. The database cache is also used to optimize write I/O.
When a database page is modified (known as a dirty page), the page stays in cache for a period
of time. The longer it stays in cache, the better the chance that the page will be modified multiple
times before those changes are written to the disk. Keeping dirty pages in cache also causes
multiple pages to be written to the disk in the same operation (known as write coalescing).
Exchange uses as much of the available memory in the system as possible, which is why there
aren't large amounts of available memory on an Exchange Mailbox server.
It may not be easy to know whether the memory configuration on your Exchange Mailbox server
is undersized. For the most part, the Mailbox server will still function, but your I/O profile may be
much higher than expected. Higher I/O can lead to higher disk read and write latencies, which
may impact application health and client user experience. In the results section, there isn't any
reference to memory counters. Potential memory issues will be identified in the storage validation
and application health result sections, where memory-related issues are more easily detected.
Storage
If you have performance issues with your Exchange Mailbox server, those issues may be
storage-related issues. Storage issues may be caused by having an insufficient number of disks
to support the target I/O requirements, having overloaded or poorly designed storage connectivity
infrastructure, or by factors that change the target I/O profile like insufficient memory, as
discussed previously.
75
The first step in storage validation is to verify that the database latencies are below the target
thresholds. In previous releases, logical disk counters determined disk read and write latency. In
Exchange 2010, the Exchange Mailbox server that you are monitoring is likely to have a mix of
active and passive mailbox database copies. The I/O characteristics of active and passive
database copies are different. Because the size of the I/O is much larger on passive copies, there
are typically much higher latencies on passive copies. Latency targets for passive databases are
200 msec, which is 10 times higher than targets on active database copies. This isn't much of a
concern because high latencies on passive databases have no impact on client experience. But if
you are using the traditional logical disk counters to measure latencies, you must review the
individual volumes and separate volumes containing active and passive databases. Instead, we
recommend that you use the new MSExchange Database counters in Exchange 2010.
When validating latencies on Exchange 2010 Mailbox servers, we recommend you use the
counters in the following table for active databases.
Counter
Target
<20 msec
<20 msec
<1 msec
We recommend that you use the counters in the following table for passive databases
Counter
Target
<200 msec
<200 msec
<200 msec
Note:
To view these counters in Performance Monitor, you must enable the advanced database
counters. For more information, see How to Enable Extended ESE Performance
Counters.
When you're validating disk latencies for Exchange deployments running on Microsoft Hyper-V,
be aware that the I/O Database Average Latency counters (as with many time-based counters)
may not be accurate because the concept of time within the VM is different than on the physical
76
server. The following example shows that the I/O Database Reads (Attached) Average Latency is
22.8 in the VM and 17.3 on a physical server for the same simulated workload. If the values of
time-based counters are over the target thresholds, your server may be running correctly. Review
all health criteria to make a decision regarding server health when your Mailbox server role is
deployed within a VM.
Values of disk latency counters for virtual and physical Mailbox servers
Counter
22.792
17.250
17.693
18.131
27.758
MSExchange Database/
10.829
8.483
0.944
0.411
10.184
10.963
1.966
1.695
334.371
341.139
180.656
183.360
2.062
2.065
0.511
0.514
MSExchangeIS
RPC Averaged Latency
MSExchangeIS Mailbox
In addition to disk latencies, review the Database\Database Page Fault Stalls/sec counter. This
counter indicates the rate of page faults that can't be serviced because there are no pages
available for allocation from the database cache. This counter should be 0 on a healthy server.
Counter
Target
<1
77
Also, review the Database\Log Record Stalls/sec counter, which indicates the number of log
records that can't be added to the log buffers per second because the log buffers are full. This
counter should average less than 10.
Counter
Target
<10
Target
MSExchangeIS\RPC Requests
Next, make sure that the transport layer is healthy. Any issues in transport or issues downstream
of transport affecting the transport layer can be detected with the MSExchangeIS
Mailbox(_Total)\Messages Queued for Submission counter. This counter should be less than 50
at all times. There may be temporary increases in this counter, but the counter value shouldn't
grow over time and shouldn't be sustained for more than 15 minutes.
Counter
Target
MSExchangeIS Mailbox(_Total)\Messages
Queued for Submission
Next, ensure that maintenance of the database copies is in a healthy state. Any issues with log
shipping or log replay can be identified using the MSExchange Replication(*)\CopyQueueLength
and MSExchange Replication(*)\ReplayQueueLength counters. The copy queue length shows
the number of transaction log files waiting to be copied to the passive copy log file folder and
should be less than 1 at all times. The replay queue length shows the number of transaction log
files waiting to be replayed into the passive copy and should be less than 5. Higher values don't
impact client experience, but result in longer store mount times when a handoff, failover, or
activation is performed.
78
Counter
Target
MSExchange Replication(*)\CopyQueueLength
<1
MSExchange
Replication(*)\ReplayQueueLength
<5
Target
<80%
For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor
Virtual Processor\% Guest Run Time counter. This provides an accurate value for the amount of
physical CPU being utilized by the guest operating system. This counter should be less than
80 percent on average.
Counter
Target
<80%
Application Health
To determine whether the MAPI client experience is acceptable, use the MSExchange
RpcClientAccess\RPC Averaged Latency counter. This counter should be below 250 msec. High
latencies can be associated with a large number of RPC requests. The MSExchange
RpcClientAccess\RPC Requests counter should be below 40 on average.
Counter
Target
<250 msec
<40
79
Transport Servers
To determine whether a transport server is healthy, review processor, disk, and application
health. For an extended list of important counters, see Transport Server Counters.
Processor
For physical Exchange deployments, use the Processor(_Total)\% Processor Time counter. This
counter should be less than 80 percent on average.
Counter
Target
<80%
For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor
Virtual Processor\% Guest Run Time counter. This provides an accurate value for the amount of
physical CPU being utilized by the guest operating system. This counter should be less than
80 percent on average.
Counter
Target
<80%
Disk
To determine whether disk performance is acceptable, use the Logical Disk(*)\Avg. Disk
sec/Read and Write counters for the volumes containing the transport logs and database. Both of
these counters should be less than 20 msec.
Counter
Target
<20 msec
<20 msec
Application Health
To determine whether a Hub Transport server is sized properly and running in a healthy state,
examine the MSExchangeTransport Queues counters outlined in the following table. All of these
queues will have messages at various times. You want to ensure that the queue length isn't
sustained and growing over a period of time. If larger queue lengths occur, this could indicate an
overloaded Hub Transport server. Or, there may be network issues or an overloaded Mailbox
server that's unable to receive new messages. You will need to check other components of the
Exchange environment to verify.
Counter
Target
MSExchangeTransport
<3000
80
Counter
Target
Queues(_total)\Aggregate Delivery
MSExchangeTransport Queues(_total)\Active
Remote Delivery Queue Length
<250
MSExchangeTransport Queues(_total)\Active
Mailbox Delivery Queue Length
<250
MSExchangeTransport Queues(_total)\Retry
Mailbox Delivery Queue Length
<100
MSExchangeTransport
Queues(_total)\Submission Queue Length
<100
Return to top
To validate that all passive copies of databases on a server can be successfully activated on
other servers hosting a passive copy, run the following command.
81
To validate that one copy of each of the active databases will be successfully activated on
another Mailbox server hosting passive copies of the databases, shut down the server by
performing the following action.
Turn off the current active server.
Success criteria: The active mailbox databases are mounted on another Mailbox server in the
DAG. This can be confirmed by running the following command.
Get-MailboxDatabaseCopyStatus <DatabaseName>
Press and hold the power button on the server until the server turns off.
Pull the power cables from the server, which results in the server turning off.
Success criteria: The active mailbox databases are mounted on another Mailbox server in the
DAG. This can be confirmed by running the following command.
Get-MailboxDatabase -Server <MailboxServer> | GetMailboxDatabaseCopyStatus
Return to top
82
If the Mailbox servers in the failed datacenter are still accessible (usually not the case), run
the following command on each Mailbox server.
Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <insertsitename>
If the Mailbox servers in the failed datacenter are unavailable but Active Directory is operating
in the primary datacenter, run the following command on a domain controller.
Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <insertsitename>
-ConfigurationOnly
Note:
Failure to either turn off the Mailbox servers in the failed datacenter or to successfully
perform the Stop-DatabaseAvailabilityGroup command against the servers will create
the potential for split brain syndrome to occur across the two datacenters. You may need
to individually turn off computers through power management devices to satisfy this
requirement.
Success criteria: All Mailbox servers in the failed site are in a stopped state. You can verify this by
running the following command from a server in the failed datacenter.
Get-DatabaseAvailabilityGroup | Format-List
83
Clients will continue to try to connect, and should automatically connect after Time to Live
(TTL) has expired for the original DNS entry, and after the entry is expired from the client's
DNS cache. Users can also run the ipconfig /flushdns command from a command prompt
to manually clear their DNS cache. If using Outlook Web App, the Web browser may need to
be closed and restarted to clear the DNS cache used by the browser. In Exchange 2010 SP1,
84
this browser caching issue can be mitigated by configuring the FailbackURL parameter on
the Outlook Web App virtual directory owa.
Clients starting or restarting will perform a DNS lookup on startup and will get the new IP
address for the service endpoint, which will be a Client Access server or array in the second
datacenter.
85
After the Mailbox servers in the primary datacenter have been incorporated into the DAG, they
will need some time to synchronize their database copies. Depending on the nature of the failure,
the length of the outage, and actions taken by an administrator during the outage, this may
require reseeding the database copies. For example, if during the outage, you remove the
database copies from the failed primary datacenter to allow log file truncation to occur for the
surviving active copies in the secondary datacenter, reseeding will be required. At this time, each
database can be synchronized individually. After a replicated database copy in the primary
datacenter is healthy, you can proceed to the next step.
1. During the datacenter switchover process, the DAG was configured to use an alternate
witness server. To reconfigure the DAG to use a witness server in the primary datacenter, run
the following command.
Set-DatabaseAvailabilityGroup -Identity <DAGName> -WitnessServer
<PrimaryDatacenterWitnessServer>
2. The databases being reactivated in the primary datacenter should now be dismounted in the
secondary datacenter. Run the following command.
Get-MailboxDatabase | Dismount-Database
3. After the databases have been dismounted, the Client Access server URLs should be moved
from the secondary datacenter to the primary datacenter. To do this, change the DNS record
for the URLs to point to the Client Access server or array in the primary datacenter.
Important:
Don't proceed to the next step until the Client Access server URLs have been moved
and the DNS TTL and cache entries have expired. Activating the databases in the
primary datacenter prior to moving the Client Access server URLs to the primary
datacenter will result in an invalid configuration (for example, a mounted database
that has no Client Access servers in its Active Directory site).
4. To activate the databases, run one of the following commands.
Get-MailboxDatabase <insertcriteriatoselectDBs> | MoveActiveMailboxDatabase -ActivateOnServer <DAGMemberinSecondSite>
or
Move-ActiveMailboxDatabase -Server <DAGMemberinPrimarySite> ActivateOnServer <DAGMemberinSecondSite>
5. To mount the databases, run the following command.
Get-MailboxDatabase <insertcriteriatoselectDBs> | Mount-Database
Success criteria: The active mailbox databases are successfully mounted on Mailbox servers in
the primary site. To confirm, run the following command.
Get-MailboxDatabaseCopyStatus <DatabaseName>
Return to top
86
Pass
Overall throughput
Overall throughput
Result
383
540
Instance1
42.2
18.9
Instance2
42.7
17.9
Instance3
42.9
17.4
Instance4
42.0
17.9
Instance5
42.0
18.0
Instance6
41.8
17.0
Instance7
42.8
17.7
Instance8
42.6
17.4
Instance1
25.9
25.9
Instance2
26.4
25.1
Instance3
26.4
21.7
Instance4
26.1
22.6
Instance5
25.9
23.8
87
Database
Instance6
25.5
19.8
Instance7
26.3
21.2
Instance8
26.5
18.5
Instance1
23.8
3.8
Instance2
23.7
3.7
Instance3
24.0
3.3
Instance4
23.5
3.8
Instance5
23.7
3.8
Instance6
23.7
3.5
Instance7
23.7
3.7
Instance8
24.3
3.3
Return to top
88
Counter
Target
Tested result
8.63
Target
Tested result
<70%
54
Storage
The storage results look good. The average read latency for the active databases is 19.3 when
measured in the VM and 15.9 when measured on the EqualLogic storage array. As discussed in
"Server Validation: Performance and Health Criteria" earlier in this white paper, time-based
counters measured in a VM may not be accurate because the VM has a different concept of time
than the physical server. The difference between these counters is likely the result of a
combination of iSCSI network latency (generally <1 msec) and inaccurate counter values in the
VM.
Counter
Target
Tested result
MSExchange Database\I/O
Database Reads (Attached)
Average Latency
<20 msec
19.3
<20 msec
15.9
MSExchange Database\I/O
Database Writes (Attached)
Average Latency
<20 msec
6.8
<20 msec
2.5
<20 msec
5.2
MSExchange Database\I/O
<200 msec
23.7
<Reads average
89
Counter
Target
Tested result
MSExchange Database\I/O
Database Writes (Recovery)
Average Latency
<200 msec
7.6
<20 0msec
7.5
Application Health
Exchange is healthy, and all of the counters used to determine application health are well under
target values.
Counter
Target
Tested result
MSExchangeIS\RPC Requests
<70
2.7
<10 msec
2.4
MSExchangeIS
Mailbox(_Total)\Messages Queued for
Submission
1.5
MSExchange
Replication(*)\CopyQueueLength
<1
0.1
MSExchange
Replication(*)\ReplayQueueLength
<5
2.1
Target
Tested result
<70%
19
Storage
The storage results look good. The very low latencies should have no impact on message
transport.
90
Counter
Target
Tested result
Logical/Physical Disk(*)\Avg.
Disk sec/Read
<20 msec
0.012
Logical/Physical Disk(*)\Avg.
Disk sec/Write
<20 msec
0.012
Application Health
The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on
client experience.
Counter
Target
Tested result
MSExchange
RpcClientAccess\RPC Averaged
Latency
<250 msec
MSExchange
RpcClientAccess\RPC Requests
<40
Target
Tested result
\MSExchangeTransport
Queues(_total)\Aggregate Delivery
Queue Length (All Queues)
<3000
1.5
\MSExchangeTransport
Queues(_total)\Active Remote
Delivery Queue Length
<250
\MSExchangeTransport
Queues(_total)\Active Mailbox
Delivery Queue Length
<250
1.1
\MSExchangeTransport
Queues(_total)\Submission Queue
Length
<100
\MSExchangeTransport
Queues(_total)\Retry Mailbox
Delivery Queue Length
<100
0.4
91
Target
Tested result
<75%
42
<5%
<80%
44
<5%
Application Health
The VM health summary counters indicate that all VMs are in a healthy state.
Counter
Target
Tested result
Test Case: Single Server Failure or Single Server Maintenance (In Site)
Validation of Expected Load
The message delivery rate verifies that tested workload matched the target workload. The actual
message delivery rate is slightly higher than target.
Counter
Target
Tested result
17.08
17.3
92
Counter
Target
Tested result
<70%
69
Storage
In this test case, the average read latency for the active databases is 26.2 when measured in the
VM and 16.2 when measured on the EqualLogic storage array. As discussed in "Server
Validation: Performance and Health Criteria" earlier in this white paper, time-based counters
measured in a VM may not be accurate because the VM has a different concept of time than the
physical server. The difference between these counters is likely the result of a combination of
iSCSI network latency (generally <1 msec) and inaccurate counter values in the VM. Because the
read latency measured on the EqualLogic array is less than 20, there's no concern about the
counter measured in the VM being over target.
Counter
Target
Tested result
MSExchange Database\I/O
Database Reads (Attached)
Average Latency
<20 msec
26.2
<20 msec
16.2
MSExchange Database\I/O
Database Writes (Attached)
Average Latency
<20 msec
7.4
<20 msec
2.1
<20 msec
5.2
MSExchange Database\I/O
Database Reads (Recovery)
Average Latency
<200 msec
Not applicable
MSExchange Database\I/O
Database Writes (Recovery)
Average Latency
<200 msec
Not applicable
<200 msec
Not applicable
<Reads average
93
Counter
Target
Tested result
Target
Tested result
MSExchangeIS\RPC Requests
<70
8.0
<10 msec
3.7
MSExchangeIS
Mailbox(_Total)\Messages Queued for
Submission
3.3
MSExchange
Replication(*)\CopyQueueLength
<1
Not applicable
MSExchange
Replication(*)\ReplayQueueLength
<5
Not applicable
Target
Tested result
<70%
26.3
Storage
The storage results look good. The very low latencies should have no impact on message
transport.
Counter
Target
Tested result
Logical/Physical Disk(*)\Avg.
Disk sec/Read
<20 msec
0.0041
Logical/Physical Disk(*)\Avg.
Disk sec/Write
<20 msec
0.0005
Application Health
94
The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on
client experience.
Counter
Target
Tested result
MSExchange
RpcClientAccess\RPC Averaged
Latency
<250 msec
13.2
MSExchange
RpcClientAccess\RPC Requests
<40
6.1
The Transport Queue counters are all well under target, confirming that the Hub Transport server
is healthy and able to process and deliver the required messages.
Counter
Target
Tested result
\MSExchangeTransport
Queues(_total)\Aggregate Delivery
Queue Length (All Queues)
<3000
4.7
\MSExchangeTransport
Queues(_total)\Active Remote
Delivery Queue Length
<250
\MSExchangeTransport
Queues(_total)\Active Mailbox
Delivery Queue Length
<250
3.6
\MSExchangeTransport
Queues(_total)\Submission Queue
Length
<100
\MSExchangeTransport
Queues(_total)\Retry Mailbox
Delivery Queue Length
<100
1.1
Target
Tested result
<75%
49.9
95
Counter
Target
Tested result
<5%
1.3
<80%
51.2
<5%
3.6
Application Health
The VM health summary counters indicate that all VMs are in a healthy state.
Counter
Target
Tested result
Target
Tested result
17.08
17.4
Target
Tested result
<70%
64%
Storage
96
In this test case, the average read latency for the active databases is 24.0 when measured in the
VM and 15.9 when measured on the EqualLogic storage array. As discussed in "Server
Validation: Performance and Health Criteria" earlier in this white paper, time-based counters
measured in a VM may not be accurate because the VM has a different concept of time than the
physical server. The difference between these counters is likely the result of a combination of
iSCSI network latency (generally <1 msec) and inaccurate counter values in the VM. Because the
read latency measured on the EqualLogic array is less than 20, there's no concern about the
counter measured in the VM being over target.
Counter
Target
Tested result
MSExchange Database\I/O
Database Reads (Attached)
Average Latency
<20 msec
24.0
<20 msec
15.9
MSExchange Database\I/O
Database Writes (Attached)
Average Latency
<20 msec
7.2
<20 msec
2.0
<20 msec
5.0
MSExchange Database\I/O
Database Reads (Recovery)
Average Latency
<200 msec
Not applicable
MSExchange Database\I/O
Database Writes (Recovery)
Average Latency
<200 msec
Not applicable
<200 msec
Not applicable
<Reads average
Application Health
Exchange is healthy, and all of the counters used to determine application health are well under
target values.
97
Counter
Target
Tested result
MSExchangeIS\RPC Requests
<70
7.8
<10 msec
3.5
MSExchangeIS
Mailbox(_Total)\Messages Queued for
Submission
3.0
MSExchange
Replication(*)\CopyQueueLength
<1
Not applicable
MSExchange
Replication(*)\ReplayQueueLength
<5
Not applicable
Target
Tested result
<70%
25
Storage
The storage results look good. The very low latencies should have no impact on message
transport.
Counter
Target
Tested result
Logical/Physical Disk(*)\Avg.
Disk sec/Read
<20 msec
0.003
Logical/Physical Disk(*)\Avg.
Disk sec/Write
<20 msec
0.001
Application Health
The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on
client experience.
Counter
Target
Tested result
MSExchange
RpcClientAccess\RPC Averaged
Latency
<250 msec
13.0
98
Counter
Target
Tested result
MSExchange
RpcClientAccess\RPC Requests
<40
5.9
The Transport Queue counters are all well under target, confirming that the Hub Transport server
is healthy and able to process and deliver the required messages.
Counter
Target
Tested result
\MSExchangeTransport
Queues(_total)\Aggregate Delivery
Queue Length (All Queues)
<3000
4.2
\MSExchangeTransport
Queues(_total)\Active Remote
Delivery Queue Length
<250
\MSExchangeTransport
Queues(_total)\Active Mailbox
Delivery Queue Length
<250
3.4
\MSExchangeTransport
Queues(_total)\Submission Queue
Length
<100
\MSExchangeTransport
Queues(_total)\Retry Mailbox
Delivery Queue Length
<100
0.6
Target
Tested result
<75%
47.5
<5%
1.2
<80%
48.7
99
Counter
Target
Tested result
<5%
3.5
Time
Hyper-V Hypervisor Root Virtual
Processor(_total)\% Guest Run
Time
Application Health
The VM health summary counters indicate that all VMs are in a healthy state.
Counter
Target
Tested result
Return to top
Conclusion
This white paper provides an example of how to design, test, and validate an Exchange 2010
solution for customer environments with 9,000 mailboxes deployed on Dell server and storage
solutions. The step-by-step methodology in this document walks through the important design
decision points that help address key challenges while ensuring that core business requirements
are met.
Return to top
For the complete Exchange 2010 documentation, see Exchange Server 2010.
For more information from Dell, see the following resources:
PowerEdge Servers
Dell Power Solutions Article: Optimizing Microsoft Exchange Server 2010 Deployments on
Dell Servers and Storage
Con
100
101