Howto HA Zimbra8
Howto HA Zimbra8
This document is a compilation of information found on fora, blogs, official docs, and the fruits of my
experiments, so I don’t claim any rights: you’re free to reproduce / modify / use it!
In this documentation, the HA will be made between two VM named fangorn (preferred master) and
sylvebarbe (slave).
Two different storage technics were tried: DRBD, and virtual bus sharing by VMware on a SAN.
Prior to the operations this document gives, the following packages were installed on the VM:
- Pacemaker (http://packages.ubuntu.com/precise/pacemaker), our cluster manager.
- Corosync (http://packages.ubuntu.com/precise/corosync), our cluster engine.
- Openssh-server (http://packages.ubuntu.com/precise/openssh-server).
- Likewise-open (http://packages.ubuntu.com/precise/likewise-open).
- Sqlite3 (http://packages.ubuntu.com/precise/sqlite3).
- Sysstat (http://packages.ubuntu.com/precise/sysstat).
- Libgmp3c2 (http://packages.ubuntu.com/precise/libgmp3c2).
- Drbd8-utils (http://packages.ubuntu.com/precise/drbd8-utils).
- Zimbra Collaboration Suite 8 Open Source Edition (http://www.zimbra.com/downloads/os-
downloads.html).
Finally, I tried to keep it clear and simple, but my English may be not very good, so I apologize for any
misunderstanding you might meet with.
Taer.
Table of contents
include "drbd.d/global_common.conf";
include "drbd.d/*.res";
resource zimbra {
syncer {
rate 30M; #Increases the sync speed (should be 30% of the
max speed of the chaîne "I/O + network") /!\This is the DRBD 8.3 Syntax,
please check your version
}
meta-disk internal;
on sylvebarbe {
device /dev/drbd0; #Your DRBD device
disk /dev/sda2; #Your “real” partition
address 192.168.6.11:7788; #Your VM IP and port for DRBD
communication for this resource
}
on fangorn {
device /dev/drbd0;
disk /dev/sda2;
address 192.168.6.10:7788;
}
}
If an error appears, the simplest is to recreate the partition (/!\ Beware : it should not be formatted) :
fdisk /dev/sda2
With the command “/etc/init.d/drbd status” you can visualize the progression of the
synchronization.
mkfs.ext4 /dev/drbd0
If one of the disks is no more available, DRBD keeps a local copy of the modifications. So, when the
slave reconnects, the master sends to it all the missing updates. Therefore, the slave could continue
data writing as if nothing has happened.
If Zimbra is already installed, errors could happened (port conflicts, LDAP errors,…), in this case,
uninstall Zimbra before by following the official doc
(http://wiki.zimbra.com/wiki/UnInstalling_Zimbra_on_Linux).
Make the install that correspond to your needs, but be sure in the config to change the following
parameter:
1 > 1 > localhost (or whatever you want as long as it is the same on the two machines and appears in
each /etc/hosts)
We will configure Corosync for the VM to watch each other, and Pacemaker for the active VM to take
the virtual IP 192.168.100.40, and mount all the partitions needed for Zimbra.
1. Corosync
Edit the config file of Corosync : /etc/corosync/corosync.conf. This file should be the same on all the
nodes (all the VM of the cluster). The idea is to increase some timers to have less risks of having a
false out of order node case:
• token: 5000
• token_retransmits_before_loss_const: 20
• join: 1000
• consensus: 7500
• max_messages: 20
• secauth: on #on = using an auth key to allow a node to connect to the cluster
• Your network address in bindnetaddr. Example: your eth0 got the address 192.168.6.11 with
a netmask of 24, put 192.168.6.0.
• You multicast address and port (or you can keep the default one).
• To add a second interface, add another “interface” bloc and increase the “ringnumber”, put
the address of your second network and a new multicast. You will have to change
“rrp_mode” to active: your two interfaces will work simultaneously. If you change
“rrp_mode” to passive, the second interface will be activated only if the first is broken. With
“rrp_mode” to none, only the first interface will ever be used.
Example:
rrp_mode: none
interface {
ringnumber: 0
bindnetaddr: 192.168.6.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
Authentication
When the package was installed, a key was generated to authenticate your node to the cluster. You
can recreate it:
# corosync-keygen
Then, you have to send the key on the different nodes of the cluster:
The configuration of the cluster is done. Activate the starting of corosync at boot by modifying the
file /etc/default/corosync. You can start it manually by: /etc/init.d/corosync start
2. Pacemaker
You now have to add the Pacemaker service in your cluster. Add the following bloc in your Corosync
config file if it isn’t already there:
service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
}
============
Last updated: Fri Oct 4 11:29:26 2012
Stack: openais
Current DC: sylvebarbe - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
0 Resources configured.
============
With that command you will be able to see the state of your cluster (nodes state, resources state…).
Now, we will connect to the Pacemaker CLI (on only one of the nodes) with the command crm:
root@fangorn:~# crm
crm(live)# help
Available commands:
crm(live)# configure
crm(live)configure# show
node fangorn
node sylvebarbe
To sum up:
crm(live)configure# show
node fangorn
node sylvebarbe
property $id="cib-bootstrap-options" \
stonith-enabled="false" \
no-quorum-policy="ignore"
3. Virtual IP
The VIP is the service address of your cluster; this means it is the address you will use to access to
your service in HA. This IP will be added as a secondary address of your network interface
Always in crm configure, inject the following configuration and validate with commit:
============
Last updated: Mon Oct 4 13:34:42 2012
Stack: openais
Current DC: sylvebarbe - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
1 Resources configured.
============
4. DRBD
To configure this service, always in crm configure, inject the following config, and then validate
with commit:
Role parameters:
ms ms-zimbra zimbra \
meta clone-max="2" notify="true" globally-unique="false" target-
role="Started"
Creation of a group containing the VIP resource, and the FS resource, this group is denied in
colocation on ms-zimbra (our main resource).
============
# /etc/init.d/drbd status
When I have done this configuration, I had some resources errors, in this case make sure you forgot
nothing (like mounting directory) and reboot your machine.
The “pengine” will prevent a resource known to make errors to start again, that’s why rebooting is
important to reset this behavior.
Schematization:
Since we will use a SAN, this time we will mount a file system controlled without DRBD.
This one will serve to stock emails and index of the users’ mailbox, and will be mounted in
“/mail/data” on our servers.
To do so:
• Connect to VMCENTER.
• Open properties of a VM.
• Add a virtual disk which takes in charge clustering functions.
• Put this disk on a new SCSI controller with the sharing bus mode on “Virtual”.
• Connect to VMCENTER.
• Open properties of the VM.
• Add our previous virtual disk.
• Put this disk on a new SCSI controller with the sharing bus mode on “Virtual”.
Note : a virtual bus sharing permits to link several VM to the same disk, but forbids the use of
snapshots.
After a restart of our VM, this new disk is ready, let’s format it!
lshw –C disk
Creation of a partition:
fdisk /dev/sdb
mkfs.ext4 /dev/sdb1
mkdir -p /mail/data
chown zimbra:zimbra /mail/data
chmod 775 /mail/data
su zimbra
mkdir /mail/data/index
mkdir /mail/data/store
Index will contain the index files (by default /opt/zimbra/index), and store the mail data (by default
/opt/zimbra/store).
Therefore, edit with the “edit” command the config file to add our new resource to the group,
before validating with “commit”:
The file system is now automatically mounted on the master of the cluster.
Adrián Gibanel from bTactic (http://www.btactic.com) realized an OCF script doing the automatic
launch of Zimbra, this one was designed for Ubuntu 10.04 + Zimbra 7
(http://www.btactic.org/2012/08/26/zimbra-7-ose-ha-alta-disponibilidad-pacemaker-corosync-drbd-
ubuntu-10-04/), but seems to work fine with Ubuntu 12.04 + Zimbra 8.
mkdir /usr/lib/ocf/resource.d/btactic
chmod +x /usr/lib/ocf/resource.d/btactic/*
On one of the VMs enter crm configure, and inject the config:
Then, edit with the edit command the config file to add our new resource to the group, before
validating with commit:
============
Last updated: Wed Oct 24 15:36:27 2012
Last change: Wed Oct 24 13:54:08 2012 via cibadmin on fangorn
Stack: openais
Current DC: sylvebarbe - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
6 Resources configured.
============
Zimbra is now started automatically on the master of the cluster. But Zimbra services could take
some time to start: between 1min30 and 2mins, so don’t panic if ZMServer says “Stopped” for a
while.
D. Tests
For my tests I used a script adding 100 mailbox, and sending a mail to the admin once each mailbox is
created. That will allow testing the two methods (DRBD with users’ creation and virtual bus sharing
with mail sending).
All tests were made twice to confirm the results.
Fangorn still considers itself as the master, and for him, Sylvebarbe is lost.
Mails and Index keeps the previous state on Fangorn (all modifications made by Sylvebarbe are lost),
plus all the mails that have been queued on Fangorn. And this, even if Fangorn was shut down after
networking loss.
Note: if I restart Fangorn after network loss, the Zimbra resource owned by the bTactic script take the
state “Stopped” and after a few seconds “Started Sylvebarbe” again. This is quite strange since the
only modification in crm status is the appearance of Fangorn as slave.
All the modifications to the database made by the script on Fangorn are on its side of /dev/drbd0, so
a resync of DRBD is enough to keep or discard the database modification, depending on which you
choose as the primary and secondary node.
Fangorn is automatically promoted Master after some time.
When Fangorn is starting up again, DRBD is automatically re-synching, so the Sylvebarbe’s version is
kept.
DRBD detects split brain when the connection between the nodes is up again, and do their DRBD
protocol handshake. If DRBD detects that the two nodes are (or were at disconnection) in primary
role, the replication is cut. The following message appears in system log:
After split brain detection, one node will always be in StandAlone mode. The other must be too in
StandAlone mode (if the two nodes detected simultaneously the split brain), or in WFConnection (if
connection was down before the other node had a chance to detect split brain).
At this moment, if DRBD does not recovery by itself, a manual intervention is needed on one node
which the modifications will be lost (split brain victim):
When reconnected, the split brain victim changes to SyncTarget mode, and its data are replaced by
the one of the split brain survivor.
After re-synching, split brain is resolved and the two nodes are UpToDate.