Running a HA cluster via DRBD and Zimbra. The two servers are identical, trying to get ZCS7.1.1 running on CentOS 5.6. Followed a number of How-To's to get this guy up and running.

The plot thickens, multiple domains configured on the server, Split-DNS deployed.

The plot thickens...again. HTTPD running on port 80, Zimbra is proxied from port 80 to port 81 via virtual hosting and described in the Wiki article pertaining to Zimbra and Apache.

Therefore, Zimbra and HTTPD are the services needed to start during a failover. Syncs and all that are done. When I manually kill node1 to test the resource takeover on node2, it errors.

The log of the error:
Code:
ResourceManager[4212]:  2011/08/03_13:27:23 info: Running /etc/init.d/zimbra  start
ResourceManager[4212]:  2011/08/03_13:27:23 ERROR: Return code 127 from /etc/init.d/zimbra
ResourceManager[4212]:  2011/08/03_13:27:23 CRIT: Giving up resources due to failure of zimbra
ResourceManager[4212]:  2011/08/03_13:27:23 info: Releasing resource group: zimbra-1 IPaddr::192.168.168.10/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra httpd
ResourceManager[4212]:  2011/08/03_13:27:23 info: Running /etc/init.d/httpd  stop
ResourceManager[4212]:  2011/08/03_13:27:23 info: Running /etc/init.d/zimbra  stop
ResourceManager[4212]:  2011/08/03_13:27:23 ERROR: Return code 127 from /etc/init.d/zimbra
ResourceManager[4212]:  2011/08/03_13:27:24 info: Retrying failed stop operation [zimbra]
ResourceManager[4212]:  2011/08/03_13:27:24 info: Running /etc/init.d/zimbra  stop
For testing purposes, I removed httpd from my /etc/ha.d/haresources file and tried again, same thing.

Now, when I installed Zimbra on node2, I did a software only install since /opt/zimbra is already taken care of by the node1 install, that's the working theory anyway.

My DRBD.conf file (confirmed to be the same on both servers, different interface for the sync part):
Code:
#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd82/drbd.conf
#

common { syncer { rate 100M; al-extents 257; } }

resource r0 {
        protocol C;
        handlers { pri-on-incon-degr "halt -f"; }
        disk { on-io-error detach; }
        net {  cram-hmac-alg "sha1"; shared-secret "pass"; }
        startup { degr-wfc-timeout 15; wfc-timeout 20; }

        on zimbra-1 {
        address 172.16.0.1:7789;
        device /dev/drbd0;
        disk /dev/sda6;
        meta-disk internal;
        }

        on zimbra-2 {
        address 172.16.0.2:7789;
        device /dev/drbd0;
        disk /dev/sda6;
        meta-disk internal;
        }
}
My /etc/ha.d/haresources file (again, confirmed to be the same on both):
Code:
zimbra-1 IPaddr::192.168.168.10/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra httpd
My hosts file:
Code:
127.0.0.1               localhost.localdomain localhost
192.168.168.10          zimbra.domain1.com zimbra
192.168.168.10          mail.domain1.com mail
192.168.168.10          mail.domain2.com mail
192.168.168.10          mail.domain3.com mail
192.168.168.10          mail.domain4.com mail
192.168.168.11          zimbra-1
192.168.168.12          zimbra-2
I cannot figure out what the error code 127 is.

I'm wondering if it has something to do with the fact that node1 zimbra is listening on 81 and maybe node2 tries to listen on 80?