Hi,

Having a cold standby server in a different location would be nice. But at this point you can't use clustering anymore. So I spent some time and build a cold standby. Here is how I did it.

System: ZCS 4.5.6 NE on Ubuntu 6.06.1

Step one:
Create a 1:1 copy of your server. You can use what ever you prefer LVM snapshot, physical hard drive copy, rsync. I used rsync as I don't have physical access to server. Important for this step: zimbra needs to be down to make sure its really 1:1.

Step two:
Adjusting DNS. As zimbra wont start if the DNS is not correct we have to fake a bit.
Lets say the primary server is: zmail.mydomain.tld - add an additional DNS entry for this server zmail2.mydomain.tld
Install a local DNS server on your cold standby server - I used dnsmasq.
We need this DNS server so the cold standby server can use the FQDN of your primary server while having a different IP. To do this I added this line in /etc/dnsmasq.conf:
Code:
address=/zmail.mydomain.tld/192.168.1.100
and changed /etc/hosts to:
Code:
127.0.0.1       localhost.localdomain localhost
192.168.1.100  zmail.mydomain.tld zmail
Now the cold standby server can use "zmail.mydomain.tld" to run a local zimbra configured for you primary server and still access the primary server using zmail2.mydomain.tld.

Step three:
Configure password less ssh using ssh keys - we need this to use rsync in a cron job.

Step four:
To be sure nothing is going wrong while syncing the backups I moved /opt/zimbra/backup to /zmailbackup.

Step five:
Create some scripts to control the sync / restore and if your server is "cold standby" or "active"

/root/coldstandby is just a file. I use it to check if the server is in "cold standby" modus or live.

/root/change.zimbra.status.sh is used to change the function from "cold standby" to "live". If the server is live you don't want to sync with your primary server anymore.....

Code:
#!/bin/bash
case $1 in
"status")
   if [ -f /root/coldstandby ]; then
   echo "server is in cold standby modus"
   else
   echo "server is LIVE"
   fi
   ;;
"cold")
   echo "switching into cold modus"
   echo "... remove start scripts"
   update-rc.d -f zimbra remove
   echo "... activate sync, backup, restore"
   echo "if this file is missing server is live" > /root/coldstandby
   echo "... stop zimbra"
   /etc/init.d/zimbra stop
   echo "... done"
   ;;
"hot")
   echo "switching into live modus"
   echo "... install start scripts"
   update-rc.d zimbra defaultis 99
   echo "... deactivate sync, backup, restore"
   rm /root/coldstandby
   echo "... check if a restore is running"
   RESTORE=`ps fax | grep -i java | grep -i restore | wc -l`
   if [ $RESTORE -gt 0 ]; then
     while [ $RESTORE -gt 0 ]; do
       echo "!!! FOUND ACTIVE RESTORE PROCESS PLEASE WAIT UNTIL FINISHED !!!"
       echo "... waiting for 5 min, and check again"
       sleep 3000
       RESTORE=`ps fax | grep -i java | grep -i restore | wc -l`
     done
   fi
   echo "... no restore runnin anymore ... going live now"
   /etc/init.d/zimbra stop
   /etc/init.d/zimbra start
   ;;
*)
   echo "help..."
   echo "switch to live modus: ./change.zimbra.status hot"
   echo "switch to cold modus: ./change.zimbra.status cold" 
   echo "query status: ./change.zimbra.status status"
   ;;
esac
/root/sync.live.server.sh syncs the backup folder with the primary server and starts the restore.
Code:
#!/bin/bash
if [ -f /root/coldstandby ];
then
  echo "`date`: start syncing backups" >> /var/log/zimbra.cold.log
  rsync -a root@zmail2.mydomain.tld:/opt/zimbra/backup/* /zmailbackup/
  echo "`date`: start restoring backups" >> /var/log/zimbra.cold.log
  su - zimbra /opt/zimbra/cold.restore.sh
  echo "`date`: restore finished"
else
  echo "`date`: no sync/restore done server considered to be LIVE" >> /var/log/zimbra.cold.log
fi
/opt/zimbra/cold.restore.sh restores the backup
Code:
#!/bin/bash
zmcontrol stop
LABEL=`zmrestoreldap -lbs -t /zmailbackup | sed -n 1p`
zmrestoreldap -lb $LABEL -t /zmailbackup
zmmailboxctl start
zmrestore -a "all" --ignoreRedoErrors -t /zmailbackup
zmcontrol stop
I had trouble with RedoErrors and the only way I could get it to work was using the "--ignoreRedoErrors" option. I also tried to use zmrestoreoffline - but this did not work at all for me. The "zmcontrol stop" at the beginning and the end are just for safety.

The only thing that is left is a cron job. I have this line in my /etc/crontab:
Code:
55 */2  * * *   root   /root/sync.live.server.sh
For me this is 30min after the server did it's backup - which works fine for me.

I know that the "cold standby" server is never 100% up2date but this is ok for me.

Cheers
Andre