Results 1 to 6 of 6

Thread: Backup mx + failover for NE

  1. #1
    Join Date
    Jun 2008
    Location
    Berkeley, CA
    Posts
    1,474
    Rep Power
    9

    Default Backup mx + failover for NE

    We are in final stages of evaluating Zimbra NE as a replacement for our current mail system, with Exchange as the other candidate.

    The ability to provide at least a cold standby (at a remote location) is a major requirement, and I believe I've worked out procedures which will allow me perform frequent scheduled data synchronization and then bring up the standby should it be needed. Some of this is discussed in these threads:
    http://www.zimbra.com/forums/install...ity-tiger.html
    http://www.zimbra.com/forums/adminis...storation.html

    However I've also been asked to provide a backup mx in the remote location. I suppose the simplest approach would be to just run something on another (third) machine, but I'd like to be able to offer a more elegant solution that minimizes hardware and physical space requirements. In short, ideally, the standby machine would be able to operate as a secondary mx, and if I need to make the sync'ed data "live", I would be able to integrate any mail that had arrived in the meantime.

    My first thought on this would be to have some steps where the MTA being used as a secondary mx would be configured to stop listening on standard ports before Zimbra is made active. Then I could tell it to go ahead and deliver its mail to Zimbra on the same machine. (I'd possibly multihome and have the two MTAs on different IP addresses if necessary.)

    I'm looking for advice on whether this sounds feasible and any suggestions on proceeding. This would likely be done under Mac OS X 10.4 or 10.5, so any of the common MTAs available for that platform could be used, or even Communigate, which is our current mail server and could be relegated to that role.

    I also wonder if I'm making this more complicated than it needs to be. E.g. is there a way to simply submit mail directly by copying files? Or perhaps a way to run Zimbra (FOSS) as a backup mx and then integrate the outgoing mail queue when I bring up the NE?

    Will the next major release of Zimbra offer anything to help with this--hopefully without requiring purchase of multiple NE licenses?
    Last edited by ewilen; 01-21-2009 at 06:07 PM.

  2. #2
    Join Date
    Jun 2008
    Location
    Berkeley, CA
    Posts
    1,474
    Rep Power
    9

    Default

    Ah, I see from

    Zimbra Product Portal
    Bug 11423 - disaster recovery through server to server sync (beta)
    also Enterprise messaging and collaboration: Zimbra's product roadmap

    that "disaster recovery through server to server sync" is planned for 6.0, but the status is currently "at risk", and the details of how the feature will work aren't too clear--as e.g., whether would it require a separate NE license to implement the feature on a NE installation.

  3. #3
    Join Date
    Sep 2006
    Location
    477 Congress Street | Portland, ME 04101
    Posts
    1,374
    Rep Power
    11

    Default

    Sounds like you have two requirements there:
    1. Backup MX in a data center separate from the production Zimbra data center.
    2. Cold standby Zimbra servers in the second data center.


    For the backup MX, we use a plain-jane Postfix box and some scripts on the Zimbra and Postfix boxes to export a list of valid email addresses and domains (Zimbra box) and on the Postfix box a separate script looks for changes from the previously exported list of valid email addresses, and makes appropriate changes to the relevant Postfix files. You can run the scripts as often as you like; we run them several times a day.

    In this way, we have automated the process of having our backup MX be completely up to date at all times.

    For the cold standby servers requirement, I would think the real issues for management are to clarify the amount of Zimbra downtime they are willing to tolerate in the event of an outage at the primary data center, and whether or not they can tolerate any "lost" emails when switching over to the cold standby farm at the backup data center.

    The less downtime and data loss that can be tolerated, the more $$$ it will take.

    We are experimenting with ldap exports, mysql dumps, and syncing all of /opt/zimbra to Amazon S3, but this is more for D/R than near real-time failover to a secondary data center.

    Basically what I am saying is that satisfying the backup MX requirement is pretty straightforward and inexpensive, but satisfying the second requirement for cold standby servers I would not attempt without further refinement of the needs from management.

    Hope that helps,
    Mark

  4. #4
    Join Date
    Jun 2008
    Location
    Berkeley, CA
    Posts
    1,474
    Rep Power
    9

    Default

    Hi, Mark, thanks for your reply.

    Since we're doing comparative evaluations with Exchange, there's strong implication that the benchmark will be defined in terms of Exchange 2007 SP1's Standby Continuous Replication. That said, I've suggested that bringing up the standby might not be instantaneous and that the frequency of replication might be as low as once per hour (meaning up to an hour's worth of messages could be lost), and this has been deemed acceptable.

    The method I've worked out is basically to have a clone installation of Zimbra on the standby, and to rsync /opt/zimbra/backup and /opt/zimbra/redolog to it. When necessary, I perform

    zmrestoreldap
    zmrestoreoffline
    zmplayredo

    in that order (with appropriate arguments and necessary turning on/off of zimbra services before each command). I'm not sure yet whether it will be better to execute those commands periodically or to wait until the standby needs to be brought online--possibly a daily or weekly automated run to ensure that restoration won't take too long when we really need the standby.

    Given that we have on the order of ~100 users and will be pushing the data from a T1 to a 6Mb DSL connection, I believe this is feasible with rsync.

    As you can probably see, the standby's purpose is more along the lines of disaster recovery than near-realtime failover. However, in the event of extended downtime at our primary site (e.g., a cable cut on our T1), we would expect to bring up the standby at the secondary site so that users with alternate means of Internet connectivity would be able to access their data and continue their work.

    With all that in mind I have to admit that we may not actually need a live secondary mx--brief outages at our primary site should simply result in the sender trying again, while anything over 4-8 hours ought to trigger our standby plan. However it's probably easier to just set up a secondary mx than to argue the point.

    What I was hoping to avoid, though, was an extra machine in the secondary site's server room. If we end up using Exchange, then (according to my Exchange-savvy colleague) the primary and secondary servers will be able to handle replication while also jointly accepting mail as primary and secondary mx's. With Communigate, I've found that it's simple to rsync data to a "standby directory" on the secondary mx, and then if I need to make that data "live", I can also easily integrate any messages which have arrived on the secondary mx in the time since the primary became unavailable, using a feature called Foreign Queue Processing.

    I suppose that having an extra machine for secondary mx isn't too bad and may be preferable to over-complicating the configuration of the standby. Nevertheless if there's a good way to save space and avoid maintaining a separate piece of hardware, I feel that would be desirable.

  5. #5
    Join Date
    Sep 2006
    Location
    477 Congress Street | Portland, ME 04101
    Posts
    1,374
    Rep Power
    11

    Default

    Exchange replication is indeed more advanced than Zimbra (we support both), and yes, the Exchange replica box can do double duty as a backup MX. You will definitely spend less setup time IMHO deploying Exchange than Zimbra in a failover, WAN-connected two-data-center deployment scenario.

    One thing regarding costs; last I checked E2K7 no longer bundles an Outlook CAL like all previous versions of Exchange did. So, unless you are an edu, those Outlook CALs can run you ~ $70 seat, and with 100 users that's another $7,000 in Exchange licensing you may not have counted on.

    Zimbra make a big deal about their deployments being much less expensive than Exchange, not only on the licensing front but also on the hardware requirements front, and that has indeed been our experience. You could run 100 users on Zimbra comfortably on a used HP DL-360 G4 with 6GB of RAM and a pair of old single-core 3.0GHz Xeons and RAID1 146GB or 300GB disks.

    One "brute-force" D/R scenario we use is to have a spare identical chassis in the production rack and in the second data center. If the first data center gets an extended outage, we go to the data center, shut down the Zimbra server, rsync /opt/zimbra somewhere local just in case, and remove the disk drives. We take the drives to the secondary data center, shove them in the spare chassis there, boot the server, reconfigure the NICs, make a change to public DNS and everything is back in service.

    Same as if there is a hardware issue in the primary data center; just yank the disks and put them in the spare chassis there.

    If you can afford the downtime for someone to get to the data center to do this, it is a much much less expensive way to get some good redundancy. Elegant it is not! But effective it is!

    Hope that helps,
    Mark

  6. #6
    Join Date
    Jun 2008
    Location
    Berkeley, CA
    Posts
    1,474
    Rep Power
    9

    Default

    Thanks again for sharing your experiences. Our data centers will be on opposite coasts, though, so I don't think the drive-swapping method will work for us.

    Also, yes, we've looked at the Exchange licensing costs and those will probably be a major argument in favor of Zimbra.

    I have another question but I'll send it via PM.

Similar Threads

  1. Replies: 658
    Last Post: 04-04-2014, 09:01 AM
  2. Per User Backup for FOSS edition
    By fdsadmin in forum Developers
    Replies: 33
    Last Post: 03-14-2013, 09:16 AM
  3. Keeping a backup server synced with live server
    By Q-Mike in forum Administrators
    Replies: 5
    Last Post: 04-11-2008, 01:40 PM
  4. [SOLVED] Backups failing, "unable to read metadata for account"
    By smcgrath1111 in forum Administrators
    Replies: 10
    Last Post: 04-10-2008, 03:15 PM
  5. FYI: ZCS NE backup to fuse/sshfs mount, worked.
    By jagipson in forum Administrators
    Replies: 0
    Last Post: 09-28-2007, 06:37 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •