Results 1 to 10 of 10

Thread: HA Open Zimbra Idea - MailStore via http rather than file system.

  1. #1
    Join Date
    Jan 2006
    Posts
    33
    Rep Power
    9

    Default HA Open Zimbra Idea - MailStore via http rather than file system.

    I have an idea for modifying the mailstore for imap for added scalability and failover. It would use web servers to store the actual files rather than the standard file system.

    I'm hoping to generate some interest and knock this out. I think it should be a pretty simple change and give HUGE gains. It would def be a bit slower to access the message file via http rather than local filesystem, but should be pretty fast and gives simple solution for replication/failover/HA on the open source version of zimbra.

    It would work something like below.

    1. Modify "com.zimbra.cs.store.StoreManager" so that it stores message file via post to web server. It should post to two different servers, for failover.

    1. The two urls to the file are stored in the database with the messageId.

    1. When the imap server needs to "get" a message file, it will requrest it from one of the urls listed in the database. If one fails it will failover to the other.

    1. If a server is low on storage, just add another plain old web server. It would just need a script to receive uploads via post. Then add the server to the list of available web storage servers

    1. Mysql and Ldap are replicated to a second server for application failover. Storage fails over using the above solution.


    I'm going to start digging through the code this week to see how doable this is. Let me know if you have any interest in helping out. I could def use a hand from someone who has worked with the Zimbra code.

    Thanks,

    Kevin

  2. #2
    Join Date
    May 2007
    Location
    Vancouver, Canada
    Posts
    75
    Rep Power
    8

    Default

    That's an interesting idea, but it seems to me that it might be putting a lot of "HA" effort into the wrong place.

    What you are essentially trying to do, it sounds like, is provide HA for the mail store. But by moving this up to the application layer, you're forcing the application to become more aware of its surroundings than it really needs to be. For example, when you do a "POST" to store a message, will the Zimbra server do a POST to multiple web servers to ensure that the message is stored everywhere? If so, what happens if one of them fails - should it consider the message saved or not? And if you don't do multiple POSTs, but instead rely on the web server to replicate the data, how is that different from just using disk replication technology on the Zimbra server - i.e. mirrored drives, or a full SAN?

    Perhaps more importantly, what happens if the Zimbra server itself dies? Even though you've replicated your underlying data, with your zimbra server dead, you still can't serve it up to the user. So you've got DR (saved data), but no HA (you can't get at it)

    I think you could probably achieve a better level of HA without actually modifying the Zimbra codebase at all -- e.g. use MySQL replication, as you suggested, but then replicate the mailstore filesytem at the OS level to a second box (or use a SAN, but that adds substantially more cost). On the second box, you have your running MySQL replica, and a 'standby' copy of Zimbra that can be started if the first server should fail, and its local copy of data will already be in sync.

    I've been reading up on Sun's Availability Suite, which was recently open-sourced. It has some impressive features that would work well in this situation. It's just one reason I've been working on getting Zimbra running on Solaris.

  3. #3
    Join Date
    Jan 2006
    Posts
    33
    Rep Power
    9

    Default SAN would work great, but...

    I def recognize that SAN is the most common way to go for HA in a mail environment and definitely a good way to go. I'm in search of an in expensive alternative.

    My issue is that while the SAN direction seems to be the most common for HA, the infrastructure is prohibitively expensive for a lot of implementations and rather complex to set up, especially in a managed hosting environment.

    My goal is not just to provide an HA solution, but one that can be setup very easily with little expense in a managed hosting environment.

    So in response to your points:

    1) "Application layer aware of of too much."
    Well it is already aware of the file system. I'm not sure that having it aware of a url is any more complex than having it aware of a path in the file system.

    2) "Post to multiple servers could fail"
    True, this is part of the reason I posted, to work out details. I had also considered a replication daemon. It could only copy to one location, the daemon could replicate.

    3) "Why not user disk replication"
    I have considered this. We used disk replication one of my previous companies to store photo and video file backups/failover. It did work, but required custom OS installs to support and were a bit finicky. We eventually moved to a system like I am describing. It was easy to implement, could be setup on any server w/o kernel other OS modifications and worked very well.

    4) "what happens if a post fails"
    No problem, it can be replicated later. So long as it is in one location the replication can happen as soon as the secondary storage is up.

    5) "what happens if zimbra dies"
    This was the reason for the replicated MySQL and LDAP. The second machine would have a hot copy of Zimbra waiting to go. I was figuring for starters two frontend zimbra boxes. One active the other ready to go. If the active went down the secondary would be ready. All the message files would be replicated to the web storage servers separate, so they'd be ready to go.

    So yeah SAN sounds good. I think the http idea could provide some additions though:
    - easy - if you are low on storage just add a webserver. No need to mount new drives configure new servers... just add it to the list of available storage servers in your Zimbra config.
    - cheap - storage is cheap these days. 1TB of storage could be added with shared web host for $7/month.
    - managed hosting - this would work with managed hosting enviroments. It would be cost prohibitive to setup a similiar solution with SAN. I'd likely need a rack, servers and custom hardware.



    Anyway, I'm gonna give it a shot... my main concern right now is actually speed. We have been successful with this model for photos and videos. The upload speed to the servers could be a deal breaker. If it is fast enough on read/write, I think it should be pretty straight forward to have this running soon.
    Last edited by kbaker; 07-20-2007 at 11:38 PM.

  4. #4
    dijichi2 is offline OpenSource Builder & Moderator
    Join Date
    Oct 2005
    Posts
    1,176
    Rep Power
    12

    Default

    Zimbra have said in the past that there will be full mailstore replication builtin to Zimbra. Might be worth asking first if this is still the case?

  5. #5
    dijichi2 is offline OpenSource Builder & Moderator
    Join Date
    Oct 2005
    Posts
    1,176
    Rep Power
    12

    Default

    ps - I did try a long time ago a blank proof of concept using drbd and heartbeat for failover clustering which worked fine. It was a bit clunky and unsupported, and this was a NE install so I didn't use it, but it is possible. This involves no modification to Zimbra.

    Since these days as hillman points out Sun Cluster is in the process of being opensourced, and Veritas SF is available for Linux, as well as the advent of OCFS, GFS, drbd etc, it might be worth investigating providing HA from the system level first before modifying Zimbra itself.

    The major win of course of modfying Zimbra itself would be live-live capability and 'proper' (ie redundant) load balancing.

  6. #6
    Join Date
    Jan 2006
    Posts
    33
    Rep Power
    9

    Default

    Something like Live-live redundancy should be possible with two instances of Zimbra running off of a single database, that would failover to a secondary if it goes down.

    Unfortunately actual live-live would be very difficult to achieve w/o major modification to the Zimbra code base, I think. This is due to the sequential messageid requirement in the IMAP RFC. W/o some sort of token passing between instances this would probably break the RFC.

    We had a pretty long discussion about it on the DBMAIL list about a year ago.
    Being that hardware is so cheap these days, I'm less interested in live-live. More focused on massive scalability of the message store.

  7. #7
    Join Date
    Jan 2006
    Posts
    33
    Rep Power
    9

    Default

    Being a software engineer, not an admin, I am more interested in the software http failover solution. If this works, I can just add web servers every time I want storage simplifying the HA significantly.

    Regardless, I think it is an interesting approach and should be at least tested for performance. I'll only be looking into system level replication if this doesn't work out. Actually I already have dbmail with mysql replication working great, so would probably just stick to that.

    Thanks for the comments guys. If anyone is interested in a software solution for HA in Zimbra, rather than the hardware solution, give me a heads up... I'll be working on it.

  8. #8
    dijichi2 is offline OpenSource Builder & Moderator
    Join Date
    Oct 2005
    Posts
    1,176
    Rep Power
    12

    Default

    having software replication would be fantastic. having software replication that worked effectively over WAN links would be even better. live-live is really important in this day and age as any downtime costs $$$ and jobs. at the very least, active-standby running in redundant datacentres is virtually an audit requirement for most large companies.

    have you looked at the way cyrus does it? they started with a murder which was their odd name for frontend load balancing/intelligent redirection to wherever the mailbox was on the backend servers, then replication was added in relatively recently.

  9. #9
    Join Date
    Jan 2006
    Posts
    33
    Rep Power
    9

    Default

    Thanks for the reply. Here are some responses to your thoughts. I'm going to start a wiki on this as soon as I get my dev environment setup here to do Zimbra builds.

    1) Replication over WAN - This should be simple with this system. To me it is somewhat of a no brainer to use web servers for storage backend as they have been developed and perfected for years to deliver files efficiently over WAN. The replication is rather simple. When a message comes in copy to two of your http message store servers from a master list. If it fails on one, move on, a daemon will clean up in the background as soon as a secondary storage resource becomes available. If we have any speed issues for reads from the message store, something like a local squid cache on the Zimbra server could probably take care of it.


    2) Live live/Active-standby - Achieving Active-standby should be trivial. The replication is broken into two parts in Zimbra, meta data and message store. The first part is all the meta data associated with a message, this is stored in the database. LDAP and MySQL can both be setup to replicate to a secondary Zimbra server that is in "active-standby".

    The second, message store, would be handled with the HA http message storage. The great thing about this concept is that it inherantly live-live. The message is stored to two or more locations, a reference to each is saved in the database. None of the message store locations have to be the primary.

    When a message must be read from the http message store, the Zimbra server can choose from any of the locations with the same response. If this choice is random it should spread the load well over the storage servers, maximizing hardware usage. If one fails it can simply use one of the other message store urls. So with this we get both failover and load balancing for message stores.

    I'd be happy with active-standby on the meta data, mysql/ldap, on first pass. Full live-live would likely need some significant changes to Zimbra, so I'd probably look into that later. I believe Cyrus uses a token passing system to achieve live-live. There is a good discussion on how this might work on the dbmail list here "http://www.dbmail.org/dokuwiki/doku.php?id=unique_token". There is also an alternative idea based on generating ascending guid message id's form configured system information here "http://www.dbmail.org/dokuwiki/doku.php?id=unique_id_guid".

  10. #10
    Join Date
    Jan 2006
    Posts
    33
    Rep Power
    9

    Default Next Steps: Starting tests and dev

    I'm going to be approaching this project something like this. Of course on my "free time" so could take a bit.

    - get Zimbra dev environment setup
    - do successful unchanged build
    - replace MessageStore filesystem write code with post to remote apache server using jakarta httpclient
    - replace MessageStore filesystem read code with http read using jakarta httpclient
    - build server
    - test delivery and read
    - move http storage server list into configuration.
    - setup replication code, at delivery time multi-writes.
    - test build
    - write http replication daemon to run in the background, taking care of any multi-writes that don't make it due to connection failure.

    Just a quick overview.... off I go. It'll prob be a week or so before I have any working code, but have to start somewhere.

    Cheers,

Similar Threads

  1. QUE Failure
    By tbullock in forum Administrators
    Replies: 31
    Last Post: 07-30-2008, 01:17 PM
  2. centos 5 zimbra 4.5.6 no statistics
    By rutman286 in forum Installation
    Replies: 9
    Last Post: 08-14-2007, 10:30 AM
  3. Replies: 8
    Last Post: 02-27-2007, 04:10 AM
  4. zimbra-core missing
    By kinaole in forum Developers
    Replies: 1
    Last Post: 10-02-2006, 12:59 PM
  5. Replies: 16
    Last Post: 09-07-2006, 07:39 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •