Results 1 to 7 of 7

Thread: [SOLVED] Compressing blobs in volume

  1. #1
    Join Date
    Feb 2006
    Location
    Newcastle, UK
    Posts
    17
    Rep Power
    9

    Default [SOLVED] Compressing blobs in volume

    Hi all,

    We've been having diskspace problems on one of our Zimbra servers for a couple of months now, mainly because it isn't particularly big (~100GB). To help it out, I added a secondary volume (an NFS mounted partition from a NAS box) and started running HSM.

    Unfortunately, our usage is growing exponentially, and all I can do now is to keep reducing the age threshold for HSM more and more. I've got it at 90 days now, which is repidly getting to the point that far too recent mail is going to get moved by HSM.

    I noticed just now that when the inital volume was set up on this server, it seems compression was turned off on the message volume. Is there any way to apply compression to blobs already on a volume? I'm surmising that I have some 90GB of blobs that would benefit greatly from being compressed and would love to be able to recurse over the volume with some magic utility, compressing the blobs as it finds them :-)

    Any pointers?

    Cheers,

    Kev

    PS. Network mounted disks are my only hope for this box at present - a lack of foresight on my part used 1U servers with both disks filled, with the intention of connecting them to a SAN. The budget for the SAN was pulled...

  2. #2
    Join Date
    Feb 2006
    Location
    Newcastle, UK
    Posts
    17
    Rep Power
    9

    Default

    No-one any ideas?

  3. #3
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    25

    Default

    Hopefully will have a answer soon for you ... Asked a employee if you were to disable blob compression and then enable it again would it go through the whole store and compress as necessary.

  4. #4
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    21

    Default

    When you enable compression existing blobs aren't touched. Vote for Bug 35143 - blob compress and decompress tool first (then we can make it an automatic process after).

    Two solutions are further below, but bear with the edumacation - it'll help you long term to understand/for others coming across this thread.

    My preferred order for compression:
    • -Start with -zip on backups if they're not on the same system.
      • (Note: --zipStore aka zip without compression it will be the default argument to zmbackup in 6.0)
      • Fewer files make it easier to copy or rsync later, and prevents you from running out of inodes. You still have the benefit of single instance storage; shared blobs are only added once then referenced with a pointer. Of course you lose the hard linking optimization (speed and space) for blobs that are in an earlier full backup already when working from the same disk, so it’s more advantageous for those off-site single-copies.
    • -Compress Secondary/HSM
    • -Compress Primary Stores
    • -Compress Indexes
    Since the perf impact of this is largely dependent on the message profile at a particular site (specs on usage/users/server would be handy), we don't have any blanket formula to define the effect of compression, but there are a few things to keep in mind:

    A) CPU utilization - If your maxed out now don't. Compression is hard on the CPU, you don't want to turn this into a bottleneck.

    B) Memory - When messages are compressed on disk, they have to be read entirely into memory; they can't be streamed. This may sound somewhat counter-intuitive, but it means that the memory usage is higher when using compression. (On huge sites with weak servers if not careful you'll hit OOME when you have compressed blobs in the 100's of MB range. )

    Bug 28329 - stream: don't hold contact blob in memory
    Bug 31694 - stream: MessageCache should not load full messages into memory
    Bug 22678 - remove MailItem.getContent() and MessageCache blob caching
    Bug 10136 - Don't write blobs when holding mailbox lock
    Bug 34890 - Implement file descriptor cache for blob store

    The size is specified by zimbraMessageCacheSize (default is 1671168 or 1.5MB). Messages that are loaded into memory consume the same number of bytes as the uncompressed message size. Messages that are streamed from disk (no compression & less than 1.5MB) are estimated to take up 4k in the message cache - they're pointed at, not loaded.

    (Extra tidbit: Remember incoming messages larger than zimbraMailDiskStreamingThreshold (1048576 or 1MB) are streamed to disk during LMTP delivery, instead of being read into memory. This limits memory consumption at the expense of higher disk utilization.)

    C) If you want to slow growth with minimal perf impact, you can enable compression with a high threshold. (The default for which blobs to compress is 4096 or 4k.)

    All that said though - if you're out of disk space, you're out of disk space. So compression appears your next remaining option.

    The two solutions:


    Now you have several choices, just remember a volume id column for the blob is stored in each mail_item table in MySQL. Examine your zmvolume -l very carefully so you dont' get confused which is which, or in the DB that's su - zimbra > mysql > select * from zimbra.volume;

    Solution 1: Easiest/and most importantly no downtime trick - The HSM would compress if we moved from a primary to a secondary volume. So create a compressed HSM volume on your main server.

    We honor the compression setting of a target volume only when we store/move a message (delivery or HSM). Thus the point of Bug 35143 - blob compress and decompress tool to handle existing/non-moves(I guess another for those that want to uncompress - never tried this solution in reverse.)

    Message stores:
    • Alpha1: Primary on server
    • Bravo2: Secondary on NAS (compressed)
    • Charlie3: Secondary on server (compressed)
    1.1) Set current HSM to Charlie3 with a super low time value, let it move/compress everything for you over night or kick off an zmhsm --start session (check with --status).

    1.2) Change the HSM current back to Bravo2.

    The goal is to change the type of Charlie3 to Primary Message, then mark it current. (Future: Bug 18720 - Add support for more than one current secondary storage volume in HSM )

    1.3) The admin console on 5.0.13 won't let you remove the 'current secondary volume' drop down field, you have to use CLI:
    -ts,--turnOffSecondary Turns off the current secondary message volume
    zmvolume -ts 3

    And zmcolume -l will look like:
    name: messageCharlie3
    type: secondaryMessage
    path: /opt/zimbra/store3
    compressed: false
    current: false

    1.4) Now that it's not locked as current anymore, just change the type using Admin console or CLI:
    -t,--type <arg> Volume type (primaryMessage, secondaryMessage, or index)
    zmvolume -t primaryMessage 3

    1.5) Finally switch your 'current primary message store' from /opt/zimbra/store1 to /opt/zimbra/store3 (drop down in admin console or zmvolume -sc 3) and your set - store1 isn't deleted, it can still be accessed, just new files aren't going there.

    Remember when a new message is delivered or created, the message is saved in the current message volume. Additional message volumes can be created, but only one is configured as the current volume where new messages are stored.

    Try to get the main store down to the minimum/1 day first, but even if you didn't: Just leave it alone, and the uncompressed stuff will get moved to the NAS 1 > 2 (bypassing 3) eventually.
    If you really want to get rid of store1 sooner see this thread: http://www.zimbra.com/forums/adminis...e-volumes.html

    For the future: Bug 35142 - volume migration tool & Bug 23472 - Ability to move/consolidate messages from one message store to another.


    Solution 2: The entirely manual method is to compress blobs in place, by iterating through the store:

    2.1) Compression of blobs in the message store is done with gzip (don't remember, but possibly with null as os type and no mtime).

    2.2) Compressed blobs are not identifiable by name, so compression of existing (already delivered) blobs will need to be done to a temp file on the same FS, then moved over the existing file atomically. Compressing in place (overwriting the existing file) will open you up to data corruption, and should be avoided.

    2.3) Do not compress blobs with a link count > 1 - these are hardlinked across mailboxes (exist in more than one mailbox) and the compression and replacement (in 5 above) will break the hardlink, likely negating any space savings achieved.

    2.4) Do not compress any files that are smaller than the FS block size, since you won't save any space. (Btw what filesystem type are you using?)

    2.5) If you're compressing in place on the active primary store, don't do it while HSM is running. If you don't want to shut things down, you can compress while things are running, but you can expect to see errors while attempting to access blobs > 1MB that reside in message cache. This problem will fix itself after a server restart or after a message is pushed out of the cache by newer messages. Personally I feel it's better to do it with Zimbra stopped.

    So:
    2A) Shutdown ZCS.
    2B) Compress files in the blob store. Makes sure filename does not change and that you don't double compress files - point 2.2 above.
    2C) Start ZCS
    Last edited by mmorse; 02-27-2009 at 10:38 AM. Reason: rearrange tidbits under solutions - called store3 charlie (as you probably already have a 2)

  5. #5
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    25

    Default

    Very informative indeed; thanks Mike

  6. #6
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    21

    Default

    I rearranged the tidbits under the solutions themselves for clarity - the solution1 trick really is the easiest.

  7. #7
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    21

    Default

    FYI for 6.0+ re not loading compressed blobs into memory:
    There's a new /opt/zimbra/uncompressed directory controlled by 2 global/server attributes:
    zimbraMailUncompressedCacheMaxFiles 5000
    zimbraMailUncompressedCacheMaxBytes 1073741824 (1GB)

    Edit:
    The planned location changed to /opt/zimbra/tmp/uncompressed then again to /tmp/uncompressed.
    Last edited by mmorse; 03-04-2009 at 11:39 AM.

Similar Threads

  1. [SOLVED] How do I add multiple storage pools?
    By nordviks in forum Installation
    Replies: 3
    Last Post: 01-20-2009, 11:42 PM
  2. How to specify current volume by cli ?
    By marisu in forum Developers
    Replies: 1
    Last Post: 11-10-2008, 05:11 PM
  3. multiple volumes
    By tiger2000 in forum Administrators
    Replies: 1
    Last Post: 11-02-2008, 09:16 PM
  4. Replies: 1
    Last Post: 10-29-2008, 09:33 PM
  5. Alternate Index, Store and DB Location?
    By marrotte in forum Administrators
    Replies: 5
    Last Post: 08-16-2008, 06:32 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •