Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Strange data store size increase on migration

Hybrid View

  1. #1
    zwvpadmin Guest

    Default Strange data store size increase on migration

    So I've successfully completed a challenging migration yet there is something quite strange going on.

    I upgraded from Zimbra 6.0.13 on Ubuntu 8.04 x32, to Ubuntu 10.04 x64 Zimbra 7.1.3.

    because of the arch incompatibility to do a straight upgrade, I used imapsync to physically copy all user data from old server to new across the network. This worked relatevely well with some scripting and a user account/password list. The new system is up and running fine and there doesn't seem to be any problems but there is one very strange thing.

    On the old server my /opt directory was approximately 215 gigs. There were actually MORE user accounts (280) on the old system as I used this opportunity to prune old unused mailboxes. The new system has fewer user accounts (220) and yet the /opt directory is pushing 430 gigs.

    This strikes me as incredibly strange, especially when the physical size increase appears to be exactly double. What could have caused this?? I wouldn't be too concerned except the drive assigned to /opt is only 500g @RAID 50 which "should" have lasted ample years. If this is normal and uncorrectable I will need to quickly increase its size.

    The only other thing that I feel is worth mentioning is that I had synced approximately 80% of the data to the new server on install 7.1.2 and then ran the upgrade to 7.1.3 and resynch'd. Is it possible that it literally duplicated data?? I checked several mailboxes and no duplicate mail messages appear to be visible...

    Any ideas???

  2. #2
    phoenix is offline Zimbra Consultant & Moderator
    Join Date
    Sep 2005
    Location
    Vannes, France
    Posts
    23,587
    Rep Power
    58

    Default

    Quote Originally Posted by zwvpadmin View Post
    So I've successfully completed a challenging migration yet there is something quite strange going on.

    I upgraded from Zimbra 6.0.13 on Ubuntu 8.04 x32, to Ubuntu 10.04 x64 Zimbra 7.1.3.

    because of the arch incompatibility to do a straight upgrade, ....
    The correct procedure to follow would have been this: Network Edition: Moving from 32-bit to 64-bit Server - Zimbra :: Wiki

    Quote Originally Posted by zwvpadmin View Post
    The only other thing that I feel is worth mentioning is that I had synced approximately 80% of the data to the new server on install 7.1.2 and then ran the upgrade to 7.1.3 and resynch'd. Is it possible that it literally duplicated data?? I
    I would think this is the likely cause. Duplicated mail won't show in the client or web ui but will be on the mail store, if I were you I'd look at that to see if there's duplicated data in there.

    BTW, the RAID level you're using is not recommended for Zimbra. If you want good (or improved) performance you should use RAID10.
    Regards


    Bill


    Acompli: A new adventure for Co-Founder KevinH.

  3. #3
    zwvpadmin Guest

    Default

    I would have liked to follow that guide, but we also made some configuration changes to our domain structure to accommodate unplanned aquisitions and the probability of future expansion. In addition the original configuration was designed around an improperly designed network (That was inplace before I came into the picture) and further changes to the network domain and nomenclature also had to be made.

    Unfortunately my approach was unavoidable based on lengthy research and a few conversations with zimbra employee's about it.

    That being said, how can I go about checking for duplicates in the data store itself. If they do not show up under client mailboxes I'm unclear how to locate and eliminate them. Please advise.

    Thanks!

  4. #4
    zwvpadmin Guest

    Default Duplicate data in the data store

    How can I go about checking for duplicate data in the data store itself. If they do not show up under client mailboxes I'm unclear how to locate and eliminate them.

    I am working with Zimbra 7.1.4 on Ubuntu 10.04 LTS x64

    Please advise.

    Thanks!

  5. #5
    Join Date
    Jul 2007
    Location
    Baltimore
    Posts
    1,649
    Rep Power
    11

    Default

    what kind of duplicate data are you looking for to begin wtih? zimbra already kind of 'dedupes' by default. if many users on your system receive the same e-mial, it will store it once and hard link it for the other acconts

  6. #6
    Join Date
    Jan 2007
    Location
    Minnesota
    Posts
    719
    Rep Power
    9

    Default

    @bdial: Dedupe only takes effect at LMTP submission time. If you use imapsync, zmrestore, or any other non-SMTP, non-LMTP import method, there is no dedupe.

    @phoenix: I find it hard to imagine duplicating data with imapsync by accident and not seeing the effects in the client. Maybe you'd copy the same message twice with different INTERNALDATEs, but surely clients would notice the duplicates in search. Or did you simply mean that LMTP dedupe is lost?

    You can use freedups or similar (freedup.org, hardlink) to hard-link retroactively. As long as you restrict it to /opt/zimbra/store*, it's safe. It will take a LONG time to run, at least 24 hours, but it's mostly metadata reads so should not hurt performance much. If you really have a nearly 50% data duplication rate among accounts, that's unusual.

    Another possibility: Is this OSS or NE? 7.x silently made --zip the default for zmbackup, which increases the size of /opt/zimbra/backup. But only after the second or third full backup.

    Quick test: df -i. Are there radically more inodes in use on the new server? This correlates with number of files. This will tell you if you should be looking for lots of duplicated small files, or a smaller number of really big files.

    Just as quick: mysql -e 'select count(*) from mboxgroup10.mail_item' will give you a count of messages in a 1% sample of the mail store. There are 100 mysql databases, creatively named mboxgroup1 through mboxgroup100, and individual accounts are "randomly" assigned to one. I would not expect the counts on each server to match exactly, because user jdoe is likely in a different mboxgroup on server1 than on server2, but if you consistently get a higher count n the new server than on the old server, something strange is going on.

    du -skh /opt/zimbra/{store,index,db,log,[...]} will tell you definitively where the space has gone, but it will take a long time to return (hours, at least).

  7. #7
    zwvpadmin Guest

    Default

    Thank you Rich G. for the thorough response. I think loosing the SMTP/LMTP dedupe is probably what I'm up against. Users in my company are constantly mass mailing each other notices and inquiries, I would imagine 1 in 10 messages is duplicated across 50% or greater mailboxes. It's a-typical but the nature of the beast in this situation.

    Doing what you've suggested here are some results:

    df -i (OLD /opt): /dev/md0 23805952 1378225 22427727 6% /opt
    df -i (NEW /opt): /dev/sdb1 32776192 1669350 31106842 6% /opt

    mysql -e 'select count(*) from mboxgroupX.mail_item' (OLD):
    1:10171, 2:42741, 3:30073, 4:22586, 5:2407, 6:32999, 7:9994, 8:23582, 9:60016, 10:2697 Total: 237266

    mysql -e 'select count(*) from mboxgroupX.mail_item' (NEW):
    1:1427, 2:21283, 3:155, 4:957, 5:19096, 6:61267, 7:15310, 8:34707, 9:176, 10:6489 Total: 160867 <- reflecting fewer user mailboxes?

    I'm running the du -skh /opt/zimbra/... commands and tallying up for later.

    Based on what I'm seeing here I would have to say the issue is almost certainly dedupe issues. I'm going to try to run the freedups program you suggested and see if that helps. Let me know if anything else here strikes you.

  8. #8
    zwvpadmin Guest

    Default

    I had significant trouble attempting to compile freedup from source as there were only a i386 & i586 32bit versions available, so I opted for an alternate version of fdupes in Maverick that includes:

    -L --hardlink: replace all duplicate files with hardlinks to the first file in each set of duplicates

    It was available at http://mirrors.us.kernel.org/ubuntu/...R2-3_amd64.deb

    A ran a test command without the -L action:

    fdupes -m -n -r /opt

    It took HOURS to complete, but even with zimbra running live (which was unable to read thousands of locked files) still showed:

    525279 duplicate files (in 196227 sets), occupying 172900.5 megabytes

    I believe I've found my solution. I would imagine the actual amount to be closer to 200gb which would put my original datastore size back down around 230gb and much closer to the expected.

    I'll run this process over the weekend when I can shut down Zimbra to let it run overnight and report back. Thanks again for all the input.
    Last edited by zwvpadmin; 01-19-2012 at 05:49 AM.

  9. #9
    Join Date
    Jan 2007
    Location
    Minnesota
    Posts
    719
    Rep Power
    9

    Default

    > fdupes -m -n -r /opt

    Don't run that without the -n!

    Keep it within /opt/zimbra/store. Other "duplicate" files are likely to be temporary lockfiles and whatnot, which would be bad to combine.

  10. #10
    zwvpadmin Guest

    Default

    Makes sense. What about /opt/zimbra/[index,db,data] ?

    Or are duplicates likely only within the store?

Similar Threads

  1. Replies: 2
    Last Post: 11-27-2011, 10:59 PM
  2. System crash leads to major MySQL corruption
    By Glider Guider in forum Administrators
    Replies: 1
    Last Post: 08-01-2011, 11:03 AM
  3. [SOLVED] Store server to store server migration, CLI?
    By NathanL in forum Administrators
    Replies: 3
    Last Post: 10-04-2010, 08:11 AM
  4. Replies: 2
    Last Post: 02-12-2008, 10:55 AM
  5. mail migration usermail size
    By kmuralidharan in forum Migration
    Replies: 0
    Last Post: 07-10-2007, 08:16 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •