Page 2 of 2 FirstFirst 12
Results 11 to 16 of 16

Thread: De-duplicate mailstore

  1. #11
    Join Date
    Jan 2008
    Posts
    658
    Rep Power
    8

    Default

    Quote Originally Posted by brian View Post
    Voted! Thanks, Brian.

  2. #12
    Join Date
    Aug 2008
    Posts
    10
    Rep Power
    7

    Default

    Quote Originally Posted by veronica View Post
    De-dupe is per mail store as the users moved to separate mail store have new sql database.
    More-over, ZCS de-dupe operates only at initial message delivery time, via LMTP, to multiple recipients. e.g. an LMTP session with multiple RCPT To: commands. Otherwise, there is no de-duplication. Also, ZCS de-duplication is on an entire message basis -- messages must be 100% identical, every header line, as they are during a simultaneous LMTP delivery.

    ZCS does not implement attachment de-duplication, e.g. calculating a hash for an attachment and only storing it once, regardless of how often it is delivered.

    Moving users, migrations, etc. do not use LMTP and thus "break" de-duplication.

    Larry

  3. #13
    Join Date
    Aug 2008
    Posts
    10
    Rep Power
    7

    Default

    Quote Originally Posted by Klug View Post
    However, if you use "manual hardlinks" instead of integrated SIS, what happens if one user deletes the mail the hardlink points to?
    What happens when ZCS removes a blob (file) is that the link count on the inode to which the file references is decremented by one. When an inode link count goes to zero, the OS may then remove the data from disk. If the other user(s) still have blobs/files that reference the inode, the data will not be removed.

    Some further explanation: in *nix, a file on disk exists as an inode. A directory file contains references to inodes. Each reference increments the link count for an inode. Most inodes have a link count of one, as most "files" "exist" in just one directory. I say "file" because the file is actually the inode, the name in the directory just references the inode. There can be many references, all with different names in different locations. When all the references go away, the inode can go away and the space made available again for allocation.

    Larry

  4. #14
    Join Date
    Mar 2006
    Location
    Beaucaire, France
    Posts
    2,322
    Rep Power
    13

    Default

    I have not tried (yet) to play with this: I intend to build a 6.0.x VM, create a bunch of users on it then migrate (imapsync) the same data to each account and then use hardlink to reclaim the space.

    I want to do this in order to test the way ZCS handles the hardlink when modifying something in the message.

    If you look at freedups' README you can see that all apps are not handling it the same way...

  5. #15
    Join Date
    Aug 2008
    Posts
    10
    Rep Power
    7

    Default

    Quote Originally Posted by Klug View Post
    I have not tried (yet) to play with this: I intend to build a 6.0.x VM, create a bunch of users on it then migrate (imapsync) the same data to each account and then use hardlink to reclaim the space.

    I want to do this in order to test the way ZCS handles the hardlink when modifying something in the message.

    If you look at freedups' README you can see that all apps are not handling it the same way...
    This is my understanding of how ZCS functions:

    ZCS blobs in the database have an identifier, and a change identifier. In the mysql mboxgroupXX.mail_item database, these are columns id and mod_content. For a blob with an id of 1 and a mod_content value of 1, the blob on disk is stored as /opt/zimbra/store/{dir}/{mbxid}/msg/{dir}/1-1.msg. The {dir} values differ based on hashing algorithms applied to the mailbox id and the blob id. Whenever ZCS makes a change to a blob, the mod_content value should be changed (e.g. to 2), and the new, modified blob written to disk as /opt/zimbra/store/{dir}/{mbxid}/msg/{dir}/1-2.msg. As ZCS currently implements single-store as OS level hard links between identical blobs in multiple accounts, this change tracking mechanism permits one account to change a shared data item without affecting other accounts.

    If you want to see how many of your blobs have hard links and are presumably shared by other accounts, go to your blob store (see the path as above) and use the stat command, e.g.

    Code:
    $ stat -c '%n %h' * | egrep -v '1$'
    Larry

  6. #16
    Join Date
    Nov 2008
    Location
    US
    Posts
    21
    Rep Power
    7

    Default

    Quote Originally Posted by lweeks View Post
    For a blob with an id of 1 and a mod_content value of 1, the blob on disk is stored as /opt/zimbra/store/{dir}/{mbxid}/msg/{dir}/1-1.msg. The {dir} values differ based on hashing algorithms applied to the mailbox id and the blob id. Whenever ZCS makes a change to a blob, the mod_content value should be changed (e.g. to 2), and the new, modified blob written to disk as /opt/zimbra/store/{dir}/{mbxid}/msg/{dir}/1-2.msg. As ZCS currently implements single-store as OS level hard links between identical blobs in multiple accounts,
    This implies that one could run a de-dupe utility on /opt/zimbra/store to reclaim space in cases such as mine.

    Any one speak to any database implications (ie, are the links/identical blobs tracked in the db)?

Similar Threads

  1. How to eliminate Duplicate Emails
    By dchristiaan in forum Administrators
    Replies: 3
    Last Post: 07-10-2009, 09:25 AM
  2. Need to move duplicate messages
    By ashish_clarion in forum Zimlets
    Replies: 2
    Last Post: 06-26-2009, 07:37 AM
  3. Replies: 0
    Last Post: 10-10-2008, 02:47 PM
  4. Prevent duplicate delivery
    By stace in forum Administrators
    Replies: 6
    Last Post: 07-13-2007, 07:34 AM
  5. A user on more then one mailstore.
    By The_W in forum Administrators
    Replies: 3
    Last Post: 05-12-2006, 09:23 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •