Results 1 to 4 of 4

Thread: Script: (Thunberbird) mbox tree to (zimbra desktop) importable .eml tree

Threaded View

  1. #1
    Join Date
    Jul 2010
    Rep Power

    Post Script: (Thunberbird) mbox tree to (zimbra desktop) importable .eml tree

    Our thunderbird clients have a lot of local folders that we'd like to migrate to zimbra desktop local folders. Thunderbird stores local folders in mbox files that can be structured in subdirectories. Zimbra desktop can import tgz files that contain a directory tree with .eml files where each .eml file is one message.

    After playing with and a bash script, I gave up and started to write my own python script which I'm sharing here:

    Created on 02.11.2011
    @version 0.9
    @author: Fabrice Bongartz (fabrice (at) fabrice d.o.t. me)
    @copyright: (C) 2011 Fabrice Bongartz
    This program is free software; you can redistribute it and/or modify it under
    the terms of the GNU General Public License as published by the
    Free Software Foundation; either version 3 of the License, or (at your option)
    any later version.
    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
    FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
    details. You should have received a copy of the GNU General Public License
    along with this program; if not, see <>.
    import sys, os, os.path, time, email.utils, re, argparse, tarfile
    from argparse import ArgumentTypeError
    re_from = re.compile("^From ")
    re_date = re.compile("^Date: +")
    re_strip_from = re.compile("From ([^@]+@)*(\(.+\)|\S+) *")
    def get_date_from_fromline(from_line):
        return re_strip_from.sub("", from_line)
    def get_date_from_dateline(date_line):
        return re_date.sub("", date_line)
    def mbox2eml(mbox_file, dest_dir, change_mtime, prefer_dateheader): # jump back to the beginning of the file
        previous_was_newline = True
        nr = 0
        in_headers = False
        date_from = None # to store dates from the "From " header.
        date_date = None # to store dates from the "Date: " header.
        cur_path = None
        cur_file = None
        bof = True
        for line in mbox_file:
            if (previous_was_newline): # True for the first iteration
                if (re_from.match(line) != None):
                    # we're at the beginning of a new message
                    if (bof == False): # True for the first iteration
                        # finish the last message
                        if (cur_file.closed == False): cur_file.close()
                        if (change_mtime):
                            if (prefer_dateheader and date_date != None):
                                mtime = time.mktime(date_date)
                            else: mtime = time.mktime(date_from)
                            os.utime(cur_path, (mtime, mtime))
                    # prepare for a new message
                    cur_path = os.path.join(dest_dir, str(nr) + ".eml")
                    cur_file = open(cur_path, "w")
                    date_from = email.utils.parsedate(get_date_from_fromline(line))
                    date_date = None
                    in_headers = True
                    bof = False
                    nr += 1
                elif (in_headers): in_headers = False
            if (prefer_dateheader and in_headers and re_date.match(line) != None):
                date_date = email.utils.parsedate(get_date_from_dateline(line))
            # write the current line
            # determine if we're at a newline
            if line.replace("\r\n", "\n") == "\n": previous_was_newline = True
            else: previous_was_newline = False
        # treat the last remaining message
        if (bof == False): # should only be True here if the file was empty
            if (cur_file.closed == False): cur_file.close()
            if (change_mtime):
                if (prefer_dateheader and date_date != None):
                    mtime = time.mktime(date_date)
                else: mtime = time.mktime(date_from)
                os.utime(cur_path, (mtime, mtime))
        return True
    def is_mbox_file(f):
        Determine of the given File object is an mbox file. This simply checks
        if the file's first line has the string "From " at its start.
        if (re_from.match(f.readline()) != None):
            return True
        return False
    def recurse_mbox(mbox_start_dir, tmp_dir, change_mtime = True,
                     prefer_dateheader = True):
        for root, dirs, files in os.walk(mbox_start_dir, True, None, False):
            for f_str in files:
                if ((len(f_str) > 3 and f_str[-4:] != ".msf") or len(f_str) <= 3):
                    f = open(os.path.join(root, f_str), "r")
                    if (is_mbox_file(f)):
                        print "Treating mbox " + os.path.join(root, f_str)
                        rel_path = os.path.relpath(root, mbox_start_dir)
                        if (rel_path == "."): rel_path = ""
                        dest_dir = os.path.join(tmp_dir, rel_path, f_str).replace(".sbd", "")
                        if (os.path.isdir(dest_dir) == False): os.makedirs(dest_dir)
                        mbox2eml(f, dest_dir, change_mtime, prefer_dateheader)
    def create_tgz(dir, tgz_path):
        tgz =, "w:gz")
        for root, dirnames, filenames in os.walk(dir):
            for f in filenames:
                filepath = os.path.join(root, f)
                relpath = os.path.relpath(filepath, dir)
                print "Adding to targz:", filepath
                tgz.add(filepath, relpath)
    def initialize_options():
            ap = argparse.ArgumentParser()
            ap.add_argument("-s", "--mbox-start-dir", required = True,
                           dest = "mbox_start_dir", help = "A source directory "
                            + "that contains a tree of mbox files.")
            ap.add_argument("-d", "--destination", required = True,
                            dest = "dest_dir", help = "Directory where the "
                            + "eml-file tree should be created.")
            ap.add_argument("-z", "--tgz", dest = "tgz", help = "Create a gzipped "
                            + "tar archive that contains the directory tree "
                            + "specified with -d/--destination at the given "
                            + "path. This is optional. Note that this might be "
                            + "faster using an optimized commandline tool like "
                            + "gnu tar.")
            ap.add_argument("-M", "--dont-change-mtimes", dest = "dont_change_mtimes",
                            action = "store_true", default = False, help = "By "
                            + "default, the mtime and atime of the created eml "
                            + "files will be changed to a date found in each email "
                            + "header. This option disabled changing mtime/atime.")
            ap.add_argument("-I", "--ignore-date-header", default = False,
                            action = "store_true", dest = "ignore_date_header",
                            help = "By default, and if -M/--dont-change-mtimes "
                            + "wasn't specified, in order to change the "
                            + "atime+mtime of each eml file, the program will look "
                            + "for a Date: line in the email headers. If no Date: "
                            + "line was found, the date from the \"From \" line at "
                            + "the beginning of the message will be used. This "
                            + "option disables looking for the Date: line so that "
                            + "the \"From \" line will always be used.")
            return ap.parse_args()
    if __name__ == '__main__':
        opts = initialize_options()
        # check args
        if (not os.path.isdir(opts.mbox_start_dir)):
            raise ArgumentTypeError("The given mbox start dir is not a directory")
        if (not os.path.isdir(opts.dest_dir)):
            raise ArgumentTypeError("The given destination dir is not a directory")
        # walk the mboxes and create the destination eml file structure
                     (opts.dont_change_mtimes == False),
                     (opts.ignore_date_header == False))
        # optionally create a gzipped tar archive
        if (opts.tgz): create_tgz(opts.dest_dir, opts.tgz)
    Save the script above somewhere as For example: /home/user/migrate/

    To get commandline help, exectute the script with the -h option like this:
    python -h
    So, in order to migrate thunderbird's local folders to a zimbra desktop installation, follow these steps:
    1. Make sure you read the script's commandline help. Ideally also read the source code so you understand what the script does.
    2. Make sure you have python2 installed. There should be no other dependencies. (Tested with python2.7 on arch linux).
    3. Copy the local folders from thunderbird to the machine where you'll be executing the script. If you're using the script on the pc that holds the thunderbird profile, you may skip this step.
    4. Create a destination directory. For example: /home/user/migrate/dest. Make sure, the user that will execute the script has write privileges in that directory.
    5. Launch the script. Example:
      python /home/user/migrate/ -s THUNDERBIRD_DIR -d /home/user/migrate/dest /home/user/migrate/importme.tgz
    6. The directory tree with eml files will be created in the dest directory. After that, dest's contents will be tar/gzipped to importme.tgz
    7. Open Zimbra Desktop, go to Preferences -> Local Folders -> Import / Export and import the tgz file.

    Any improvements to the script are welcome. For example, I didn't implement any error handling or logging. There could be more comments and so on.

    EDIT: I noticed that Zimbra Desktop doesn't seem to like tgz files over 2 GB in size on windows (can someone confirm this?), so I had to split the local folders for some big accounts.

    I hope this can be useful for someone.
    Last edited by fbongartz; 11-04-2011 at 12:53 AM. Reason: 2 GB limit for tgz files under windows?

Similar Threads

  1. Old Backup stay in TO_DELETE status and no clearing..
    By bartounet in forum Administrators
    Replies: 0
    Last Post: 10-05-2010, 07:40 AM
  2. admin consol blank after 5.0.3 upgarde
    By maumar in forum Administrators
    Replies: 6
    Last Post: 03-21-2008, 05:16 AM
  3. [SOLVED] Error Installing Zimbra on RHEL 5
    By harris7139 in forum Installation
    Replies: 10
    Last Post: 09-25-2007, 11:39 AM
  4. Zimbra shutdowns every n hours.
    By Andrewb in forum Administrators
    Replies: 13
    Last Post: 08-14-2007, 08:55 AM
  5. Fedora Core 3, Clean Install - Not working!
    By pcjackson in forum Installation
    Replies: 17
    Last Post: 03-05-2006, 06:38 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts