Zimbra 7.1.1 shutting down every couple hours
This might be the same type of problem reported by other users of 7.1.1, but I'm making a new post because I think my symptoms are different and I am attempting an analysis of what's going on.
This is what is seems is happening:
1. a slow DB response to zmconfigd's inquiries of getting configuration variables leads to timeout of query
2. zmconfigd applies some built-in default values to the variables it queries which are different from the values it normally gets. It also detects that all services have been disabled.
(e.g., zimbraMailSSLPort changes from 443 to 0, zimbraImapSSLBindPort changes from 993 to 7993)
3. The services are scheduled to stop because they are incorrectly detected as disabled by zmconfigd's timed-out query.
4. The configuration queries come back successfully. However, zmconfigd remembered the bogus configuration values and it thinks the new values are a configuration change.
5. zmconfigd writes the 'new' configuration values to config files.
6. The rewrite of the configuration files causes zmconfigd to trigger a service restart.
7. When the LDAP server (mailboxd) goes down, the configuration values get bogus information again which leads to writing the configuration files again and another restart of services.
8. Repeat steps 3-6
9. Eventually it gets an "exit code 1" from "bin/postfix stop norewrite" so it loops continuously trying to stop postfix. Note that postfix is actually already stopped, but (perhaps the exit code of 1 is indicating no work was performed).