Hi all,

we are facing random stopping of some Zimbra services. We can't figure out what's triggering this services shutdown.

[root@blackbox ~]# sudo -u zimbra zmcontrol status
Host blackbox.open1.it
ldap Running
logger Running
mailbox Stopped
tomcat is not running
mta Stopped
zmsaslauthdctl is not running
snmp Running
spell Running
[root@blackbox ~]#



HELP! Due to the nature of these kind of problems, we can't reproduce them systematically to get 'em fixed. That's why I'm bothering you guys, we are kinda in a hurry.... :-(

Marcello Golfieri

P.S. I post our last logs:


/var/log/messages [extract] :

[...]
May 4 04:40:05 blackbox zimbramon[23811]: 23811:info: 2007-05-04 04:40:01, STATUS: blackbox.open1.it: ldap: Running
May 4 04:40:05 blackbox zimbramon[23811]: 23811:info: 2007-05-04 04:40:01, STATUS: blackbox.open1.it: logger: Running
May 4 04:40:05 blackbox zimbramon[23811]: 23811:info: 2007-05-04 04:40:01, STATUS: blackbox.open1.it: mailbox: Running
May 4 04:40:05 blackbox zimbramon[23811]: 23811:info: 2007-05-04 04:40:01, STATUS: blackbox.open1.it: mta: Running
May 4 04:40:05 blackbox zimbramon[23811]: 23811:info: 2007-05-04 04:40:01, STATUS: blackbox.open1.it: snmp: Running
May 4 04:40:05 blackbox zimbramon[23811]: 23811:info: 2007-05-04 04:40:01, STATUS: blackbox.open1.it: spell: Running
May 4 04:42:03 blackbox zimbramon[24448]: 24448:info: 2007-05-04 04:42:01, STATUS: blackbox.open1.it: ldap: Running
May 4 04:42:03 blackbox zimbramon[24448]: 24448:info: 2007-05-04 04:42:01, STATUS: blackbox.open1.it: logger: Running
May 4 04:42:03 blackbox zimbramon[24448]: 24448:info: 2007-05-04 04:42:01, STATUS: blackbox.open1.it: mailbox: Running
May 4 04:42:03 blackbox zimbramon[24448]: 24448:info: 2007-05-04 04:42:01, STATUS: blackbox.open1.it: mta: Running
May 4 04:42:03 blackbox zimbramon[24448]: 24448:info: 2007-05-04 04:42:01, STATUS: blackbox.open1.it: snmp: Running
May 4 04:42:03 blackbox zimbramon[24448]: 24448:info: 2007-05-04 04:42:01, STATUS: blackbox.open1.it: spell: Running
May 4 04:44:18 blackbox zimbramon[24943]: 24943:info: 2007-05-04 04:44:01, STATUS: : Cannot: determine
May 4 04:44:20 blackbox saslauthd[20142]: server_exit : master exited: 20142
May 4 04:45:57 blackbox ntpd[3462]: sendto(193.6.222.20): Invalid argument
May 4 04:46:04 blackbox zimbramon[25549]: 25549:info: 2007-05-04 04:46:01, STATUS: blackbox.open1.it: ldap: Running
May 4 04:46:04 blackbox zimbramon[25549]: 25549:info: 2007-05-04 04:46:01, STATUS: blackbox.open1.it: logger: Running
May 4 04:46:04 blackbox zimbramon[25549]: 25549:info: 2007-05-04 04:46:01, STATUS: blackbox.open1.it: mailbox: Stopped
May 4 04:46:04 blackbox zimbramon[25549]: 25549:info: 2007-05-04 04:46:01, STATUS: blackbox.open1.it: mta: Stopped
May 4 04:46:04 blackbox zimbramon[25549]: 25549:info: 2007-05-04 04:46:01, STATUS: blackbox.open1.it: snmp: Running
May 4 04:46:04 blackbox zimbramon[25549]: 25549:info: 2007-05-04 04:46:01, STATUS: blackbox.open1.it: spell: Running
May 4 04:48:00 blackbox ntpd[3462]: sendto(200.73.83.34): Invalid argument
May 4 04:48:03 blackbox zimbramon[26063]: 26063:info: 2007-05-04 04:48:01, STATUS: blackbox.open1.it: ldap: Running
May 4 04:48:03 blackbox zimbramon[26063]: 26063:info: 2007-05-04 04:48:01, STATUS: blackbox.open1.it: logger: Running
May 4 04:48:03 blackbox zimbramon[26063]: 26063:info: 2007-05-04 04:48:01, STATUS: blackbox.open1.it: mailbox: Stopped
May 4 04:48:03 blackbox zimbramon[26063]: 26063:info: 2007-05-04 04:48:01, STATUS: blackbox.open1.it: mta: Stopped
May 4 04:48:03 blackbox zimbramon[26063]: 26063:info: 2007-05-04 04:48:01, STATUS: blackbox.open1.it: snmp: Running
May 4 04:48:03 blackbox zimbramon[26063]: 26063:info: 2007-05-04 04:48:01, STATUS: blackbox.open1.it: spell: Running
May 4 04:50:01 blackbox zimbramon[26439]: 26439:info: 2007-05-04 04:50:01, QUEU
E: 0 0
[...]


################################################## ##

/opt/zimbra/log/mailbox.log:



2007-05-04 01:10:05,509 INFO [LmtpServer-42] [] LmtpHandler - [192.168.100.101] connected
2007-05-04 01:10:05,640 INFO [LmtpServer-42] [name=admin@blackbox.open1.it;] mailbox - Added message id=663 digest=gMe,AaipTL4n5+N7PSjoOgTW2
7E= mailbox=2 rcpt=admin@blackbox.open1.it
2007-05-04 01:10:59,390 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:11:45,614 INFO [LmtpServer-42] [] LmtpHandler - [192.168.100.101] quit from client
2007-05-04 01:11:45,615 INFO [LmtpServer-42] [] ProtocolHandler - Handler exiting normally
2007-05-04 01:11:59,385 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:12:59,380 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:13:59,374 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:14:59,368 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:15:59,363 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:16:59,358 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:17:59,353 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:18:59,348 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:19:59,342 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 01:20:59,345 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=1, after=0 (8ms)
2007-05-04 01:30:02,491 INFO [http-7071-Processor49] [ip=127.0.0.1;name=zimbra;] mailbox - Starting table maintenance. MinRows=10000, MaxRo
ws=1000000, GrowthFactor=10
2007-05-04 01:40:49,877 INFO [Timer-0] [] SessionCache - Removed 2 idle sessions (AdminSession: 2). 1 active sessions remain.
2007-05-04 03:10:49,976 INFO [Timer-0] [] SessionCache - Removed 2 idle sessions (AdminSession: 2). 1 active sessions remain.
2007-05-04 04:02:05,386 INFO [LmtpServer-43] [] LmtpHandler - [192.168.100.101] connected
2007-05-04 04:02:05,923 INFO [LmtpServer-43] [name=admin@blackbox.open1.it;] mailbox - Added message id=664 digest=pwyKH,HUwALlrZUHhEovFi9kh
9M= mailbox=2 rcpt=admin@blackbox.open1.it
2007-05-04 04:02:58,483 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:03:45,896 INFO [LmtpServer-43] [] LmtpHandler - [192.168.100.101] quit from client
2007-05-04 04:03:45,897 INFO [LmtpServer-43] [] ProtocolHandler - Handler exiting normally
2007-05-04 04:03:58,478 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:04:58,482 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (1ms)
2007-05-04 04:05:58,475 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:06:58,470 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:07:58,464 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:08:58,459 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:09:58,454 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:10:58,448 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:11:58,442 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=0, after=1 (0ms)
2007-05-04 04:12:58,445 INFO [IndexWritersSweeper] [] MailboxIndex - open index writers sweep: before=1, closed=1, after=0 (9ms)
2007-05-04 04:44:23,950 INFO [SIGTERM handler] [] StatsImageServlet - Servlet StatsImageServlet shutting down
2007-05-04 04:44:23,951 INFO [SIGTERM handler] [] soap - Servlet AdminServlet shutting down
2007-05-04 04:44:23,951 INFO [SIGTERM handler] [] TcpServer/7025 - LmtpServer initiating shutdown
2007-05-04 04:44:23,951 INFO [LmtpServer] [] TcpServer/7025 - finished accept loop
2007-05-04 04:44:23,970 INFO [SIGTERM handler] [] TcpServer/7025 - LmtpServer shutting down idle thread pool
2007-05-04 04:44:23,971 INFO [SIGTERM handler] [] TcpServer/110 - Pop3Server initiating shutdown
2007-05-04 04:44:23,971 INFO [SIGTERM handler] [] TcpServer/110 - Pop3Server shutting down idle thread pool
2007-05-04 04:44:23,971 INFO [Pop3Server] [] TcpServer/110 - finished accept loop
2007-05-04 04:44:23,971 INFO [SIGTERM handler] [] TcpServer/995 - Pop3Server initiating shutdown
2007-05-04 04:44:23,971 INFO [SIGTERM handler] [] TcpServer/995 - Pop3Server shutting down idle thread pool
2007-05-04 04:44:23,971 INFO [Pop3Server] [] TcpServer/995 - finished accept loop
2007-05-04 04:44:23,971 INFO [SIGTERM handler] [] TcpServer/143 - ImapServer initiating shutdown
2007-05-04 04:44:23,971 INFO [SIGTERM handler] [] TcpServer/143 - ImapServer shutting down idle thread pool
2007-05-04 04:44:23,971 INFO [ImapServer] [] TcpServer/143 - finished accept loop
2007-05-04 04:44:23,971 INFO [SIGTERM handler] [] TcpServer/993 - ImapSSLServer initiating shutdown
2007-05-04 04:44:23,971 INFO [SIGTERM handler] [] TcpServer/993 - ImapSSLServer shutting down idle thread pool
2007-05-04 04:44:23,971 INFO [ImapSSLServer] [] TcpServer/993 - finished accept loop
2007-05-04 04:44:23,985 INFO [NOTIFY-7035-Server] [] misc - spurious wakeup [1] [no attachment] interest=ACCEPT ready= key=7a74db2c poolSize
=0 queueSize=0
2007-05-04 04:44:23,985 INFO [NOTIFY-7035-Server] [] misc - shutting down thread pool
2007-05-04 04:44:23,985 INFO [NOTIFY-7035-Server] [] misc - waiting for thread pool to shutdown
2007-05-04 04:44:23,985 INFO [NOTIFY-7035-Server] [] misc - done waiting for thread pool to shutdown
2007-05-04 04:44:23,985 INFO [NOTIFY-7035-Server] [] misc - closing all selection keys
2007-05-04 04:44:23,986 INFO [NOTIFY-7035-Server] [] misc - closed selector
2007-05-04 04:44:23,986 INFO [SIGTERM handler] [] session - shutdown: clearing SessionCache
2007-05-04 04:44:23,986 INFO [NOTIFY-7035-Server] [] misc - ended server loop
2007-05-04 04:44:23,986 INFO [IndexWritersSweeper] [] MailboxIndex - IndexWritersSweeper thread exiting
2007-05-04 04:44:23,986 INFO [SIGTERM handler] [] MailboxIndex - Flushing all open index writers
2007-05-04 04:44:24,047 INFO [SIGTERM handler] [] RedoLogManager - Logged: 486 items, 3.835ms/item
2007-05-04 04:44:24,048 INFO [IncomingDirectorySweeper] [] FileBlobStore - IncomingDirectorySweeper thread exiting
2007-05-04 04:44:24,048 INFO [SIGTERM handler] [] extensions - Destroying extensions
2007-05-04 04:44:24,084 INFO [SIGTERM handler] [] extensions - Destroyed extension clamscanner: com.zimbra.clam.ClamScanner@com.zimbra.cs.ex
tension.ZimbraExtensionClassLoader@7f629e71
2007-05-04 04:44:24,085 INFO [SIGTERM handler] [] ContentServlet - Servlet ContentServlet shutting down
2007-05-04 04:44:24,085 INFO [SIGTERM handler] [] PublicICalServlet - Servlet PublicICalServlet shutting down
2007-05-04 04:44:24,086 INFO [SIGTERM handler] [] soap - Servlet SoapServlet shutting down
2007-05-04 04:44:24,086 INFO [SIGTERM handler] [] account - Servlet PreAuthServlet shutting down
2007-05-04 04:44:24,087 INFO [SIGTERM handler] [] mailbox - Servlet UserServlet shutting down
2007-05-04 04:44:24,087 INFO [SIGTERM handler] [] FileUploadServlet - Servlet FileUploadServlet shutting down

################################################## ##


/opt/zimbra/tomcat/logs/catalina.out [tail]:


[...]
"Finalizer" daemon prio=1 tid=0x0000002ac75d5640 nid=0x4e8e in Object.wait() [0x0000000040656000..0x0000000040656bb0]
at java.lang.Object.wait(Native Method)
- waiting on <0x0000002aa1da1d48> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue .java:116)
- locked <0x0000002aa1da1d48> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue .java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finali zer.java:159)

"Reference Handler" daemon prio=1 tid=0x0000002ac75d4e20 nid=0x4e8d in Object.wait() [0x0000000040555000..0x0000000040555e30]
at java.lang.Object.wait(Native Method)
- waiting on <0x0000002aa16d04c8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:474)
at java.lang.ref.Reference$ReferenceHandler.run(Refer ence.java:116)
- locked <0x0000002aa16d04c8> (a java.lang.ref.Reference$Lock)

"main" prio=1 tid=0x00000000401160a0 nid=0x4e72 waiting on condition [0x0000007fbfffd000..0x0000007fbfffd690]
at java.lang.Thread.sleep(Native Method)
at com.zimbra.cs.launcher.TomcatLauncher.start(Tomcat Launcher.java:53)
at com.zimbra.cs.launcher.TomcatLauncher.main(TomcatL auncher.java:74)

"VM Thread" prio=1 tid=0x0000002ac75d0930 nid=0x4e8c runnable

"GC task thread#0 (ParallelGC)" prio=1 tid=0x0000000040133c80 nid=0x4e85 runnable

"GC task thread#1 (ParallelGC)" prio=1 tid=0x0000000040134b00 nid=0x4e86 runnable

"VM Periodic Task Thread" prio=1 tid=0x0000002ac75f0e70 nid=0x4e96 waiting on condition

[tomcat launcher] got signal SIGTERM invoking stop
4-mag-2007 4.44.22 org.apache.coyote.http11.Http11BaseProtocol pause
INFO: Pausing Coyote HTTP/1.1 on http-80
4-mag-2007 4.44.22 org.apache.coyote.http11.Http11BaseProtocol pause
INFO: Pausing Coyote HTTP/1.1 on http-7071
4-mag-2007 4.44.23 org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina
4-mag-2007 4.44.25 org.apache.coyote.http11.Http11BaseProtocol destroy
INFO: Stopping Coyote HTTP/1.1 on http-80
4-mag-2007 4.44.25 org.apache.coyote.http11.Http11BaseProtocol destroy
INFO: Stopping Coyote HTTP/1.1 on http-7071