Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 25

Thread: Zimbra .pids / service monitoring

  1. #11
    Join Date
    May 2007
    Location
    London, UK
    Posts
    26
    Rep Power
    8

    Default

    Quote Originally Posted by padraig View Post
    This looks good but there is a danger this could mask an underlying problem
    if processes dies regularly
    the posted config doesnt work... for a bunch of reasons.

    I'm working on a much better version right now... should have it finished today, i'll test for a week and then post back here if it works

  2. #12
    Join Date
    Feb 2008
    Posts
    3
    Rep Power
    7

    Default

    Anyone get this working successfully?
    This thread is a little old but im hoping somebody got this to work.

    Quote Originally Posted by Leesbian View Post
    the posted config doesnt work... for a bunch of reasons.

    I'm working on a much better version right now... should have it finished today, i'll test for a week and then post back here if it works

  3. #13
    Join Date
    Oct 2005
    Location
    Thatcher, AZ
    Posts
    5,606
    Rep Power
    21

    Default

    There is a very fundamental issue with this work flow that needs to be considered:

    If a service stops, it stops for a reason. This work flow does nothing to address that problem.

    This means that if there is larger issue, such as an unhanded exception...well it's only a matter of time before it goes down again. Since this idea would automatically restart the service, you may never know if you hit an unhanded exception. It also might make it worse....

    Zimbra has great handlers. We have our own watchdog proc for things like mta, clam, and java. If those die, it tries to restart them. If there is a condition preventing the restart, it won't restart them.

    The moral of the story is that if the server goes down, you really should figure out why, as opposed to just restarting the service.

    I do think this is a good idea, which is why I'm saying it's a problem with the work flow itself.

    There's a high availability/fail over script floating around. You might want to look at that.

  4. #14
    Join Date
    Feb 2008
    Posts
    3
    Rep Power
    7

    Default

    does the watchdog process send an email to the admin if a process dies and it has to restart it or cant restart it? is there an option to set something like that up? i realize that if a service does die that there could be a bigger underlying issue, but i would like an alert telling me its died and could/couldnt be restarted rather than just finding out by all my customers calling and complaining ;-)

    i was just trying to be proactive in being alerted to the issue first if something were to happen.

    thanks for the input.

  5. #15
    Join Date
    Oct 2005
    Location
    Thatcher, AZ
    Posts
    5,606
    Rep Power
    21

    Default

    Well, it wouldn't be able to send an e-mail because the server is down, thus smtp is down. If e-mail's down, you probably won't get the message anyway.

    What I would do is to have a script that monitors the services. If a condition is raised where the services go down, you could have it sent an http post to your "support server" or something. If you're using windows nt, you would whip up a script where if that post is received, it uses windows messaging service (not MSN messenger, but the messenger protocol built into windows nt machines) to send your machine an alert.

    Just some thoughts.

    Definitely possible.

  6. #16
    Join Date
    Oct 2005
    Location
    Thatcher, AZ
    Posts
    5,606
    Rep Power
    21

    Default

    Correction:
    SMTP may not be down, but another service could be down. In any case, since this is a disaster-related script, you should plan for the event that smtp is unavailable.

  7. #17
    Join Date
    Mar 2007
    Location
    Small village in the center of Italy
    Posts
    350
    Rep Power
    8

    Default multistore is worthwhile to be monitored using monit

    all what u say, john, is right..but:
    i have a multistore architecture with store servers wan-connected to a central hub;
    i have a store that die when wan connection with master goes away; at this moment i dunno any way to resort it without using monit; if u would suggest me something different u are welcome!
    any advice will be glad

  8. #18
    Join Date
    Mar 2008
    Posts
    2
    Rep Power
    7

    Post A working monitor...

    Hey there... no one's done anything with this in a while, but I figured I would post my working monitor script. The one thing to note is that the purpose of the script is NOT to restart a failed process, simply to give the administrator a heads up that something is about to go bad (Eg. process hung, running out of resources, process died... etc).

    Code:
    check system myhost.local
      if loadavg (1min) > 4 then alert
      if loadavg (5min) > 2 then alert
      if memory usage > 85% then alert
      if cpu usage (user) > 70% then alert
      if cpu usage (system) > 50% then alert
      if cpu usage (wait) > 20% then alert
    
    check process Zimbra.Apache
      with pidfile "/opt/zimbra/log/httpd.pid"
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      if failed port 80 protocol http then alert
      group zimbra
    
    check process Zimbra.Logwatch
      with pidfile "/opt/zimbra/log/logswatch.pid"
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      group zimbra
    
    check process Zimbra.MySQL
      with pidfile "/opt/zimbra/db/mysql.pid"
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      if failed port 7306 protocol mysql then alert
      group zimbra
    
    check process Zimbra.MySQL_Logger
      with pidfile "/opt/zimbra/logger/db/mysql.pid"
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      depends on Zimbra.MySQL
      group zimbra
    
    check process Zimbra.MTA_Config
      with pidfile "/opt/zimbra/log/zmmtaconfig.pid"
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      group zimbra
    
    check process Zimbra.Mailbox_Java
      with pidfile "/opt/zimbra/log/zmmailboxd_java.pid"
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      if failed port 143 protocol imap then alert
      group zimbra
    
    check process Zimbra.Mailbox_Control
      with pidfile "/opt/zimbra/log/zmmailboxd_manager.pid"
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      group zimbra
    
    check process Zimbra.ClamAV
      with pidfile /opt/zimbra/log/clamd.pid
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      group zimbra
    
    check process Zimbra.Cyrus_SASL
      with pidfile /opt/zimbra/cyrus-sasl/state/saslauthd.pid
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      group zimbra
    
    check process Zimbra.Postfix
      with pidfile /opt/zimbra/data/postfix/spool/pid/master.pid
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      if failed port 25 protocol smtp then alert
      group zimbra
    
    check process Zimbra.LDAP
      with pidfile /opt/zimbra/openldap/var/run/slapd.pid
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      if failed host myhost.local port 389 protocol ldap3 then alert
      group zimbra
    
    check process Zimrba.Amavis
      with pidfile /opt/zimbra/log/amavisd.pid
      if children > 255 for 5 cycles then alert
      if cpu usage > 95% for 3 cycles then alert
      group zimbra
    So, think of this as an early warning system. Monit can easily be set to use a different SMTP server than your Zimbra server, so it gets around that problem as well.

  9. #19
    Join Date
    Feb 2007
    Posts
    52
    Rep Power
    8

    Thumbs up To each his/her own..

    Quote Originally Posted by jholder View Post
    There is a very fundamental issue with this work flow that needs to be considered:

    If a service stops, it stops for a reason. This work flow does nothing to address that problem.

    This means that if there is larger issue, such as an unhanded exception...well it's only a matter of time before it goes down again. Since this idea would automatically restart the service, you may never know if you hit an unhanded exception. It also might make it worse....

    Zimbra has great handlers. We have our own watchdog proc for things like mta, clam, and java. If those die, it tries to restart them. If there is a condition preventing the restart, it won't restart them.

    The moral of the story is that if the server goes down, you really should figure out why, as opposed to just restarting the service.

    I do think this is a good idea, which is why I'm saying it's a problem with the work flow itself.

    There's a high availability/fail over script floating around. You might want to look at that.
    Everyone's requirements are different, so your mileage will vary. I've had processes die, and they could die for many reasons, sometimes even under load from a spam attack.

    Depending on your environment, you may not want the service down, if say it happened at 4am and you get a wakeup call at 8am from irate users. Your investigation time would be limited, you would have to restart the service.

    So the real moral of the story, know what you need before you implement. Just leaving a service down is great in theory, as we take our time to exchange pleasantries with Zimbra tech support to get the issue resolved. But that's not always a quick thing.

    As someone mentioned later, monit can be configured to send alerts via another smtp server, so based on your alerts config, you will be notified of a down situation.

    You can also comment out the start/stop lines and just have the alerts sent out, pretty flexible.

  10. #20
    Join Date
    Mar 2008
    Posts
    2
    Rep Power
    7

    Default Absolutely

    Oh, I completely agree... That's the whole point of the monitrc posting that I put up... all it does is let the admin know that either (A) a service has gone down, or (b) the server appears to be struggling with something... either way, they should look into it. The monit script I posted doesn't even have start/stop lines, and that's completely intentional.

    The idea behind having the alerts for children processes/memory utilization/load etc. is that the administrator can get in, and worst case scenario, alert the users that the system is going down. In my experience, I've seen that generally the anger level of a client is inversely proportional to the amount of warning they had. eg. "You're getting a lot of spam, it looks like it's about to hang the system" is often appreciated more than "The reason you haven't received email in the last 4 hours is because spam clogged the system".

    ... god I hate spam.

Similar Threads

  1. Can't start Zimbra!
    By zibra in forum Administrators
    Replies: 5
    Last Post: 03-22-2007, 12:34 PM
  2. Post instsallation problems
    By Assaf in forum Installation
    Replies: 14
    Last Post: 01-29-2007, 11:38 AM
  3. huge log size
    By rmvg in forum Administrators
    Replies: 5
    Last Post: 01-02-2007, 10:39 AM
  4. zimbra-core missing
    By kinaole in forum Developers
    Replies: 1
    Last Post: 10-02-2006, 12:59 PM
  5. Unable to start tomcat
    By chanck in forum Administrators
    Replies: 11
    Last Post: 06-11-2006, 01:58 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •