Results 1 to 5 of 5

Thread: Centos 4.5, RHCS, and Zimbra

  1. #1
    Join Date
    Apr 2007
    Location
    WV
    Posts
    49
    Rep Power
    8

    Default Centos 4.5, RHCS, and Zimbra

    I'm working on installing RedHat Cluster Suite to support a 2 node Zimbra cluster in active/passive mode. I am using the Zimbra documnet for installing a single node cluster using the zimbra cluster package.

    At this point, I have all the related packages installed (rgmanager, system-config-cluster, ccsd, magma, magma-plugins, cman, cman-kernel-smp, dlm, dlm-kernel-smp, fence, gulm, iddev) from the csgfs repository. The ccsd service seems to start, however, cman or rgmanager fail outright:

    Jul 20 09:50:15 wsl-mx1 ccsd: start succeeded
    Jul 20 09:50:20 wsl-mx1 cman: FATAL: Module cman not found. failed
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Cluster is not quorate. Refusing connection.
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Error while processing connect: Connection refused
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Invalid descriptor specified (-111).
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Someone may be attempting something evil.
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Error while processing get: Invalid request descriptor
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Invalid descriptor specified (-111).
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Someone may be attempting something evil.
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Error while processing get: Invalid request descriptor
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Invalid descriptor specified (-21).
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Someone may be attempting something evil.
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Error while processing disconnect: Invalid request descriptor
    Jul 20 09:50:26 wsl-mx1 clurgmgrd[5766]: Resource Group Manager Starting
    Jul 20 09:50:26 wsl-mx1 clurgmgrd[5766]: Loading Service Data
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Cluster is not quorate. Refusing connection.
    Jul 20 09:50:26 wsl-mx1 ccsd[5387]: Error while processing connect: Connection refused
    Jul 20 09:50:26 wsl-mx1 clurgmgrd[5766]: #5: Couldn't connect to ccsd!
    Jul 20 09:50:26 wsl-mx1 clurgmgrd[5766]: #8: Couldn't initialize services
    Jul 20 09:50:26 wsl-mx1 rgmanager: clurgmgrd startup failed
    Jul 20 09:50:39 wsl-mx1 ccsd[5387]: Unable to connect to cluster infrastructure after 150 seconds.

    Since cman complains about a missing module, I tried to verify and found it right away:
    [root@wsl-mx1]# ls -la /lib/modules/2.6.9-55.ELsmp/kernel/cluster/
    total 700
    drwxr-xr-x 2 root root 4096 Jul 20 09:19 .
    drwxr-xr-x 10 root root 4096 Jul 20 09:19 ..
    -rwxr-xr-x 1 root root 159744 Jun 17 18:32 cman.ko
    -rwxr-xr-x 1 root root 185592 Jun 17 18:32 cman.symvers
    -rwxr-xr-x 1 root root 150424 Jun 17 19:14 dlm.ko
    -rwxr-xr-x 1 root root 185884 Jun 17 19:14 dlm.symvers

    If I load the modules manually, they load fine, but cman and rgmanager still fail to start:

    [root@wsl-mx1]# insmod /lib/modules/2.6.9-55.ELsmp/kernel/cluster/cman.ko
    [root@wsl-mx1]# insmod /lib/modules/2.6.9-55.ELsmp/kernel/cluster/dlm.ko
    [root@wsl-mx1]# lsmod | egrep -i "dlm|cman"
    dlm 117604 0
    cman 125664 1 dlm

    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Cluster is not quorate. Refusing connection.
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Error while processing connect: Connection refused
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Invalid descriptor specified (-111).
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Someone may be attempting something evil.
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Error while processing get: Invalid request descriptor
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Invalid descriptor specified (-111).
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Someone may be attempting something evil.
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Error while processing get: Invalid request descriptor
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Invalid descriptor specified (-21).
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Someone may be attempting something evil.
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Error while processing disconnect: Invalid request descriptor
    Jul 20 11:11:52 wsl-mx1 clurgmgrd[6362]: Resource Group Manager Starting
    Jul 20 11:11:52 wsl-mx1 clurgmgrd[6362]: Loading Service Data
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Cluster is not quorate. Refusing connection.
    Jul 20 11:11:52 wsl-mx1 ccsd[5387]: Error while processing connect: Connection refused
    Jul 20 11:11:52 wsl-mx1 clurgmgrd[6362]: #5: Couldn't connect to ccsd!
    Jul 20 11:11:52 wsl-mx1 clurgmgrd[6362]: #8: Couldn't initialize services
    Jul 20 11:11:52 wsl-mx1 rgmanager: clurgmgrd startup failed

    My guess is that something is amiss with the library linking, but I'm not sure where to begin looking.

    [UPDATE: This issue has been resolved]
    Last edited by briansrapier; 07-26-2007 at 08:25 AM. Reason: resolved

  2. #2
    Join Date
    Apr 2007
    Location
    WV
    Posts
    49
    Rep Power
    8

    Default

    I wound up rebuilding and sticking with the 2.6.9.55-ELsmp kernel as opposed to 2.6.9.55.0.2-ELsmp one. Everything installed fine using the single node cluster install, but fails to start.

    At first it was complaining that it could not locate /opt/zimbra-cluster/bin/zmcluctl. I discovered that I needed to manually install the zimbra-cluster rpm. Next It complained about OCF_RESKEY_service_name not being set, so I set it and manually attempted a restart, but got:

    standard in must be a tty

    If I wait a while and attempt to run it again, I get:

    This node is already running Zimbra service zimba.

    But there are no zimbra processes running.

    Zimbra Support hasn't been any help at all in getting this resolved. If anyone has experience resolving this or similar issues, I would appreciate your assistance.

  3. #3
    Join Date
    Mar 2006
    Location
    Beaucaire, France
    Posts
    2,322
    Rep Power
    13

    Default

    I had the same problem with the "missing module" (on RHEL 4).

    It was a bad up2date : some of the modules where updated but not all of them (plus it was correct on one of the two nodes while broken on the other one).

    I did a new up2date and everything went OK.

  4. #4
    Join Date
    Apr 2007
    Location
    WV
    Posts
    49
    Rep Power
    8

    Default

    The resolution was 2-fold. First, if you previously attempted a cluster install, running 'install -u' does not remove all of the pieces. Some you will have to remove by hand. Here are the steps I followed:

    1. Disable the cluster services

    # clusvcadm -d [CLUSTERSVCNAME]

    2. Save a copy of cluster.conf

    # cp /etc/cluster.config /etc/cluster.config.good

    3. Removing the data in the SAN/iSCSI shared device

    # mount [SAN] [MOUNTPOINT]
    # rm -rf *

    4. Erasing the zimbra-cluster RPM

    # rpm -e zimbra-cluster (very important!!)

    5. Un-install ZCS

    # ./install.sh -u

    6. Removing zimbra related directories

    # cd /opt
    # rm -rf zimbra-cluster
    # rm -rf zimbra

    Alternatively:
    # cd zimbra
    # rm -rf .* (to remove the .saveconfig, etc.)

    7. Cleaning up the passwd and group files - including the shadow files (zimbra, postfix and postdrop)

    # userdel zimbra
    # groupdel zimbra

    Secondly, I found that CMAN does not like service names longer than 16 characters, but the zimbra-cluster install requires the service name to be the same as the cluster hostname. In my case I have 2 nodes, 'mx.domain.com' and 'mx2.domain.com'. My cluster name is 'mx.domain.com'. I used 'mx.domain.com' for all the service name related items during the zimbra installation, but during the last step, 'configure-cluster', I called it 'zimbra'.

    I'm not sure if there was a conflict with the naming conventions, but it appears to be working. I haven't run any serious test, yet.

  5. #5
    Join Date
    Mar 2006
    Location
    Beaucaire, France
    Posts
    2,322
    Rep Power
    13

    Default

    Quote Originally Posted by briansrapier View Post
    First, if you previously attempted a cluster install, running 'install -u' does not remove all of the pieces.
    I should have mentionned that, sorry.
    Bug 17209 - ./install.sh -u does not delete cluster RPM

Similar Threads

  1. RHCS setup
    By Klug in forum Installation
    Replies: 1
    Last Post: 05-29-2007, 12:59 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •