Back to the main page

Installing Solaris Cluster 3.3u1

A cluster is more nodes that work together to provide high-availability for applications. Compare two pictures, one is the general example from Oracle/Sun documentation and the second one is my actual setup for this excersise.

The main hardware components are: Note:
The cluster must never split into separate partitions that are active and access data at same time, data corruption happens. If split happens, a partition with majority of votes gain quorum and is active. In the cluster with 2 nodes/servers (each has one vote), it is obvious that one more vote is needed, so the shared disk (LUN) is installed as quorum device (with one vote). In this case we are protected from 2 major problems that can happen in cluster: Sun picture here shows software components that creates Cluster s/w environment. Main components are:

Let's now do real installation of Solaris Cluster 3.3u1. Both SunFire T2000 servers run Solaris 10 update10. The StorEdge 6120 has configured 50G slice/LUN and allows access from both servers. The FC switch has no zoning configured, for this excersise. If you need info how to configure FC switch zoning see SAN zoning .
Both T2000 have slice 4 of root disk mounted to UFS /globaldevices file system. The scinstall command later renames /globaldevices to, for example, /global/.devices/node@1 where 1 is number of host when it becomes global-cluster member.
Let's compare 2 hosts in the table and see coresponding actions before/during/after installation:

Nodeunixlab-2unixlab-3
/etc/hosts file add unixlab-3 here add unixlab-2 here
from /etc/vfstab file
/dev/dsk/c0t0d0s1 -       -       swap    -       no      -
/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 /       ufs     1   no  -
/dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s3 /var    ufs     1   no  -
/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /.0     ufs     2   yes -
/dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /backup ufs     2   yes -
/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /globaldevices  ufs 2  yes -
same here
from format command
3. c4t60003BACCC75000050E2094800021C74d0 <SUN-T4-0302-50.00GB>
  /scsi_vhci/ssd@g60003baccc75000050e2094800021c74
same here
Each node has to SSH to other without password
Create ssh key with no passphrase: ssh-keygen -t dsa -b 1024
Copy public key root_id_dsa.pub to unixlab-3 and place in file /.ssh/authorized_keys
Add line "IdentityFile /.ssh/root_id_dsa" in file unixlab-2:/etc/ssh/ssh_config
repeat same here and copy public key to unixlab-2
Get Solaris Cluster s/w Download, unzip solaris-cluster-3_3u1-ga-sparc.zip and go to directory Solaris_sparc repeat same here
Installation
{unixlab-2}/tmp/Solaris_sparc# ./installer
Unable to access a usable display on the remote system. Continue in command-line mode?(Y/N)
Y
Java Accessibility Bridge for GNOME loaded.
Installation Type
-----------------
   Do you want to install the full set of Oracle Solaris Cluster Products and
   Services? (Yes/No) [Yes] {"<" goes back, "!" exits} No
Choose Software Components - Main Menu
-------------------------------
Note: "*  *" indicates that the selection is disabled
[ ] 1. Oracle Solaris Cluster Geographic Edition 3.3u1
[ ] 2. Quorum Server
[ ] 3. High Availability Session Store 4.4.3
[ ] 4. Oracle Solaris Cluster 3.3u1
[ ] 5. Java DB 10.2.2.1
[ ] 6. Oracle Solaris Cluster Agents 3.3u1

   Enter a comma separated list of products to install, or press R to refresh
   the list [] {"<" goes back, "!" exits}: 4,6

Choose Software Components - Confirm Choices
--------------------------------------------
Based on product dependencies for your selections, the installer will install:
[X] 4. Oracle Solaris Cluster 3.3u1
 *  *  Java DB 10.2.2.1
[X] 6. Oracle Solaris Cluster Agents 3.3u1
Component Selection - Selected Product "Oracle Solaris Cluster 3.3u1"
---------------------------------------------------------------------
** * Oracle Solaris Cluster Core
*[X] 2. Oracle Solaris Cluster Manager
Component Selection - Selected Product "Java DB 10.2.2.1"
---------------------------------------------------------
** * Java DB Client
** * Java DB Server
Component Selection - Selected Product "Oracle Solaris Cluster Agents 3.3u1"
----------------------------------------------------------------------------
*[X] 1. Oracle Solaris Cluster HA for Java System Application Server
*[X] 2. Oracle Solaris Cluster HA for Java System Message Queue
*[X] 3. Oracle Solaris Cluster HA for Java System Directory Server
*[X] 4. Oracle Solaris Cluster HA for Java System Messaging Server
*[X] 5. Oracle Solaris Cluster HA for Application Server EE (HADB)
*[X] 6. Oracle Solaris Cluster HA/Scalable for Java System Web Server
*[X] 7. Oracle Solaris Cluster HA for Instant Messaging
*[X] 8. Oracle Solaris Cluster HA for Java System Calendar Server
*[X] 9. Oracle Solaris Cluster HA for Apache Tomcat
*[X] 10. Oracle Solaris Cluster HA for Apache
*[X] 11. Oracle Solaris Cluster HA for DHCP
*[X] 12. Oracle Solaris Cluster HA for DNS
*[X] 13. Oracle Solaris Cluster HA for MySQL
*[X] 14. Oracle Solaris Cluster HA for Sun N1 Service Provisioning System
*[X] 15. Oracle Solaris Cluster HA for NFS
*[X] 16. Oracle Solaris Cluster HA for Oracle
*[X] 17. Oracle Solaris Cluster HA for Agfa IMPAX
*[X] 18. Oracle Solaris Cluster HA for Samba
   Enter a comma separated list of components to install (or A to install all )
   [A] {"<" goes back, "!" exits} 10,15,18
*[X] 10. Oracle Solaris Cluster HA for Apache
*[X] 15. Oracle Solaris Cluster HA for NFS
*[X] 18. Oracle Solaris Cluster HA for Samba
Checking System Status
    Available disk space...        : Checking .... OK
    Memory installed...            : Checking .... OK
    Swap space installed...        : Checking .... OK
    Operating system patches...    : Checking .... OK
    Operating system resources...  : Checking .... OK
System ready for installation

Screen for selecting Type of Configuration
1. Configure Now - Selectively override defaults or express through
2. Configure Later - Manually configure following installation
   Select Type of Configuration [1] {"<" goes back, "!" exits} 2
Ready to Install
----------------
The following components will be installed.

Product: Oracle Solaris Cluster
Uninstall Location: /var/sadm/prod/SUNWentsyssc33u1
Space Required: 513.57 MB
---------------------------------------------------
        Java DB
           Java DB Server
           Java DB Client
        Oracle Solaris Cluster 3.3u1
           Oracle Solaris Cluster Core
           Oracle Solaris Cluster Manager
        Oracle Solaris Cluster Agents 3.3u1
           Oracle Solaris Cluster HA for Apache
           Oracle Solaris Cluster HA for NFS
           Oracle Solaris Cluster HA for Samba
1. Install
2. Start Over
3. Exit Installation
   What would you like to do [1] 
Oracle Solaris Cluster
|-1%--------------25%-----------------50%-----------------75%--------------100%|
Installation Complete
same installation here
Adjust ${PATH}/${MANPATH} The tcsh shell:
setenv PATH /usr/cluster/bin:${PATH}
setenv MANPATH /usr/cluster/man:${MANPATH}
same here
Configure cluster on all nodes with scinstall (as we selected to do it later). Do from SERIAL CONSOLE, reboot needed.
Configuration performed only on unixlab-2. unixlab-3 reboots first, once unixlab-3 is up, unixlab-2 reboots.
{unixlab-2}/# scinstall
  *** Main Menu ***
    Please select from one of the following (*) options:
      * 1) Create a new cluster or add a cluster node
        2) Configure a cluster to be JumpStarted from this install server
        3) Manage a dual-partition upgrade
        4) Upgrade this cluster node
      * 5) Print release information for this cluster node
    Option: 1

  *** New Cluster and Cluster Node Menu ***
    Please select from any one of the following options:
        1) Create a new cluster
        2) Create just the first node of a new cluster on this machine
        3) Add this machine as a node in an existing cluster
    Option: 1

 *** Create a New Cluster ***
    This option creates and configures a new cluster.
    If the "remote configuration" option is unselected from the Oracle
    Solaris Cluster installer when you install the Oracle Solaris Cluster
    framework on any of the new nodes, then you must configure either the
    remote shell (see rsh(1)) or the secure shell (see ssh(1)) before you
    select this option. If rsh or ssh is used, you must enable root access
    to all of the new member nodes from this node.
    Press Control-d at any time to return to the Main Menu.
    Do you want to continue (yes/no) [yes]? yes

  ### Typical or Custom Mode 
    This tool supports two modes of operation, Typical mode and Custom.
    For most clusters, you can use Typical mode. However, you might need
    to select the Custom mode option if not all of the Typical defaults
    can be applied to your cluster.
    Please select from one of the following options:
        1) Typical
        2) Custom

    Option [1]: 1

 ### Cluster Name 
    Each cluster has a name assigned to it. The name can be made up of any
    characters other than whitespace. Each cluster name should be unique
    within the namespace of your enterprise.

    What is the name of the cluster you want to establish?  suncluster

  ### Cluster Nodes 
    This Oracle Solaris Cluster release supports a total of up to 16
    nodes.
    Please list the names of the other nodes planned for the initial
    cluster configuration. List one node name per line. When finished,
    type Control-D:
    Node name (Control-D to finish):  unixlab-2
    Node name (Control-D to finish):  unixlab-3
    
    This is the complete list of nodes:
        unixlab-2
        unixlab-3
    Is it correct (yes/no) [yes]?

    Attempting to contact "unixlab-3" ... done
    Searching for a remote configuration method ... done

  ### Cluster Transport Adapters and Cables 
    You must identify the cluster transport adapters which attach this
    node to the private cluster interconnect.
    Select the first cluster transport adapter:
        1) e1000g1
        2) e1000g2
        3) e1000g3
        4) Other
    Option: 3

 Will this be a dedicated cluster transport adapter (yes/no) [yes]?

    Searching for any unexpected network traffic on "e1000g3" ... done
    Verification completed. No traffic was detected over a 10 second
    sample period.

    Select the second cluster transport adapter:
        1) e1000g1
        2) e1000g2
        3) e1000g3
        4) Other
    Option:  2

Will this be a dedicated cluster transport adapter (yes/no) [yes]?

    Searching for any unexpected network traffic on "e1000g2" ... done
    Verification completed. No traffic was detected over a 10 second
    sample period.

    Plumbing network address 172.16.0.0 on adapter e1000g3 >> NOT DUPLICATE ... done    
    Plumbing network address 172.16.0.0 on adapter e1000g2 >> NOT DUPLICATE ... done

  ### Quorum Configuration 
    Every two-node cluster requires at least one quorum device. By
    default, scinstall selects and configures a shared disk quorum device
    for you.

    This screen allows you to disable the automatic selection and
    configuration of a quorum device.

    You have chosen to turn on the global fencing. If your shared storage
    devices do not support SCSI, such as Serial Advanced Technology
    Attachment (SATA) disks, or if your shared disks do not support
    SCSI-2, you must disable this feature.

    If you disable automatic quorum device selection now, or if you intend
    to use a quorum device that is not a shared disk, you must instead use
    clsetup(1M) to manually configure quorum once both nodes have joined
    the cluster for the first time.

    Do you want to disable automatic quorum device selection (yes/no) [no]?

    Is it okay to create the new cluster (yes/no) [yes]?

    During the cluster creation process, cluster check is run on each of
    the new cluster nodes. If cluster check detects problems, you can
    either interrupt the process or check the log files after the cluster
    has been established.

    Interrupt cluster creation for cluster check errors (yes/no) [no]?

  Cluster Creation

    Testing for "/globaldevices" on "unixlab-2" ... done
    Testing for "/globaldevices" on "unixlab-3" ... done

    Starting discovery of the cluster transport configuration.

    The following connections were discovered:

        unixlab-2:e1000g3  switch1  unixlab-3:e1000g3
        unixlab-2:e1000g2  switch2  unixlab-3:e1000g2

    Completed discovery of the cluster transport configuration.

    Started cluster check on "unixlab-2".
    Started cluster check on "unixlab-3".

    cluster check failed for "unixlab-2".
    cluster check failed for "unixlab-3".

The cluster check command failed on both of the nodes.

Refer to the log file for details.
The name of the log file is /var/cluster/logs/install/scinstall.log.15266.

    Configuring "unixlab-3" ... done
    Rebooting "unixlab-3" ... done

    Configuring "unixlab-2" ... done
    Rebooting "unixlab-2" ...

SERIAL CONSOLE MESSAGES:

Booting in cluster mode
CMM: Node unixlab-3 (nodeid = 1) with votecount = 1 added.
CMM: Node unixlab-2 (nodeid = 2) with votecount = 0 added.
clcomm: Adapter e1000g2 constructed
clcomm: Adapter e1000g3 constructed
CMM: Node unixlab-2: attempting to join cluster.
clcomm: Path unixlab-2:e1000g2 - unixlab-3:e1000g2 online
CMM: Node unixlab-3 (nodeid: 1, incarnation #: 1357159875) has become reachable.
CMM: Cluster has reached quorum.
CMM: Node unixlab-3 (nodeid = 1) is up; new incarnation number = 1357159875.
CMM: Node unixlab-2 (nodeid = 2) is up; new incarnation number = 1357160079.
CMM:  Cluster members: unixlab-3 unixlab-2.
CMM: node reconfiguration #3 completed.
CMM: Node unixlab-2: joined cluster.
clcomm: Path unixlab-2:e1000g3 - unixlab-3:e1000g3 online
DID subpath "/dev/rdsk/c4t60003BACCC75000050E2094800021C74d0s2" created for instance "4".
did instance 6 created.
did subpath unixlab-2:/dev/rdsk/c0t0d0 created for instance 6.
did instance 7 created.
did subpath unixlab-2:/dev/rdsk/c0t1d0 created for instance 7.
did instance 8 created.
did subpath unixlab-2:/dev/rdsk/c0t2d0 created for instance 8.
Configuring DID devices
obtaining access to all attached disks
Configuring the /dev/global directory (global devices)
SCPOSTCONFIG: Configuring Oracle Solaris Cluster quorum...
SCPOSTCONFIG: clquorum:  (C192716) I/O error.
SCPOSTCONFIG: Will add the following quorum devices:
SCPOSTCONFIG:         /dev/did/rdsk/d4s2
SCPOSTCONFIG: scquorumconfig:  Quorum autoconfig failed
SCPOSTCONFIG: The quorum configuration task encountered a problem on node unixlab-2, 
manual configuration by using clsetup(1CL) might be necessary
No need to run configuration here, it's done from unixlab-2. Unixlab-3 reboots here, watch serial console messages.

console login: Booting in cluster mode CMM: Node unixlab-3 (nodeid = 1) with votecount = 1 added. CMM: Node unixlab-3: attempting to join cluster. CMM: Cluster has reached quorum. CMM: Node unixlab-3 (nodeid = 1) is up; new incarnation number = 1357159875. CMM: Cluster members: unixlab-3. CMM: node reconfiguration #1 completed. CMM: Node unixlab-3: joined cluster. did instance 1 created. did subpath unixlab-3:/dev/rdsk/c0t0d0 created for instance 1. did instance 2 created. did subpath unixlab-3:/dev/rdsk/c0t1d0 created for instance 2. did instance 3 created. did subpath unixlab-3:/dev/rdsk/c0t2d0 created for instance 3. did instance 4 created. did subpath unixlab-3:/dev/rdsk/c4t60003BACCC75000050E2094800021C74d0 created for instance 4. Configuring DID devices obtaining access to all attached disks Configuring the /dev/global directory (global devices) CMM: Node unixlab-2 (nodeid = 2) with votecount = 0 added. CMM: Cluster members: unixlab-3. CMM: node reconfiguration #2 completed. clcomm: Adapter e1000g3 constructed clcomm: Adapter e1000g2 constructed clcomm: Path unixlab-3:e1000g3 - unixlab-2:e1000g3 errors during initiation Path unixlab-3:e1000g3 - unixlab-2:e1000g3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path. clcomm: Path unixlab-3:e1000g2 - unixlab-2:e1000g2 errors during initiation Path unixlab-3:e1000g2 - unixlab-2:e1000g2 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path. clcomm: Path unixlab-3:e1000g2 - unixlab-2:e1000g2 online CMM: Node unixlab-2 (nodeid: 2, incarnation #: 1357160079) has become reachable. CMM: Node unixlab-2 (nodeid = 2) is up; new incarnation number = 1357160079. CMM: Cluster members: unixlab-3 unixlab-2. CMM: node reconfiguration #3 completed. CCR: Ignoring override field for table directory on joining node unixlab-2. CCR: Ignoring override field for table dcs_service_classes on joining node unixlab-2. clcomm: Path unixlab-3:e1000g3 - unixlab-2:e1000g3 online
/etc/vfstab
#/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /globaldevices ufs 2 yes -
/dev/did/dsk/d6s4 /dev/did/rdsk/d6s4 /global/.devices/node@2 ufs 2 no global
NOTE: node@2 = because became cluster node number 2 
#/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /globaldevices  ufs 2 yes  -
/dev/did/dsk/d1s4 /dev/did/rdsk/d1s4 /global/.devices/node@1 ufs 2 no global
NOTE: node@1 = because became cluster node number 1 
Global devices, local filesystems
{unixlab-2}# df -h | grep global
/dev/did/dsk/d1s4      3.9G   6.6M   3.9G     1%    /global/.devices/node@1
/dev/did/dsk/d6s4      3.9G   6.6M   3.9G     1%    /global/.devices/node@2
Same here:
{unixlab-3}# df -h | grep global
/dev/did/dsk/d1s4      3.9G   6.6M   3.9G     1%    /global/.devices/node@1
/dev/did/dsk/d6s4      3.9G   6.6M   3.9G     1%    /global/.devices/node@2
Network interfaces
e1000g0: flags=9000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER> mtu 1500 index 2
        inet 192.168.28.215 netmask ffffff00 broadcast 192.168.28.255
        groupname sc_ipmp0
        ether 0:14:4f:6a:bf:16
e1000g2: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 3
        inet 172.16.1.2 netmask ffffff80 broadcast 172.16.1.127
        ether 0:14:4f:6a:bf:18
e1000g3: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4
        inet 172.16.0.130 netmask ffffff80 broadcast 172.16.0.255
        ether 0:14:4f:6a:bf:19
clprivnet0: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 5
        inet 172.16.4.2 netmask fffffe00 broadcast 172.16.5.255
        ether 0:0:0:0:0:2
e1000g0: flags=9000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER> mtu 1500 index 2
        inet 192.168.28.216 netmask ffffff00 broadcast 192.168.28.255
        groupname sc_ipmp0
        ether 0:14:4f:82:5:46
e1000g2: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 6
        inet 172.16.1.1 netmask ffffff80 broadcast 172.16.1.127
        ether 0:14:4f:82:5:48
e1000g3: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4
        inet 172.16.0.129 netmask ffffff80 broadcast 172.16.0.255
        ether 0:14:4f:82:5:49
clprivnet0: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 5
        inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
        ether 0:0:0:0:0:1
Manually add LUN as quorum device (command clsetup, since "Quorum autoconfig failed"
{unixlab-2}# clsetup
  ### Initial Cluster Setup
    This program has detected that the cluster "installmode" attribute is
    still enabled. As such, certain initial cluster setup steps will be
    performed at this time. This includes adding any necessary quorum
    devices, then resetting both the quorum vote counts and the
    "installmode" property.
    Please do not proceed if any additional nodes have yet to join the
    cluster.

    Is it okay to continue (yes/no) [yes]?
    Do you want to add any quorum devices (yes/no) [yes]?

    Following are supported Quorum Devices types in Oracle Solaris
    Cluster. Please refer to Oracle Solaris Cluster documentation for
    detailed information on these supported quorum device topologies.
    What is the type of device you want to use?

        1) Directly attached shared disk
        2) Network Attached Storage (NAS) from Network Appliance
        3) Quorum Server
    Option:  1

  ### Add a Shared Disk Quorum Device

    If you are using a dual-ported disk, by default, Oracle Solaris
    Cluster uses SCSI-2. If you are using disks that are connected to more
    than two nodes, or if you manually override the protocol from SCSI-2
    to SCSI-3, by default, Oracle Solaris Cluster uses SCSI-3.

    If you turn off SCSI fencing for disks, Oracle Solaris Cluster uses
    software quorum, which is Oracle Solaris Cluster software that
    emulates a form of SCSI Persistent Group Reservations (PGR).
    Warning: If you are using disks that do not support SCSI, such as
    Serial Advanced Technology Attachment (SATA) disks, turn off SCSI
    fencing.
    Is it okay to continue (yes/no) [yes]?

    Which global device do you want to use (dN)?  d4
    Is it okay to proceed with the update (yes/no) [yes]?

/usr/cluster/bin/clquorum add d4
    Command completed successfully.

    Do you want to add another quorum device (yes/no) [yes]?  no

    Once the "installmode" property has been reset, this program will skip
    "Initial Cluster Setup" each time it is run again in the future.
    However, quorum devices can always be added to the cluster using the
    regular menu options. Resetting this property fully activates quorum
    settings and is necessary for the normal and safe operation of the
    cluster.

    Is it okay to reset "installmode" (yes/no) [yes]?

/usr/cluster/bin/clquorum reset
/usr/cluster/bin/claccess deny-all

    Cluster initialization is complete.

Serial console messages:

CMM: Cluster members: unixlab-3 unixlab-2.
CMM: node reconfiguration #4 completed.
CMM: Votecount changed from 0 to 1 for node unixlab-2.
CMM: Cluster members: unixlab-3 unixlab-2.
CMM: Quorum device 1 (/dev/did/rdsk/d4s2) added; votecount = 1
nothing to be done here
Quick check of quorum devices
{unixlab-2}/# cldevice list -v
DID Device  Full Device Path
----------  ----------------
d1          unixlab-3:/dev/rdsk/c0t0d0
d2          unixlab-3:/dev/rdsk/c0t1d0
d3          unixlab-3:/dev/rdsk/c0t2d0
d4          unixlab-3:/dev/rdsk/c4t60003BACCC75000050E2094800021C74d0
d4          unixlab-2:/dev/rdsk/c4t60003BACCC75000050E2094800021C74d0
d6          unixlab-2:/dev/rdsk/c0t0d0
d7          unixlab-2:/dev/rdsk/c0t1d0
d8          unixlab-2:/dev/rdsk/c0t2d0

{unixlab-2}/# clquorum list -v
Quorum              Type
------              ----
d4                  shared_disk
unixlab-3           node
unixlab-2           node

{unixlab-2}/# clquorum show
=== Cluster Nodes ===
Node Name:            unixlab-3
  Node ID:            1
  Quorum Vote Count:  1
  Reservation Key:    0x50E49D1600000001

Node Name:            unixlab-2
  Node ID:            2
  Quorum Vote Count:  1
  Reservation Key:    0x50E49D1600000002

=== Quorum Devices ===
Quorum Device Name:   d4
  Enabled:            yes
  Votes:              1
  Global Name:        /dev/did/rdsk/d4s2
  Type:               shared_disk
  Access Mode:        scsi2
  Hosts (enabled):    unixlab-3, unixlab-2
same here


Back to the main page