Monday, May 2, 2011

RHCS(Red Hat Cluster Suite) I/O fencing using SNMP IFMIB

RHCS  is comparable to HP MC/Service Guard, IBM HACMP, SUN Cluster etc. The free open source version is available in Centos/Fedora
Fencing is the act of isolating a cluster node from its storage when the node is not responding,
otherwise, when the node is recovered, the shared file system may be corrupted when written by more than 1 node at the same time.
RHCS supports following Fencing methods:
Power fencing:
Forcefully Power Cycle, just like pull out the power Cord
- Internal Power fencing 
    Lights-out management Card ( HP iLO, IBM RSA, SUN iLOM, DELL DRAC, IPMI etc)
- External Power fencing
    SMART Power Switch (APC Switched Rack PDU etc)
Network port fencing:
Shutdown the storage network port   
    - IP Switch for iSCSI
    - SAN Switch for FC SAN
SCSI 3 Persistent Reservation
Virtual Guest
shutdown VM guest by VM host
- Xen, Vmware or any guest managed by libvirt tools
full list :
http://www.redhat.com/cluster_suite/hardware/

#Each fence type above has its own fence agent, which is Python/Perl script.
$ls /sbin/fence*
/sbin/fence_ack_manual   /sbin/fence_drac     /sbin/fence_mcdata     /sbin/fence_tool
/sbin/fence_apc          /sbin/fence_drac5    /sbin/fence_node       /sbin/fence_virsh
/sbin/fence_apc_snmp     /sbin/fence_egenera  /sbin/fence_rhevm      /sbin/fence_vixel
/sbin/fence_bladecenter  /sbin/fence_ifmib    /sbin/fence_rps10      /sbin/fence_vmware
/sbin/fence_brocade      /sbin/fence_ilo      /sbin/fence_rsa        /sbin/fence_vmware_helper
/sbin/fence_bullpap      /sbin/fence_ilo_mp   /sbin/fence_rsb        /sbin/fence_wti
/sbin/fence_cisco_mds    /sbin/fence_ipmilan  /sbin/fence_sanbox2    /sbin/fence_xvm
/sbin/fence_cisco_ucs    /sbin/fence_lpar     /sbin/fence_scsi       /sbin/fence_xvmd
/sbin/fenced             /sbin/fence_manual   /sbin/fence_scsi_test
#Power fencing agent (fence_ilo,fence_apc etc) typically use telnet/ssh to IP of LOM card or external Powerswitch to turn off the power.
fence_ifmib is today's subject, it use SNMP write permission to shutdown a network port of  iSCSI storage Server
LAB Setup
Virtual Box with 3 VMS
filer: Cluster management Server Luci and iSCSI filer.
node1: Cluster node1 with Luci agent ricci
node2: Cluster node2 with Luci agent ricci
filer, node1, and node2 are connected in separate LAN, LAN1 is for application traffic, LAN2 is for iSCSI stroage traffic, So filer has 3 NICs, each Cluster node has 2 NICs.

Using fence_ifmib, the node1 can fence off node2 by issuing snmpset via LAN1 interface to shutdown filer's one of LAN2 NICs, where node2 is connected and vice versa.
Setup Cluster
on each node, yum install ricci; service ricci restart
on filer, yum install luci;  "luci_admin init" to set admin password; service luci resart
login to Luci web interface on port 8084
create a new cluster and add cluster node members
Setup fence_ifmib fence device
- on filer, set up snmpd
$ cat /etc/snmp/snmpd.conf 
createUser fence_admin   MD5 "Pass1234"
authuser   read,write -s usm  fence_admin  authnopriv .1.3.6.1.2.1.2.2.1
I use SNMP V3 USM because Luci requires SNMP username/password. I think, other Cluster management tools: system-config-cluster, ccs_tool accept SNMP V2c and V1
further info on setup snmpV1/V2c/V3
http://honglus.blogspot.com/2009/03/setup-net-snmp-on-linux-centos-52.html
http://honglus.blogspot.com/2011/03/setup-snmp-v3-usm-with-encryption.html
- on cluster node , install  snmpagent and assign a  shared  fence device for each node
yum install netsnmp-utils
The port is the interface index numer in filer, since interface starts with loopback, eth0 index num=2, eth1 index num=3.  It is not hardcoded in fence_device, so each node can share the same fence device but different port
image

Test
On any node, run "fence_node the-other-node-name", then go to filer to check if one of the filer's storage NICs is down

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.