Likosar Sullivan w 200 ProvidingHighAvailability[1]
-
Upload
little-wonders -
Category
Documents
-
view
216 -
download
1
description
Transcript of Likosar Sullivan w 200 ProvidingHighAvailability[1]
Brian Likosar, Sr. Solutions ArchitectDave Sullivan, Sr. ConsultantRed Hat4-May-2011
PROVIDING HIGH AVAILABILITY
FOR ORACLE DATABASES
WITHOUT HIGH COST
Costs?
● Oracle RAC costs: $4600/core, plus $5060 for support
● RHEL: Resilient Storage Add-On: $799/socket pair
● Specific example: HP DL360 G7 (2-way, 4 cores ea.)● Oracle RAC: $41,860 per year● RHEL: $799 per year
● To be fair, RAC provides scalability as well
● Sources: hp.com, shop.oracle.com, and www.redhat.com/rhel/purchasing_guide.html
Items to discuss
● HA (Highly Available)● Minimize unexpected down time● Reduce MTTR (Mean Time To Recovery)
● Redundancy Across The Board (No SPOFs!)
● Very customizable
● Automated Fail-over
Infrastructure Component View
Dual Rail Power
Production VLAN
Remote MGT VLAN
Heartbeat/Fence VLAN
Inter-Switch Link
Backup VLAN
Production Network
Fence Device
Oracle DB
Oracle DB
Clustered Nodes
Highly Available Oracle on RHEL with HA-LVM
● Oracle Database
● Red Hat Enterprise Linux 6
● Red Hat Cluster Suite● Controls Oracle Database● Automates Fail-over
● HA-LVM (part of Resilient Storage add-on)
● Shared storage● ISCSI● SAN
RHEL OS & Cluster Component View
System Software
RHEL6 OS
RH Cluster SuiteOracle 11GR2 RHEL Multipath
RH Cluster Suite
RH Cluster Suite
CMAN
qdiskd
fenced
corosyncCLVMD
orHA-LVM
RGMANAGER
lvmfs
vip
oracle
LVM
Red Hat Enterprise Linux
● Included in the OS:● DM (device mapper) Multipath
● LVM (logical volume management)
● Ext4 (4th extended filesystem)
● Red Hat Cluster Suite components:● corosync (previously openais/aisexec) #heartbeat
● cman - “Cluster Manager”
● clvmd - “Cluster Logical Volume Manager”
● qdiskd - “Quorum Disk”
● fenced - “I/O Fencing”
● rgmanager - “Resource Group Manager”
Cluster Logical Volume Manager (CLVMD)
● Daemon that runs on all cluster nodes and controls concurrent access to the same storage
● For our purposes – it's what prevents our logical volumes from being mounted on more than one system at a time
HA-LVM w/o CLVMD
● Older way of doing HA-LVM
● Uses Volume “Tagging” Scheme to provide LVM Mutual Exclusion
● Provides a way to do LVM maintenance on a larger cluster set of LVMs (ie. flipping tags on the particular VG)
Quorum Disk
● Adds complexity to cluster
● Typically utilized in even node clusters to act as tie-breaker in split-brain situations
● Prevents fence-loop and fence death situations
● Provides heuristics to determine “Health” of the cluster
● Provides all-but-one failure mode as well as others● e.g. 4 node cluster and 3 nodes fail
I/O Fencing
● Provides Countermeasure To Remove Misbehaving Or Dead Node From Shared Storage
● Most Critical Part Of Cluster That Utilizes Shared Storage (SAN/ISCSI)
● Protects Data From Corruption● Node Kernel Panic
● Node Freezes
● Node Hangs
I/O Fencing
● Allows Nodes To Safely Assume Control Over Shared Resources In Network Partition Situations
● Fencing Types● Power Fencing
● Normally allows for full automated recovery● Reduction in MTTR
● SCSI & SAN Fabric Fencing
● Node normally requires manual reboot● Allows for system troubleshooting● May not require additional hardware
Resource Manager (rgmanager)
● Daemon that watches for running processes
● Coordinates resources and their startup order
● Can be observed via “clustat” and controlled via “clusvcadm”
MPIO Layer Device-Mapper-Multipath
● Redundancy at I/O Fibre Layer
● MPIO Failover pre-rhel5u5
● polling_interval=5 #interval check for failed paths (seconds)
● Default (5s)● normal path check #interval check for good paths (seconds)
● 5 * polling interval (default 20s)● MPIO Failover post-rhel5u5
● polling_interval=5 #interval check for failed paths (seconds)
● Default (5s)● checker_timeout #assuming set
● Or pulls from /sys/block/sdx/device/timeout ● rc.local to make persistent
Cluster Timeouts
● Understanding of cluster timeouts is critical
● HBA Device Timeout (lpfc/qlogic/etc.)● e.g. modinfo lpfc #lpfc_devloss_tmo
● Multipath Failover Timings
● Quorum Disk Timeout
● Quorum Device Poll
● Cman Timeout
Cluster Timeout Matrix
Component Variable Equation Example
Hba timeout lpfc_devloss_tmo (lpfc) qlport_down_retry (qlogic)default=30s
x 10s
Multipath timeout checker_timeout (as of rhel5u5)/sys/block/sdx/device/timeout
x + 5s 15s
Qdisk timeout Interval * tko x + 20s 30s
Quorum device poll
quorum_dev_poll x + 25s 45s
Cman timeout token x + 50s 60s
Eviction ---->
cman tmo -->
LVM / FS Configuration examples
● lvmconf --enable-cluster
● chkconfig clvmd on; service clvmd start
● pvcreate /dev/disk
● vgcreate -n oraclevg /dev/disk
● lvcreate -n oradatalv -L 10G oraclevg
● mkfs -t ext4 /dev/oraclevg/oradatalv
Other items to assist with setup
● ip addr add 1.2.3.4/24 dev eth0
● Have DBA create database using “cluster” filesystems and configure listener against alias (VIP)
● Be sure that they do not create any “local” configurations – use spfile where appropriate.
● Once you're able to test everything manually on both sides of cluster, create the cluster configuration
Common Pitfalls
● Multicast support / IGMP snooping
● Partition table state inconsistent
● Review/modification of oracledb.sh script required
● Understanding Power I/O Fencing
● When To Use Quorum Disk
● If you have simple two node and with proper network power fencing architected you don't need qdisk
● Not Validating SPOFs
● Run any services scripts outside your cluster first to validate
● think echo $? #return 0
Cluster Configuration File<?xml version="1.0"?><cluster config_version="20" name="summit2011">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/><clusternodes>
<clusternode name="ayame.salab.dfw.redhat.com" nodeid="1" votes="1"><fence>
<method name="fence_ayame"><device name="wti" port="7"/>
</method></fence>
</clusternode><clusternode name="botan.salab.dfw.redhat.com" nodeid="2" votes="1">
<fence><method name="fence_botan">
<device name="wti" port="8"/></method>
</fence></clusternode>
</clusternodes><cman expected_votes="1" two_node="1"/><fencedevices>
<fencedevice agent="fence_wti" ipaddr="10.15.183.249" login="root" name="wti" passwd="red22hat"/>
</fencedevices> .....
Cluster Configuration File Cont'd ....
<rm><failoverdomains/><resources>
<lvm lv_name="misclv" name="misclv" vg_name="demovg"/><lvm lv_name="datalv" name="datalv" vg_name="demovg"/><lvm lv_name="redolv" name="redolv" vg_name="demovg"/><fs device="/dev/demovg/misclv" fsid="54499" mountpoint="/oradata/oramisc"
name="oramisc"/><fs device="/dev/demovg/redolv" fsid="54499" mountpoint="/oradata/oraredo"
name="oraredo"/><fs device="/dev/demovg/datalv" fsid="54499" mountpoint="/oradata/oradata"
name="oradata"/><ip address="10.15.183.183" monitor_link="off" sleeptime="30"/><oracledb home="/home/oracle/product/11.2.0" listener_name="listener" name="summit"
type="base" user="oracle" vhost="orcl.salab.dfw.redhat.com"/></resources><service autostart="1" exclusive="0" max_restarts="3" name="summitdemo"
recovery="restart" restart_expire_time="90"><lvm ref="misclv"/><lvm ref="datalv"/><lvm ref="redolv"/><fs ref="oramisc"/><fs ref="oradata"/><fs ref="oraredo"/><ip ref="10.15.183.183"/><oracledb ref="summit"/>
</service></rm>
</cluster>
Helpful links
● HA-LVM implementation: https://access.redhat.com/kb/docs/DOC-3068
● Multicast notes: https://access.redhat.com/kb/docs/DOC-5933
● Timing with qdisk: https://access.redhat.com/kb/docs/DOC-2882
Supplemental information for this presentation
● Will be available at http://people.redhat.com/~blikosar/
● Brian can be reached at [email protected] (don't expect rapid response, he's a busy guy!)