Backup and recovery best practices for an ultra ... - HPE Community

Backup and recovery best practices for an ultra-large Oracle database white paper

Environment: Oracle 10g on HP-UX, HP Integrity Superdome, and HP StorageWorks XP12000 storage array

Executive summary............................................................................................................................... 3 Key findings .................................................................................................................................... 3

Overview............................................................................................................................................ 4 Configuring the hardware..................................................................................................................... 6

Defining the zones ........................................................................................................................... 7 Setting up the EVA5000 ................................................................................................................... 8 Setting up the XP12000 storage array................................................................................................ 8 Setting up the HP StorageWorks ESL E-Series 712e Tape Library ........................................................... 9

Configuring the software .................................................................................................................... 10 Managing processor interrupts ........................................................................................................ 10 Setting up Secure Path .................................................................................................................... 11

XP Array—Active-Active .............................................................................................................. 11 EVA Array—Active-Passive .......................................................................................................... 12

Setting up LVM .............................................................................................................................. 12 Setting up the file systems................................................................................................................ 13 Working with Datagen ................................................................................................................... 13

Backup and restore with the ESL E-Series tape library............................................................................. 15 Setting up the ESL backup policy...................................................................................................... 15 Setting up the storage unit............................................................................................................... 15 Setting up the master server global attribute ...................................................................................... 15 Backup issues ................................................................................................................................ 16 Off-host backups ............................................................................................................................ 16 Setting up restores.......................................................................................................................... 16 Restore issues ................................................................................................................................ 17

Performance results ............................................................................................................................ 18 EVA performance results ................................................................................................................. 18

Back up the Superdome to EVA.................................................................................................... 18 Back up the XP12000 Business Copy to EVA................................................................................. 19

ESL E-Series backup and restore performance results .......................................................................... 20

Best practices .................................................................................................................................... 21 Best practices for the XP12000 storage array.................................................................................... 21 Tuning ESL for best performance ...................................................................................................... 21

Appendix A—Bill of materials ............................................................................................................. 22 Appendix B—Configuring RAID Manager ............................................................................................ 24

RAID Manager environment variables............................................................................................... 24 For more information.......................................................................................................................... 26

HP................................................................................................................................................ 26 Oracle.......................................................................................................................................... 26 VERITAS/Symantec ........................................................................................................................ 26

Executive summary Data recovery and protection are the top priorities of business today, and essential elements of any business continuity strategy—the bigger the enterprise, the more challenging the problem—requiring robust and reliable solutions. Final choices and IT decisions depend on how a business perceives its tolerance for data inaccessibility or loss. As such, protection can be as simple as backup of the information locally to a secondary storage device, or it can be complex, involving clustered application and multiple terabytes of storage.

The HP StorageWorks Customer Focused Testing Team constructed an HP-UX, Oracle® 10g Superdome, and HP StorageWorks XP12000 storage array to represent an enterprise environment. The purpose of the testing was to create a backup and restore environment that provided data protection and recovery with minimal impact to daily operations. Additional goals were to use HP StorageWorks Business Copy XP and to determine best practices that can be deployed in the environment.

The objectives for the testing, based on actual customer input, included the following:

• Reduce the backup window from 18+ hours to less than 12 hours • Establish best practices for offline backup and restore of an Oracle ultra-large database • Determine the best practices for online backup • Provide data archives

Key findings Testing successfully provided the following high-level results:

• Successfully reduced downtime during the tape backup process to less than 8 hours. – Superdome restore—2.66 TB/hr (restore took approximately 8 hours, 32 minutes) – Superdome backup—2.76 TB/hr (backup took approximately 8 hours, 20 minutes) – HP Integrity rx4640 server backup—2.03 TB/hr (backup took about 11 hours, 20 minutes)

• Virtually eliminated downtime by implementing Business Copy.

Important findings uncovered during the tests are documented in the Best Practices chapter. See Best practices, p. 21.

3

Overview The test environment was a representative configuration for an enterprise customer. HP constructed an Oracle 10g environment, which used the HP-UX operating system, and included a Superdome Integrity server and an XP12000 storage array. The main purpose of the project was to conduct backup and restore testing, which would determine best ways to reduce database downtime and increase database availability. HP integrated and tested backup and recovery of a 24-TB Oracle database with the following objectives:

• Demonstrate reduced downtime of the Oracle environment during backup and recovery operations by reducing database downtime to less than 12 hours.

• Determine best practices for backup and recovery to tape and for replicating data using Business Copy XP.

• Characterize the impact of backup on users and application performance.

Testing included backing up the database, providing a means to archive the data, and performing full and partial restores of data loss. The backup included the database and all related information. Two different restore tests were performed. In the first test, a single file was deleted from the database, and then was restored. In the second test, the full database was restored after a disaster, such as the loss of an entire storage array.

Several options for backup and restore were evaluated as were their impact on database downtime, complexity, and recovery speed:

• Superdome Integrity Server to HP StorageWorks ESL E-Series Tape library: This is a classic backup scenario using an offline database. Symantec NetBackup was used to spool the data directly to tape. The test goal was to measure the time between the point of database shutdown and when the database became active. Times for the full backups and restore were recorded.

• Superdome Integrity Server to HP StorageWorks Enterprise Virtual Array 5000 (EVA5000) storage array: In this scenario, the Superdome copied the database files on the XP12000 storage array to separate file systems that were backed by the EVA. Experiments on the best ways to implement the copy were performed. The database was brought down before copying began and restarted when complete. Backup and restore times were recorded.

• Business Copy XP to EVA5000 storage array: This scenario utilized the Business Copy feature of the XP12000 storage array. The database was shut down and then secondary volumes (svol) were split using Business Copy. Then, the database was brought online. The HP Integrity rx4640 server mounted the svols from the XP12000 storage array and copied the data to separate file systems. The time to copy the data from the XP to the EVA was recorded. Impact to performance was also assessed.

• Business Copy XP to ESL Library: This scenario used the 24-TB database. An XP Business Copy was split and mounted on the rx4640 server, which then spooled the data to tape. The time for this action was recorded. After the spool was complete, the rx4640 server unmounted the svols, and the split was rejoined. Performance was assessed during this scenario.

4

To run these tests, HP configured the system illustrated in Figure 1. The key components include the following:

• Oracle 10g—Oracle was used to provide the load to the Superdome and generate the data, which was backed up and restored.

• HP Integrity Superdome server—The Oracle server, which was also used for backing up and restoring.

• HP rx4640 server—This server was used to offload the Superdome to back up the Oracle data. • HP XP12000 storage array—The XP12000 high-availability storage array was configured as the

primary disk storage device for the Oracle production data and the replicated data was used for backup. The XP12000 storage array was connected to the Superdome as well as the rx4640 backup server. Business copies were also made on the XP12000 storage array, which provided data backup.

• HP EVA5000—This storage was used for a disk-to-disk backup and to offload the XP12000 storage array.

• HP ESL E-Series 712e Tape Library—The ESL was configured as the nearline backup and restore device. The rx4640 and Superdome servers backed up the Oracle database files to the tape library.

• HP-UX operating system—Enterprise operating system used on both the Superdome and rx4640 servers.

• Symantec NetBackup 6.0 was used as the backup application.

5

Configuring the hardware HP constructed an enterprise configuration using an Integrity Superdome server and XP12000 storage array, which were best suited to an enterprise environment supporting ultra-large Oracle databases. See Appendix A—Bill of materials, p 22 for the complete list of hardware.

Figure 1 shows the configuration.

Figure 1. Test configuration

6

This section provides details on the SAN connectivity and configuration. The data includes zoning and configuration for both arrays.

Defining the zones Since HP-UX generates multiple busses for the multiple paths to which a device can be mapped, zoning was needed to reduce the total number of paths. Only one path should be used for tape devices as they are not supported in multipath configurations and sometimes cause problems with the backup application setup.

Since the current release of HP-UX is limited to 256 busses and multiple busses are generated for each path from a host port to an EVA port, bus exhaustion can occur if the EVA is not properly zoned. After zoning is introduced, it must be used in all cases for the paths to exist, which is why the XP12000 storage array was also zoned.

Twenty-four of the 32 Superdome HBA ports were zoned twice because the Superdome has 32 ports but needed to connect to 44 ports (32 XP, 8 ESL, and 4 EVA ports). Additionally multiple paths from the Superdome were connected to the EVA. In total, 16 paths were zoned and 32 Superdome ports connected to 56 storage connections. For the ports that were in multiple zones, those zones were from different groups (XP, tape, or EVA). The rx4640 server had six dual port HBAs. Each of these ports was zoned twice, again each in a different group.

• Superdome to XP12000 storage array

Zones were established for each Superdome HBA port that was connected to an XP12000 storage array port.

• Superdome to ESL

Eight Superdome HBA ports were zoned with eight tape library interfaces. Selective presentation was used to make a one-to-one mapping.

• Superdome to EVA

Sixteen different Superdome HBA ports were mapped to the four EVA ports; four HBA ports to each EVA port.

• rx4640 server to ESL

Eight rx4640 server ports were zoned to eight ports of the XP. The XP ports used for the rx4640 server were different from the XP ports used for the Superdome server.

• rx4640 server to ESL

Eight rx4640 server ports were zoned with all eight tape library interfaces. Selective presentation was used to make a one-to-one mapping.

• rx4640 server to EVA

Eight different rx4640 server HBA ports were mapped to the four EVA ports; two HBA ports to each EVA port.

• SMA to EVA

All EVA ports were zoned with the SMA ports.

7

Setting up the EVA5000 The EVA configuration consisted of one controller enclosure with two HSV110 controllers and eight disk enclosures, which were populated with 50 250-GB FATA drives.

The EVA5000 presented four virtual disks, which were all from a single disk group. The disk group included all 50 available FATA disks and was configured for double disk failure protection. The virtual disks were presented to all host ports that were connected to any port of the EVA. HP StorageWorks Secure Path used the Least-Bandwidth policy to load-balance across all the active paths.

Each used host port was identified on the EVA. The OS type was set to HP-UX. The first two virtual disks from the EVA had Preferred Path/Mode set to Path A—Failover Only. The other two were set to Path B—Failover Only, which equally divided the load across the two controllers. On occasion possibly because of a reboot, the devices would not be on the expected controller, and the Secure Path command to set the path would return this message:

Warning: LUN is part of a group, in which at least one other member is preferred to the other controller.

To reset the devices to the expected controller, the Preferred Path/Mode was set to No Preference, and then set back to the expected path, using either the Command View GUI or SSSU script options.

Setting up the XP12000 storage array The XP configuration included the following:

• XP 12000 Microcode: 50-04-06 • One DKC cabinet • Two DKU cabinets • Three phase 208-VAC redundant power to each cabinet

All fiber connections were connected to two Brocade 24000 switches. All failover was from one switch to the other.

The array front-end configuration consisted of two 32-port fiber CHiP pairs for a total of 32 CHiP processors across both pairs. To achieve the required throughput, 32 fiber ports were used and each fiber port was dedicated to a CHiP processor. The two pair of 32 port CHiP cards was dedicated to the Superdome database. There was also one 16-port pair of CHiP cards installed, which provided eight fiber connections to the backup node (rx4640 server) for the secondary Business Copy volumes.

The XP has a full configuration of four ACP pairs with 128 array groups (4 disk) divided evenly across the ACPs. The four disk array groups were configured into 64 RAID groups of 7D+1P (seven data disks and one parity disk). Thirty-two of these RAID groups were dedicated to the production database and 32 were dedicated to the Business Copy. In the production RAID groups, each group had four OPEN-V ldevs of 191.9 GB in size. Each of these ldevs was presented as a LUN to the Superdome for a total of 128 LUNs. This provided a total of 24.7 TB of data for the Oracle database.

The XP12000 storage array presented 129 LUNs to the hosts. A total of 128 were allocated for Oracle LVM (Logical Volume Manager) usage and the other presented LUN was used to create a file system for the Oracle installation and application. On the Superdome, each LUN was presented through two paths allowing for multipath failover. On the rx4640 server, only one path was presented. Secure Path on the Superdome for the XP selected a preferred path that balanced the two switches and four PCI boxes.

8

For Business Copy, it is best to dedicate separate RAID groups to the secondary copies. In this case, 32 RAID groups were dedicated. The size and emulation of the ldevs for the secondary copies must be the same as the primary ldevs. Business Copies were made on the XP12000 storage array. The primary volumes were presented to the Superdome. The secondary volumes were presented to the rx4640 server. When the copies were split, the secondary volumes were accessed on the rx4640 server to mount the /oracle directory and imported to create a copy of the LVM VGs. One set was on the Superdome and the other set was on the rx4640 server. Work on the Superdome continued while data was backed up using the rx4640 server.

It is important to observe the re-sync time required during a split of the Business Copy. If a significant amount of writes occur from the host, there may be a period of time ranging between minutes and hours when the split is initiated and when the secondary volume gets re-synced with updates. (The period of time depends on several issues including how fast the array can update changes from the primary to the secondary volume. If there is a large amount of activity occurring, the period of time may be increased.) The XP will not split the Business Copy pairs until there is data consistency between the primary volume and the secondary volume.

Setting up the HP StorageWorks ESL E-Series 712e Tape Library The HP StorageWorks ESL E-Series 712e Tape Library was configured with 16 Fibre Channel HP StorageWorks Ultrium 960 tape drives and the Enterprise Tape Library Architecture (ETLA). ETLA is comprised of Interface Controllers for the tape drives and robotics, and an Interface Manager for management of the library and ETLA.

Each of the four e2400-FC Interface Controllers has six Fibre Channel ports. Four ports are for the backend tape devices, and the remaining two are for the SAN. All interfaces are 2 GB. Each SAN port on the Interface Controller was connected to separate fabrics to distribute the load evenly across fabrics and HBAs. A separate Interface Controller was used to connect the robotic device to the SAN and required a SCSI connection from the robot and an FC connection to the SAN. Altogether nine FC connections were made from the ESL to the SAN.

The tape library was managed from a dedicated SAN Management Appliance. HP StorageWorks Command View TL software can be installed on the same SAN Management Appliance used for HP StorageWorks Command View EVA.

Symantec NetBackup 6.0 was used as the backup application. No service or maintenance packages were installed. During testing, binary patches provided by Symantec were used to address uncovered issues. See Restore issues. In the NetBackup environment, the rx4640 server was the master server, the global catalog server, and the robotic control host. For clarification, the global catalog server manages the backup images and the media where the images reside. Since there can only be one host per robotic device, NetBackup elects one of the hosts to be the robotic control host. The robotic control host moves the media to the tape drives when backups or restores are activated. The Superdome was the NetBackup media server, which was responsible for writing data directly to the tape device and is a subset of the master server.

9

Configuring the software See Appendix A—Bill of materials, p. 22 for the complete list of software. Table 1 lists the HP-UX kernel parameters that were modified for this testing. These settings produced best results in a previous configuration. The default value has been included for convenience.

Table 1. Altered kernel parameters

Tunable Default Used

max_async_ports 50 1024

max_thread_proc 256 1024

maxdsiz 1073741824 536870912

maxdsiz_64bit 4294967296 2147483648

maxfiles 2048 8192

maxfiles_lim 4096 16384

maxuprc 256 4096

nfile 65,536 200,000

ninode 8192 7500

nproc 4200 8192

npty 60 200

nstrpty 60 200

scsi_max_qdepth 8 128

semvmx 32767 32768

shmmax 0 x 40000000 0 x 100000000

Managing processor interrupts The Superdome includes multiple processor boards or cells. Each board can house up to four CPUs. Each cell can connect to a PCI I/O shelf. Without intervention, the I/O for a particular HBA may be handled by the CPU on a processor board that is not connected to the local I/O shelf. In this case, the I/O must go through a slower, cell interconnect path.

To circumvent this issue, HP-UX provides a command that allows an operator to manage the interrupt configuration of a system. The command was used to map the interrupts of the HBAs to the local cells and to evenly distribute the interrupts for that cell amongst the CPUs on the local processor board.

Verify that the cell to which an HBA is directly connected is the cell that handles the interrupts for that HBA. Performing this verification will keep the interrupts local to the cells. In this case, two interrupts were assigned to each CPU. This was not the default at system boot. The intctl command was used to migrate the interrupts to the desired CPU.

Following is an example of a command to migrate an interrupt: (bmtic236 is the Superdome.)

root@bmtic236> intctl -M -H 0/0/6/1/1 -I 1 -c 1

10

Setting up Secure Path Secure Path supports an active-active configuration and an active-passive configuration. The XP12000 storage array required the active-active software and the EVA5000 required the active-passive software.

XP Array—Active-Active On the XP12000 array, Secure Path provides the “autopath” command to display the device paths. In addition, the set_prefpath option was used to select the preferred path from the two paths available for each device. In this way, the paths selected were balanced across the HBAs and switches.

Figure 2 shows sample output of the autopath display. The “[PP]” indicates the selected preferred path.

Figure 2.

================================================================== HPswsp Version : A.3.0F.00F.00F ================================================================== Array WWN : 2A33 ================================================================== Lun WWN : 50_0-2A33-0000 Load Balancing Policy : Preferred Path ================================================================== Device Path Status ================================================================== /dev/dsk/c3t0d0 [PP] Active /dev/dsk/c48t0d0 Active ================================================================== Lun WWN : 50_0-2A33-0001 Load Balancing Policy : Preferred Path ================================================================== Device Path Status ================================================================== /dev/dsk/c3t0d1 Active /dev/dsk/c48t0d1 [PP] Active ================================================================== Lun WWN : 50_0-2A33-0003 Load Balancing Policy : Preferred Path ================================================================== Device Path Status ================================================================== /dev/dsk/c3t0d2 [PP] Active /dev/dsk/c48t0d2 Active ==================================================================

11

EVA Array—Active-Passive With the installation of the active-passive component, all the multiple existing device special files (DFSs) were removed and new unified DSFs were generated.

For example, before the installation of Secure Path, each virtual disk resulted in the generation of multiple DSFs in the /dev/dsk and /dev/rdsk directories. An example is /dev/dsk/c22t2d3. After installation, each device had a different, single designation. The command “spmgr” is used to show and select the multiple paths that Secure Path can use for each virtual disk on the EVA.

Figure 3 shows sample output of the spmgr display.

Figure 3. Sample output of spmgr display

Server: bmtic236.cxo.cpqcorp.net Report Created: Mon, Oct 31 08:38:07 2005 Command: spmgr display = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Storage: 5000-1FE1-5002-E1D0 Load Balance: On Auto-restore: Off Balance Policy: Least Bandwidth Path Verify: On Verify Interval: 30 HBAs:fcd25 fcd23 fcd17 fcd27 fcd3 fcd1 fcd19 fcd5 fcd7 fcd9 fcd21 fcd15 fcd13 fcd11 fcd29 fcd31 Controller: P5849D5AAPC03F, Operational P5849D5AAPR05H, Operational Devices: c117t0d0 c117t0d1 c117t0d2 c117t0d3 TGT/LUN Device WWLUN_ID H/W_Path #_Paths 0/ 0 c117t0d0 6005-08B4-0001-25EC-0000-E000-0075-0000 16 255/255/0/0.0 Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPC03F YES c186t0d1 fcd25 no Active c180t0d1 fcd17 no Active c164t0d1 fcd1 no Active c168t0d1 fcd5 no Active c172t0d1 fcd9 no Active c182t0d1 fcd21 no Active c176t0d1 fcd13 no Active c190t0d1 fcd29 no Active Controller Path_Instance HBA Preferred? Path_Status P5849D5AAPR05H no c185t0d1 fcd23 no Standby c187t0d1 fcd27 no Standby c166t0d1 fcd3 no Standby c181t0d1 fcd19 no Standby c170t0d1 fcd7 no Standby c177t0d1 fcd15 no Standby c174t0d1 fcd11 no Standby c191t0d1 fcd31 no Standby

Setting up LVM The Logical Volume Manager (LVM) is a subsystem for managing disk space. Two important requirements were as follows:

• Prove that the LVM (logical volume group) could support 24 TB of Oracle database files. • Successfully execute backup and restore operations at the same time that a large Oracle database

is active and online.

12

The LVM configuration consisted of 768 logical volumes for each one of the Oracle database files that were 32 GB each. There were also several additional files for the Oracle control files, such as system files, log files, and undo tablespace files. Within the constraints of LVM (limitation of 255 logical volumes per virtual disk group), four virtual disk groups were built to hold 32 physical volumes. By using the largest power of two greater or equal to the VG size divided by the maximum number of physical extents (64 k), the physical extent size was determined to be 128 MB.

To use the volumes, which were split from the Business Copy as LVM volumes on the rx4640 server, vgexport was run on the Superdome to generate a map file for each of the volume groups. The vgimport command was used to activate the volume groups on the rx4640 server (vgimport used both the map files and the list of devices in the volume groups).

Setting up the file systems Each of the four EVA virtual disks contained a VxFS file system. One file system was used to back up each of the VGs. Using a large block size yielded best performance; however, it is least efficient for space utilization. If files greater than 2 GB in size are going to be used on a VxFS, the creation and mount commands need to use the largefiles option. The created file systems were tagged with their mount point. This is useful if device- special files change or other uncommon situations occur.

Following is an example of creating a file system:

newfs -o largefiles -b 8192 /dev/rdsk/c117t0d0

The file systems were mounted with the options recommended in the document A Quantitative Comparison between Raw Devices and File Systems for implementing Oracle Databases, which can be found at:

http://www.oracle.com/technology/deploy/performance/pdf/TWP_Oracle_HP_files.pdf

Following is an example of a VxFS entry for /etc/fstab: (This entry is one line.)

/dev/dsk/c117t0d0 /bckp/vg01 vxfs largefiles,delaylog,nodatainlog,mincache=direct,convosync=direct 0 2

Working with Datagen Datagen is a decision support (DSS) Oracle workload that originated from a real-world customer requirement. Datagen involves creating a large Oracle database and executing full table scans and table copies.

The Datagen workload is divided into two parts. In the first part, a large table is created and loaded using the SQL loader “direct load” option. The data is generated using a C program. When the table is completely loaded, full table scans are performed, which generates significant read I/O on the storage system. The second part of the Datagen workload involves creating another tablespace slightly larger than the first tablespace. In this case, the table is copied, which generates both read and write I/O on the storage system.

After creating the database, the Oracle init.ora parameters were tuned to generate best completion times. Refer to Table 2 and Table 3 for the specific Oracle parameters that were used during the testing. These are the optimal settings for this environment.

13

http://www.oracle.com/technology/deploy/performance/pdf/TWP_Oracle_HP_files.pdf

Table 2. Datagen-specific Oracle file settings

Quantity Size File Options

1 Control

1 500m Sys

1 1000m Sysaux

1 8192m temp space Autoextend, Local Management

2 745m Logs

256 8192m Tablespace 1 data files

287 8192m Tablespace 2 data files

1 512m undo tablespace Autoextend, Local Management

Table 3. Datagen-specific Oracle parameters

Parameter Default Used

sort_area_size 65536 30,000,000

parallel_max_servers 15 2048

parallel_threads_per_cpu 2 8

db_block_size 8192 32768

undo_management MANUAL AUTO

dml_locks 164 500

db_files 200 800

Processes 30 2048

db_cache_size 8192

8192M

16284M

28672M

57344M

98304M

114688M

log_buffer 8,388,608 40,485,760

db_file_multiblock_read_count 8 2

shared_pool_size 84,000,000 1,000,000,000

parallel_threads_per_cpu 2 8

14

Backup and restore with the ESL E-Series tape library The major goal for the backups was to back up the Oracle databases presented to the Superdome server directly to the tape devices. During the backups, the database was taken offline to keep the data in a consistent state. A total of 777 raw volume files and one normal data directory that housed some of the Oracle tools used for testing were included in the backup. There was a total of approximately 23.11 TB to be backed up to tape. Data was sent to multiple tape drives in multiple streams to achieve high throughput. Since 64 streams were configured and there were only 16 drives, four streams were sent to the same tape drive. This method is known as multiplexing.

Setting up the ESL backup policy • Multiple data streams—In the Policy Attributes, select the “Allow multiple data streams” option,

which allows the backups to be split into multiple streams so the backups can be sent to multiple devices simultaneously.

• Multiplexing—In the backup policy schedule, multiplexing stream writes to tape can be set. This improves tape drive efficiency by allowing a single tape device to be sent multiple data streams. In this case, four streams were sent. Select the Media Multiplexing checkbox and define a value.

• Setting backup streams—Files to be backed up were grouped using a NetBackup “directive.” The NEW_STREAM directive causes a new stream to be created when the “allow multiple data streams” option is set. Adding this directive to the backup selections allows file selections to be sent as one data stream from NetBackup. For this backup, there were 65 NEW_STREAM directives set, which required a group of 12 files to be a part of the backup for each new stream. A grouping of 64 streams for the Oracle logical volumes (such as /dev/vg01/roradf1_1), and one stream for the /oracle directory were used.

Setting up the storage unit In NetBackup, the storage unit is the relationship between the host and its storage device. The storage unit is where the Superdome was selected as the host for the ESL E-Series 712e with its16 LTO3 tape drives.

After the storage unit is created, select “Maximum concurrent write drives” and “Maximum streams per drive.” “Maximum concurrent write drives” tells the host how many drives may be used simultaneously, and “Maximum streams per drive” tells the host how many streams can be simultaneously sent to any one drive. Be sure that the maximum streams per drive setting is equal to or higher than the Media Multiplexing setting in the backup policy to make use of multiplexing. During the testing of this configuration, these values were both set to 16.

Note: This storage unit is created when using the Device Configuration wizard within NetBackup. HP recommends that the Device Configuration wizard always be used to create tape libraries and their associated devices.

Setting up the master server global attribute In the master server host properties, “Maximum Jobs Per Client” value was set to 65. This is the total number of active jobs that can run concurrently at any one time. Since 64 streams (plus the managing job) were running at the same time, this value had to be set to a minimum of 65.

15

Backup issues During early testing, the NetBackup master server (rx4640 server) communication with the media server (Superdome) was timing out with media manager error 174. This error could be attributed to the large number of jobs that were active during the backups. Without the grouping described in the policy selections (see Master Server Global Attributes Best Practices), there were as many as 700 active jobs for this backup. Most would be queued until a drive became available. After configuring the job to keep the number of active jobs down, this error was avoided. Symantec was aware of this issue with other configurations and was planning on adding a fix to a future patch that would help to remedy this issue.

Off-host backups Off-host backups relieve the Oracle server from backup responsibilities by creating a Business Continuance Volume (BCV) and presenting it to a dedicated backup server to move the data to tape. In this case, a mirror (SVOL) of the production database on the Superdome was established on the XP, and broken from the mirror pair, and represented for backups. The SVOL was then presented to the master server (rx4640 server) and mounted in the same format as the volumes on the Superdome (such as, /dev/vg01/roradf1_1). Data was then backed up in the same manner as previously described. The host unit used for backups (rx4640 server) was the only difference.

In this configuration, the Superdome and the rx4640 servers used the XP12000 storage array for the database primary volumes (PVOL) and copied volumes (SVOL). The Oracle load was applied to the Superdome’s PVOL at the same time that the rx4640 server was backing up the SVOL. This was done to determine if the two loads caused contention with respect to the volumes on the XP12000 storage array. The testing results showed no significant contention.

Note: When mounting the raw volumes on the master server (rx4640 server), the ownership of the files may be root/sys by default. This is important when restoring the files, as the ownership must be changed to oracle/dba when mounted for backup, or after the restore has completed. If this ownership change is not made, the database will fail to start.

Setting up restores In all test cases, the Oracle server (the Superdome) was used to perform the restores of the data. Be sure to set the following options to ensure that the Oracle server completes the restores:

• Manually select Raw Partition Backups because the data being backed up was from raw volumes. • Redirected restores

For files backed up on one machine that were to be restored to another machine, a parameter must be set in the bp.conf file located in the /usr/openv/netbackup directory. The line to be added must be in the following format:

FORCE_RESTORE_MEDIA_SERVER = backup_server restore_server

Where the backup_server is the host name of the server that performed the backup and the restore_server is the host name of the server that needs to perform the restore.

16

The restore went smoothly and efficiently and NetBackup automated the restore process of de-multiplexing media and mounting the required tapes. After selecting the file type as Raw Partition Backups, the graphic interface displayed all of the files that had been backed up. All files were selected and the restore was started. The restore job quickly de-multiplexed the data and returned to its original location.

Note: For the restore to be successful, the raw volume device files must exist on the server to which you are restoring.

Restore issues With the unpatched NetBackup 6.0 code, one minor issue and one major issue occurred that did not allow the restores to complete. The minor issue happened when NetBackup was shut down unexpectedly to recover from log files that filled up the /usr file-system. This left an empty *.ior file in the /usr/openv/var directory. The .ior files were moved to a different location, which caused new .ior files to be populated appropriately. Symantec was aware of this issue and, as a result of the HP discovery, provided a patch that fixed the issue.

During the restore testing, HP uncovered a major issue when NetBackup tried to mount a piece of media and claimed that the media was unexpected media, or a piece of cleaning media, which put the mount requests into a pending state within the NetBackup Device Monitor. Symantec again, as a result of the HP discovery, released a patch to HP (special bptm, bpkar, and tar) for testing, which did work. All subsequent jobs completed without issue after the binary patch was installed.

17

Performance results The testing effort successfully provided the following results:

• Successfully reduced downtimes from tape backups to less than 8 hours. • Using Business Copies, downtime was virtually eliminated.

EVA performance results For these tests, data was transferred between the XP12000 storage array and the EVA5000 FATA drives, which provided nearline backups. The “dd” utility was used to obtain the I/O results. “dd” is a commonly used low-level UNIX® tool that does sequential I/O with a modifiable block size. An example of a “dd” command is:

dd if=/dev/vg03/roralog3 of=/bckp/vg03/roralog3 bs=1024k

Back up the Superdome to EVA For this test, multiple dd processes were run in parallel reading the data from each logical volume and creating a corresponding file on the EVA file system. The copied table was not backed up. Therefore 4.3 TB of data was backed up.

Table 4. Backup Superdome to EVA

Backup time

Superdome I/O

CPU (usr,sys,wio,idle)

EVA FS Util

Backup 8 hours, 37 minutes

285 MB/s 0,1,97,2

Restore 3 hours, 07 minutes

796 MB/s 0,1,99,0

4.3T

Another test was performed where the data was read from the LVM volumes, compressed, and then written to the EVA file systems. In this case, both tables were backed up for a total of 8.5 TB of data.

Table 5. Compressed data

Backup time

Superdome I/O


EVA FS Util


185MB/s 81,19,0,0

Restore 8 hours, 16 minutes

490 MB/s 58,42,0,0

5.9TB

18

Back up the XP12000 Business Copy to EVA The following table displays the results of using “dd” to back up the XP Business Copy to the EVA. Note that the backup was of the Business Copy devices on the XP. These were first presented to the r4640 server, and then the logical volumes were imported to the EVA. (During the test, the copied table was not included in the backup.) A total of 4.3 TB was backed up.

Table 6. Backup Business Copy to EVA

Backup time

Superdome I/O


EVA FS Util


288 MB/s 1,3,96,0 4.3TB

The following table shows the elapsed time results for the Datagen table copy when the pairs were split. Note that running the Oracle table copy workload had a minimal effect on the results.

Table 7. Datagen table copy

Oracle Load Resync Factor Elapsed times

None 85% 2 hours, 23 minutes

Oracle Table Copy 82% 3 hours, 34 minutes

The following table displays the results of the Oracle table scan before and during the Business Copy resync. As you can see, the load had a minimal effect on the elapsed times.

Table 8. Table scan impact

Table scan impact Elapsed times

No active synch 21 minutes, 10 seconds

Active synch 21 minutes, 23 seconds

19

ESL E-Series backup and restore performance results The following table summarizes the performance results using a 23.11-TB database.

Table 9. ESL backup and restore

Completion Time Throughput CPU Utilization

Superdome backup 8.37 hours 2.76 TB/hour

804 MB/s

80%

rx4640 backup 11.38 hours 2.03 TB/hour

591 MB/s

95%

Superdome restore 8.68 hours 2.66 TB/hour

775 MB/s

Notes: 1. While the CPU utilization was very high on the rx4640 server, it did not cause any system problems. A bigger server, such as four CPUs instead of two, is recommended. 2. There were times when the rate was greater than 1GB/s (1024 MB/s).

20

Best practices HP used the test environment to define general best practices using an Oracle 10g with an XP12000 storage array.

Best practices for the XP12000 storage array HP recommends the following two general practices:

• Be sure that you plan for growth and scalability. • Replicate only the essential data.

The following best practices are specific for the configuration:

• Keep front--end CHiP processor utilization below 80%. • Keep back-end ACP processor utilization below 75%. • When designing for performance on a fiber port pair with a single CHiP processor, use one for

production data volumes and one for failover. For best performance, it is recommended to dedicate the processor I/O to a single fiber port.

• On large database configurations, spread the LUNs across RAID groups and ACPs. • When planning for Business Copy splits for offline use, pay close attention to the I/O write load

that is placed on the primary volumes.

Tuning ESL for best performance Several settings were set to achieve the required performance. These settings should be tuned for the specific environment, and when possible in a test environment. The following is a list of the important tuning options used to achieve good performance during the backup to tape.

• Multiplex and multistream—Since there were so many files to be backed up, multiplexing and multistreaming were used to keep the number of active jobs to a manageable level within NetBackup, and to take advantage of all tape resources.

• Buffer configuration—There were three touch files that were used within NetBackup to achieve better levels of performance, with respect to the use of memory buffers. These files will typically be located in the /usr/openv/netbackup/db/config directory. A thorough document that explains buffer configuration for NetBackup can be found at http://seer.support.veritas.com/docs/183702.htm. Another recommended document regarding buffer configuration for LTO3 and NetBackup can be found at http://h71028.www7.hp.com/ERC/downloads/5982-9971EN.pdf.

Following are the settings that were used in the test environment: • NUMBER_DATA_BUFFERS: The number of buffers used by NetBackup to buffer data before sending

it to the tape drives. The default value is 16 and was set to 32. • SIZE_DATA_BUFFERS: The size of each buffer setup multiplied by the NUMBER_DATA_BUFFERS

value. The default value is 65536 and was set to 262144. • NUMBER_DATA_BUFFERS_RESTORE: The number of buffers used by NetBackup to buffer data

before writing it to the disk. The default value is 16 and was set to 32.

21

http://seer.support.veritas.com/docs/183702.htm

http://h71028.www7.hp.com/ERC/downloads/5982-9971EN.pdf

Appendix A—Bill of materials

Oracle Server Quantity Software/Firmware

Operating system HP-UX 11i V2 May ‘05 update

Database Oracle 10g V10.1.0.4.0

Multi-path solution HP StorageWorks Secure Path for HP-UX V3.0F

Backup Software Solution

Symantec NetBackup Enterprise Server 6.0

Cell F/W Sys FW 2.50, PDHC 15.10

MP F/W 5.14

ED F/W 2.9

CLU F/W 15.2

HP Integrity Superdome—32 server 1

PM, CIO F/W 15.0

1.5-GHz 6M CPUs 16

GB RAM 64

12-slot PCI-X chassis 4

Firmware version: 3.03.150 PCI-X dual-channel 2-GB FC HBA (A6826A)

16 (4 per shelf)

Driver 1.42

Backup Server

Operating system HP-UX 11i V2 May 05 update

Multi-path solution HP StorageWorks Secure Path for HP-UX V3.0F

Backup Software Solution

Symantec NetBackup Enterprise Server 6.0

MP E.03.13

BMC 03.47

EFI 03.10

HP rx4640 server 1

System 03.11

1.5-GHz CPU 2

GB Memory 4

Firmware version: 3.03.150 A6826A (dual port HBA)

6

Driver 1.42

Storage – EVA5000

EVA5000 (2C8D) 1 V3.020

250-GB FATA Disk (ND25058238) 50 HP01

HP StorageWorks SAN Director 2/128 switches

2 V4.4.0c

HP OpenView Storage Management Appliance III

1 V2.1

22

Storage—XP12000 array

XP12000 storage array 1 Microcode 50-04- 06

Disk control frame 1 per Microcode

146-GB 10k rpm array group 4 disks

128 per Microcode

146-GB 10k rpm spare disks 6 per Microcode

Disk array frame 2 per Microcode

High-performance FC-AL disk path 2 per Microcode

Standard-performance ACP pair 1 per Microcode

16-Port 1–2-Gbps FC SW CHIP pair 1 per Microcode

32-Port 1–2-Gbps FC SW CHIP pair 2 per Microcode

4-GB Cache Memory Module 16 per Microcode

1-GB Shared Memory Module 5 per Microcode

Business Copy XP 1 per Microcode

Secure Path XP 1 3.0F

HP StorageWorks Performance Advisor XP 1 2.1

HP StorageWorks RAID Manager XP 1 1.17.04

Fibre Channel Cables As needed

Storage—ESL Tape Library

ESL E-Series 712e tape library f/w version 4.10

Ultrium 960 tape drives 16 F/w versions L26W

e2400-FC 2GB Interface Controllers 4 5.6.78

Interface Manager 1 I160

Ultrium 800 GB tape cartridge 62

Command View XP Management Station

HP ProLiant DL380 (or other ProLiant) 1

Command View XP 1 2.1

Command View TL Version 1 1.6.00

23

Appendix B—Configuring RAID Manager HP StorageWorks RAID Manager can be used to issue Business Copy commands from a host to the disk array. Business Copy allows you to create and maintain up to nine copies of data on the local disk array.

RAID Manager environment variables Following are the environment variables that must be set in the shell environment for Business Copy operations. Use the export command to set them.

export HORCMINST=0 export HORCC_MRCF=1

All SCSI target IDs and LUN numbers must be obtained from your configuration. Do not use these examples. Use the raidscan command in RAID Manager to obtain your SCSI targets and LUN numbers.

Following is the HORCM0 config file. All of the PVOL volumes must be defined here.

HORCM0 CONFIG FILE # Created by mkconf.sh on Mon Sep 12 08:48:40 MDT 2005 HORCM_MON #ip_address service poll(10ms) timeout(10ms) 16.112.14.236 horcm0 1000 3000 HORCM_CMD #dev_name dev_name dev_name #UnitID 0 (Serial# 10803) /dev/rdsk/c162t15d7 /dev/rdsk/c163t15d7 HORCM_DEV #dev_group dev_name port# TargetID LU# MU# #oradb disk1 CL1-A 0 3 oradb d1 CL1-A 0 3 oradb d2 CL1-A 0 5 oradb d3 CL1-A 0 2 oradb d4 CL1-A 0 6 oradb d5 CL1-A 1 0 oradb d6 CL1-B 0 0

oradb d125 CL4-G 0 7 oradb d126 CL3-H 0 4 oradb d127 CL3-H 0 5 oradb d128 CL3-H 0 6 oradb d129 CL3-H 0 7

24

Following is the HORCM1 config file. All of the SVOL volumes must be defined here.

HORCM_INST #dev_group ip_address service oradb 16.112.14.236 horcm1

HORCM1 CONFIG FILE # Created by mkconf.sh on Mon Sep 12 08:48:40 MDT 2005 HORCM_MON #ip_address service poll(10ms) timeout(10ms) 16.112.14.236 horcm1 1000 3000 HORCM_CMD #dev_name dev_name dev_name #UnitID 0 (Serial# 10803) /dev/rdsk/c162t15d7 /dev/rdsk/c163t15d7 HORCM_DEV #dev_group dev_name port# TargetID LU# MU# #oradb disk1 CL1-J 0 0 oradb d1 CL1-J 0 0 oradb d2 CL1-J 0 1 oradb d3 CL1-J 0 2 oradb d4 CL1-J 0 3 oradb d5 CL1-J 0 4 oradb d6 CL1-J0 5 oradb d125 CL4-L 1 3 oradb d126 CL4-L 1 4 oradb d127 CL4-L 1 5 oradb d128 CL4-L 1 6 oradb d129 CL4-L 1 7

HORCM_INST #dev_group ip_address service oradb 16.112.14.236 horcm0

25

For more information

HP • HP StorageWorks product information

http://www.hp.com/country/us/eng/prodserv/storage.html

• HP-Oracle solutions http://www.hp.com/go/hpcft

• HP Integrity Superdome Servers http://www.hp.com/products1/servers/integrity/superdome_high_end/comparison.html

• HP Integrity rx4640 Servers http://www.hp.com/products1/servers/integrity/entry_level/rx4640/index.html

• HP-UX Operating System http://www.hp.com/products1/unix/operating/

• HP StorageWorks XP arrays

http://h18006.www1.hp.com/products/storageworks/enterprise/index.html

• HP StorageWorks Business Copy XP

http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/business/

• HP StorageWorks ESL E-Series Tape Library

http://h18006.www1.hp.com/products/storageworks/esltapelibraries/index.html

Oracle • For the latest Oracle product information:

http://www.oracle.com/products/index.html

• Oracle 10g Documentation Library http://www.oracle.com/technology/documentation/database10g.html

VERITAS/Symantec • For the latest Netbackup product information:

http://www.veritas.com/Products/www?c=product&refId=2

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Oracle is a registered US trademark of Oracle Corporation, Redwood City, California. UNIX is a registered trademark of The Open Group.

4AA0-4661ENW, Rev. 1, April 2006

26

http://www.hp.com/country/us/eng/prodserv/storage.html

http://www.hp.com/go/hpcft

http://www.hp.com/products1/servers/integrity/superdome_high_end/comparison.html

http://www.hp.com/products1/servers/integrity/entry_level/rx4640/index.html

http://www.hp.com/products1/unix/operating/

http://h18006.www1.hp.com/products/storageworks/enterprise/index.html

http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/business/

http://www.oracle.com/products/index.html

http://www.oracle.com/technology/documentation/database10g.html

http://www.veritas.com/Products/www?c=product&refId=2

http://h18006.www1.hp.com/products/storageworks/esltapelibraries/index.html

Backup and recovery best practices for an ultra ... - HPE Community

Documents

Transcript of Backup and recovery best practices for an ultra ... - HPE Community