CMSO Server Patching Procedures - AWS

47
CMSO Patching Procedures for Lower Environment Servers Version 56 August 5, 2021

Transcript of CMSO Server Patching Procedures - AWS

CMSO Patching Procedures for Lower Environment

Servers Version 56

August 5, 2021

CMSO Patching Procedures

CMSO Patching Procedures i Document Date: August 5, 2021

Revision History

Date Revision Description 1/18/2017 1.0 Original version 3/16/2017 2.0 Added ATLAS lower environment 8/10/2017 3.0 Added step for troubleshooting when /var is

100% to CMECF section. 1/4/2018 4.0 Added alternative method for subscribing

CMECF servers to CMSO’s CMECF patching channels received from David Agbomola

7/24/2018 5.0 Added section for BSA & populated with initial info from 7/24/18 call with BSA Team

9/17/2018 6.0 Added post-patching steps/communications 10/19/2018 7.0 Added section to add servers to BSA using

ELLIS tool 10/25/2018 8.0 Formatted BSA section

11/1/2018 9.0 Added steps for automated patching of PPS Linux boxes including PACTS (added to PACTS section)

12/7/2018 10.0 Added steps for updating IP/server name in BSA Added section to remove a server from BSA

1/17/2019 11.0 Updated “post patching” steps

1/22/2019 11.1 Updated the step “To send patching notifications to CMSO” to exclude filter “AO_Branch = NPHB”

2/14/2019 11.2 Under BSA, renamed section entitled “To send patching notifications to CMSO” as “To generate list of servers to be patched from BSA” and moved to its own section

3/26/2019 11.3 Added procedure to BSA for requesting access to BSA system

4/3/2019 12.0 - Added section for Atlassian & CMECF Wiki/Tools server patching - Added notes to “PPS Lower Environment Linux” patching section about PACTS kernel lock - Added instructions for starting databases on PACTS Informix grid node servers

4/10/2019 13 - Clarified section on starting databases on PACTS grid servers - Updated ATLAS patching steps

CMSO Patching Procedures

CMSO Patching Procedures ii Document Date: August 5, 2021

7/24/2019 14 - Added specific kernel lock info to PACTS and ATLAS sections

9/19/2019 15 Added possible causes of patching failure to automated patching section

9/30/2019 16 Removed old CMECF patching procedure that used satellite server. Started updating procedure, added a comment with links to instructions and other things that should be incorporated into the text or referenced

12/5/2019 17 Added CMSO SQL patching section

12/18/2019 18 Added “Creating the Monthly Patch Schedule” section

12/26/2019 19 Updated ansible server info – server name and IP

1/8/2020 20 - Updated list of packages that should be excluded from patching on Atlassian servers (from B. Meena 1/8/2020 IM) - Added troubleshooting section and consolidated info from other sections

2/13/2020 21 Added start/stop instructions for Atlassian databases & app services

2/20/2020 22 Added detail to Troubleshooting section

2/21/2020 23 Created separate “pre-requisites” section under BSA

2/24/2020 24 Added step to Atlassian section to start Apache, how to enter cert passphrase if prompted

2/26/2020 25 Added detail to Atlassian patching

3/18/2020 26 Added check for bamboo remote agents after bamboo server is patched – Atlassian section

5/15/2020 27 Updated Troubleshooting section

6/12/2020 28 Revised ATLAS patching section

6/15/2020 29 Added section for “Other Lower Environment Linux Servers in the CMSO vRA/Cloud” and added vrautil01 & vrautil02 servers

7/9/2020 30 Edited Atlassian section

7/22/2020 31 Edited Atlassian Section

7/23/2020 32 Atlassian patching – added step to take snapshot of servers prior to patching, added link to instructions for stopping Bamboo jobs, added step to verify Bamboo jobs have

CMSO Patching Procedures

CMSO Patching Procedures iii Document Date: August 5, 2021

started, added note about what to do if server doesn’t come back up, added step to delete snapshots

9/25/2020 33 Added first info related to RHEL8: some yum-* packages are being deprecated in RHEL8, use “dnf” instead where applicable. Added to “Lower Environment Linux Servers” section and “Troubleshooting” section

9/28/2020 34 Updated CMECF “post-patching” steps – added distro list to use if there are issues patching and we need to notify Dev Teams of reboot

10/7/2020 35 Added small group notification to “After Patching” steps

10/22/2020 36 Updated Windows patching section

10/26/2020 37 Added patching prep section to Atlassian patching – snapshot Bamboo server on the Thursday before patching

11/4/3030 38 Added section on special considerations for the metacentertest server

11/19/2020 39 Added step to Atlassian section to take snapshot of the Bamboo server the Thursday before patching

12/23/2020 40 Revised CMECF patching procedure

1/11/2021 41 Updated BSA pre-requisite sections – BSA access & BSA documentation page/ELLIS access

1/25/2021 42 Tweaked CMECF patching section and re-formatted

1/28/2021 43 - Added steps for patching via Windows PowerShell - Updated Atlassian Prod server patching exclusions

2/3/2021 44 Updated BSA – Adding servers to Ellis section (added table for mapping Environment values from iTop to ELLIS)

3/2/2021 45 Tweaks to CMECF post-remediation distro lists

3/11/2021 46 - Updated patching exclusions list in Atlassian section - Updated instructions for accessing ELLIS (in BSA section)

4/15/2021 47 Added Mac Pro section

4/19/2021 48 Add steps to PPS Linux patching about commenting out cron after patching

CMSO Patching Procedures

CMSO Patching Procedures iv Document Date: August 5, 2021

4/23/2021 49 Added actions to perform before and after patching the Mac Pro server

4/23/2020 50 Added Backup & Restoring section

5/17/2021 51 Added Troubleshooting section to Mac Pro section

6/4/2021 52 Added patch availability list, misc. minor edits

6/17/2021 53 Added instructions for requesting TSSA (formerly BSA) Console installation

6/22/2021 54 Added Cocker to patching exclusion list on automationajta server under Atlassian

7/16/2021 55 Minor tweaks to BMC Server Automation section under BSA section

7/19/2021 56 Updated patching exclusion info for Atlassian servers

CMSO Patching Procedures

CMSO Patching Procedures v Document Date: August 5, 2021

Table of Contents

REVISION HISTORY ........................................................................................................................ I 1 DEFINITIONS AND ACRONYMS ........................................................................................... 1

2 PATCHING WINDOW & NEW PATCH AVAILABILITY ........................................................ 1

3 REFERENCES ........................................................................................................................ 2

4 PRE-REQUISITES .................................................................................................................. 2

5 RESTORES ............................................................................................................................. 3

6 CM/ECF LOWER ENVIRONMENT SERVER OS PATCHING .............................................. 4

PATCH SCHEDULE FOR CM/ECF SERVERS ........................................................................... 4 ADD SERVERS TO BSA AND PATCH SCHEDULE AND SEND PATCHING ANNOUNCEMENT ............ 4 PATCHING PREP .................................................................................................................. 4 PATCHING REMEDIATION ...................................................................................................... 8 AFTER REMEDIATION ........................................................................................................... 9 MANAGING SERVICES (OPTIONAL) ........................................................................................ 9

7 PPS LOWER ENVIRONMENT LINUX SERVERS IN THE CMSO VRA/CLOUD (EXCEPT ATLAS) 10

BEFORE PATCHING ............................................................................................................ 10 PATCHING STEPS ............................................................................................................... 10 7.2 Automated Patching ................................................................................................................ 10 7.2 Manual Patching (Backup Method) ......................................................................................... 11 AFTER PATCHING ............................................................................................................... 12 SPECIAL CONSIDERATIONS FOR VRAUTIL01 SERVER ............................................................. 13 SPECIAL CONSIDERATIONS FOR VRAUTIL02 SERVER ............................................................. 13 SPECIAL CONSIDERATIONS FOR METACENTERTEST SERVER ................................................. 13

8 ATLAS LOWER ENVIRONMENT LINUX SERVERS IN THE CMSO VRA/CLOUD ........... 15

BEFORE PATCHING ............................................................................................................ 15 PATCHING STEPS ............................................................................................................... 15 AFTER PATCHING ............................................................................................................... 15

9 LOWER ENVIRONMENT LINUX SERVERS HOSTED BY CTHO ...................................... 16

10 TROUBLESHOOTING LINUX SERVERS ............................................................................ 16

TROUBLESHOOTING PACTS & ATLAS SERVERS ................................................................ 16 TROUBLESHOOTING CMECF SERVERS............................................................................... 16 TROUBLESHOOTING RHEL8 SERVERS ................................................................................ 17

11 WINDOWS PATCHING IN THE LOWER ENVIRONMENTS ............................................... 17

ASSUMPTIONS ................................................................................................................... 17 PROCEDURE OVERVIEW ..................................................................................................... 17 PATCHING WINDOWS SERVERS VIA POWERSHELL UTILITY ................................................... 18 PATCHING WINDOWS SERVERS MANUALLY VIA WINDOWS UPDATE ...................................... 20 AFTER PATCHING ............................................................................................................... 22

12 MAC PRO .............................................................................................................................. 23

BEFORE PATCHING ............................................................................................................ 23 PATCHING STEPS ............................................................................................................... 25

CMSO Patching Procedures

CMSO Patching Procedures vi Document Date: August 5, 2021

AFTER PATCHING ............................................................................................................... 25 TROUBLESHOOTING ........................................................................................................... 25

13 ATLASSIAN SERVERS ........................................................................................................ 25

PATCHING PREP ................................................................................................................ 26 STEPS ............................................................................................................................... 26 TROUBLESHOOTING ........................................................................................................... 30

14 SQL SERVER PATCHING ................................................................................................... 30

BAIMS SQL SERVER PATCHING ........................................................................................ 30 OTHER SQL SERVER PATCHING ......................................................................................... 31

15 PATCHING VIA BLADELOGIC SERVER AUTOMATION (BSA) PORTAL ....................... 32

BSA POCS/CONTACT INFO ................................................................................................ 32 URLS/APP ........................................................................................................................ 32 PRE-REQUISITES ................................................................................................................ 32

15.3 Access to the servers to be patched ....................................................................................... 32 15.3 Access to the BSA Portal ........................................................................................................ 32 15.3 Access to the BSA Console .................................................................................................... 33 15.3 Access to the BMC Server Automation page (BSA documentation & ELLIS) ........................ 33 15.3 Access to the CMSO Patch Schedule library .......................................................................... 34 ADDING SERVERS TO BSA .................................................................................................. 35 UPDATE IP/SERVER NAME IN BSA ...................................................................................... 36 REMOVE A SERVER FROM BSA ........................................................................................... 37 TO CREATE THE MONTHLY PATCH SCHEDULE ...................................................................... 38 PATCHING VIA BSA ............................................................................................................ 39

CMSO Patching Procedures

CMSO Patching Procedures 1 Document Date: August 5, 2021

1 Definitions and Acronyms The following terms are used in this document.

Term Definition CM/ECF Case Management/Electronic Case Files CTHO Cloud Technology Hosting Office PACTS Probation and Pretrial Services Automated Case Tracking System PPS Probation and Pretrial Services

2 Patching Window & New Patch Availability The patching window starts on the Wednesday following Patch Tuesday (second Tuesday of the month) and ends the second Sunday after Patch Tuesday. Patches become available in the enterprise repos per the schedule below:

CMSO Patching Procedures

CMSO Patching Procedures 2 Document Date: August 5, 2021

3 References The monthly Patch Schedules are posted on SharePoint at https://fedcourts.sharepoint.com/sites/xDPS-CMSO/Patch%20Schedules. The monthly Jira tracking ticket has a summary of the schedule and sub-tasks that have instructions. Example - ESPT-30912

4 Pre-Requisites 1. A JIRA ticket must be created under the ESPT project for tracking purposes (e.g. ESPT-30912). 2. CMSO Security must validate the list of CMECF and PPS patches before the patches are deployed.

For ATLAS servers, the ATLAS Team (not CMSO Security) must validate. Details are in the sections below.

3. A general communication must be sent to all relevant parties prior to Patch Week listing all servers to be patched, patching window, and system administrators assigned to each task (use the Patch Schedule at http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/_layouts/15/start.aspx#/Patch%20Schedules/Forms/AllItems.aspx). The Environment Team Lead will send the communication. Send the communication to the following:

To: /DCA/AO/USCOURTS, Cc: AOml_DPS-CMSO Office Contractors, AOml_DPS-CMSO Office Staff, AOTXml_CMECF_HelpDesk, AOTXml_Comm, AOTXml_HSD_CMHB, AOTXml_HSD_NPHB, AOTXml_SD_PX

4. A specific communication must be sent just prior to each patching window (e.g., Monday AM) listing

the servers to be patched (see the Patch Schedule at http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/_layouts/15/start.aspx#/Patch%20Schedules/Forms/AllItems.aspx). Send the communication to the following:

To: AOml_DPS-CMSO Office Contractors, AOml_DPS-CMSO Office Staff, AOTXml_CMECF_HelpDesk, AOTXml_Comm, AOTXml_HSD_CMHB, AOTXml_HSD_NPHB, AOTXml_SD_PX Also send to the PACTS-specific distro lists identified in the Patch Schedule for each group of servers (e.g., [email protected]).

CMSO Patching Procedures

CMSO Patching Procedures 3 Document Date: August 5, 2021

5 Restores The below is for VMs and files/directories in the CMSO vRA/cloud. In case restore is needed, instructions for restoring individual file/directories using vRA self-service are on the Backup wiki page at https://wiki.opps.gtwy.dcn/pages/viewpage.action?pageId=570851492#BackingUp&RestoringvRAServers(UsingDellSystem)-Restoringindividualfiles/directories. Instructions for restoring full VM are at https://wiki.opps.gtwy.dcn/pages/viewpage.action?pageId=570851492#BackingUp&RestoringvRAServers(UsingDellSystem)-RestoringawholeVM. After-hours support is available – see here. Other instructions and information about backups are on the Backup Wiki page. Training info including video demo is here.

CMSO Patching Procedures

CMSO Patching Procedures 4 Document Date: August 5, 2021

6 CM/ECF Lower Environment Server OS Patching CMECF servers are patched via the enterprise BladeLogic (BSA) system. CHNO creates patch analysis jobs every month containing packages to be deployed on the servers (one job for RHEL6 and one for RHEL7). CMECF servers are patched two months behind, so, for example, patching done in October 2020 would use the “RedHat 6 Analysis 20200825” job. CMSO then runs a patch analysis operation to associate the job to the servers. The operation also identifies servers that failed to patch (or patch fully) and need to be fixed prior to actual patching. A remediation operation is scheduled to do the actual patching and reboot. The patch analysis operation is run again to verify servers were patched. If servers have an issue, the cause is fixed and the remediation operation is run again, followed by the patch analysis operation until the server shows as 100% successful. Servers are added to BSA via the ELLIS portal. For more information and procedures related to BSA see Section 14 Patching via BladeLogic Server Automation (BSA) Portal. More information on the patching process and procedure can be found in the monthly patching tickets in Jira (e.g., search on December 2020 monthly patching).

Patch Schedule for CM/ECF Servers CM/ECF servers are patched the 4th Saturday of the month. The Patch Schedules are posted on SharePoint at https://fedcourts.sharepoint.com/sites/xDPS-CMSO/Patch%20Schedules.

Add Servers to BSA and Patch Schedule and send patching announcement 1. Sys admin or Enviro Coordinator assigns a Jira ticket to the Safe Manager (primary), (backup) to

add server to BSA system via ELLIS portal. Safe Manager assigns the server to the correct patching group.

2. Safe Manager re-assigns ticket to sys admin who verifies server is appearing in the correct BSA smart group (CMSO CMECF RHEL6 or CMSO CMECF RHEL7).

3. Sys admin exports report out of BSA system listing the patch groups and servers in each group (RHEL6 and RHEL7) and copies it into the Patch Schedule.

4. Sys admin updates the Patch Schedule as needed (patch analysis job to be used, dates, Jira ticket) and posts it on SharePoint.

5. Sys admin sends maintenance announcement on the Tuesday before patching.

Patching Prep User setup pre-requisites: • Access to the TSSA (BSA) system and other user setup pre-requisites per Section 13.2 • User should have viewed the training and read the job aids in Section 13.7 • User should have shadowed a sys admin at least once

1. Run patch analysis operation to associate this month’s patch analysis job to the servers.

a. Log in to BSA at https://bsaportal.ao.dcn using adu-a credentials (obtain password for your adu-a account from CyberArk).

b. Run operation following the steps in the BSA Portal Analysis Operation Creation doc. When prompted to select the patch analysis job, CMECF servers are patched two months behind, so, for example, patching done in October 2020 would use the “RedHat 6 Analysis 20200825” job.

See example below of the path and sample analysis jobs to choose from

CMSO Patching Procedures

CMSO Patching Procedures 5 Document Date: August 5, 2021

c. Navigate to the CMSO smart groups that contain the CMSO CMECF servers. Go to Browse

and follow the path below:

d. Select All CMSO Cloud:

e. Click on the checkmark next to the RHEL6 or RHEL7 group as appropriate:

CMSO Patching Procedures

CMSO Patching Procedures 6 Document Date: August 5, 2021

f. Click Finish. g. Your home page will be displayed. Click Run. Operation takes approximately 30 minutes. h. When operation is complete, click on View Results. i. Check to see if the operation passed or failed or is missing patches – see instructions in BSA

Log Viewing doc.

2. Troubleshoot any servers that failed. The log in BSA will indicate if it’s a space issue, unreachable, or other error. See Section 9.2 Troubleshooting CMECF Servers below. If still not able to fix, reach out to the BSA Team for help.

Note: When troubleshooting, any reboots done outside the monthly patching maintenance window need to be coordinated with the Development Teams.

3. After fixing go back into BSA and run the patch analysis job again on the particular server.

4. After all the servers are showing as “pass” on the patch analysis operation, schedule the

remediation operation. This operation applies the patches and reboots the servers so be careful with the scheduling. Remediation operations should be set to go off at 12:01 AM on the day of patching.

a. Click on “Actions” -> “Remediate All Patches For All Targets:”

CMSO Patching Procedures

CMSO Patching Procedures 7 Document Date: August 5, 2021

b. The “Selected Deploy Template” screen comes up.

c. Select “Only Reboot at End.”

CMSO Patching Procedures

CMSO Patching Procedures 8 Document Date: August 5, 2021

Patching Remediation 1. After the patching remediation operation is complete, run another patch analysis operation to

verify patching was successful. See below for desired result (100% successful).

2. Add screenshot(s) of the results (including any failures/missing patches) to the main CMECF

sub-task in the monthly master patching ticket in Jira. To find the monthly master patching ticket, search for “December 2020” or “December 2020 patching” (for example). Then look for “CMECF server patching” sub-task (this is the main CMECF sub-task). Provide screenshots for both RHEL6 and RHEL7 and identify them.

3. If any servers failed or are missing patches, remediate and run the patch analysis operation

again on the server(s) that failed. Do this until server shows as 100% successful or determination is made that server(s) will require additional work during business hours.

4. Add screenshot(s) of the results after remediation to the main CMECF sub-task in the monthly

master patching ticket for both 100% fixed and still having issue. Do this for RHEL6 and RHEL7.

CMSO Patching Procedures

CMSO Patching Procedures 9 Document Date: August 5, 2021

After Remediation 1. Reply back to the original maintenance announcement that work is complete, and list any

servers having issues that were not able to be fixed during the maintenance window. Note that we are looking into the issues and will update separately.

2. If there are servers having issues and need to be addressed on the next business day, then: a. Create sub-task(s) for the servers having issues under the main monthly patching ticket. Note

the issue and fix for each server in the sub-task(s). b. Once the issues are fixed, schedule patching/reboot for 12:01 AM ET and notify the Dev

Teams - Email AOml_DPS-CMSO_CMECF_Scrum_Masters and AOml_CMECF_CM and state that patching/reboot is scheduled for 12:01 AM ET. Copy the Engineering Team and CMSO CMECF ISSO. If it is cmkbdb.cmkb.aocms.gtwy.dcn server, include the Test Team Lead (cmkbdb is the only vRA test automation server that has CMECF installed).

c. The next morning validate that patching and reboot were successful and reply back to the small-group announcement that work is complete. If there are continuing issues, list the servers still having issues and return to Step 3b.

d. Add screenshot(s) of the fixed servers showing as 100% successfully patched to the main CMECF sub-task in the monthly master patching ticket.

Managing Services (Optional) After applying the patches and rebooting the server, verify the following services are running on the server:

1. ECF-live/test/dev service depending on the application type live/test/dev applications 2. ECF-tomcat7/8 service 3. fastServe service 4. httpd service 5. ECF-beam-live/test/dev service 6. FTS-live service if the service is required to run 7. Oninit process , for Informix database for the inside server 8. Check and make sure that the RunSyncDocument is turned off ie

#chkconfig –list RunSyncDocument. Even if it says off on reboot, run a stop command on the service #service RunSyncDocuments stop, # cat /etc/cron.d/SyncDocumentsCron to make sure its commented out. Then run this command across the db servers--# service RunPacerSendDaemon stop; service RunPacerSendDaemon.bap stop; service RunSyncDocuments stop; service RunSyncDocuments.bap stop RunSyncDocuments.bap == disabled RunSyncDocuments == disabled

Grep the service name to know if its running or not, # ps –ef| grep –I service name . The output will give you the status of the service. If the service is not running, start the service, # service service name start For Informix # oninit –v will start the Informix process.

CMSO Patching Procedures

CMSO Patching Procedures 10 Document Date: August 5, 2021

7 PPS Lower Environment Linux Servers in the CMSO vRA/Cloud (except ATLAS)

PACTS servers run on Linux.

Note: ATLAS has special patching considerations, see separate section for ATLAS.

Before Patching Perform the steps below the day before patching:

1. Logon to vrautil02.opps.gtwy.dcn as ansibleadmin user. 2. Open cron jobs (crontab -e). 3. Navigate to the line that starts with “Patching script:”

4. Change the patching day on the first line (see red box below) to match the announced PPS Patch Schedule for that month. Start time should be 12:00 AM on the date of patching.

Note: Normally patch day is the Saturday after Patch Tuesday but may be changed if, for example, CHNO plans to conduct maintenance on the vRA that day.

5. Go into “vra_pps_linux.patching.yml” (cron job that kicks off patching) and verify that it is still

pointing to the correct patching script:

6. Perform check to make sure that servers are reachable through ansibleadmin user by running ansible -s -m command -a 'yum check-update' 'vra_pps_linux'.

Patching Steps

7.2 Automated Patching The automated process was implemented in September 2017 and is the preferred method for patching PPS Linux servers including PACTS. Some servers are EXCLUDED from the automated process:

• Bamboo servers - excluded due to specific patching issues on these servers

CMSO Patching Procedures

CMSO Patching Procedures 11 Document Date: August 5, 2021

• ATLAS servers – excluded due to additional steps that must be done for these servers - see Section 7 ATLAS Lower Environment Servers

PACTS servers are kernel locked due to incompatibility of updated kernel with current version of Zing used by PACTS. The script excludes the kernel, and on some newer PACTS lower-environment servers (those in the vRA), the kernel is locked in the config file.

• As of September 2019, PACTS Prod servers have been and continue to be kernel

locked at 2.6.32-696.16.1.el6.x86_64 (RHEL 6.10). • PACTS lower-environment servers try to match Prod – see iTop for specific

servers/versions. The patching script is located on the Enviro Team Ansible server: vrautil02.opps.gtwy.dcn (10.34.120.158).

1. The job will be triggered at 12:01am per the published Patch Schedule posted at http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/_layouts/15/start.aspx#/Patch%20Schedules/Forms/AllItems.aspx.

2. After completion, the server will be rebooted automatically. Note: Services should start. If there’s an issue, will show up in validation and fix at that time.

3. Two log files will be created:

• Patching logs called “patching.DAY.log” located in the destination server’s /tmp directory • Ansible logs called “/tmp/patching.DAY.log” located on the Ansible server

(10.34.120.158) under /tmp directory

Log files will be identified by day (Thursday, Friday, etc.) Example: A patching batch run on Thursday will generate the logs below: Destination server: /tmp/patching.THU.log Ansible server vrautil02.opps.gtwy.dcn: /tmp/patching.THU.log

4. Go through the logs for all the PPS Linux servers and validate and remediate as needed.

a. Verify the filesystems are mounted

#df –h b. Run cat /var/log/yum.log (shows status of patches that were applied).

If there are issues patching, see Section 9 Linux Patching Troubleshooting.

7.2 Manual Patching (Backup Method) PPS Linux patching (except ATLAS) should use the automated procedure; however the manual steps are provided below in case they’re needed. For steps for patching ATLAS see separate section.

Note: PACTS servers are kernel locked until further notice. On some newer PACTS lower-environment servers (those in the vRA), the kernel is locked in the config file (see iTop).

1. Put the servers in the blackout/monitoring tool (Zabbix) and disable the monitoring so that the alerts

won’t be triggered when you reboot the servers. 2. Now log in to the server scheduled to patch. 3. Take an output of “df –h” command to make sure what file systems are mounted.

#df –h 4. Check the services running on the server using the below command:

CMSO Patching Procedures

CMSO Patching Procedures 12 Document Date: August 5, 2021

# ps –ef | grep –i jboss/Informix/mysqld etc. 5. Check whether it is an Application server or DB server by going to the directory:

#cd /opt #ls –rlt (shows: Informix, jboss, zabbix, mysqld, etc.)

6. Stop the Application/DB services as shown below: --If BO Application Server (oppsbodev1, oppsbodev2, oppsbodev3, oppsbotest1, oppsbotest2), before the reboot, stop the application with these commands: #cd /opt/BO3_1/bobje #./tomcatshutdown.sh #./stopservers

#./mysqlshutdown.sh 7. Once the services are stopped; run the below command to verify the available packages to be

applied. # yum check-update or yum update (shows the transaction list of packages to be applied)

8. Verify that the packages related to the Application(s)/ Database(s) is excluded as shown below: # yum update --exclude=zabbix*, --exclude=mysql*, --exclude=bo*, --exclude=postgres*, --exclude=zing* --exclude=jboss*, --exclude=informix* -y

(Technically Application/DB related packages to be excluded should be provided by the App/DB team, because we cannot simply update all the packages available in the repository) 9. The above command shows you the list of the packages to be updated; confirm it and hit enter. 10. Once the packages are installed successfully; reboot the server. #shutdown –Fr now 11. When the server comes back online, start the validation (see validation steps in the Automated

Patching section above).

--BO Application Start— #cd /opt/BO3_1/bobje #./mysqlstartup.sh #./startservers #./tomcatstartup.sh If there are issues patching, see Section 9 Linux Patching Troubleshooting.

After Patching 1. Validate that patching and reboot have been successful. 2. Reply back to the main (large-group) maintenance announcement that work is complete, and list any

servers having issues that were not able to be fixed during the maintenance window. Note that we are looking into the issues and will update separately.

3. If there are servers having issues and need to be addressed on the next business day, then: a. Create sub-task(s) for the servers having issues under the main monthly patching ticket. Note

the issue and fix for each server in the sub-task(s). b. Once the issues are fixed, schedule patching/reboot and reply back to the maintenance

announcement but only to small group: 1. Remove all distro lists from the email. 2. Add the relevant Dev Team email distro list (see PPS POCs) and state when

the patching/reboot is scheduled for. If there are test automation servers, include the Test Team as well.

3. Copy the Engineering Team.

CMSO Patching Procedures

CMSO Patching Procedures 13 Document Date: August 5, 2021

c. The next morning validate that patching and reboot were successful and reply back to the small-group announcement that work is complete. If there are continuing issues, list the servers having issues and return to Step 3b.

4. Turn off cron job. a. Logon to vrautil02.opps.gtwy.dcn as ansibleadmin user. b. Navigate to the line that starts with “Patching script” and comment out the cron job:

Note: It is important to do this step so that the cron job doesn’t kick off automatically on same day next month.

Special considerations for vrautil01 server vrautil01.opps.gtwy.dcn (10.34.120.22) is CMSO Engineering Team tool server used for server build/provisioning. Exclude Python from patching.

Special considerations for vrautil02 server vrautil02.opps.gtwy.dcn (10.34.120.158) is CMSO Engineering Team tool server used for system administration tasks (account management, patching, etc.). Exclude Python from patching. Since this server is used to patch all the other servers, patch it manually after all the other servers.

Special considerations for metacentertest server Metacenter is an application under the DSS Development Team. The Metacenter Test server--metacentertest.opps.gtwy.dcn (10.34.120.106)--is patched manually. Services must be stopped prior to patching per the below instructions, then need to be started after patching/rebooting. STOP SERVICES BEFORE PATCHING METACENTER: Stop Tomcat: cd /opt/metacenter/Tomcat85_MC/bin sudo ./shutdown.sh Stop Search and Content Server: ps -ef|grep search | awk '{print $2}' ps -ef|grep content | awk '{print $2}' sudo kill -9 xxxxxx (where xxxxxx are the process IDs that display when the search and content commands above are run) START SERVICES AFTER PATCHING/REBOOTING METACENTER: Make sure the firewall is not running. Start Search Server: cd /opt/metacenter/metacenter_home/config/search_server/bin sudo nohup ./runConsole.sh & Verify ps -ef|grep search

CMSO Patching Procedures

CMSO Patching Procedures 14 Document Date: August 5, 2021

Start Content Server: cd /opt/metacenter/metacenter_home/config/content_server/bin sudo nohup ./runConsole.sh & Verify ps -ef|grep cont Start Tomcat: cd /opt/metacenter/Tomcat85_MC/bin sudo ./startup.sh Verify ps -ef|grep -i tomcat Validate that the Metacenter app is up and running: http://metacentertest.opps.gtwy.dcn:8085/portal http://metacentertest.opps.gtwy.dcn:8983/solr

CMSO Patching Procedures

CMSO Patching Procedures 15 Document Date: August 5, 2021

8 ATLAS Lower Environment Linux Servers in the CMSO vRA/Cloud The ATLAS Linux Test servers are RHEL7 servers in the vRA and are patched manually:

• aoatlas-e-webt1.opps.gtwy.dcn 10.34.120.172 • aoatlas-e-wast1.opps.gtwy.dcn 10.34.120.174 • aoatlas-e-ldapt.opps.gtwy.dcn 10.34.120.175 • aoatlas-e-db2t1.opps.gtwy.dcn 10.34.120.180

The servers are patched the Friday of the first week of patching. Contact info for the ATLAS Tech Lead:

FYI: The Prod servers, which are managed and patched by CHNO, are release- and kernel-locked. To see specific version, go into iTop, under the “Server” configuration item search on Project = ATLAS, Organization = CTHO, and Environment = Prod or search on the word “kernel”. (However it is best to check on the server directly.) Kernels are locked 2.6.32-754.3.5.el6.x86_64 RHEL 6.10 (santiago) or 2.6.32-754.6.3.

Before Patching 1. Obtain list of available updates off of Test server aoatlas-e-webt1.opps.gtwy.dcn 10.34.120.172:

a. Log on to server b. Run "yum check-update" c. Copy into Notepad.

2. By the Tuesday before Patch Week, email the file to (ATLAS Tech Lead) and copy (ATLAS PM) requesting that validate the list of patches and enter a comment in the Jira ticket if any changes are needed. Provide link to the Jira ticket.

3. Attach Notepad file to the ticket.

Patching Steps 1. Putty into the server. 2. Run “yum update” excluding any packages indicated by the Development Team in “Before Patching”

step above. 3. Once patching is successful, reboot the server. 4. Verify that server is up and running. If there are issues patching, see Section 9 Linux Patching Troubleshooting.

After Patching As ATLAS patching is combined with PPS patching, see Section 6.3 After Patching.

CMSO Patching Procedures

CMSO Patching Procedures 16 Document Date: August 5, 2021

9 Lower Environment Linux Servers hosted by CTHO Servers that are hosted by CHNO are patched by CHNO staff following their own procedures.

10 Troubleshooting Linux Servers

Troubleshooting PACTS & ATLAS Servers If patching or reboot fails, it could be due to one of the following causes:

1. Space issues on the server or log is full (e.g., /var, audit/logd) o Note: for audit/logd, fix by starting in single user mode, then go into audit/logd and clean

it out (run /dev/null <specify file>) 2. ansibleadmin password has expired (expires every 60 days per CMSO Security requirement) 3. Cannot SSH

o Wrong fingerprint/corruption on SSH authorized key file – remove and add back o If it’s a vRA server, try logging on to vRA console and troubleshoot

4. The repo is slow and there is a timeout issue o Disable repos that aren’t needed

5. Server is down o Could be decommissioned o In single user mode (rare)

Failed patch: Log into the server as root, do a quick check to know the Errata and Packages that failed to update and do a Manual yum update/install. As root run # yum update (argument) where argument is the package you intend to update. Eg firefox or run this as root # yum update --skip-broken --exclude=ao*,kernel*,java*,httpd*,zabbix* , this command will update the system but will exclude ao- packages,kernel upgrade,httpd ,java and Zabbix Failed reboot: Try bouncing the server from vRA console. If that doesn’t work log into the server as root and run a manual reboot. To reboot the system run this command as root # shutdown –r now

Troubleshooting CMECF Servers CMECF servers are patched via BSA (see Section 12 Patching via BladeLogic Server Automation (BSA) Portal for details). Pre-checks can be done prior to patching to determine if patching might fail on a server and correct the issue in advance. Specifically, run patch analysis job as part of prepping for patching. If the job fails, this will be indicated in the BSA log for each server. For those showing as failed, click on the log, and use information that appears to troubleshoot issues (e.g., BSA agent isn’t running). More information on remediating patch analysis jobs is at https://chno.jdc.ao.dcn/Projects/BMCServerAutomation/Documentation/BSA%20Portal%20Patch%20Analysis%20Remediation%20Log%20Viewing.pdf If patching or reboot fails, it could be due to one of the causes below.

1. Space issues on the server or log is full (e.g., /var, audit/logd) o Note: Since CMECF patching is done via BSA, running the patch analysis job that is

done as part of patching prep will catch space issues.

Example: If /var is 100% full, the ecf directory is a frequent cause on CMECF servers.

Go to /var/log/tomcat7, look for files by size: ls -l | sort -k5nr | sed 20q

CMSO Patching Procedures

CMSO Patching Procedures 17 Document Date: August 5, 2021

and gzip any that are old and big. So it looks like this:

Leave the current files alone (catalina.out & cmecf.log), but the ones with dates are safe to zip.

2. Cannot SSH

o Wrong fingerprint/corruption on SSH authorized key file – remove and add back o If it’s a vRA server, try logging on to vRA console and troubleshoot o If it’s a hosted server, contact CHNO CMHB Hosting Branch -

AOTXml_HSD_CMHB_LIN. 3. The repo is slow and there is a timeout issue

o Disable repos that aren’t needed 4. Server is down

o Could be decommissioned o In single user mode (may happen if space is full, e.g., auditd. Log on in single user mode

and free up space) 5. BSA agent isn’t running (rcsd)

o Start the service 6. Issue with the BSA setup (e.g., not communicating)

o Run through the list of issues above. If there is still an issue, contact the BSA Team - (primary), (federal lead). Team distro list - AOTXml_HSD_BL_Admins

7. CMECF services didn’t start [in particular Apache/httpd, fastServ (thttpd), and cmecf-beam-live or cmecf-beam-test]

o Start the service

Troubleshooting RHEL8 Servers Some of the yum-* packages were deprecated in RHEL8. Use “dnf” instead of “yum” to install the required packages where applicable.

11 Windows Patching in the Lower Environments This section outlines the Case Management Systems Office’s (CMSO) Application Support Division’s procedures for performing Windows patching in the lower environments (everything except Prod, Train, and Demo).

Assumptions This document assumes the user has access to, and knowledge of the following items:

• Access to Windows servers in the lower environment • Administrative privileges to install Microsoft patches and updates in the lower environment • Administrative privileges to reboot Windows servers in the lower environment

Procedure Overview All Windows servers in the CMSO VRA must download and install the latest critical and security updates provided by Microsoft on a monthly basis. The purpose of these updates is to prevent or fix problems, enhance the security of the computer, or improve the computer’s performance.

CMSO Patching Procedures

CMSO Patching Procedures 18 Document Date: August 5, 2021

Monthly patch schedule announcement is sent out via email. The patching is performed on the Saturday after Patch Tuesday from 7:00AM to 3:00PM. Engineers do not need to send patch notifications out since Deborah usually does it days/weeks before the schedules patch weekend. Once an engineer has completed patching on the scheduled day, the engineer must send a completion email, responding to original patch schedule announcement and removing any distro lists except the two distro lists below (keep these):

AOml_DPS-CMSO Office Staff AOml_DPS-CMSO Office Contractors

Currently Windows patching is done using a PowerShell tool but this will change soon. The CMSO Engineering Team will soon start using TSSA (BSA) system for patching.

Patching Windows Servers via PowerShell Utility UU2-master PowerShell utility: This is a PowerShell utility used for downloading and installing windows updates. Steps for patching using the utility

1. RDP to Aoenv-e-windev1.tadu.dcn and login with tadu-a account 2. Run PowerShell as admin

a. 3. Type in cd E:\WUU2-master\

a. b. You can also cd to the directory where the WUU2-master file is located

4. Type in .\WUU.ps1 and hit ENTER (Don’t forget the .)

a. 5. The PowerShell Windows Update Utility box will appear.

a.

CMSO Patching Procedures

CMSO Patching Procedures 19 Document Date: August 5, 2021

6. Save the servers in a .txt file as shown below ( i.e. you can use the server IP instead of the hostnames)

a. 7. Go back to the PowerShell Windows Update Utility

a. b. Click File c. Click Add computers from file

i. d. Select the file the contains the list of servers

CMSO Patching Procedures

CMSO Patching Procedures 20 Document Date: August 5, 2021

e. f. WUU will connect to the servers in the list and check for updates. (Be patient, as this can

take several minutes). g. Once it’s done, it will display the results in a sortable table, color-coded based on update

status (green = up to date, orange = reboot required, red = updates needed)

h. Select the ones in orange (hold ctrl to select multiple). Right click, “Restart Computer” i. For the ones in red, if the Available column doesn’t match the Downloaded column, right

click them and select “Download Updates” j. Once all available updates are downloaded, select the servers and choose “Install

Updates” k. Repeat the above steps until all servers are green and show “All updates installed”

Patching Windows Servers Manually via Windows Update The below steps should only be used if patching manually. 1. Before starting to patch, make sure a communication email has been sent as noted in Section 3 Pre-

Requisites. 2. From Administrator computer, launch Remote Desktop Connection. 3. Input IP of server you would like to patch and hit Connect. 4. Windows Security prompt will ask for credentials. Input the password and hit Ok. 5. Launch Windows Update by clicking Start, All Programs, Windows Updates

a. On Windows Server 2016 and up. b. select the Start button, and then go to Settings > Update & Security > Windows Update .

6. Windows Updates will open. On the left hand side of the pane, click Check for updates.

CMSO Patching Procedures

CMSO Patching Procedures 21 Document Date: August 5, 2021

7. If new updates are found, a message will appear notifying you. Click Install Updates to download

and install the updates.

8. Once new updates are installed, Windows Update will require that you reboot the computer. Do so

by clicking Reboot Now 9. Once computer comes back up, log in to ensure that all patches were installed and configured

correctly. 10. Once you are logged back in, run another Windows Update check to ensure that the computer has

received the latest patches. If Windows is up to date, the following message will be displayed.

CMSO Patching Procedures

CMSO Patching Procedures 22 Document Date: August 5, 2021

Important Note: 1. In some rear cases, you will find a server that is not able to download patches automatically. If

this occurs, you will need to download the patches directly from Microsoft and then drop the patches on the server and install. Specifically, on aoevtsd-e-app2.tadu.dcn, the Background intelligent transfer service (BITS) is corrupt/jammed causing windows patching to fail. When patching this server, download the patches from Microsoft and install them manually. Reference ESPT-29477 regarding the specific issue

2. Before patching, we need to take a snapshot of these two TFS servers (aotfsdev-e-app1.tadu.dcn

10.34.120.155 and aotfsdev-e-sql1.tadu.dcn 10.34.120.115). The snapshots can be deleted after 72 hours or after DEV team has tested the server. After patching, reach out to Troy Munford for testing. Generally, we need to create a heat ticket for vRA cloud team to take a snapshot of aotfsdev-e-sql1.tadu.dcn 10.34.120.115. Due the size of the drives, the snapshot keeps failing on our end.(Example ESPT-29354)

After Patching Follow the steps in Section 6.3 After Patching.

CMSO Patching Procedures

CMSO Patching Procedures 23 Document Date: August 5, 2021

12 Mac Pro Overview The Mac Pro (name: CMSOs-Mac-Pro) is patched the Monday (Tuesday if holiday) after Patch Tuesday between 7am-9am ET. Even if there are no patches to be installed, the Mac Pro is rebooted during this time to optimize performance. The Mac Pro is the build device for most of CMSO’s mobile applications. More information about the Mac Pro is in the Data Center Admin Guide.

Before Patching Perform these steps before the day of patching:

1. Log on to CMSOs-Mac-Pro to check if there are any available updates to be installed a. Click on the Apple logo at the top left corner and click system preferences

b. Click software update

c. Here you’ll find all the updates available to be installed

CMSO Patching Procedures

CMSO Patching Procedures 24 Document Date: August 5, 2021

Caution: As you can see above, there is an OS upgrade available to be installed. Make sure to confirm all OS upgrades with the Mac Pro users before installing. Currently, they don’t want to upgrade the current OS to MacOS Big Sur.

2. Confirm Data Center staff is available to stand by in case hands-on support is needed (e.g., manual machine re-start).

Contact information Ashburn (ASHVA) On-site engineer Site ID 1605 AT&T/AOUSC c/o Evoque 21571 Beaumeade Circle, Site ID 1605, Ashburn, VA 20147 Cell Phone: 202-706-0198 24x7 Tech Phone #: 571-799-9950 Email: (doesn’t seem to be using ATT email address) AT&T staff hours: Mon-Fri, 8-5 After hours/weekend support: Submit ticket to NMF per the Data Center Admin Guide (see above). Provide Evoque ticket # which will be issued after submitting HEAT ticket. will interface with Evoque on-site staff.

3. Email the changes to be implemented (patching, Nessus upgrade, no patches just reboot, etc.) and ask if they’re available to test the Mac Pro for verification any time after 9:00AM.

Example:

Response:

CMSO Patching Procedures

CMSO Patching Procedures 25 Document Date: August 5, 2021

Patching Steps 1. Log on to the system. 2. Go to the top left, click on the apple. 3. Click on System Preferences. 4. Click on Software Updates. 5. Click on More info. 6. Go to the list of patches and uncheck the ones you don’t wish to apply.

Note: Exclude anything that looks like it would patch the hard drive. Patching the hard drive could cause major issues.

7. Click Upgrade Now. a. Do not perform OS upgrade without confirmation from

After Patching 1. Verify the server is up and running.

a. Make sure you can access and log into the server. 2. Email to verify systems are running and applications are working fine. 3. Close Jira ticket with comment that verification was received.

Troubleshooting See the Data Center Admin Guide for more info on the Mac Pro including troubleshooting info.

13 Atlassian Servers Patching of Atlassian servers is generally done on the 4th Sunday of the month.

Note: Weekdays are not an option because Bamboo jobs for automated deployments run until 9 pm, and server backups start at about 11 pm.

Prior to patching, it is important to review the Atlassian Admin Guide, in particular sections on Key Links, Points of Contact (in particular the CHNO section), Bamboo, and Escalation Procedure. Instructions for restoring individual files/directories using vRA self-service are at https://wiki.opps.gtwy.dcn/pages/viewpage.action?pageId=570851492#BackingUp&RestoringvRAServers

CMSO Patching Procedures

CMSO Patching Procedures 26 Document Date: August 5, 2021

(UsingDellSystem)-Restoringindividualfiles/directories. See also Section 5 Backups and Restoring Servers and Files/Directories in the CMSO vra/Cloud.

Patching Prep

1. Send patching schedule (Environment Team Coordinator sends for all CMSO servers the week before Patch Tuesday).

2. Create Jira ticket to track the work and assign to Atlassian Administrator (Environment Team Coordinator will do this as part of setting up the master ticket for monthly patching).

3. Verify application is functioning as expected after lower environment servers are patched (generally done the 2nd Saturday of the month).

4. On the Thursday before patching, delete any snapshots of the servers scheduled to be patched. If an existing snapshot is less than three days old, contact the owner of the snapshot.

5. On the Thursday before patching, take a snapshot of the Bamboo, storage, and Nexus servers.

This needs to be done during business hours because they are large VMs and may need Cloud Team support to create space on the datastore to be able to take the snapshot.

If there is an issue taking a snapshot, submit a HEAT ticket to the Cloud Team and call the National Service Desk to escalate – see instructions at https://wiki.opps.gtwy.dcn/display/VRA/VRA+Portal#VRAPortal-CHNOCloudTeam(includesvRABackupsupport.

6. Send patching reminder to the CMSO groups below (do not include CHNO groups) the Friday

before Atlassian patching. Copy the text out of the main monthly patch schedule sent earlier in the month by the Environment Coordinator.

AOml_DPS-CMSO Office Staff AOml_DPS-CMSO Office Contractors

Steps

1. Email CMSO that maintenance is starting. Reply back to patching reminder that was sent the Friday before patching.

2. Atlassian Admin performs application sanity checks.

a. Jira i. Log into the Jira application as administrator ii. Administration → System → Integrity checker iii. Select all → Check iv. Click on option for application links and make sure all the applications are

connected b. Other apps

i. Log into the application as administrator ii. Go to Troubleshooting & Support section iii. Kick off the application’s troubleshooter.

3. Atlassian Admin stops all Bamboo jobs. See instructions in the Atlassian Admin Guide.

4. Atlassian Admin stops services on all the Atlassian application servers. Switch to “atlassian”

user account (do NOT use root). However, Nexus and SVN server require root user to shutdown services

CMSO Patching Procedures

CMSO Patching Procedures 27 Document Date: August 5, 2021

sudo su - atlassian jira: /data/atlassian/jira/bin -> ./stop-jira.sh bamboo: /data/atlassian/bamboo/bin -> ./stop-bamboo.sh kill -9 (process id's) bitbucket: /data/atlassian/bitbucket/5.9.0/bin -> ./stop-bitbucket.sh Confluence (wiki): /data/atlassian/confluence/bin -> ./stop-confluence.sh fisheye-crucible (codereview): /data/atlassian/fisheye-crucible/bin -> ./stop.sh Nexus: switch to root -> sudo su - /data/nexus/app/bin -> ./nexus stop svn: switch to root -> sudo su -

service httpd stop service httpd status – checks the status of http service

5. Atlassian Admin stops the five database nodes. Use root account.

The database servers are behind firewall so they can only be accessed via vrautil02.opps.gtwy.dcn using ansibleadmin [ansibleadmin@vrautil02 ~]$ ssh mysqlnode-01.opps.gtwy.dcn [ansibleadmin@vrautil02 ~]$ ssh mysqlnode-02.opps.gtwy.dcn [ansibleadmin@vrautil02 ~]$ ssh mysqlnode-03.opps.gtwy.dcn [ansibleadmin@vrautil02 ~]$ ssh mysqlnode-04.opps.gtwy.dcn [ansibleadmin@vrautil02 ~]$ ssh mysqlnode-05.opps.gtwy.dcn

become root -> sudo su - service mysqld stop

6. Take snapshots of the servers. Note: Snapshot of the large servers (Bamboo, storage, Nexus)

should have been taken the Thursday before patching as part of Section 11.1 Patching Prep.

Note: Until such time as space issues in the vRA are resolved, take snapshots on a rolling basis, that is take snapshot of the first server, patch server, if server comes back online with no issues, delete snapshot. Then move on to next server.

7. Sys Admin patches servers as per normal procedure for PPS Linux boxes (i.e., use yum update)

except for the special considerations below:

a. Exclude the following package from patching on server as indicated (- yum update -x):

CMSO Patching Procedures

CMSO Patching Procedures 28 Document Date: August 5, 2021

kernel* on bamboo.opps.gtwy.dcn docker* on automationajta.aocms.gtwy.dcn Note: 7/16/2021 Kernel lock implemented on bamboo server via yum.conf and version lock file; locked at v3.10.0-693.5.2. The docker exclusion on automationajta is in place via yum.conf.

b. Reboot servers in the following sequence. Make sure server comes back up completely before proceeding to the next server:

storage.opps.gtwy.dcn mysqlnode-01.opps.gtwy.dcn mysqlnode-03.opps.gtwy.dcn mysqlnode-04.opps.gtwy.dcn mysqlnode-05.opps.gtwy.dcn

c. On the SQL nodes make sure the shared file system from storage.opps.gtwy.dcn is mounted.

8. Atlassian Admin starts the databases. Use root account.

Run “systemctl status mysqld” to see whether the mysql node is running. If so, proceed to next step. If not, then run on all mysql servers 01,02,03,04,05

become root -> sudo su - systemctl start mysqld

9. Atlassian Admin starts services on the application servers. Switch to “atlassian” user account

(do NOT use root). However, Nexus and SVN server require root user to start services. Startup commands are below.

sudo su - atlassian jira: /data/atlassian/jira/bin -> ./start-jira.sh bamboo: /data/atlassian/bamboo/bin -> ./start-bamboo.sh bitbucket: /data/atlassian/bitbucket/5.9.0/bin -> ./start-bitbucket.sh Confluence (wiki): /data/atlassian/confluence/bin -> ./start-confluence.sh fisheye-crucible (codereview): /data/atlassian/fisheye-crucible/bin -> ./start.sh

Nexus: switch to root -> sudo su - /data/nexus/app/bin -> ./nexus start On nexus.opps.gtwy.dcn, make sure the nexus process is actively running: Run “#service nexus status”

CMSO Patching Procedures

CMSO Patching Procedures 29 Document Date: August 5, 2021

Note: This check is done because PACTS deployments to lower environments and ESB deployments to all environments including Production are dependent on Nexus. The Nexus server has a startup script to make sure that http://usppsappdev-nexus.opps.gtwy.dcn:8081/nexus/index.html is accessible.

svn: switch to root -> sudo su - service httpd start service httpd status – checks the status of http service

10. Atlassian Admin starts Apache on all the application servers.

a. Run systemctl restart httpd

b. If prompted for certificate passphrase, run command “openssl req -in file.csr -noout -text” and look for “challenge passphrase.”

Note: The challenge passphrase is in the CSR file which is located in etc/pki/tls/certs on each server (e.g., etc/pki/tls/certs/jira.opps.gtwy.dcn.csr.)

Note: You should only get the challenge on the Jira & Wiki servers.

11. Atlassian Admin performs sanity checks:

a. Jira: i. Log into the Jira application as administrator. ii. Administration → System → Integrity checker (left panel) iii. Select all → Check. iv. Click on option for application links and make sure all the applications are

connected. b. Other applications:

i. Log into the application as administrator. ii. Go to Troubleshooting & Support section. iii. Kick off the application’s troubleshooter.

c. Verify Bamboo remote agents are running: i. Go to https://bamboo.opps.gtwy.dcn/agent/viewAgents.action > Online remote

agents tab, and verify that 25 remote agents are online and of these the three agents on cm9adb are disabled.

ii. If needed, follow the steps in the Bamboo section of the Atlassian Admin Guide to re-start the Linux and Windows agents below which are known to have issues re-starting:

1. Linux – agents on pv4adb-dev server 2. Windows - aoenv-e-winbld1.tadu.dcn, Remote Agent 041, and Remote

Agent 042 d. Verify Bamboo jobs have started.

The jobs that run on the weekend are automated and should start on their own once the Bamboo service and remote agents have started. To check that jobs have started, go to the remote agent status page above and look on the “Online remote agents” tab, or see instructions in the Atlassian Admin Guide.

12. Reply back to maintenance announcement that work is complete and resolve Jira ticket with

comment “patching complete.” If there is continuing issue with a server that will go beyond the maintenance window, state in the announcement that work is complete except for <list server(s)> and that we are looking into it.

CMSO Patching Procedures

CMSO Patching Procedures 30 Document Date: August 5, 2021

a. If escalation is needed, start escalation procedure. b. Reply back to maintenance announcement when issue is fixed. c. Enter issue, fix, and root cause in the patching Jira ticket.

13. Sys admin deletes the snapshots of the Bamboo, Nexus, and storage servers by COB of the

Monday after patching. The snapshots of all the other servers should have been deleted on a rolling basis during patching.

Troubleshooting Top issues encountered during Atlassian patching & fixes: • Package dependencies – Fix: search for dependent packages in the repo & apply • Shared file system issue with storage.opps.gtwy.dcn (Atlassian storage server) – Fix based on nature

of the issue • When Atlassian Admin does sanity checks, the app appears to be up & running but some process is

holding it – Fix: kill the zombie process and re-start the application • Failed to resolve dependency for xyz package (dependency resolution conflict) – see ticket below for

examples and fixes and general way to troubleshoot this kind of issue: • bamboo.opps.gtwy.dcn - ESPT-25174 • bambooagent-01.opps.gtwy.dcn - ESPT-25174

Note: If a server is powered off or having trouble starting, you cannot watch it reboot from the vRA console. If server can’t be restarted, restore to snapshot and work with Environment team leads to schedule maintenance during business hours when the Cloud Team is available.

14 SQL Server Patching CMSO is responsible for patching SQL Server database software on its systems in the vRA.

BAIMS SQL Server Patching

1. The monthly patching announcement is sent the 2nd Tuesday of the month (Patch Tuesday). A sample announcement email including groups it should be sent to is included below.

2. Servers are patched the Sunday following “Patch Tuesday” of each month. 3. When patching is completed, the same email groups are notified and asked to verify the

application is working as expected.

From: Sent: Tuesday, November 12, 2019 3:12 PM To: AOml_BAIMS_AdminSupport; AOml_DPS-CMSO Office Contractors; AOml_DPS-CMSO Office Staff Cc: AOml_DPS-CMSO-ENV-DBTeam Subject: Scheduled SQL Server Patching of BAIMS Non-Production Systems Sunday, November 17 (CMSO) This is a reminder for the SQL patching of non-production servers taking place 11/17/2019. Action: Installation of SQL Server Cumulative Update 10 for Service Pack 2 (KB4524334) a) Environment(s): Non - Production MS SQL Servers. b) Who: c) Date/Time: Sunday, November 17, 2019 between 6 AM to 9 AM EST d) Outage: Yes

aobaimdev-e-db.tadu.dcn (10.34.120.137) aobaimtst-e-db.tadu.dcn (10.34.120.151) aobaimstg-e-db.tadu.dcn (10.34.120.156)

CMSO Patching Procedures

CMSO Patching Procedures 31 Document Date: August 5, 2021

e) Impact: Servers will be unavailable while systems reboot. Please ensure someone on your team verifies that all systems are back up and running normally once the patching has been completed. A confirmation of successful patches and updates is not always indicative of a healthy server.

Other SQL Server Patching TBD

CMSO Patching Procedures

CMSO Patching Procedures 32 Document Date: August 5, 2021

15 Patching via BladeLogic Server Automation (BSA) Portal Currently only CMSO CMECF servers in the vRA are patched via BSA. CMSO is responsible for patching the servers. PPS Linux and Windows servers are expected to be added in 2021.

BSA POCs/Contact info • POCs - (Team Lead, primary for user access to BSA system), • (primary for troubleshooting server connection/patching issues), • Distro list - AOTXml_HSD_CMHB_BL_Admins • On-call POC for day-of patching – 210-428-2475

URLs/App As of June 2021, we are patching CMECF and PPS Linux servers via the Console rather than the Portal (web-based client) as it has significantly more flexibility and functionality to patch and manage servers. We still use the Portal to obtain the rollup report as it is preferred by management. • BSA Portal - https://bsaportal.ao.dcn/dcaportal/site/dcaportal.login.loginpage • BSA Console – Go to Search bar on your laptop and type “automation console” (the Console is

installed locally). • BMC Server Automation SharePoint site - use to access ELLIS (tool for adding servers to BSA) and

user documentation - https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation See below for instructions to request access.

Pre-requisites Below user setup pre-requisites are needed to patch servers via BSA and to add/remove servers.

15.3 Access to the servers to be patched Submit a JIRA ticket to the CMSO Engineering Team. Use Project = ESPT, Issue Type = Account Administration.

15.3 Access to the BSA Portal 1. Log into the HEAT Service Portal at https://nsms.ao.dcn/HEAT/Default.aspx using your network

credentials. 2. Click on Service Catalog in the menu bar. 3. Enter “BMC” in the search field. 4. Select the “BMC BladeLogic Server Automation (BSA) Access Request” ticket.

5. Some data will auto-populate. 6. For “BSA Role Name,” enter “CMSO_Linux_Admin” or “CMSO_Windows_Admin” as appropriate. 7. Submit ticket. 8. Once access is granted, verify your access.

If you have issues, email the BSA Team – see POCs section above.

CMSO Patching Procedures

CMSO Patching Procedures 33 Document Date: August 5, 2021

15.3 Access to the BSA Console Pre-requisite: You will need the full computer name of your workstation (Start > right-click This PC > More > Properties)

1. Email the request to [email protected], copy (BSA Team Lead). Include info below:

Justification: I will be using TSSA to patch and manage CMSO servers in our vRA/cloud. Username: adu\<network login username> (e.g., adu\) Workstation: <full computer name of your workstation> (e.g., usc.ao.dcn)

2. Download the TSSA installer from here: https://aoenv-e-

windev1.tadu.dcn:444/TSSA/TSSA%20Console/TSSACONSOLE89-SP4-P3-WIN64.exe and save it to your temp folder – e.g., C:\Temp\TSSACONSOLE89-SP4-P3-WIN64.exe

3. One of the BSA Admins will be in touch to do a screen share to configure the app. If you don’t hear from someone, email AOTXml_HSD_CMHB_BL_Admins.

15.3 Access to the BMC Server Automation page (BSA documentation & ELLIS) The BMC Server Automation page has links to job aids and other information. ELLIS is the system used to add/remove servers from BSA via an Excel spreadsheet front end.

To obtain access:

1. Email the BSA Team at “AOTXml_HSD_CMHB_BL_Admins” requesting access to the https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation page, copy Team Lead. State that it is for purposes of patching CMSO servers and adding/removing servers from BSA.

2. Once access is granted, verify access by going to https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation and log in using your network credentials.

3. Verify ability to reach documentation via the links on the right side of the page as shown in the screenshot below.

4. Verify ability to get to the ELLIS spreadsheet/system highlighted in yellow below.

a. Click on ELLIS_v12.xlsm link to launch the spreadsheet. b. Click on ‘Enable content’

CMSO Patching Procedures

CMSO Patching Procedures 34 Document Date: August 5, 2021

c. If a Run-time 1004 error appears, go to the ELLIS How-to Guide and follow the steps in the “Run-time error 1004” section to fix the issue.

d. Relaunch the spreadsheet. The BSA login page will appear. Login in with your network credentials.

e. If prompted to enter role, select appropriate one (select CMSO admin if that’s an option).

If you have an issue, email the BSA Team – see POCs section.

15.3 Access to the CMSO Patch Schedule library Sys admins need Edit access to the Patch Schedule library to be able to view and update patch schedules. Email Team Lead, ask to be given Edit rights on the CMSO Zone > Patch Schedules library for purposes of updating patch schedules. Access the library as follows:

1. Go into MS Teams. 2. Go to the xDPS-CMSO-ENVMgmt Team (see screenshot below). 3. Go to the Environment Support channel (see screenshot below). 4. Select “Patch Schedules” the from menu bar at the top of the page (if you don’t see it use the

“more” drop-down menu - see screenshot below).

5. Click on the schedule you wish to edit or click on the ellipsis (…) and select “Open in SharePoint” to see all schedules for current and prior years organized by year.

CMSO Patching Procedures

CMSO Patching Procedures 35 Document Date: August 5, 2021

Note: All CMSO users and some CHNO sys admins have read-only access to the library.

Adding servers to BSA 1. Go to BSA documentation page - https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation 2. Click on ELLIS_v10.xlsm link to launch the spreadsheet. 3. Click on ‘Enable content’ 4. If a Run-time 1004 error appears, go to the ELLIS How-to Guide and follow the steps in the “Run-

time error 1004” section to fix the issue. 5. Relaunch the spreadsheet. The BSA login page will appear. Login in with your network

credentials.

6. If prompted to enter role, select appropriate one (select CMSO admin if that’s an option). 7. Populate the data – sample is below (check with Team Lead on whether values need to be

updated) http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/TeamDocuments/ELLIS_v10_7_16_2018.xlsm Sample BSA server smart group – http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/TeamDocuments/BSA%20Server%20Smart%20Group%20-%20CMECF%20Servers_CMSO_SG_to_add_list.xlsx Use the values below for the fields listed below: o Program – Primary application running on the server (CMECF, PACTS, DSS, etc.) o Service name – PO Cloud o Service team – CMSO (i.e., who’s responsible for managing/patching server) o Org code – DPS_CMSO o For the Environment field, use the table below to map values from iTop to ELLIS:

Environment in iTop Environment in ELLIS CMSO-Prod Production Demo Demo Dev Development Integration Development

CMSO Patching Procedures

CMSO Patching Procedures 36 Document Date: August 5, 2021

Preprod Test Prod Production Sandbox Development Stage Stage Test Test Test Automation Test Train Train

8. After populating the values, click on ‘Export Server List’ at the top of the spread sheet. Then,

select one of the radio buttons to ‘Add new servers to BSA’ or ‘Update existing servers in BSA’.

9. Enter information on the next screen and click ‘OK’

10. You will receive an email with list of servers that may already exist in BSA, any successfully

added servers to the BSA, any failed to update servers, etc. Refer to the sample email - http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/TeamDocuments/Sample_Email__Received_ELLIS%20BSA%20Server%20Add%20Results.pdf

Update IP/Server name in BSA Go to BSA documentation page - https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation/Documentation/Forms/DocType.aspx

CMSO Patching Procedures

CMSO Patching Procedures 37 Document Date: August 5, 2021

To update IP – no changes are needed to update IP in BSA If a server is listed in BSA inventory as a hostname, it will reference whatever is listed in DNS. So if you do a nslookup on the DNS name, and it returns a certain IP address, that same IP Address will show up in BSA. To update server name – Then, go to ‘BSA Portal NSH Script Operation Creation’ https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation/Documentation/BSA%20Portal%20NSH%20Script%20Operation%20Creation.pdf Create an operation for renaming a server Go to Parameters tab and click on ‘New Target Name’ Edit the value field to the new server name and save Go back to the ‘rename server’, from ‘Actions’ drop down select ‘Run Now’ Select a target by going to ‘Browse’ or search for it Select the row and click on ‘Execute’

Remove a server from BSA Note: Although you can do multiple servers at a time, this tool is a little buggy so it’s recommended to do one server at a time.

1. Stop the BSA Agent running on the server. 2. Log on to the BSA console. 3. Navigate to this location: BladeLogic > Jobs > AO Enterprise Jobs > Production > Toolbox >

Decommission Server. 4. Click on ‘Create Operation’ and select NSH script. 5. Enter Name ‘Decommission Server’ and Description and click Next.

CMSO Patching Procedures

CMSO Patching Procedures 38 Document Date: August 5, 2021

6. Click Next on the next few screens and click ‘Finish’. The job should be created. If you don’t have permissions, reach out to BSA Team.

7. After the job has been created, Go to ‘Decommission Server’ job, click on ‘Actions’ and ‘Run Now’. Browse/Search for the server that you want to remove from BSA and click ‘Execute’.

8. Verify that the server has been removed - Go to Inventory tab, enter the server name in the Search field, and the output should be “No target found.”

You can also unregister server by logging into the BSA system and execute this job against the server:

[/BladeLogic/Jobs/AO Enterprise Jobs/Production/Toolbox//Server Decommission/Decommission from BSA only]

To create the Monthly Patch Schedule 1. Create Word document:

1. Go to the most recent CMECF Patch Schedule at http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/Patch%20Schedules/Forms/AllItems.aspx and check it out. 2. Go to File > Save As, enter new date in the file name, and click on “Save to current folder” in the upper right. When prompted click “Discard checkout” of the old schedule. The new schedule is now saved in the Patch Schedule folder on SharePoint.

2. Generate the list of servers to be patched:

1. Go to BSA on SharePoint - https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation 2. Scroll down to ‘Agent Status Summary’ widget. 3. Next to “FILE” click on ‘Open in Excel.’ 4. Save file locally and open the file. 5. Navigate to the right-most tab called ‘Server Inventory Detail Report’ (use arrows on bottom-left of the sheet). 6. Filter by ‘AO_Program’ and select the groups starting with ‘CMSO’

CMSO Patching Procedures

CMSO Patching Procedures 39 Document Date: August 5, 2021

7. In the ‘AO_Branch’ column, exclude CMHB, NPHB, and any groups other than CMSO

3. Re-order columns in the spreadsheet to match the Patch Schedule. 4. Turn on “Filter” in the spreadsheet, and, filtering by “AO_Program,” copy/paste into the relevant section of the Patch Schedule. 5. Update the other info the Schedule: month, patch analysis jobs to be used, dates, assignee doing the work.

Patching via BSA The information below is from 7/24/2018 call with BSA Team To log on to the BSA Portal, go to https://bsaportal.ao.dcn and log on using your JENIE (network) credentials. Select the Site and Authentication Method options shown below:

CMSO Patching Procedures

CMSO Patching Procedures 40 Document Date: August 5, 2021

See WebEx recording of training session for CMSO: http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/TeamDocuments/Webex_BSA_CMECF_OS_patch_training.mp4

See “Using BSA” doc posted at http://cmso.jdcwin.jdc.ao.dcn/DIVISIONS/App_Support/environment/enviroteam/TeamDocuments/Using_BSA_for_Patching_CMECF_Servers.docx

From the BSA SharePoint site:

RSCD Agent Troubleshooting:

https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation/Documentation/RSCD%20Agent%20Troubleshooting-20171004.pdf?Web=1

Patch Analysis Operation Creation:

https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation/Documentation/BSA%20Portal%20Patch%20Analysis%20Operation%20Creation.pdf?Web=1

Patch Analysis Remediation Operation Creation:

https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation/Documentation/BSA%20Portal%20Patch%20Analysis%20Remediation%20Operation%20Creation.pdf?Web=1

Patch Remediation Log Viewing:

https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation/Documentation/BSA%20Portal%20Patch%20Analysis%20Remediation%20Log%20Viewing.pdf?Web=1

Reboot Operation creation:

https://ctho.jdc.ao.dcn/Projects/BMCServerAutomation/Documentation/BSA%20Portal%20Reboot%20Operation%20Creation.pdf?Web=1