Workload Scheduler Basics Troubleshooting Links

27
Workload Scheduler Basics Troubleshooting Links

Transcript of Workload Scheduler Basics Troubleshooting Links

Workload Scheduler Basics

Troubleshooting Links

2

Objective

▪ Objective

▪ Demonstrate common steps necessary to identify the cause of common

problems

▪ Elements covered in this session:

▪ FTA link issues

▪ Dynamic Agent link issues

▪ Open

3

Fault Tolerant Agent Link problem

▪ What does it mean to be linked?

▪ Messages addressed to FTA will be sent from its domain manager

▪ FTA will send messages to its domain manager (even if addressed to another workstation.)

▪ Link states – conman sc - LTI JW -> linked and messages can be sent both ways

▪ L -> Linked

▪ T -> TCP/IP

▪ I -> Initialized

▪ J -> Jobman running

▪ W -> Writer running

▪ Common Problem States

▪ “LT “ -> One way communication

▪ The local netman has acknowledgement that the remote netman can be reached

▪ The remote FTA cannot connect back to the local workstation

▪ Tests need to be run on the remote FTA:

4

FTA Link problem continued

▪ Tests on remote FTA host ( for “LT “ state.)

▪ cd <TWSHome>; ls –l Sinfonia

▪ Exists? Current date?

▪ cpuinfo <Domain Manager> -> shows NODE and PORT

▪ Name resolution tests:

▪ ping <NODE>

▪ nslookup <NODE>

▪ Connection tests

▪ ftp

▪ ftp> open <NODE> <PORT>

▪ telnet <NODE> <PORT>

▪ ssh –v –p <PORT> <NODE>

▪ tcpclient –server <NODE> <PORT>

▪ Tests on Local DM if no state at all – Same tests as above and on FTA host…

▪ Confirm netman is running and using the correct port

▪ ps –ef | grep netman

▪ grep “nm port” localopts

▪ netstat –an | grep <port>

5

FTA Link – No STATE

From MASTER: conman sc

Focus on F94 …

No RUN

No DATE TIME

No STATE

Step 1: Link:

conman “link F94” → No change in ‘conman sc’ output.

6

FTA Link – No STATE

Inspect Master’s NETMAN and TWSMERGE log files:

The NETMAN.log file has no messages related to the FTA nor its host

information.

TWSMERGE.log shows us that Mailman cannot link to the FTA

7

FTA Link – No STATE cont.

Name resolution and connection tests from MASTER (DM):

…looks okay.

8

FTA Link – No STATE cont.

Connection tests from MASTER (DM):

…not able to connect to the port.

9

FTA Link – No STATE cont.

Question the IP address and hostname.

Should be fta.com rather than ftaa.com.

To test temporarily, update /etc/hosts:

The same connection tests as before still fail, but see 10.134.34.123 being

used.

Tests on FTA host:

…netman not running – issue: StartUp

…but netman is listening on 39411 – not 31111

10

FTA Link – No STATE

Look in the TWS/localopts file to see the “nm port” value:

Need to change the port value to 31111 to match the workstation definition:

Stop TWS processes completely (including netman) and restart so localopts

will be re-read after restart

$ conman “stop;wait”; conman “shut;wait”; StartUp

11

FTA Link – No STATE cont.

Rerun resolution and connection tests from MASTER:

…something still wrong.

12

FTA Link – No STATE cont.

Check for local firewall settings on FTA host:

Able to connect to port 31111 from FTA host terminal.

Used firewall-cmd to open port 31111

…something still wrong.

13

FTA Link – “LT” STATE

Rerun connection tests from MASTER:

Now, link to FTA: $ conman “link F94”; conman sc F94

State is “LT” … something still wrong … FTA cannot call back.

14

FTA Link – “LT” STATE cont.

Test from FTA to MASTER:

New Sinfonia received and Symphony is compiled:

F94 is not reaching M94 – There is no “LT” for the M94 row.

15

FTA Link – “LT” STATE cont.

Inspect the callback information provided in the Symphony:

Should be “master.com” in /etc/hosts. Update /etc/hosts and link:

16

FTA Link – “LT” STATE cont.

Still from the FTA:

$ conman link M94; conman sc

Looks good from the FTA.

From the MDM, M94:

$ conman link F94; conman sc F94

Linked satisfactorily.

17

Dynamic Agent Link

▪ Dynamic Agent link states:

▪ LBI J

▪ L -> Linked

▪ B -> Broker

▪ I -> Initialized

▪ J -> JobManager running.

▪ Dynamic Agent sends messages directly to its Resource Advisor

▪ Think of a Dynamic Agent as an xAgent of the Broker Agent. The Broker Agent is the

HOST for all Dynamic Agents.

▪ Problem states

▪ “ “ – The Broker Agent is not linked

▪ conman link <DWB>

▪ brokerApplicationStatus.sh

▪ conman “stopbroker;wait”; conman “startbroker;wait”

▪ In SystemOut.log: TWSAgent

18

Dynamic Agent Link – ITDWB Status

https://hostname:31124/ibm/console

Applications -> Application Types -> WebSphere enterprise applications

19

Dynamic Agent Link – ITDWB Status

<TWAHome>/WAS/TWSProfile/logs/server1/SystemOut.log

Broker Server application is starting:

Broker Server application is stopped.

20

Dynamic Agent Link

▪ Problem State: “LBI “ ( no J ) – JobManager not reported

Broker Agent (DWB) is linked – not a local issue

▪ On DA host:

Inspect: <TWAHome>/TWS/stdlist/JM/JobManager_message.log

AWSITA081E The agent can not send the resource information to

"https://hostname:31116/JobManagerRESTWeb/JobScheduler/resource". The error is: "AWSITA245E An error occurred

getting the response of the HTTP request. The error is "CURL error 18".". … could be DWB is not running.

Determine if DA is running and listening:

$ ps –ef | grep JobManager

$ grep ssl_port <TWAHome>/TWS/ITA/cpa/ita/ita.ini

$ netstat –an | grep <ssl_port>

$ ShutDownLwa; StartUpLwa

Confirm DA host connection to Resource Advisor (RA) on MDM or DDM:

Same tests as with FTA to DM

Default https port: 31116

ssh –v –p 31116 <MDMHost>

21

Dynamic Agent Link

Trace packets …

$ mkdir /tmp/DebugDir

$ vim <TWAHome>/TWS/ITA/cpa/config/JobManager.ini

In [ITA] section, add the following line:

DebugDir = /tmp/DebugDir

$ ShutDownLwa; StartUpLwa

$ ls –l /tmp/DebugDir/

-rw-r--r-- 1 m94 m94 585 Feb 19 18:00 1550620851956_snd.dmp

22

Dynamic Agent Link

Dynamic Agents use a secure connection:

$ openssl s_client –connect mdm.com:31116

$ openssl s_client –connect da.com:31114

$ <TWAHome>/TWS/JavaExt/jre/jre/bin/ikeyman

$ <TWAHome>/TWS/ITA/cpa/ita/cert/TWSClientKeyStore.kdb

$ <TWAHome>/WAS/TWSProfile/etc/TWSServerKeyFile.jks

23

Dynamic Agent Link

Opening TWSClientKeyStore.kdb using ikeyman – type CMS

Dynamic Agent’s personal certificate tests signer certificate from MDM connection:

Signer certificate in DA kdb file allows DA to connect securely to MDM / DDM

24

Dynamic Agent Link

Opening TWSSeverKeyFile.jks. DA connections must sign the MDM’s personal certificate.

25

Dynamic Agent Link

Opening TWSSeverTrustFile.jks. Signers for connections to MDM and to DA from MDM:

26

Dynamic Agent Link problem cont.

Live Demo: Troubleshooting Dynamic Agent link issue.

MDM: 29

DA: 123

*Note: After a plan extension, Dynamic agents get a grace period of 10 minutes good link

status by default. If the DA is not able to send resource information to the RA in that time, the

“J” is removed from the DA’s workstation link status.

27

Troubleshooting

Questions??