Workload Scheduler Basics Troubleshooting Links
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Workload Scheduler Basics Troubleshooting Links
2
Objective
▪ Objective
▪ Demonstrate common steps necessary to identify the cause of common
problems
▪ Elements covered in this session:
▪ FTA link issues
▪ Dynamic Agent link issues
▪ Open
3
Fault Tolerant Agent Link problem
▪ What does it mean to be linked?
▪ Messages addressed to FTA will be sent from its domain manager
▪ FTA will send messages to its domain manager (even if addressed to another workstation.)
▪ Link states – conman sc - LTI JW -> linked and messages can be sent both ways
▪ L -> Linked
▪ T -> TCP/IP
▪ I -> Initialized
▪ J -> Jobman running
▪ W -> Writer running
▪ Common Problem States
▪ “LT “ -> One way communication
▪ The local netman has acknowledgement that the remote netman can be reached
▪ The remote FTA cannot connect back to the local workstation
▪ Tests need to be run on the remote FTA:
4
FTA Link problem continued
▪ Tests on remote FTA host ( for “LT “ state.)
▪ cd <TWSHome>; ls –l Sinfonia
▪ Exists? Current date?
▪ cpuinfo <Domain Manager> -> shows NODE and PORT
▪ Name resolution tests:
▪ ping <NODE>
▪ nslookup <NODE>
▪ Connection tests
▪ ftp
▪ ftp> open <NODE> <PORT>
▪ telnet <NODE> <PORT>
▪ ssh –v –p <PORT> <NODE>
▪ tcpclient –server <NODE> <PORT>
▪ Tests on Local DM if no state at all – Same tests as above and on FTA host…
▪ Confirm netman is running and using the correct port
▪ ps –ef | grep netman
▪ grep “nm port” localopts
▪ netstat –an | grep <port>
5
FTA Link – No STATE
From MASTER: conman sc
Focus on F94 …
No RUN
No DATE TIME
No STATE
Step 1: Link:
conman “link F94” → No change in ‘conman sc’ output.
6
FTA Link – No STATE
Inspect Master’s NETMAN and TWSMERGE log files:
The NETMAN.log file has no messages related to the FTA nor its host
information.
TWSMERGE.log shows us that Mailman cannot link to the FTA
9
FTA Link – No STATE cont.
Question the IP address and hostname.
Should be fta.com rather than ftaa.com.
To test temporarily, update /etc/hosts:
The same connection tests as before still fail, but see 10.134.34.123 being
used.
Tests on FTA host:
…netman not running – issue: StartUp
…but netman is listening on 39411 – not 31111
10
FTA Link – No STATE
Look in the TWS/localopts file to see the “nm port” value:
Need to change the port value to 31111 to match the workstation definition:
Stop TWS processes completely (including netman) and restart so localopts
will be re-read after restart
$ conman “stop;wait”; conman “shut;wait”; StartUp
11
FTA Link – No STATE cont.
Rerun resolution and connection tests from MASTER:
…something still wrong.
12
FTA Link – No STATE cont.
Check for local firewall settings on FTA host:
Able to connect to port 31111 from FTA host terminal.
Used firewall-cmd to open port 31111
…something still wrong.
13
FTA Link – “LT” STATE
Rerun connection tests from MASTER:
Now, link to FTA: $ conman “link F94”; conman sc F94
State is “LT” … something still wrong … FTA cannot call back.
14
FTA Link – “LT” STATE cont.
Test from FTA to MASTER:
New Sinfonia received and Symphony is compiled:
F94 is not reaching M94 – There is no “LT” for the M94 row.
15
FTA Link – “LT” STATE cont.
Inspect the callback information provided in the Symphony:
Should be “master.com” in /etc/hosts. Update /etc/hosts and link:
16
FTA Link – “LT” STATE cont.
Still from the FTA:
$ conman link M94; conman sc
Looks good from the FTA.
From the MDM, M94:
$ conman link F94; conman sc F94
Linked satisfactorily.
17
Dynamic Agent Link
▪ Dynamic Agent link states:
▪ LBI J
▪ L -> Linked
▪ B -> Broker
▪ I -> Initialized
▪ J -> JobManager running.
▪ Dynamic Agent sends messages directly to its Resource Advisor
▪ Think of a Dynamic Agent as an xAgent of the Broker Agent. The Broker Agent is the
HOST for all Dynamic Agents.
▪ Problem states
▪ “ “ – The Broker Agent is not linked
▪ conman link <DWB>
▪ brokerApplicationStatus.sh
▪ conman “stopbroker;wait”; conman “startbroker;wait”
▪ In SystemOut.log: TWSAgent
18
Dynamic Agent Link – ITDWB Status
https://hostname:31124/ibm/console
Applications -> Application Types -> WebSphere enterprise applications
19
Dynamic Agent Link – ITDWB Status
<TWAHome>/WAS/TWSProfile/logs/server1/SystemOut.log
Broker Server application is starting:
Broker Server application is stopped.
20
Dynamic Agent Link
▪ Problem State: “LBI “ ( no J ) – JobManager not reported
Broker Agent (DWB) is linked – not a local issue
▪ On DA host:
Inspect: <TWAHome>/TWS/stdlist/JM/JobManager_message.log
AWSITA081E The agent can not send the resource information to
"https://hostname:31116/JobManagerRESTWeb/JobScheduler/resource". The error is: "AWSITA245E An error occurred
getting the response of the HTTP request. The error is "CURL error 18".". … could be DWB is not running.
Determine if DA is running and listening:
$ ps –ef | grep JobManager
$ grep ssl_port <TWAHome>/TWS/ITA/cpa/ita/ita.ini
$ netstat –an | grep <ssl_port>
$ ShutDownLwa; StartUpLwa
Confirm DA host connection to Resource Advisor (RA) on MDM or DDM:
Same tests as with FTA to DM
Default https port: 31116
ssh –v –p 31116 <MDMHost>
21
Dynamic Agent Link
Trace packets …
$ mkdir /tmp/DebugDir
$ vim <TWAHome>/TWS/ITA/cpa/config/JobManager.ini
In [ITA] section, add the following line:
DebugDir = /tmp/DebugDir
$ ShutDownLwa; StartUpLwa
$ ls –l /tmp/DebugDir/
-rw-r--r-- 1 m94 m94 585 Feb 19 18:00 1550620851956_snd.dmp
22
Dynamic Agent Link
Dynamic Agents use a secure connection:
$ openssl s_client –connect mdm.com:31116
$ openssl s_client –connect da.com:31114
$ <TWAHome>/TWS/JavaExt/jre/jre/bin/ikeyman
$ <TWAHome>/TWS/ITA/cpa/ita/cert/TWSClientKeyStore.kdb
$ <TWAHome>/WAS/TWSProfile/etc/TWSServerKeyFile.jks
23
Dynamic Agent Link
Opening TWSClientKeyStore.kdb using ikeyman – type CMS
Dynamic Agent’s personal certificate tests signer certificate from MDM connection:
Signer certificate in DA kdb file allows DA to connect securely to MDM / DDM
24
Dynamic Agent Link
Opening TWSSeverKeyFile.jks. DA connections must sign the MDM’s personal certificate.
25
Dynamic Agent Link
Opening TWSSeverTrustFile.jks. Signers for connections to MDM and to DA from MDM:
26
Dynamic Agent Link problem cont.
Live Demo: Troubleshooting Dynamic Agent link issue.
MDM: 29
DA: 123
*Note: After a plan extension, Dynamic agents get a grace period of 10 minutes good link
status by default. If the DA is not able to send resource information to the RA in that time, the
“J” is removed from the DA’s workstation link status.