Post on 10-Mar-2023
Linux on IBM Z and LinuxONEHow to troubleshoot
July 16, 2020—
Sa LiuLinux on IBM Z and LinuxONE
Service & Support
Trademarks
Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.This information provides only general descriptions of the types and portions of workloads that are eligible for execution on Specialty Engines (e.g, zIIPs, zAAPs, and IFLs) ("SEs"). IBM authorizes customers to use IBM SE only to execute the processing of Eligible Workloads of specific Programs expressly authorized by IBM as specified in the “Authorized Use Table for IBM Machines” provided at www.ibm.com/systems/support/machine_warranties/machine_code/aut.html (“AUT”). No other workload processing is authorized for execution on an SE. IBM offers SE at a lower price than General Processors/Central Processors because customers are authorized to use SEs only to process certain types and/or amounts of workloads as specified by IBM in the AUT.
* Registered trademarks of IBM Corporation
The following are trademarks of the International Business Machines Corporation in the United States and/or other countries.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a Registered Trade Mark of AXELOS Limited. ITIL is a Registered Trade Mark of AXELOS Limited. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. UNIX is a registered trademark of The Open Group in the United States and other countries. VMware, the VMware logo, VMware Cloud Foundation, VMware Cloud Foundation Service, VMware vCenter Server, and VMware vSphere are registered trademarks or trademarks of VMware, Inc. or its subsidiaries in the United States and/or other jurisdictions. Other product and service names might be trademarks of IBM or other companies.
CICS*Cognos*DataStage*DB2*GDPS
Global Business Services*IBM*IBM (logo)*InfoSphereMaximo*
MQ*Parallel Sysplex*QualityStageRational*Smarter Cities
XIV*zEnterprise*z/OS*z Systems*z/VM*
SPSS*System Storage*System x*Tivoli*WebSphere*
z/VSE*
1
Agenda
2
INTRODUCTION DATA COLLECTION
STORAGE TROUBLESHOOTING
NETWORK TROUBLESHOOTING
PERFORMANCE TROUBLESHOOTING
CUSTOMER CASES
Linux on Z troubleshooting§ Understand the problem
§ What are the symptoms of the problem?
§ Where / When / Under which condition does the problem occur?
§ Can the problem be reproduced?
§ What do you need to do?§ Collect data before recovery
§ Collect data right after the problem occurs
§ Collect data from a healthy system
§ Keep track of the system setup and the latest changes
3
Linux on Z troubleshooting§ How to do it?
§ Get the system prepared for data collection
§ Install packages for Linux tools (s390-tools, sysstat, perf …)
§ Enable Linux kdump / have disks ready for standalone dump
§ Enable regular system activity monitoring
§ Learn to use the tools for data analysis
4
§ dbginfo.sh – a script that collects data for debugging Linux on Z (requires root authority )
§ It collects:§ System information and generic configuration data
§ List of devices and their configurations
§ System logs / trace data
§ s390 debug buffer
§ z/VM or KVM basic data
(if Linux runs under z/VM or KVM)
Distribution Package name
RHEL s390util
SLES s390-tools
Ubuntu s390-tools
Run before reboot!
7
Collect System Data
dbginfo.sh outputroot@system: # dbginfo.shdbginfo.sh: Debug information script version 2.11.0-7.27Copyright IBM Corp. 2002, 2018
Hardware platform = s390xKernel version = 5.3.18 (5.3.18-22-default)Runtime environment = LPAR
1 of 13: Collecting command output2 of 13: Collecting z/VM command3 of 13: Collecting procfs4 of 13: Collecting sysfs5 of 13: Collecting log files6 of 13: Collecting config files7 of 13: Collecting osa oat output8 of 13: Collecting ethtool output9 of 13: Collecting OpenVSwitch output skipped10 of 13: Collecting domain xml files skipped11a of 13: Collecting docker container output skipped11b of 13: Collecting docker network output skipped12 of 13: Collecting nvme output13 of 13: Postprocessing
Finalizing: Creating archive with collected data
Collected data was saved to:>> /tmp/DBGINFO-2020-07-15-08-44-18-test-1CA1E7.tgz <<
Review the collected data before sending to your service organization.
root@system # cd /tmp/DBGINFO-2020-07-15-08-44-18-test-1CA1E7/s8315028:/tmp/DBGINFO-2020-07-15-08-44-18-test-1CA1E7root@system # ls -tdbginfo.log etc sysfs.tgzosa_oat_eth3.raw usr procosa_oat.out lib zvm_runtime.outosa_oat_eth1.raw boot runtime.outosa_oat_eth2.raw run journalctl.outosa_oat_eth0.raw var
Linux commands output z/VM commands output
Make sure you have enough disk space under /tmpUse “dbginfo.sh -d <directory>” to specify another location for the tarball
8
Collect Performance Data§ sadc – system activity data collector
§ perf – performance analysis tool
§ iostat – monitors I/O device load and the CPU utilization
§ dasdstat – display DASD performance data
§ ziomon / ziorep – collect FCP performance data and generate reports
§ z/VM MONWRITE – collects CP *MONITOR data
§ hyptop – dynamic real-time view of hypervisor
9For more details refer to the book Troubleshooting Guide
Collect Performance Data
§ sadc – system activity data collector (sysstat package)
§ Collect all counters at 10 second interval to a binary output file
§ sar – system activity report
§ Convert the binary output file to a plain text report
§ Start sysstat service as a permanent service (recommended)§ sadc default configuration at 10-minute intervals (/var/log/sa/)
§ Under z/VM, collect MONWRITE data in the same time period and same interval as sadc data!
root@system: #/usr/lib64/sa/sadc -S XALL 10 sadc_output
root@system: # sar -A -f sadc_output > sar_output
10
Collect Performance Data
§ Recommended data collection process
§ 1) run dbginfo.sh
§ 2) start sadc at 5 second interval
§ 3) run the test with workload
§ 4) stop the sadc with
§ 5) run dbginfo again
§ 6) convert sadc output file to a report
# /usr/lib64/sa/sadc -S XALL 5 /tmp/server_sadc.out &
12
# /sbin/dbginfo.sh
# killall sadc
# /sbin/dbginfo.sh
# sar -A -f /tmp/server_sadc.out > server_sar
Collect System Dump
For more details refer to the book Using the Dump Tools 13
kdump – Kernel Crash Dump
14
§ To boot a new instance of the kernel in a pre-reserved memory section
§ To copy the existing memory untouched to storage orvia network
§ kexec tool
§ crashkernel=auto ( minimum amount of memory required: 4GB)
Configure the kdump on RHEL§ Install kdump service
§ Configuring kdump memory usage, edit the kernel command line parameters in /etc/zipl.conf
§ Configuring the kdump target and the core collector by editing the /etc/kdump.conf
§ dracut issue with RHEL 8 kdump to SCSI disk
§ Workaround: add zfcp.allow_lun_scan=0 to the kernel parameter
15
# yum install kexec-tools
crashkernel=auto
ext4 /dev/mapper/vg00-varcrashvolpath /var/crashcore_collector makedumpfile -c --message-level 1 -d 31
Configure the kdump on SLES
16
§ Required packages
§ kexec-tools
§ makedumpfile
§ yast2-kdump
§ kdump configuration in /etc/sysconfig/kdump, add kernel parameters to KDUMP_CMDLINE_APPEND=
§ To ignore devices currently not in use with cio_ignore and lower the required amount of memory for crashkernel
§ zfcp.allow_lun_scan=0 is default in the kdump command line
Configure the kdump on Ubuntu
17
§ kdump enabled by default (16.04 and later)
§ Installation
§ Configuration in the /etc/default/kdump−tools file
§ Using kdump-config command to configure kdump, check status, or save a vmcore file
# apt install linux−crashdump
Storage Troubleshooting
§ DASD (Direct Access Storage Device)
§ zFCP/SCSI (Small Computer System Interface)
§ LVM (Logical Volume Manager)
18
DASD
§ Check the status of DASDs
§ lscss – list channel subsystem devices
§ lsdasd – list channel attached DASDs
§ Device 0.0.eab0 is online, but not active19
# lscssDevice Subchan. DevType CU Type Use PIM PAM POM CHPIDs----------------------------------------------------------------------0.0.eaae 0.0.000f 3390/0c 3990/e9 yes f0 f0 ff 34353233 00000000 0.0.eaaf 0.0.0021 3390/0c 3990/e9 yes f0 f0 ff 34353233 00000000 0.0.eab0 0.0.0022 3390/0c 3990/e9 yes f0 f0 ff 34353233 00000000
# lsdasdBus-ID Status Name Device Type BlkSz Size Blocks==============================================================================0.0.eaae active dasda 94:0 ECKD 4096 21129MB 54091800.0.eaaf active dasdb 94:4 ECKD 4096 21129MB 54091800.0.eab0 n/f dasdc 94:8 ECKD
DASD§ dasdfmt – format ECKD type DASD
§ fdasd – partitioning tool
§ dasdview – display DASD and VTOC information
# dasdview -s 16 /dev/dasdc+----------------------------------------+------------------+------------------+| HEXADECIMAL | EBCDIC | ASCII || 01....04 05....08 09....12 13....16 | 1.............16 | 1.............16 |+----------------------------------------+------------------+------------------+| C9D7D3F1 000A0000 0000000F 03000000 | IPL1............ | ????............ |+----------------------------------------+------------------+------------------+
20
# dasdview -t info /dev/dasdc--- VTOC info -----------------------------------------------------------------The VTOC contains:
1 format 1 label(s)1 format 4 label(s)1 format 5 label(s)1 format 7 label(s)0 format 8 label(s)0 format 9 label(s)
Other S/390 and zSeries operating systems would see the following data sets:+----------------------------------------------+--------------+--------------+| data set | start | end |+----------------------------------------------+--------------+--------------+| LINUX.V0XEAB0.PART0001.NATIVE | trk | trk || data set serial number : '0XEAB0' | 2 | 450764 || system code : 'IBM LINUX ' | cyl/trk | cyl/trk || creation date : year 2019, day 137 | 0/ 2 | 30050/ 14 |+----------------------------------------------+--------------+--------------+
SCSI over zFCP§ Check the status of zFCP and SCSI
§ lszfcp – list information about zfcp adapters, ports, and units
§ lsscsi – list SCSI devices
§ ziorep_config – configuration report of the ziomon framework
§ -A adaptor
21
# ziorep_config –A
Host: host0CHPID: 60Adapter: 0.0.191cSub-Ch.: 0.0.001fName: 0xc05076ffd68018c0P-Name: 0xc05076ffd6801981Version: 0x0007LIC: 0x00001716Type: NPIV VPORTSpeed: 16 GbitState: Online
Host: host1CHPID: 61Adapter: 0.0.195cSub-Ch.: 0.0.0020Name: 0xc05076ffd6801f30P-Name: 0xc05076ffd6801991Version: 0x0007LIC: 0x00001716Type: NPIV VPORTSpeed: 16 GbitState: Online
SCSI over zFCP§ ziorep_config
§ -D device
§ -M mapper
# ziorep_config -D0.0.191c 0x50050763070845e3 0x4082402a00000000 host0 /dev/sg0 /dev/sda 8:0 Disk 2107900 IBM 0:0:0:10765108500.0.191c 0x50050763070845e3 0x4083402a00000000 host0 /dev/sg1 /dev/sdb 8:16 Disk 2107900 IBM 0:0:0:10765108510.0.191c 0x50050763070845e3 0x4084402a00000000 host0 /dev/sg2 /dev/sdc 8:32 Disk 2107900 IBM 0:0:0:10765108520.0.191c 0x50050763070845e3 0x4085402a00000000 host0 /dev/sg3 /dev/sdd 8:48 Disk 2107900 IBM 0:0:0:10765108530.0.195c 0x50050763071845e3 0x4082402a00000000 host1 /dev/sg4 /dev/sde 8:64 Disk 2107900 IBM 1:0:0:10765108500.0.195c 0x50050763071845e3 0x4083402a00000000 host1 /dev/sg5 /dev/sdf 8:80 Disk 2107900 IBM 1:0:0:10765108510.0.195c 0x50050763071845e3 0x4084402a00000000 host1 /dev/sg6 /dev/sdg 8:96 Disk 2107900 IBM 1:0:0:10765108520.0.195c 0x50050763071845e3 0x4085402a00000000 host1 /dev/sg7 /dev/sdh 8:112 Disk 2107900 IBM 1:0:0:1076510853
# ziorep_config -M0.0.191c 0x50050763070845e3 /dev/sda /dev/mapper/mpatha0.0.195c 0x50050763071845e3 /dev/sde /dev/mapper/mpatha0.0.191c 0x50050763070845e3 /dev/sdc /dev/mapper/mpathb0.0.195c 0x50050763071845e3 /dev/sdg /dev/mapper/mpathb0.0.191c 0x50050763070845e3 /dev/sdb /dev/mapper/mpathc0.0.195c 0x50050763071845e3 /dev/sdf /dev/mapper/mpathc0.0.191c 0x50050763070845e3 /dev/sdd /dev/mapper/mpathd0.0.195c 0x50050763071845e3 /dev/sdh /dev/mapper/mpathd
22For more details refer to the presentation: FCP with Linux on IBM Z and LinuxONE: SCSI over Fibre Channel – Best Practices
LVM§ Check the status of LVM
§ pvscan - scan all disks for physical volumes§ lvscan - scan all disks for logical volumes§ vgscan - scan all disks for volume groups§ vgdisplay - display attributes of volume groups
§ dmsetup – low level logical volumemanagement§ dmsetup ls –tree
§ dmsetup table
§ dmsetup status
§ /boot should NOT be LVM (normal partition)
# dmsetup ls --treemy_volgroup-LV2 (254:5)|-mpathc (254:2)| |- (8:80)| `- (8:16)`-mpathd (254:3)
|- (8:112)`- (8:48)
my_volgroup-LV1 (254:4)|-mpathc (254:2)| |- (8:80)| `- (8:16)|-mpathb (254:1)| |- (8:96)| `- (8:32)`-mpatha (254:0)
|- (8:64)`- (8:0)
23
Network options on Linux on Z
KVM LPAR
OSA Express
VSWITCH
NIC NIC
OSA Express
z/VM
GuestLAN
NIC NIC OSA IQD IQD
HiperSockets
OSA OSA
bond
OSA Express
virtio
OSA Express
ovswitch
Linux1 Linux3 Linux4 Linux5 Linux6 z/OS Linux8
LPAR LPAR
OSA
RoCE RoCE
NIC
RNIC RNIC
NIC
SMCD
SMCR
Linux2
virtio
25
qeth device driver
§ Supports§ OSA Express
§ HiperSockets
§ GuestLAN
§ VSWITCH
§ Primary network driver for Linux on Z
26
Useful tools§ General tools
§ ping
§ ip –s link
§ ss
§ traceroute
§ tcpdump
§ ethtool
§ net-tools-deprecated (ifconfig, netstat, route… ) à replaced by iproute2 (SLES15)
§ Linux on Z specific tools§ lscss – list channel subsystem devices
§ lsqeth – list qeth-based network devices
§ qetharp – querying and modifying ARP data (only layer 3 devices)
§ qethqoat – querying the OSA address table
§ znetconf -- list and configure network devices (in fly)
§ lszdev / chzdev – display or configure Z specific devices (persistent)
27
Commands example
28
# lscssDevice Subchan. DevType CU Type Use PIM PAM POM CHPIDs----------------------------------------------------------------------0.0.b130 0.0.0019 1732/01 1731/01 yes 80 80 ff 8a000000 00000000 0.0.b131 0.0.001a 1732/01 1731/01 yes 80 80 ff 8a000000 00000000 0.0.b132 0.0.001b 1732/01 1731/01 yes 80 80 ff 8a000000 00000000 0.0.b0e0 0.0.001c 1732/01 1731/01 yes 80 80 ff 89000000 00000000 0.0.b0e1 0.0.001d 1732/01 1731/01 yes 80 80 ff 89000000 00000000 0.0.b0e2 0.0.001e 1732/01 1731/01 yes 80 80 ff 89000000 00000000
#lsqethDevice name : eth0 -------------------------------------------------------
card_type : OSD_1000cdev0 : 0.0.b130cdev1 : 0.0.b131cdev2 : 0.0.b132chpid : 8Aonline : 1portname : no portname requiredportno : 0state : UP (LAN ONLINE)priority_queueing : always queue 0buffer_count : 64layer2 : 1isolation : nonebridge_role : nonebridge_state : inactivebridge_hostnotify : 0bridge_reflect_promisc : noneswitch_attrs : unknown
# znetconf -cDevice IDs Type Card Type CHPID Drv. Name State -------------------------------------------------------------------------------------0.0.b130,0.0.b131,0.0.b132 1731/01 OSD_1000 8A qeth eth0 online 0.0.bdf0,0.0.bdf1,0.0.bdf2 1731/01 Virt.NIC QDIO 02 qeth encbdf0 online
# znetconf -r b130Remove network device 0.0.b130 (0.0.b130,0.0.b131,0.0.b132)?Warning: this may affect network connectivity!Do you want to continue (y/n)?ySuccessfully removed device 0.0.b130 (eth0)
# lszdevTYPE ID ON PERS NAMESdasd-eckd 0.0.eaae yes no dasdadasd-eckd 0.0.eaaf yes no dasdbdasd-eckd 0.0.eab0 yes no dasdczfcp-host 0.0.191c yes no zfcp-host 0.0.195c yes no zfcp-lun 0.0.191c:0x50050763070845e3:0x4082402a00000000 yes no sda sg0zfcp-lun 0.0.191c:0x50050763070845e3:0x4083402a00000000 yes no sdb sg1zfcp-lun 0.0.191c:0x50050763070845e3:0x4084402a00000000 yes no sdc sg2zfcp-lun 0.0.191c:0x50050763070845e3:0x4085402a00000000 yes no sdd sg3zfcp-lun 0.0.195c:0x50050763071845e3:0x4082402a00000000 yes no sde sg4zfcp-lun 0.0.195c:0x50050763071845e3:0x4083402a00000000 yes no sdf sg5zfcp-lun 0.0.195c:0x50050763071845e3:0x4084402a00000000 yes no sdg sg6zfcp-lun 0.0.195c:0x50050763071845e3:0x4085402a00000000 yes no sdh sg7qeth 0.0.b0e0:0.0.b0e1:0.0.b0e2 yes no eth4qeth 0.0.b130:0.0.b131:0.0.b132 yes no eth0qeth 0.0.b230:0.0.b231:0.0.b232 yes no eth5qeth 0.0.b2a0:0.0.b2a1:0.0.b2a2 yes no eth6qeth 0.0.bdf0:0.0.bdf1:0.0.bdf2 yes no encbdf0qeth 0.0.e030:0.0.e031:0.0.e032 yes no eth1generic-ccw 0.0.0009 yes no
# ip -s link show dev eth016: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT group default qlen 1000
link/ether 02:00:00:d4:29:02 brd ff:ff:ff:ff:ff:ffRX: bytes packets errors dropped overrun mcast11184566 40204 0 24 0 0 TX: bytes packets errors dropped carrier collsns308 4 0 0 0 0
Configuration files and s390dbf
§ Network configuration files
§ /etc/sysconfig/network-scripts/ifcfg-*
§ /etc/sysconfig/network/ifcfg-*
§ /etc/netplan/*.yaml
§ s390dbf
§ /sys/kernel/debug/s390dbf/qdio_<device>/
§ /sys/kernel/debug/s390dbf/qeth_msg/
§ /sys/kernel/debug/s390dbf/qeth_setup/
29
z/VM VSWITCH
§ Useful commands§ QUERY VIRTUAL NIC – query virtual NIC
§ QUERY VIRTUAL OSA – display status of virtual OSA
§ QUERY VMLAN – determine the status of Guest LAN activity
§ QUERY VSWITCH DETAILS – show the details of VSWITCH
30
z/VM VSWITCH# vmcp query vswitch detailsVSWITCH SYSTEM VSW15G Type: QDIO Connected: 5 Maxconn: INFINITEPERSISTENT RESTRICTED ETHERNET Accounting: OFFUSERBASED LOCALVLAN UnawareMAC address: 02-46-0F-00-00-01 MAC Protection: UnspecifiedIPTimeout: 5 QueueStorage: 8Isolation Status: OFF VEPA Status: OFF
Uplink Port:State: ReadyPMTUD setting: EXTERNAL PMTUD value: 9000 Trace Pages: 8RDEV: BD03.P00 VDEV: 0600 Controller: DTCVSW2 ACTIVE
Adapter ID: 3906000DA1E7.01B0Uplink Port Connection:RX Packets: 30138781 Discarded: 0 Errors: 0TX Packets: 204692591 Discarded: 0 Errors: 0RX Bytes: 12276525748 TX Bytes: 306684926983Device: 0600 Unit: 000 Role: DATA Port: 2049Partner Switch Capabilities: No_Reflective_Relay
Adapter Connections: Connected: 5Adapter Owner: TEST0008 NIC: BDF0.P00 Name: HYD1G1 Type: QDIORX Packets: 4585514 Discarded: 0 Errors: 0TX Packets: 4654 Discarded: 0 Errors: 0RX Bytes: 309679608 TX Bytes: 306677Device: BDF2 Unit: 002 Role: DATA Port: 2178Options: Ethernet BroadcastUnicast MAC Addresses:02-46-0F-00-00-08 IP: 172.18.73.8
Multicast MAC Addresses:01-00-5E-00-00-0133-33-00-00-00-0133-33-FF-00-00-08
ETHERNET: Layer2NOROUTER: Layer3
Virtual NIC
VM Guest
Acceptable maximum transmission size (MTU)
31
Tuning hints and tips
33For more details refer to IBM Knowledgecenter: Performance tuninghints and tips
Network tuning
34
§ General tuning parameters
§ buffer_count = 128 (default 64) – use chzdev to configure it persistently
§ MTU size 8992 if the application is able to send trunks > 1460 bytes
§ Set larger device transmission queue
§ Enable RPS (Receive Packet Steering)
§ OSA recommendations
§ TCP Segmentation Offload (TSO)
§ Outbound (TX) checksumming
§ Scatter Gather (SG)
# ip link set <interface_name> txqueuelen 3000
# ethtool -K <interface_name> tx on sg on tso on
# echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus
Customer Cases
§RHEL 8 installation
§Channel bonding failover
§EP11 card missing master key
§Low network performance
35
36
§ Problem: network device can not be recognized during installation
§ Analysis:
§ Network configuration in the parmfile
§ Look for the kernel message in journalctl
Customer cases – RHEL 8 installation
ip=100.125.81.112::100.125.81.254:255.255.255.0:lxabc001:enccw0.0.0600:none
kernel: qeth 0.0.0600: MAC address 02:14:00:00:00:23 successfully registered on device eth0kernel: qeth 0.0.0600: Device is a Virtual NIC QDIO card (level: V642) with link type Virt.NIC QDIO.kernel: qeth 0.0.0600 enc600: renamed from eth0
37
§ Reason: Predictable network device namesprefix for device
network type device type bus-ID
<pf> <type> <bus_id>
e.g. en c 600 à it omits leading 0s!!!
en=Ethernet ccw 0.0.0600
§ Solution: Adapt the parmfile configuration to the predictable network
device name
Customer case: RHEL 8 installation (cont‘d)
ip=100.125.81.112::100.125.81.254:255.255.255.0:lxabc001:enc600:none
38
Customer case: Channel bonding failover§ Problem: The customer setup a channel bond in mode balance-xor on RHEL
8.1 and tested failover using Network Manager command
§ Analysis: nmcli connection down NIC command dis-enslaves the bond and failover to the other slave does not work
Bond0
eth2eth1
# nmcli connection down eth1
# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: adaptive load balancingPrimary Slave: NoneCurrently Active Slave: eth2MII Status: upMII Polling Interval (ms): 1000Up Delay (ms): 1000Down Delay (ms): 1000
Slave Interface: eth2MII Status: upSpeed: 1000 MbpsDuplex: fullLink Failure Count: 0Permanent HW addr: 02:a2:0f:00:00:21Slave queue ID: 0
39
Customer case: Channel bonding failover (cont‘d)
§ Solutions :
To use ip link set dev NIC
down command to simulate a link failure
Bond0
eth2eth1
# ip link set dev eth1 down
# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: adaptive load balancingPrimary Slave: NoneCurrently Active Slave: eth2MII Status: upMII Polling Interval (ms): 1000Up Delay (ms): 1000Down Delay (ms): 1000
Slave Interface: eth2MII Status: upSpeed: 1000 MbpsDuplex: fullLink Failure Count: 0Permanent HW addr: 02:a2:0f:00:00:21Slave queue ID: 0
Slave Interface: eth1MII Status: downSpeed: 1000 MbpsDuplex: fullLink Failure Count: 1Permanent HW addr: 02:a2:0f:00:00:25Slave queue ID: 0
40
§ Problem: The customer setup two EP11 domains active on an Ubuntu system. When executing the pkcsconf command to check the token, only every second time the master key is recognized.
§ Analysis: The syslog indicates with error message: no master key set.
§ Solutions: Set both EP11 domains with the same master key
Customer case: EP11 card missing master key
EP11 EP11
41
§ Problem: The customer had steaming data transfer between two Linux on Z servers (SLES12 SP3) with 500 km. The bandwidth is 650 Mb/s. With a single connection, the channel is loaded on average at 250 Mb/s.
§ Analysis: Network performance measurement and analysis with iperf3
§ Solutions : System upgrade to SLES12 SP4 with network tuning
§ TCP Segmentation Offload (TSO)
§ Improved throughput to 390 Mbit/s
§ TCP congestion control: BBR (not a general recommendation, depends on workload, better throughput with large queue size )
§ Improved throughput to ~500 Mbit/s
Customer case: Low network performance
# ethtool -K NIC_NAME tx on sg on tso on
net.ipv4.tcp_congestion_control = bbr
Questions?
42
Sa Liu
Certified Technical SpecialistLinux on IBM Z and LinuxONEService & Support
IBM Systems
Schoenaicher Strasse 220D-71032 BoeblingenMail: Postfach 1380D-71003 Boeblingen
Phone (+49)-7031-16-3104saliu@de.ibm.com
References
§ Troubleshooting Guide https://www.ibm.com/support/knowledgecenter/linuxonibm/liaaf/lnz_r_svcnt.html
§ Using the Dump Tools https://www.ibm.com/support/knowledgecenter/linuxonibm/liaaf/dumptools_container.html
§ Virtual Server Management https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaaf/lnz_r_va.html
§ How to use FC-attached SCSI devices with Linux on System z https://www.ibm.com/support/knowledgecenter/linuxonibm/liaaf/lnz_r_ts.html
§ SCSI over Fibre Channel – Best Practices http://public.dhe.ibm.com/software/dw/linux390/lvc/zFCP_Best_Practices-BB-Webcast_201805.pdf?cm_sp=dw-dwtv-_-linuxonz-_-presentation-PDF
43