Red Hat Customer Convergence

53
RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence 1 Red Hat Customer Convergence #rhconvergence

Transcript of Red Hat Customer Convergence

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

1

Red Hat Customer Convergence#rhconvergence

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

2

RED HAT ENTERPRISE LINUX:

PERFORMANCE ENGINEERING

PERFORMANCE UPDATE RHEL 6/7

Douglas ShakshoberSenior Consulting Software EngineerFebruary 6, 2014

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

3 Red Hat Confidential

Red Hat Performance Engineering

Benchmarks – code path coverage

CPU – linpack, lmbench

Memory – lmbench, McCalpin Streams

Disk IO – Iozone, aiostress – scsi, FC, iSCSI

Filesystem – IOzone, postmark– ext3/4, xfs. gfs2,gluster

Network – Netperf – 10 Gbit, 40 Gbit IB, PCI3

Bare Metal, RHEL6/7 KVM

White box AMD/Intel, with our OEM partners

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

4

Red Hat Performance Engineering L

Application Performance

Linpack MPI, SPECcpu (omp) – single systems, clusters

AIM 7 – single systems, large smp

Database DB2, Oracle 11G, Sybase 15.x , MySQL, Postgres, Mongo

OLTP – metal/kvm/RHEV-M clusters - TPC-C/virt

DSS – metal/kvm/RHEV-M, IQ, TPC-H/virt

SPECsfs NFS, Postmark

SAP – SLCS, SD

STAC = FSI – trading AMQP,Reuters, Tibco, etc

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

6

Red Hat Performance R7 beta vs R6.5

● RHEL7 partner beta

− Intel in intel_idle driver - control cstate to 1 or 0

− NUMA (numa_balance), scheduler w/ large memory - 12 TB

Testing:

− CPU Performance Linpack/Stream, Java - SPECjbb− Iozone Performance w/ various filesystem +/- 3, EXT4 write issue

− Databases (Oracle, Sybase, DB2, mySQL, Postgress, SAP

• Advanced Performance Tools

− Tuna / Tuned / Perf− ISV support/request

● KVM new virtualization features

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

7

RHEL NUMA Scheduler

● RHEL6● numactl, numastat enhancements● numad – usermode tool, dynamically monitor, auto-tune

● RHEL7 beta – numabalance● 3.10-35 checked in by Rik van Riel

● Derived from Andrea Arcangeli, Mel Gorman, Peter Zijlstra, Ingo M

● Enable / Disable● echo NUMA > /sys/kernel/debug/sched_features● echo NO_NUMA > /sys/kernel/debug/sched_features

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

8

Non-Uniform Memory Access - NUMA

● The Linux system scheduler is very good at maintaining responsiveness and optimizing for CPU utilization

● Tries to use idle CPUs, regardless of where process memory is located.... Using remote memory degrades performance!

● Red Hat is working with the upstream community to increase NUMA awareness of the scheduler and to implement automatic NUMA balancing.

● Remote memory latency matters most for long-running, significant processes, e.g., HPTC, VMs, etc.

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

9

How to manage NUMA manually - Checklist

● Research NUMA topology of each system

● Make a resource plan for each system

● Bind both CPUs and Memory● Might also consider devices and IRQs

● Use numactl for native jobs:● numactl -N <nodes> -m <nodes> <workload>

● Use numatune for libvirt started guests● Edit xml: <numatune> <memory mode="strict" nodeset="1-2"/> </numatune>

● Use Cgroups w/ apps to bind cpu/mem to numa nodes

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

10

Know Your Hardware (hwloc)

Solarflare SFN6322

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

11

Numa Performance – Specjbb

3.10-54 nonuma 3.10-54 numa numactl 0

200000

400000

600000

800000

1000000

1200000

0.9

0.95

1

1.05

1.1

1.15

1.2

Multi-instance Java peak SPECjbb2005

Multi-instance Java loads fit within 1-node

4

3

2

1

%gain vs noauto

bops

(to

tal)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

12

Use numastat to see memory layout● Rewritten for RHEL to show per-node system and

process memory information

● 100% compatible with prior version by default, displaying /sys...node<n>/numastat memory allocation statistics

● Any command options invoke new functionality● -m for per-node system memory info● <pattern> for per-node process memory info

● See numastat(8)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

13

numastat - java processes w/NUMA-balance on

# numastat -c java (default scheduler – non-optimal)Per-node process memory usage (in MBs)PID Node 0 Node 1 Node 2 Node 3 Total------------ ------ ------ ------ ------ -----57501 (java) 755 1121 480 698 305457502 (java) 1068 702 573 723 306757503 (java) 649 1129 687 606 307157504 (java) 1202 678 1043 150 3073------------ ------ ------ ------ ------ -----Total 3674 3630 2783 2177 12265 # numastat -c java (numabalance close to opt)Per-node process memory usage (in MBs)PID Node 0 Node 1 Node 2 Node 3 Total------------ ------ ------ ------ ------ -----56918 (java) 49 2791 56 37 293356919 (java) 2769 76 55 32 293256920 (java) 19 55 77 2780 293256921 (java) 97 65 2727 47 2936------------ ------ ------ ------ ------ -----Total 2935 2987 2916 2896 11734

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

14

NUMA Performance – Database Single Large DB

10 20 300

100000

200000

300000

400000

500000

600000

700000

0.95

1

1.05

1.1

1.15

1.2

Postgres Sysbench OLTP

2-socket Westmere EP 24p/48 GB

3.10-54 base3.10-54 numaNumaD %

threads

tran

s/se

c

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

15

Numa Performance – Single Oracle Database

RHEL6.4 RHEL6.4 – numad 3.10-54 numa 3.10-54 no numa

RHEL7 vs RHEL6 Oracle OLTP Performance Miminize impact on large single app

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

16

RHEL7 beta Performance Tuning

● RHEL 7 beta potential tuning● tuned-adm profile throughput-performance● tuned-adm profile latency-performance (to turn cstate=1)

● NUMAbalance scheduler via ● echo NO_NUMA > /sys/kernel/debug/sched_feature

● Adjust dirty ratios back to rhel6 40 and 10● vm.dirty_ratio = 40● vm.dirty_background_ratio = 10

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

17

RHEL7 Network Features

• Overview of new Networking Features in RHEL7

• Adaptive Tickless (dynticks) Patchset

• BUSY_POLL Socket Option

• Power Management

• Tunable Workqueues

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

18

RHEL7 Networks 1/3

● IPv4 Routing Cache, bye-bye− Reduce overhead for route lookups

● Socket BUSY_POLL (aka low latency sockets)− Performance numbers later

● 127/8 is (optionally) routable now – for cloud stuff● 40G NIC support, bottleneck moves back to CPU :-)● RFS, aRFS, XPS etc● ipset is included, accelerates complex iptables rules● netsniff-ng included ... ifpps awesome

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

19

RHEL7 Networks 2/3

● SO_REUSEPORT socket option− Multiple sockets listen on same port, TCP & UDP

● Bufferbloat Avoidance – non-LAN-latency situations− TCP Small Queues (tcp_limit_output_bytes)

− CoDel and FW CoDel Packet Schedulers

● TCP Proportional Rate Reduction (PRR)− Improves reaction time of window scaling, 3-10% range

● TCP connection repair− To support LXC, stop TCP connection and restart on another host

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

20

RHEL7 Networks 3/3

● Performance Co-Pilot Support− pmatop awesome, also pmcollectl

● Per-cgroup TCP Buffer Limits− Memory pressure controls for TCP

● Stacked VLANs 802.1ad QinQ Support− Frame header includes > 1 VLAN tag

● PTP full support in 6.5 and 7.0− Requires NIC driver enablement

● Chrony offered instead of ntpd (ntpd still included)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

21

New Networking Features in RHEL7

● Linux Containers (LXC) Network Namespaces−Per-namespace sysctl tunables

● TCP Fast Open socket option−Combines first 2 steps of handshake

● TCP Tail Loss Probe−Reduce impact of lost packets (RTO ~ 15%)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

22

RHEL “tuned” package

# yum install tune*# tuned-adm profile latency-performance# tuned-adm listAvailable profiles:- latency-performance- default- enterprise-storage- virtual-guest- throughput-performance- virtual-host

Current active profile: latency-performance# tuned-adm profile default (to disable)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

23

“tuned” Profile Summary

Tunable default enterprise-storage

virtual-hostvirtual-guest

latency-performance

throughput-performance

kernel.sched_min_granularity_ns

4ms 10ms 10ms 10ms 10ms

kernel.sched_wakeup_granularity_ns

4ms 15ms 15ms 15ms 15ms

vm.dirty_ratio 20% RAM 40% 10% 40% 40%

vm.dirty_background_ratio

10% RAM 5%

vm.swappiness 60 10 30

I/O Scheduler (Elevator)CFQ deadline deadline deadline deadline deadline

Filesystem Barriers On Off Off Off

CPU Governor ondemand performance performance performance

Disk Read-ahead 4x

Disable THP Yes

CPU C-States Locked @ 1

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

24

0

50

100

150

200

250

Impact of Power Management on Latency and HighContext-Switching Workloads (storage/network)

C6 C3 C1 C0

Late

ncy

(Mic

rose

cond

s)

Current status Network off +/-3% Storage +/-5%

Future Plans Impact on Customers

R6 UDP baselineR7 UDP baseline

R6 TCP baselineR7 TCP baseline

R6 UDP lat-perfR7 UDP lat-perf

R6 TCP lat-perfR7 TCP lat-perf

0

10000

20000

30000

40000

50000

60000

net

per

f R

R T

ran

s/s

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

25

Adaptive Tickless (DynTicks) Patchset

● Goal of this patchset is to stop interrupting userspace when

● nr_running=1 (see /proc/sched_debug)

● Idea being that if runqueue depth is 1, then the scheduler

● should have nothing to do on that core

● Move all timekeeping to non-latency-sensitive cores

● Mark certain cores as full_nohz cores

● In addition to cmdline options full_nohz and rcu_nocbs− Also need to move RCU threads yourself (pgrep, taskset, tuna)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

26

Precision Time Protocol (IEEE-1588v2)

● Tech Preview in RHEL 6.4, Full Support in 6.5−Limited driver enablement in 6.4−6.5: bnx2x, tg3, e1000e, igb, ixgbe, and sfc

● Improved synchronization accuracy over NTP−PTP Hardware timestamping most accurate

• Query your NICs PTP capabilities: ethtool -T p1p1

● Improve time sync by disabling tickless kernel−nohz=off−Increased power consumption

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

27

Precision Time Protocol (IEEE-1588v2)

nohz=off

nohz=on

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

28

Adaptive Tickless (DynTicks) Patchset

● Reading:

−https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt

−http://lwn.net/Articles/549580/

−http://www.youtube.com/watch?v=G3jHP9kNjwc

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

29

Timeline of a tick...tick...tick...RHEL5

jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4

Userspace Task Timer Interrupt

Time

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

30

Timeline of a tick...tick...tick...RHEL6 and 7 CONFIG_NO_HZ

jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4

Userspace Task Timer Interrupt Idle

Time

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

31

Timeline of a tick...tick...tick...RHEL7CONFIG_NO_HZ_FULL

jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4

Userspace Task Timer Interrupt

Time

Tickless doesn't require idle

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

32

Examining the tick 1/3

# egrep 'CPU|LOC' /proc/interrupts

# perf list|grep local_timer

irq_vectors:local_timer_entry [Tracepoint event]

irq_vectors:local_timer_exit [Tracepoint event]

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

33

Examining the tick 2/3

# perf stat -C 1 -e irq_vectors:local_timer_entry sleep 1

9 irq_vectors:local_timer_entry

# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 /root/pig -s 1

1,002 irq_vectors:local_timer_entry

Reboot with full_nohz=1 rcu_nocbs=1

# tuna -c 1 -i ; tuna -q \* -c 1 -i

# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 /root/pig -s 1

5 irq_vectors:local_timer_entry

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

34

Examining the tick 3/3 (debugfs)

# mount -t debugfs nodev /sys/kernel/debug# cd /sys/kernel/debug/tracing# echo 1 > events/irq_vectors/enable# cat trace# tracer: nop## entries-in-buffer/entries-written: 432/432 #P:8## _-----=> irqs-off# / _----=> need-resched# | / _---=> hardirq/softirq# || / _--=> preempt-depth# ||| / delay# TASK-PID CPU# |||| TIMESTAMP FUNCTION# | | | |||| | | <idle>-0 [007] dNh. 22793.558298: reschedule_entry: vector=253 <idle>-0 [007] dNh. 22793.558299: reschedule_exit: vector=253 <idle>-0 [000] d.h. 22793.558969: local_timer_entry: vector=239 <idle>-0 [000] d.h. 22793.558977: local_timer_exit: vector=239 <idle>-0 [000] d.H. 22793.558980: irq_work_entry: vector=246 <idle>-0 [000] dNH. 22793.558983: irq_work_exit: vector=246 <idle>-0 [000] d.h. 22793.559970: local_timer_entry: vector=239 <idle>-0 [000] d.h. 22793.559977: local_timer_exit: vector=239...

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

35

NUMA Topology and PCI Bus● Servers may have more than 1 PCI bus.

● Install adapters “close” to the CPU that will run the performance critical application.

● When BIOS reports locality, irqbalance handles NUMA/IRQ affinity automatically.

42:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

# cat /sys/devices/pci0000\:40/0000\:40\:03.0/0000\:42\:00.0/local_cpulist

1,3,5,7,9,11,13,15

# dmesg | grep "NUMA node"

pci_bus 0000:00: on NUMA node 0 (pxm 1)

pci_bus 0000:40: on NUMA node 1 (pxm 2)

pci_bus 0000:3f: on NUMA node 0 (pxm 1)

pci_bus 0000:7f: on NUMA node 1 (pxm 2)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

36

Performance Projects / Tooling

● RHEL6.5 “numad” “tuna”, and “tuned”

● Tuna used to bind IRQ's / real-time like isolation

● Profiling challenges

−Data address profiling (cache-2-cache detection), providing:• the hottest contended cachelines

• the process names, addresses, pids, tids causing that contention

• the cpus they ran on,

• and how the cacheline is being accessed (read or write)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

37

“tuned” Profile Summary

Tunable default enterprise-storage

virtual-hostvirtual-guest

latency-performance

throughput-performance

kernel.sched_min_granularity_ns

4ms 10ms 10ms 10ms 10ms

kernel.sched_wakeup_granularity_ns

4ms 15ms 15ms 15ms 15ms

vm.dirty_ratio 20% RAM 40% 10% 40% 40%

vm.dirty_background_ratio

10% RAM 5%

vm.swappiness 60 10 30

I/O Scheduler (Elevator)CFQ deadline deadline deadline deadline deadline

Filesystem Barriers On Off Off Off

CPU Governor ondemand performance performance performance

Disk Read-ahead 4x

Disable THP Yes

CPU C-States Locked @ 1

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

38

Iozone Performance Effect of TUNED

ext3 ext4 xfs gfs20

500

1000

1500

2000

2500

3000

3500

4000

4500

RHEL6.4 File System In Cache Performance

Intel Large File I/O (iozone)

not tuned

tuned

Th

rou

gh

pu

t in

MB

/Se

c

ext3 ext4 xfs gfs20

100

200

300

400

500

600

700

800

RHEL6.4 File System Out of Cache Performance

Intel Large File I/O (iozone)

not tuned

tuned

Th

rou

gh

pu

t in

MB

/Se

c

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

39

System Tuning Tool - tuna

• Tool for fine grained control

• Display applications / processes

• Displays CPU enumeration

• Socket (useful for NUMA tuning)

• Dynamic control of tuning

• Process affinity

• Parent & threads

• Scheduling policy

• Device IRQ priorities, etc

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

40

Tuna (RHEL6.4/ RHEL7)

1 2

3

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

41

Network Tuning: IRQ affinity● irqbalance for the common case – disable to tune

● New irqbalance automates NUMA affinity for IRQs

● Flow-Steering Technologies

● Move 'p1p1*' IRQs to Socket 1:

● Service irqbalance stop

# tuna -q p1p1* -S1 -m -x

# tuna -Q | grep p1p1

● Manual IRQ pinning for the last X percent/determinism

● Guide on Red Hat Customer Portal

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

42

CPU affinity for IRQs

CPU affinity for PIDs Scheduler Policy Scheduler Priority

Tuna IRQ/CPU affinity context menus

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

43

RHEL6.5 and RHEL7 Virt Performance

RHEL 6.5● Virtio dataplane, 4TB mem limit

RHEL 7● NUMA balance code

● KVM pvticketed_spinlocks, ACPIv

Large Guest Perf● NUMA in a guest, ACPIv, New 4TB mem limit

RHEV 3.3 (Based on RHEL 6.5)

● New memory overcommit manager – MOM

● Network QOS, Native Gluster (libgfapi)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

44

RHEL7 w/ ticketed spinlocks 3.10.0-12.el7 pvticketlocks.x86_64 – note R6 unfair-locks

1 2 4 8 120

20

40

60

80

100

120

140

39

39.5

40

40.5

41

41.5

42

Linpack NxN 20000x20016

Westmere 12core, 64 GB mem, pvticketlocks

Bare-metalnoticketedpvticketed%diff

gflo

ps

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

45

RH 3.10 OLTP Performance

RHEL63 – all nodes 3.6.0-0.24.autonuma28fast.test.x86_64 3.6.10-2.tlw16upstream.fc17.x86_640

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

0.94

0.96

0.98

1

1.02

1.04

1.06

1.08

1.1

R7 / F17 OLTP w/ spinlock backoff

(perf74, 4-socket, 512 GB, 2 FC clarion

80U100Udelta

TP

M

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

46

RH/IBM Top virtualized benchmarks

● SPECvirt2010/2012

● IBM SAP SD 2-tier bare metal / virtualized results− IBM System x3850 X5, 4 socket 40 core 80 thread system− Bare metal 12,560 SD users, KVM (80 CPU guest) 10,700− 85% of bare metal

● IBM TPC-C – World Record w/ DB2

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

47

Virtualization Benchmarks

SPECvirt_sc2013

− Increased workload injection rates

− Multi vcpu guests

• All one vcpu guests in SPECvirt_sc2010

− Up to four tiles using the same database VM

TPC-VMS

− Three independent TPC-C, TPC-H, TPC-E, or TPC-DS benchmarks

• running simultaneously

− Metric is lowest of the three scores

− Large vcpu count guests

− Large disk IO requirements

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

48

SPECvirt2010: RHEL 6 KVM Post Industry Leading Results

http://www.spec.org/virt_sc2010/results/

Virtualization Layer and HardwareBlue = Disk I/OGreen = Network I/O

Client HardwareSystem Under Test (SUT)

> 1 SPECvirt Tile/core> 1 SPECvirt Tile/core

Key Enablers: SR-IOV

Huge Pages

NUMA

Node Binding

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

49

VMware ESX 4.1 HP DL380 G7 (12 Cores, 78 VMs)

RHEL 6 (KVM) IBM HS22V (12 Cores, 84 VMs)

VMware ESXi 5.0 HP DL385 G7 (16 Cores, 102 VMs)

RHEV 3.1 HP DL380p gen8 (16 Cores,150 VMs)

VMware ESXi 4.1 HP BL620c G7 (20 Cores, 120 VMs)

RHEL 6 (KVM) IBM HX5 w/ MAX5 (20 Cores, 132 VMs)

VMware ESXi 4.1 HP DL380 G7 (12 Cores, 168 Vms)

VMware ESXi 4.1 IBM x3850 X5 (40 Cores, 234 VMs)

RHEL 6 (KVM) HP DL580 G7 (40 Cores, 288 VMs)

RHEL 6 (KVM) IBM x3850 X5 (64 Cores,336 VMs)

RHEL 6 (KVM) HP DL980 G7 (80 Cores, 552 VMs)

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

1,221 1,367 1,570

2,4421,878 2,144

2,742

3,824

4,6825,467

8,956

Best SPECvirt_sc2010 Scores by CPU Cores

(As of May 30, 2013)

System

SP

EC

virt

_sc

201

0 sc

ore

Comparison based on best performing Red Hat and VMware solutions by cpu core count published at www.spec.org as of May 17, 2013. SPEC® and the benchmark name SPECvir_sct® are registered trademarks of the Standard Performance Evaluation Corporation. For more information about SPECvirt_sc2010, see www.spec.org/virt_sc2010/.

2-socket 162-socket 12

2-socket 20

4-socket 40

8-socket 64/80

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

50

KVM / RHS Tuning

● gluster volume set <volume> group virt

● XFS mkfs -n size=8192, mount inode64, noatime

● RHS server: tuned-adm profile rhs-virtualization

● Increase in readahead, lower dirty ratio's ● KVM host: tuned-adm profile virtual-host

● Better response time shrink guest block device queue● /sys/block/vda/queue/nr_request (16 or 8)

● Best sequential read throughput, raise VM read-ahead● /sys/block/vda/queue/read_ahead_kb (4096/8192)

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

51

Iozone Performance Comparison RHS2.1/XFS w RHEV

rnd-write rnd-read seq-write seq-read0

1000

2000

3000

4000

5000

6000

7000

Out-of-the-box tuned rhs-virtualization

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

52

RHEL6 Performance Tuning Summary

● Use “Tuned”, “NumaD” and “Tuna” in RHEL6.x ● Tuned selects the deadline IO elevator

● Power savings mode (performance), locked (latency)

● Transparent Hugepages for annon memory (monitor it)

● Multi-instance consider NUMAD

● Virtualization – virtio drivers, consider SR-IOV

● Manually Tune● NUMA – via numactl, monitor numastat -c pid

● Huge Pages – static hugepages for pinned shared-memory

● Managing VM, dirty ratio and swappiness tuning

● Use cgroups for further access control

● Perf and Tuna examples in appendix

RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER#rhconvergence

53

Helpful Links

● Red Hat Low Latency Performance Tuning Guide

● Optimizing RHEL Performance by Tuning IRQ Affinity

● Red Hat Performance Tuning Guide

● Red Hat Virtualization Tuning Guide

● STAC Network I/O SIG

● Finteligent Low Latency Tuning w/KVM