KBC UNICODE: Train is on the Rails - IDUG

49
1 15 October 2008 • 16:15 – 17:15 Platform: DB2 for z/OS Jan Tielemans KBC Global Services NV Session: A12 KBC Unicode Train is on the Rails KBC Group is expanding its business to the East Europe landscape. The ICT division had the challenge to build a common application portfolio for Poland and Belgium, with the option that other daughters from other countries could also join this application portfolio… this was the call for UNICODE applications. Since our data resorts on DB2 for z/OS, we were confronted with converting EBCDIC data to UNICODE as well as building new application for accessing UNICODE data. After a short intro on what is Unicode, we will talk about our choice for UTF- 16 and discuss in more detail our experience on the various DB2 – Unicode – z/OS conversion services items/problems on which we got involved.

Transcript of KBC UNICODE: Train is on the Rails - IDUG

1

15 October 2008 • 16:15 – 17:15Platform: DB2 for z/OS

Jan TielemansKBC Global Services NV

Session: A12

KBC Unicode Train is on the Rails

KBC Group is expanding its business to the East Europe landscape. The ICT division had the challenge to build a common applicationportfolio for Poland and Belgium, with the option that other daughters from other countries could also join this application portfolio… this was the call for UNICODE applications. Since our data resorts on DB2 for z/OS, we were confronted with converting EBCDIC data to UNICODE as well as building new application for accessing UNICODE data. After a short intro on what is Unicode, we will talk about our choice for UTF-16 and discuss in more detail our experience on the various DB2 –Unicode – z/OS conversion services items/problems on which we got involved.

2

2

Presentation Objectives

• Why UNICODE is important for our company• Considerations for choosing Unicode UTF-16• DB2 Performance impact for Unicode applications• Design guidelines for Unicode applications• General recommendations and experiences with

Unicode UTF-16

These are the five bullet points on which this presentation is (maybe) selected to present here at IDUG…. However between the time when this I submitted the abstract and I wrote this presentations… things change. But in general these bullet points will be addresed.

What I will not talk about are items like :a) What is UNICODEb) How does UNICODE workc) How do you plan – budget – organize such an

implementationd) Aspects of Multi National Applications : Translation

(language expressions) – Cultural Formants (some symbols have different meanings is different countries) - Encoding

3

3

Leasing/Factoring

Traditional Retail/SME

Banking

KBC Bank and Insurance Holding

Products

Distribution

Market

892 retailbranches

19 corporatebranches

24 privatebanking

branches

618 tied insurance

agents

2.7 million individuals + 15 555 corporates + 15 406 high net worth individuals

MerchantBanking

CapitalMarkets / Trading

Asset Mgt. / PrivateBanking

Insurance/Re-

insurance

StockBrokerage

Internet / electronicchannels

794Centea bank

agents

Unique Multi-Channel Distribution Platform in Belgium

4

4

The KBC Group is expanding …. In Central, Eastern Europe and Russia

KBC is expanding its business area via acquisitions (and partnerships) of banks and insurance companies in Central, Eastern Europe and Russia. Via the acquisitions also IT departments need to be intergraded…. Some aspects of this is a joined development of applications and making applications available for the KBC Group and not for an individual country.These two aspects was our call for UNICODE !

5

5

KBC in the world

New York

Los AngelesAtlantaMexico Mumbai

NanjingShanghai TaipeiKaohsiungTaichungManilaHong KongShenzhenLabuanKuala LumpurSingapore

Teheran

6

6

ICT Infrastructure• 3 z9 Machines (IBM 2094/720 – 716 – 712)

• Two to run production workload - One for development and acceptance

• DB2 Subsystems (V8 FF)• Data sharing (2 way)• Non data sharing

• IMS as trx manager (V9)• 13M trx/day (DB2 ) ---- 5M (IMS DB)• 600 trx/sec (500 + 100)

• z/OS 1.7 …. Migrated to 1.9• 3 Sysplex Environments• DASD Boxes : DS8300 (85TB) – DS8300T(88TB) – ESS800T(22TB)

• 14K - 20K I/O sec with a 0.03 – 0.05msec• PPRC mirrored between the two Datacenters

HeadquarterBrussel

ExploitationIT datacenter

Mechelen

HeadquarterIT datacenter

Leuven

40km

KBC Datacenter is one of the largest in Belgium, the slide shows some IT information of our datacenter. Notice that the distance between our two data centers is 40Km.

7

7

3-Tier Architecture Model

SolarisWebserver

Application

Connector

z/OS

IMS Conncet

IMS Transaction(Application)

DB2

Web browser

intranet

Web browser

internet

ATM

We currently have a 3tier model in place

8

8

Application Infrastructure

• Network Centric : Head office applications, KBC-Online…• 3-Tier : Client – Midtier – Backend

• Client : “browser based”, limited, no business logic• Midtier : webserver, applications presentation logic…

(Unix, Intel)• Backend : application business logic, data (Mainframe)

• Applications are written in :• APS• VAG• RAD/EGL

• Data in EBCDIC (codepage 500) :• DB2 (since V1.2)• IMS DB

COBOLCOBOLGenerate

We have used various application generators … to generate COBOL. The data is stored in DB2 (started with version 1.2) and has always been in CODEPAGE 500. We only have one codepage !

9

9

Why Unicode: Business case

• Claims application in Poland and Belgium must be redesigned

• Businesses in Poland and Belgium see synergies• ICT will build a common application portfolio for

Poland and Belgium, with the option that other insurance daughters can join the collaboration

• The challenge is building multi-national applications

As the previous slides show, KBC is expanding to Central, Eastern Europe and Russia, joint development and joined application exploitation between different countries

10

10

The Challenge

• Building new applications – Data models in Unicode• Converting existing applications – Data models to

Unicode • Minimal impact on ‘Mainframe’ IT cost

• Backend legacy systems are touched/changed• DB2 and the application

• CPU usage• Storage• Memory

• Development cost• ….

The biggest challenge, the legacy systems where designed in the early 90s. Performance, resource usage where key items in the design. The new systems are designed by new/other people how have a different performance – resource usages and design attitude….

KBC management : we do accept a ‘small’ increase in ‘Mainframe IT cost’ … We now run at ‘good’ CPU utilization – workload – throughput…. We do NOT want to buy additional MIPS to run UNICODE applications…..

11

11

What to choose ?

SolarisWebserver

Application

Connector

z/OS

IMS Conncet

IMS Transaction(Application)

DB2

- UTF-8 or UTF-16 or a combination- Column or Table based Unicode- Function Based or Table based Unicode

Minimal impact on :•Performance•Development cost•Resources usage

One of the aspects of going to UNICODE is making decisions between UTF8 or UTF16 or both – all columns in UNICODE or do we develop a special design in which we separate ‘possible’ UNICODE from traditional character columns. For example an Employee table is divided in two parts and which are linked with the employee number. The old employee table contains the non UNICODE character columns (like ZIPcode, Gendercode,…) and a new employee table is created only with UNICODE columns (Like name, address,….)

For each decision, we must take into consideration : what is the impact on performance (CPU, I/O,…) – development cost,…

12

12

Function based or Table based Unicode

Finance

Marketing Human Resource

Stock

claims

This picture gives a simple overview of the different EBCDIC data models (applications) that are defined in DB2 and show that there are relations between them…..

Adding a new application (CLAIMS) in UNICODE introduces

a) a potential performance problem in joining EBCDIC with UNICODE tables

b) Complexity for development which table is in UNICODE or EBCDIC

13

13

0: no unicode name1 : unicode name

Column or Table based Unicode

• Pro Column based• Controlled overhead

(DASD,CPU)

• Cons Column based• Difficult programming,

management, understanding,…

• Pro Table based• Ease of programming,

management, understanding,…

• Cons Table based• Overhead (DASD,

CPU,..)• Only 5%-10% of the

character based columns can functionally contain UNICODE data …..how much of these will actually contain ‘special characters’

Table EmpName Char(30)…Zipcode Char(6)

CCSID UNICODE

Table EmpName Char(30)Name_U smallint…Zipcode Char(6)

CCSID EBCDIC

Table Emp_UName Char(30)…CCSID UNICODE

If you look at your existing data models.. How many CHAR-VARCHAR columns do you have…. And how many of them can potentially contain ‘special characters’ …. The answer for us was …. Very little ! Is this a good reason for splitting up a table in two parts (EBCDIC – UNICODE data) ? If you answer this with a performance view the answer is YES, but if you look at it from a development –maintenance point of view the answer is NO …. We decided to put every character data in a table in UNICODE

14

14

Unicode UTF-8 or UTF-16

• Pro UTF-8• Optimal for space• Inline with DB2 V8

• Con UTF-8• More programming effort

• Each character can be 1 or 2 bytes

•Substr vs Substring

• Pro UTF-16• Ease of programming

• each character is 2 bytes• Inline with programming

language COBOL, Pic N()• Con UTF-16

• Wasted space, more DASD space ?

• Bigger Bufferpool ?• More CPU consumption ?• More IO’s – GetPages ?

Go for UTF-8 or UTF-16 ? Same as previous slide… but since we use RAD/EGL, which generates COBOL … and COBOL only supports UTF-16 correctly the choice for going for UTF-16 was easier to discus

Konstantin Tadenev of UPS

As you know, Enterprise COBOL has limited support for UTF-8, yet our measurements indicate dramatic performance and capacity advantages of UTF-8 over UTF-16 in DB2, especially when dealing with UTF-8 input data (which is the case in most distributed computing scenarios). We discovered that encoding conversion may cost up to 155% of additional CPU time overhead for a process performing DB2 INSERTS. That is more than 2.5 times in CPU capacity!The IBM position on UTF-8 support in Enterprise COBOL is as follows (ETR Record 88297,180,000): Question:Does the Enterprise COBOL compiler support usage of PIC X for the data items containing UTF-8 data? If so, what are the limitations? If not, are there any plans to address this gap in functionality?Answer:Yes, the Enterprise COBOL Programming guide SC27-1412-05 hits on the topic in Chapter 7. Processing data in an international environment, in section titled'Processing UTF-8 data' where it states:

When you need to process UTF-8 data, first convert the data to UTF-16 in a national data item. After processing the national data, convert it back to UTF-8 for output. For the conversions, use the intrinsic functions NATIONAL-OF and DISPLAY-OF, respectively. Use code page 1208 for UTF-8 data.

As you can see, IBM states that the functionality for UTF-8 support is there, but offers this functionality at a price of increased CPUconsumption, which in turn may prove to be cost-prohibitive.

15

15

KBC Unicode Policy

• UNICODE UTF-16 will be used• All new applications will be developed in UNICODE

UTF-16• All character columns in a table will be in UNICODE

UTF-16• All tables in a ‘function’ will be converted to

UNICODE• No (or minimal) Joining between EBCDIC and

UNICODE tables

Knowing that we have chosen a solution that has a (possible) impact on performance, we will do testing and research on how we and IBM can reduce this overhead. We are working very closely with IBM development (DB2 Chris Crone) on this…

16

16

‘Enhanced 3-Tier Architecture Model’

SolarisWebserver

Application

ConnectorInternal UTF16

Send UTF 8

z/OS

IMS Conncet

IMS TransactionI: Converts UTF8 to UTF 16O: Converts UTF16 to UTF 8

DB2Data Stored in UTF16

PolandSolaris

Webserver

Application

ConnectorInternal UTF16

Send UTF 8

SolarisWebserver

Application

ConnectorInternal UTF16

Send UTF 8

Network capacity :UTF-16 <->UTF-8 : +40% gain + compressions

Because now our data is in UTF-16, we will use twice as much space… also on the network (bandwidth). Since we put everything in UTF-16 and we send ‘big’ messages to the WAS we ‘expect’ bandwidth problems with external connections. That is why we will convert all Outgoing data to UTF-8 and the Connectors (which send the message to the Mainframe) will convert UTF-16 to UTF-8. Test has shown that via this method we gain more than 40% in Network Capacity

17

17

DB2 Support for Unicode

• Migrated to DB2 V8 for ‘better Unicode Support’:• Catalog and Directory converted to UTF-8 (enfm mode)

• Still in EBCDIC : SYSCOPY, SYSEBCDC, SCT02, DBD01, SYSLGRNX, SYSUTILX

• Precompiling/DBRM converted to unicode UTF-8 (nfmmode)

• Controlled NEWFUNC parameter in DSNHDECP• All SQL parsing is done in UTF-8 • Allows mixed codepages in a SQL Join statement• An enhanced set of Unicode SQL Functions

• Eg : substring

When we started thinking about going to UNICODE, we needed DB2 V8 ! Full UNICODE support ….

18

18

Can be handy….BROWSE DB2TS.LIB.DSN810.SDSNDBRM(DSN@CCC4) Line 00000000 Col 001 080 Command ===> Scroll ===> CSR

********************************* Top of Data **********************************DBRM...µW98COMP DSNACCC4.ÄÄd.Âé\..4.......................................1..ØLL.. ..............DBRM...m.......¡.......Ìàáä< êá.!íèíëê..äíêë!ê.ã!ê.ëá<áäè.àñëèñ+äè.ëèêñ&...àâ+ (á...è.........ãê!(.ëßëñâ(...ëßëè â<á& êè.ïçáêá.ëè!êèß&á....á.. +à.îä è+ (á............... +à.àâ+ (á.....àë+àâ....í+ñ!+.ëá<áäè.àñëèñ+äè.ëèêñ&...â...àâ+ (á...è.........ãê!(.ëßëñâ(...ëßëñ+àáì& êè. ...ëßëñâ(...ëßëñ+àáìáë.â.ïçáêá.ëè!êèß&á....á..+à. ...ñì+ (á...â...+ (á. +à. ...îä è+ (á...............ã!ê.ãáèäç.!+<ß.ïñèç.íê.

Display ccsid 1208 or display utf8

BROWSE DB2TS.LIB.DSN810.SDSNDBRM(DSN@CCC4) Converted data shownCommand ===> Scroll ===> CSR

********************************* Top of Data **********************************.......Ö.@..Ãô.cc..bQ.......................................... ..@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@...................................xDECLARE OUTUSR1 CURSOR FOR SELECT DISTINCT STRIP ( DBNAM E , T , ' ' ) FROM SYSIBM . SYSTABLEPART WHERE STORTYPE = 'E' AND VCATNAME <> '00000001' AND DBNAME <> 'DSNDB07' UNION SELECT DISTINCT STRIP ( B . DBNAME , T , ' ' ) FROM SYSIBM . SYSINDEXPART A , SYSIBM . SYSINDEXES B WHERE STORTYPE = 'E' AND A . IXNAME = B . NAME AND A . VCATNAME <> '00000001' FOR FETCH ONLY WITH UR ....@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Via an ISPF browse you can make UTF-8 or UTF-16 data readable, just enter a DISPLAY UTF8/UTF16 command or display ccsid 1208(for UTF-8) , display CCSID 1200 (forUTF-16)

z/OS 1.9 Usability: Support for Editing ASCII Data ISPF editor to support the display and manipulation of ASCII data.No need for the users to download ASCII data files toworkstation to view or change the data SOURCE ASCII commandConverts the data from ASCII to the CCSID of the terminal using z/OS Unicode Services Data converted back toASCII when savedNew edit LF primary and macro commandRestructures the data display using the ASCII linefeedcharacter (x’0A’) as record del

19

19

DB2 Support for Unicode

• Supports UTF-8 and UTF-16 equally• Defined on the Table(space)

• Char/VarChar/CLOB for UTF-8• Graphic/VarGraphic/DBCLOB for UTF-16

CREATE TABLE ………( Co11 GRAPHIC(18) NOT NULL

,Col2 GRAPHIC(4) NOT NULL,Col3 GRAPHIC(6) NOT NULL ,Col4 VARGRAPHIC(100),Col5 DATE ,Col6 CHAR(10),Col7 SMALLINT ,COl8 INTEGER

) IN DBI.TSI CCSID UNICODE NOT VOLATILE

36 bytes

200+2 bytes

No mixed CCSID in a tableUnicode : UTF-8 and UTF-16 are allowed

10 bytes

You can mix UTF-8 and UTF-16 data in one table, CHAR/VARCAHR/CLOB are for UTF-8, whileGraphic/VarGraphic/DBCLOB are for UTF-16. Please note that the length field for UTF-8 columns defines the number of bytes and NOT the number of characters (MBCS). For UTF-16 you have twice as much bytes since it is a DBCSAlthough is not syntactically correct, in our environment UTF16 can handle all our special characters in 2bytes an will never need 4bytes for a character !

20

20

Myths or Reality

• UTF-16 :• CPU increase

• Conversion costs (EBCDIC vs UNICODE)• Bytes moved from DB2 to the application times 2• It takes twice the time/cost to do the

compare/move • Twice as much bytes to sort

• Data space is doubled, need bigger bufferpools• Sortspace(DSNDB07) times 2, bigger sort

bufferpool

Select name, adress, departnm From UEmp a, UDept bWhere a.departcode = b.departcodeOrder by adress, departnm

The increase of CPU is not only because of the conversion cost … Remember the rule that you only should select the columns that your application needs, because DB2 returns on a column by column base and each columns requires a movement of cross memory bytes to the applications.. The movement of bytes costs cpu cycles… by going to UTF16 will we these cpu cycles be doubled ?Same question if DB2 compares 2 fields, compare two times 30 bytes vs compare two times 60 bytes >> is there an CPU increase ?

What about the DB2 Bufferpools, rows get longer (beyond 4K). Buffer data must be assigned to 8K or 16K bufferpools. These new bufferpools need to be defined and monitored as well…

More bytes to sort in unicode, more space more CPU ?

21

21

Conversion Cost : Mainframe Program

Local Application

SelectHV1 500

InsertHV1 500

DB2

DBM1Class 2 CPU Time

Graphic column

Graphic column

Table inUnicode

Conversions are automatically done by DB2, in this example a Mainframe application inserts EBCDIC data into a UNICODE column, DBM1 detects the codepage differences and requests the conversion (z/OS conversion services). Since DBM1 does the request, he gets the cpuaccounted.

22

22

Conversion Cost : DRDA Protocol

Graphic column

DDF DBM1

Z/OSTable inUnicode

Graphic column

AIXDISABLEUNICODE=1

Sent data in ASCII

DISABLEUNICODE=0

Sent data in UNICODE

DB2 Connect

ConversionRoutines

DRDACommon

LayerC ProgamINSERTHV1 1253

SELECT

HV1 1253

Receiver make right !

In a distributed process, things are different. For example : with DRDA protocol, the receiver makes right. So if a client program needs data in an ASCII field and retrieves data from a DB2 Unicode column the receiver (in this case DB2 Connect) will do the conversion.

If you are sending non UNICODE data to DB2 on z, using DB2 connect, you have the possibility to decide where the conversion will take place..

23

23

z/OS Conversion Services• What is this :

• Central repository (conversion tables) for z/OS system used by :• ODBC Driver• COBOL• DB2

• High performance conversion method :• Uses HW instructions available in zSeries 800, 900, 990, z9, z10

• CU12 (alias for CUTFU): Convert from UTF-8 to UTF16 • CU21 (alias for CUUTF): Convert from UTF16 to UTF-8 • TROO: Translate from one byte to another one byte (used to convert

single-byte codes) • TRTO: Translate from two bytes to one byte (used to convert from

UTF16 to EBCDIC) • TROT: Translate from one byte to two bytes (used to convert from

EBCDIC to UTF-16) • TRTT: Translate two bytes to two bytes (used to convert between

UTF16 and double-byte EBCDIC). • See also article in z/Journal : April/May 2006 DB2 for z/OS

Version 8: Improved Unicode Conversion By LaxminarayanSriram

z/os conversions Services can do conversion to and from ASCII – UNICODE and EBCDIC !

24

24

Character Conversion support in DB2 V8• DB2 catalog table SYSIBM.SYSSTRINGS

• For EBCDIC <-> ASCII conversions• For Unicode conversions DB2 uses z/OS conversion services

• To ensure that conversions between EBCDIC and UTF-8 are as fast as possible, in some cases DB2 V8 performs so-called "inline conversion" instead of calling the z/OS Conversion Services. As a general rule, inline conversion can be done when a string consists of single-byte characters in UTF-8. This conversion enhancement is not available in V7, nor is it available for conversions between EBCDIC and UTF-16 and vice versa.

• In addition, z/OS V1.4 has improved the performance of EBCDIC to UTF-8 (and vice versa) conversions by streamlining the conversion process, and V1.4 dramatically outperforms z/OS 1.3 conversions.

• On top of that, zSeries machines have hardware instructions that assist CCSID conversion. These instructions were first implemented on the z900, and have been enhanced on the z990 machines.

Although z/OS conversion services offer a high speed conversion performance to/from ASCII – EBCDIC, DB2 will still use his own SYSSTRINGS conversion method…There are no plans in changing this.

25

25

z/OS support for Unicode

• Implication of the HW instruction set is ….• Conversion from EBCDIC to UTF-8

• Step 1: EBCDIC to UTF-16 (via TROT)• Step 2: UTF-16 to UTF-8 (via CU21)

• All access to the DB2 catalog !• Spufi : Select * From SYSIBM.SYSTABLES

• DB2 V8 performs so-called "inline conversion"

Since there are no HW instructions for converting EBCDIC directly to UTF16, an extra HW conversion is needed. TROT+CU21Conversion between EBCDIC and UTF-8 is two stage, but that's because it is the most efficient way to do it (at least generically). That's because you can use tables to do the conversion (table driven conversion will always beat algorithmic conversion). So doing the conversion as two stage is actually faster because we can use the HW (for both conversions).

26

26

Conversions requested by DB2

• Bind out Process • Moves data out of the DB2 addresspace into the

application addresspace• Row by Row – Column by Column

• Setup Cost dominates the conversion cost (less size dependent)• For each row sent to the application, column conversions are done !•2 Columns – 1000 rows

•2000 conversions !

c1 c2 c3 cn

Produce the result set

Bind out1 row

Application wants the

data in EBCDIC

Column basedData Conversions

a

b

a Setup Cost (fixed cost)

b Conversion Cost Like manyother instructions, the CPU time of these instructionsincreases in proportion to the string length

Select c1, c2,c3,…cnfromUNICODE TableWhere…

QMF1000 rows

Since DB2 moves each column separately to the application, each column needs to be converted. The conversions cost consists out of two components : the SETUP cost (a) and the actual conversion cost (b). As this slide shows: the setup cost is the biggest part of the cost ! The actual conversion cost is that much impacted by the length of the string.

So if your application requests 2 columns and 1000 rows….. DB2 does (requests) 2000 calls to conversion services …..

27

27

z/OS 1.7 Unicode Conversion ServicesTS Scan in …... Result in EBCDIC

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7

CPU

Sec

onds

Unicode

EBCDIC

DSNTIAUL job : unload 45120 row, with 66 Graphic columns, 6 date columns

We did some performance test… We se tup 2 identical tables one in EBCDIC (Char columns) and one in UNICODE (Graphic columns) (Indexes, Reorg, Runstats). We use DSNTIAUL to unload the data to a sequential file. We ran 7 jobs for each code page, and for each run we increased the number of columns in the select statement.

The purple blocks is the unload from an EBCDIC table, we see a small CPU increase if we increase the number of columns….The blue blocks is the unload from the UNICODE table, spectacular cpu increase if we increase the number of columns ….>>> Call IBM

28

28

z/OS 1.7 Unicode Conversion Services

• APAR OA21903 : Unicode services (z/OS 1.7)

• Its is still +2 times more CPU vs the EBCDIC version!

TS scan on UNICODE tbl, result is EBCDIC

0

2

4

6

8

10

12

14

16

18

1 2 3 4

CPU Seconds

First a USER FIX(?) was send to us, which reduced the conversions overhead significantly… but it is still twice as much as the EBCDIC job !

29

29

z/OS 1.9 Unicode Conversion Services• Then came z/OS 1.9 …..

• CPU Increase in conversion Services !!• AparFix CA23705 (APAR OA23705)

TS Scan and resuilt in EBCDIC

0,000,501,001,502,002,503,003,504,004,505,00

EB

CD

IC

EB

CD

IC

Uni

code

Uni

code

z/OS 1.9z/OS 1.7

DSNTIAUL job : unload 45120 row, with 66 Graphic columns, 6 date columns

Same test for z/OS 1.9, the red circle is the same test as for z/OS 1.7. Again an increase, again a USER FIX was sent

30

30

z/OS 1.9 Unicode Conversion Services• After OA23705

• Better but still 6% increase• Found a ‘problem’

• with varying string sizes and seeing a difference in the amount of time it took to perform that conversion between

• Fix not yet available (29/08/2008)• Z/OS conversion services is now part of the z/OS

performance benchmark

Slide Change

The APAR fix did not close the CPU gap completely. Lab did found a possible problem… and is testing the fix…Hopefully I can give you the APAR number and the results at this presentation at IDUG

31

31

z/OS Unicode Conversion Services

** PROGRAM SECTION USAGE SUMMARY **

MODULE SECTION 16M SECT FUNCTION % CPU TIME NAME NAME <,> SIZE SOLO TOTALSYSTEM .DB2 DB2 SYSTEM SERVICES 62.78 64.44SYSTEM .SVC SUPERVISOR CONTROL 2.22 2.22 SYSTEM .XMEMORY CROSS MEMORY 1.11 1.11

----- -----SYSTEM TOTALS SYSTEM SERVICES 66.11 67.77CUNMUNI > 230392 31.67 32.22

----- -----PROGRAM IKJEFT01 TOTALS 97.78 100.00

Some OEM Report …….

How did we know that the CPU increase was in the z/OS conversion services ?

32

32

Conversions Requested by DB2EXEC SQL DECLARE PX01 CURSOR

FOR SELECT KLN_NR -- graphic(7) FROM UNICODE_TABLE_UTF16WHERE KLN_NR > :KLNR-START -- PicX(7)

END-EXEC

EXEC SQL FETCH PX01 INTO :KLNR-FETCH -- Pic x(7)

END-EXEC

SQLCODE -330 rc16 and runtimeMust be defined as Pic N(7)

If not correctly defined, CONVERSIONS !

EXEC SQL DECLARE PX01 CURSOR FOR SELECT U.KLN_NR -- graphic(7) FROM UNICODE_TABLE_UTF16 U, EBCDIC_TABLE EWHERE U.KLN_NR = E.KLN_NR

AND U.KLN_NR > :KLNR-START -- Pic n(7)

EXEC SQL FETCH PX01 INTO :KLNR-FETCH -- Pic N(7)

END-EXEC

CONVERSIONS !

•For each row of the outer table, conversion of the column needs to be done•The more columns are joined the higher the conversion cost will be•Cost for unnecessary join criteria !

Where does DB2 call for conversions ?

On the fist SQL statement, you get a runtime error when you define it as Pic X(7) must be a PIC N(). But if in host variable is not correctly defined => conversion !

The second SQL statement shows a join between a UNICODE and EBCDIC column, if you define all Host Variables correctly, you will not get conversion. However, the compare of a UNICODE and EBCDIC column does implies conversion from the EBCDIC value to a UNICODE value !So, the rule here is : do not code unnecessary joins between UNICODE column and an EBCDIC column. For each row in the outer table, a conversion needs to be done. So hopefully will DB2 chose a good outer table…

33

33

Conversions Requested by DB2

• Test the performance of Unicode Conversion services after z/OS maintenance or Upgrade !

• Conversions are CPU expensive in DB2, when possible do it in your application

01 GRP-EBCDIC03 NM-E PIC X(120)..03 ADRS-E PIC X(60)

01 GRP-UNICODE03 NM-U PIC N(120)..03 ADRS-U PIC N(60) X

If you need to convert, do the conversion in a application and do it with a groups move => one conversion for x columns in the group

34

34

Date/Time fields and Unicode• In a UNICODE table, DB2 represents Date/Time columns as

UTF-8• DCLGEN generates PIC X(10) for these fields

• MR open to generate N(10) • By choosing UTF-16 in the application, we will have a

conversions cost

• V8 Bind Out• From internal format to UTF-8 date/time/timestamp value was

1.05X• From internal format to UTF-16 date/time/timestamp value was

2X• V9 Bind Out

• From internal format to UTF-16 date/time/timestamp reduced to 1.10x

• DB2 is doing the CU12, rather then using Converison Services

Select a.Uni_date_endFrom Unicode_table

Where a.Uni_date_start = :hv

In application defined as Pic N(10)

For date and time fields, DB2 internally transforms these to UTF8, Since we decided that ALL character data will be in UTF-16, we need to modify the result of the DECLGEN (Marketing request open to make it available in the DECLGEN function). But we will have an extraconversion cost !This conversion cost will be less in the V9 release

35

35

Date/Time fields and Unicode

• Unload – Load utility only supports date/time and timestamps in UTF-8

• “Upcoming” V9 implementations :• Unload utility :

• Unload date/time and timestamps to a UTF-16 • Load Utility :

• Load date/time and timestamps in UTF-16

The UNLOAD and LOAD utility, will use UTF-8 as the datatype for Date/time fields. This will be changed in the nearby future…to also support UTF-16

36

36

Date/Time fields and Unicode

• Any Local Date(dd/mm/yyyy) – Local Time installations ?• DSNXVDTA ASCII exit (must exist)• DSNXVDTU Unicode exit (must exist)• DSNXVDTX EBCDIC exit

* * Check day value UNICODE*

LA 3,LDAGLA 9,L'LDAG

DAGFM CLI 0(3),X'30' Is 0 for UnicodeBL FORMATR CLI 0(3),X'39' Is 9 for UnicodeBH FORMATR LA 3,1(3) BCT 9,DAGFM

SLASH1 CLI LSLASH1,X'2F' Is a / in UnicodeBNE FORMATR

* * Check day value EBCDIC*

LA 3,LDAG LA 9,L'LDAG

DAGFM CLI 0(3),C'0'BL FORMATR CLI 0(3),C'9'BH FORMATR LA 3,1(3) BCT 9,DAGFM

SLASH1 CLI LSLASH1,C'/'BNE FORMATR

If you use LOCAL date, don’t forget to write the UNICODE date/time exit !

37

37

Standards

• COBOL (and PL/I) is UTF-16 oriented, PIC N()• DB2 is (re) designed for UTF-8

• SQL Parsing is done in UTF-8• Catalog/Directory in UTF-8• DBRM’s in UTF-8• Date and Time columns are defined in UTF-8• Columns can be defined in UTF-8 or UTF-16

• z/OS & HW conversion instruction set is both UTF-8 and UTF-16 oriented

Although the For z/OS conversion services can not convert directly from EBCDIC to UTF8 does not mean that is it UTF-8 oriented. Conversion between EBCDIC and UTF-8 are two stage, but that's because it is the most efficient way to do it (at least generically). That's because you can use tables to do the conversion (table driven conversion will always beat algorithmic conversion). So doing the conversion as two stage is actually faster because we can use the HW (for both conversions).

38

38

UTF-16 and DASD Space

Note: Tablespace is COMPRESSedOne table

All columns graphic or Date

EB_IDX1 UN_IDX1 FIRSTKEYCARD . . . . 440094 440094 FULLKEYCARD . . . . 3985333 3985333 NLEAF . . . . . . . . . . . . 39556 70100 77,2% NLEVELS . . . . . . . . . 3 3CLUSTERRATIO . . . . . 100 100SPACEF . . . . . . . . . . . 173520 291600 68,0%AVGKEYLEN . . . . . . . 22 44 100,0%

EB_IDX2 UN_IDX2FIRSTKEYCARD . . . . 2540258 2540258 FULLKEYCARD . . . . 3985333 3985333 NLEAF . . . . . . . . . . . . 44780 79707 78,0%NLEVELS . . . . . . . . . 4 4

CLUSTERRATIO . . . . . 68 68SPACEF . . . . . . . . . . . 173520 360000 107,5%AVGKEYLEN . . . . . . . 22 44 100,0%

EB_IDX3 UN_IDX3 FIRSTKEYCARD . . . . 1016 1016 FULLKEYCARD . . . . 595289 595289 NLEAF . . . . . . . . . . . . 11215 13808 23,1% NLEVELS . . . . . . . . . 3 3CLUSTERRATIO . . . . . 99 99SPACEF . . . . . . . . . . . 50400 57600 14,3%AVGKEYLEN . . . . . . . 11 22 100,0%

Index levels ‘might’ increase !

This slide shows a space increase for UNICODE, even if compression is used ! If you have character based indexes, the space increase can be big.Note that we do not have V9 (yet), so we could not do any tests with index compression …We do have a lot of character based indexes in our data models !

The ‘small’ space increase for IDX3 is because this index can contain DUPLICATE’S

39

39

UTF-16 and DASD Space• Less impact on Random I/O applications• Bigger impact for Sequential applications (batch jobs)

• To consider the use of 8K, 16K, 32K bufferpools• Data ONLY,indexes always in 4K bufferpools• More rows per page, but MAX is still 255 rows per page

• Use NOT PADDED if you have indexes on variable length columns !

• Does V9 with index compression help ?• Index can be defined in a larger BP size with compression• Only in the storage area• Index pages are expanded in the bufferpool

We do not expect a big I/O increase for the online transaction, since this is a merely I/O random operation. But for the batch jobs, we have an increase. On a case by case study, will we investigate if we will have benefit if we put 4K data in a 8K or 16K bufferpool. By putting data in a larger bufferpool, you could get an I/O advantage : two I/O’s for a 4K buffer can become one I/O for a 8K buffer…in theory… It depends merely on the compressed row size ( only 255 rows on a page…)

40

40

UNICODE UTF-8 & Unload Utilityselect hex(unicol) from unicode_table_UTF8 WHERE unicol = '0'---------+---------+------3030303030

select hex(unicol) from unicode_table_UTF8 WHERE unicol = X'30'---------+---------+------3030303030

UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF8HEADER NONE WHEN (unicol1 = '0')SHRLEVEL CHANGE ISOLATION UR

HIGHEST RETURN CODE=0

UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF8HEADER NONE WHEN (unicol1 = X‘30')SHRLEVEL CHANGE ISOLATION UR

HIGHEST RETURN CODE=0

Column unicol defined as char(1) -> UTF8

The next slides will handle the UTF-16 support in the UNLOAD and LOAD Utility. By a simple test case we see here that the unload fully supports UTF-8

41

41

UNICODE UTF-16 & UNLOAD Utilityselect hex(unicol) from unicode_table_UTF16 WHERE unicol = '0'---------+---------+------00300030003000300030

select hex(unicol) from unicode_table_UTF16 WHERE unicol = UX'0030' ---------+---------+------00300030003000300030

UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF16HEADER NONE WHEN (unicol = '0')SHRLEVEL CHANGE ISOLATION UR

INVALID OPERAND '0' FOR KEYWORD ‘unicol'

select hex(unicol) from unicode_table_UTF16 WHERE unicol = X'0030'---------+---------+------DSNE610I NUMBER OF ROWS DISPLAYED IS 0

INVALID OPERAND ‘ ' FOR KEYWORD ‘unicol'

UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF16HEADER NONE WHEN (unicol = UX‘0030')SHRLEVEL CHANGE ISOLATION UR

UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF16HEADER NONE WHEN (unicol = X‘0030')SHRLEVEL CHANGE ISOLATION UR

UTILITY EXECUTION COMPLETE, HIGHEST RETURN CODE=0

Column unicol defined as graphic(1) -> UTF16

MR0228083923 opened, and accepted

UX :Hexadecimal Unicode string UTF16 only

Same test for an UTF-16 column gives some strange results … The second SELECT is syntactically correct, but does not give any results ?! The third select uses the hexadecimal UNICODE code string UX’ ‘When we use the where as when clause in the UNLOAD, we get strange results as this slide shows… only the second UNLOAD gives results, although the corresponding select did not return any rows.. But this is not useable in production jobs… write your unload string in HEX ! MR opened and accepted….. to make it possible to use a ‘normal’ when selection on UNICODE columns

42

42

UNICODE UTF-8 & UNLOAD Utility

• Convert Unicode UTF-8 to EBCDICUNLOAD DATA

FROM TABLE unicode_table_UTF8UNLDDN UNLFILE EBCDIC CCSID(id1,id2,id3)

Specifies that all output data of the character type is to be in …..

Specifies up to three coded character set identifiers (SBCS, MBCS, DBCS) that are to be used for the dataof character type in the output records, including data that is unloaded in the external character formats

UNLDDN UNLFILE EBCDIC CCSID(,500,)

UTILITY EXECUTION COMPLETE, HIGHEST RETURN CODE=0

Be aware of possible EBCDIC substitution characters (X'3F‘) in the output file

Since our data warehouse still remains in EBCDIC, we need to convert, via the unload utility, UTF16 to EBCDIC. According to the manuals this is possible…We accept the fact that not all UNICODE characters can be converted to an EBCDIC character, and there for some columns can have the EBCDIC substitution character (X’3F’).

This slide shows that this is possible for a UTF-8 column -table

43

43

UNICODE UTF-16 & UNLOAD Utility

• Convert Unicode UTF16 to EBCDICUNLOAD DATA

FROM TABLE unicode_table_UTF16UNLDDN UNLFILE EBCDIC CCSID(id1,id2,id3)

Specifies that all output data of the character type is to be in …..

Specifies up to three coded character set identifiers (SBCS, MBCS, DBCS) that are to be used for the dataof character type in the output records, including data that is unloaded in the external character formats

UNLDDN UNLFILE EBCDIC CCSID(,,500)

ERROR IN CCSID TRANSLATION FOR "DBCS PAD CHAR", FROM CCSID 1200 TO 500

IBM Response -> HPUnload supports this ($)…..MR1218066147

But it does not work for UTF-16 data ! After reporting this to IBM, the answer was .. Buy the HPunload…. MR closed !

44

44

UNICODE & DB2 UTILITIES

• DSNTIAUL will cause to much conversion overhead….

IBM UNLOAD DSN EBCDIC

UTF16

PGM IBM LOAD CARDS

IBM LOAD CARDS

UNICODE Table

EBCDIC Table

• Our solution for the UNICODE (UTF-16) <-> EBCDIC data conversions :

EBCDIC DSN

We built a special program that reads the UNICODE unload file together with the load cards, the program will take care of the conversion (one move statement to do all the UNICODE to EBCDIC conversions). The appropriate LOAD cards will be generated as well

45

45

Nice to know

• REXX does not support UTF-16• No RI possible between a Unicode table and an EBCDIC table

46

46

To summarize our UTF-16 experiences• Unicode overhead :

• CPU increase :• Eliminate conversions as much as possible…..• Improve date/time/timestamp to UTF16 conversion in V9• V9 end this year, Q2 2009 production

• Index Space increase :• DB2 does not support UTF-8 and UTF-16 equally

• From a SQL point of view it does• From a UNLOAD/LOAD view it does NOT

• More usage of BP8K and BP16K bufferpools• Need more memory

• Majority of Unicode problems, performance, …..• RAD/EGL, not generation/supporting the correct COBOL• Cobol compiler• LE runtime environment

47

47

Take your time

• Implementing Unicode is not a ‘normal’ project• Multiple skills are needed (Architect, Application, Development

Tools, Application Design, Database Administration, z/OS Technology, DB2 Technology,…)

• It is worth a thorough study• A learning curve of a couple of years is not unusual

• The difficulty is when introducing Unicode in an existing application portfolio

• Take your time for the first implementation projects• Building applications will take more time (+50% or more?)• Spend attention to testing (+100%?)• Try as good as possible to limit the impact on other applications

48

48

Maturity of technology

• Unicode support is still in evolution• Unicode is not new on mainframe (e.g. DB2 has introduced

mixed data since V2.3)• Unicode in development tools doesn’t exist so long• Still functional and performance enhancements• OEM vendors ‘may’ not support it fully

• How can I browse/edit unicode UTF16 data through ISPF ?

• What about output archiving tools ? • Can debugger tools and file browsing tools show

the representable UTF-16 characters ?• Mainframe perception is different compared to Open systems

• Mainframe has a longer tradition of measuring resource usage on an application level

• Mainframe hardware runs at a higher CPU busy %

49

49

KBC Global Services [email protected]

Session: A12KBC Unicode Train is on the Rails

Questions ?