KBC UNICODE: Train is on the Rails - IDUG
-
Upload
khangminh22 -
Category
Documents
-
view
2 -
download
0
Transcript of KBC UNICODE: Train is on the Rails - IDUG
1
15 October 2008 • 16:15 – 17:15Platform: DB2 for z/OS
Jan TielemansKBC Global Services NV
Session: A12
KBC Unicode Train is on the Rails
KBC Group is expanding its business to the East Europe landscape. The ICT division had the challenge to build a common applicationportfolio for Poland and Belgium, with the option that other daughters from other countries could also join this application portfolio… this was the call for UNICODE applications. Since our data resorts on DB2 for z/OS, we were confronted with converting EBCDIC data to UNICODE as well as building new application for accessing UNICODE data. After a short intro on what is Unicode, we will talk about our choice for UTF-16 and discuss in more detail our experience on the various DB2 –Unicode – z/OS conversion services items/problems on which we got involved.
2
2
Presentation Objectives
• Why UNICODE is important for our company• Considerations for choosing Unicode UTF-16• DB2 Performance impact for Unicode applications• Design guidelines for Unicode applications• General recommendations and experiences with
Unicode UTF-16
These are the five bullet points on which this presentation is (maybe) selected to present here at IDUG…. However between the time when this I submitted the abstract and I wrote this presentations… things change. But in general these bullet points will be addresed.
What I will not talk about are items like :a) What is UNICODEb) How does UNICODE workc) How do you plan – budget – organize such an
implementationd) Aspects of Multi National Applications : Translation
(language expressions) – Cultural Formants (some symbols have different meanings is different countries) - Encoding
3
3
Leasing/Factoring
Traditional Retail/SME
Banking
KBC Bank and Insurance Holding
Products
Distribution
Market
892 retailbranches
19 corporatebranches
24 privatebanking
branches
618 tied insurance
agents
2.7 million individuals + 15 555 corporates + 15 406 high net worth individuals
MerchantBanking
CapitalMarkets / Trading
Asset Mgt. / PrivateBanking
Insurance/Re-
insurance
StockBrokerage
Internet / electronicchannels
794Centea bank
agents
Unique Multi-Channel Distribution Platform in Belgium
4
4
The KBC Group is expanding …. In Central, Eastern Europe and Russia
KBC is expanding its business area via acquisitions (and partnerships) of banks and insurance companies in Central, Eastern Europe and Russia. Via the acquisitions also IT departments need to be intergraded…. Some aspects of this is a joined development of applications and making applications available for the KBC Group and not for an individual country.These two aspects was our call for UNICODE !
5
5
KBC in the world
New York
Los AngelesAtlantaMexico Mumbai
NanjingShanghai TaipeiKaohsiungTaichungManilaHong KongShenzhenLabuanKuala LumpurSingapore
Teheran
6
6
ICT Infrastructure• 3 z9 Machines (IBM 2094/720 – 716 – 712)
• Two to run production workload - One for development and acceptance
• DB2 Subsystems (V8 FF)• Data sharing (2 way)• Non data sharing
• IMS as trx manager (V9)• 13M trx/day (DB2 ) ---- 5M (IMS DB)• 600 trx/sec (500 + 100)
• z/OS 1.7 …. Migrated to 1.9• 3 Sysplex Environments• DASD Boxes : DS8300 (85TB) – DS8300T(88TB) – ESS800T(22TB)
• 14K - 20K I/O sec with a 0.03 – 0.05msec• PPRC mirrored between the two Datacenters
HeadquarterBrussel
ExploitationIT datacenter
Mechelen
HeadquarterIT datacenter
Leuven
40km
KBC Datacenter is one of the largest in Belgium, the slide shows some IT information of our datacenter. Notice that the distance between our two data centers is 40Km.
7
7
3-Tier Architecture Model
SolarisWebserver
Application
Connector
z/OS
IMS Conncet
IMS Transaction(Application)
DB2
Web browser
intranet
Web browser
internet
ATM
We currently have a 3tier model in place
8
8
Application Infrastructure
• Network Centric : Head office applications, KBC-Online…• 3-Tier : Client – Midtier – Backend
• Client : “browser based”, limited, no business logic• Midtier : webserver, applications presentation logic…
(Unix, Intel)• Backend : application business logic, data (Mainframe)
• Applications are written in :• APS• VAG• RAD/EGL
• Data in EBCDIC (codepage 500) :• DB2 (since V1.2)• IMS DB
COBOLCOBOLGenerate
We have used various application generators … to generate COBOL. The data is stored in DB2 (started with version 1.2) and has always been in CODEPAGE 500. We only have one codepage !
9
9
Why Unicode: Business case
• Claims application in Poland and Belgium must be redesigned
• Businesses in Poland and Belgium see synergies• ICT will build a common application portfolio for
Poland and Belgium, with the option that other insurance daughters can join the collaboration
• The challenge is building multi-national applications
As the previous slides show, KBC is expanding to Central, Eastern Europe and Russia, joint development and joined application exploitation between different countries
10
10
The Challenge
• Building new applications – Data models in Unicode• Converting existing applications – Data models to
Unicode • Minimal impact on ‘Mainframe’ IT cost
• Backend legacy systems are touched/changed• DB2 and the application
• CPU usage• Storage• Memory
• Development cost• ….
The biggest challenge, the legacy systems where designed in the early 90s. Performance, resource usage where key items in the design. The new systems are designed by new/other people how have a different performance – resource usages and design attitude….
KBC management : we do accept a ‘small’ increase in ‘Mainframe IT cost’ … We now run at ‘good’ CPU utilization – workload – throughput…. We do NOT want to buy additional MIPS to run UNICODE applications…..
11
11
What to choose ?
SolarisWebserver
Application
Connector
z/OS
IMS Conncet
IMS Transaction(Application)
DB2
- UTF-8 or UTF-16 or a combination- Column or Table based Unicode- Function Based or Table based Unicode
Minimal impact on :•Performance•Development cost•Resources usage
One of the aspects of going to UNICODE is making decisions between UTF8 or UTF16 or both – all columns in UNICODE or do we develop a special design in which we separate ‘possible’ UNICODE from traditional character columns. For example an Employee table is divided in two parts and which are linked with the employee number. The old employee table contains the non UNICODE character columns (like ZIPcode, Gendercode,…) and a new employee table is created only with UNICODE columns (Like name, address,….)
For each decision, we must take into consideration : what is the impact on performance (CPU, I/O,…) – development cost,…
12
12
Function based or Table based Unicode
Finance
Marketing Human Resource
Stock
claims
This picture gives a simple overview of the different EBCDIC data models (applications) that are defined in DB2 and show that there are relations between them…..
Adding a new application (CLAIMS) in UNICODE introduces
a) a potential performance problem in joining EBCDIC with UNICODE tables
b) Complexity for development which table is in UNICODE or EBCDIC
13
13
0: no unicode name1 : unicode name
Column or Table based Unicode
• Pro Column based• Controlled overhead
(DASD,CPU)
• Cons Column based• Difficult programming,
management, understanding,…
• Pro Table based• Ease of programming,
management, understanding,…
• Cons Table based• Overhead (DASD,
CPU,..)• Only 5%-10% of the
character based columns can functionally contain UNICODE data …..how much of these will actually contain ‘special characters’
Table EmpName Char(30)…Zipcode Char(6)
CCSID UNICODE
Table EmpName Char(30)Name_U smallint…Zipcode Char(6)
CCSID EBCDIC
Table Emp_UName Char(30)…CCSID UNICODE
If you look at your existing data models.. How many CHAR-VARCHAR columns do you have…. And how many of them can potentially contain ‘special characters’ …. The answer for us was …. Very little ! Is this a good reason for splitting up a table in two parts (EBCDIC – UNICODE data) ? If you answer this with a performance view the answer is YES, but if you look at it from a development –maintenance point of view the answer is NO …. We decided to put every character data in a table in UNICODE
14
14
Unicode UTF-8 or UTF-16
• Pro UTF-8• Optimal for space• Inline with DB2 V8
• Con UTF-8• More programming effort
• Each character can be 1 or 2 bytes
•Substr vs Substring
• Pro UTF-16• Ease of programming
• each character is 2 bytes• Inline with programming
language COBOL, Pic N()• Con UTF-16
• Wasted space, more DASD space ?
• Bigger Bufferpool ?• More CPU consumption ?• More IO’s – GetPages ?
Go for UTF-8 or UTF-16 ? Same as previous slide… but since we use RAD/EGL, which generates COBOL … and COBOL only supports UTF-16 correctly the choice for going for UTF-16 was easier to discus
Konstantin Tadenev of UPS
As you know, Enterprise COBOL has limited support for UTF-8, yet our measurements indicate dramatic performance and capacity advantages of UTF-8 over UTF-16 in DB2, especially when dealing with UTF-8 input data (which is the case in most distributed computing scenarios). We discovered that encoding conversion may cost up to 155% of additional CPU time overhead for a process performing DB2 INSERTS. That is more than 2.5 times in CPU capacity!The IBM position on UTF-8 support in Enterprise COBOL is as follows (ETR Record 88297,180,000): Question:Does the Enterprise COBOL compiler support usage of PIC X for the data items containing UTF-8 data? If so, what are the limitations? If not, are there any plans to address this gap in functionality?Answer:Yes, the Enterprise COBOL Programming guide SC27-1412-05 hits on the topic in Chapter 7. Processing data in an international environment, in section titled'Processing UTF-8 data' where it states:
When you need to process UTF-8 data, first convert the data to UTF-16 in a national data item. After processing the national data, convert it back to UTF-8 for output. For the conversions, use the intrinsic functions NATIONAL-OF and DISPLAY-OF, respectively. Use code page 1208 for UTF-8 data.
As you can see, IBM states that the functionality for UTF-8 support is there, but offers this functionality at a price of increased CPUconsumption, which in turn may prove to be cost-prohibitive.
15
15
KBC Unicode Policy
• UNICODE UTF-16 will be used• All new applications will be developed in UNICODE
UTF-16• All character columns in a table will be in UNICODE
UTF-16• All tables in a ‘function’ will be converted to
UNICODE• No (or minimal) Joining between EBCDIC and
UNICODE tables
Knowing that we have chosen a solution that has a (possible) impact on performance, we will do testing and research on how we and IBM can reduce this overhead. We are working very closely with IBM development (DB2 Chris Crone) on this…
16
16
‘Enhanced 3-Tier Architecture Model’
SolarisWebserver
Application
ConnectorInternal UTF16
Send UTF 8
z/OS
IMS Conncet
IMS TransactionI: Converts UTF8 to UTF 16O: Converts UTF16 to UTF 8
DB2Data Stored in UTF16
PolandSolaris
Webserver
Application
ConnectorInternal UTF16
Send UTF 8
SolarisWebserver
Application
ConnectorInternal UTF16
Send UTF 8
Network capacity :UTF-16 <->UTF-8 : +40% gain + compressions
Because now our data is in UTF-16, we will use twice as much space… also on the network (bandwidth). Since we put everything in UTF-16 and we send ‘big’ messages to the WAS we ‘expect’ bandwidth problems with external connections. That is why we will convert all Outgoing data to UTF-8 and the Connectors (which send the message to the Mainframe) will convert UTF-16 to UTF-8. Test has shown that via this method we gain more than 40% in Network Capacity
17
17
DB2 Support for Unicode
• Migrated to DB2 V8 for ‘better Unicode Support’:• Catalog and Directory converted to UTF-8 (enfm mode)
• Still in EBCDIC : SYSCOPY, SYSEBCDC, SCT02, DBD01, SYSLGRNX, SYSUTILX
• Precompiling/DBRM converted to unicode UTF-8 (nfmmode)
• Controlled NEWFUNC parameter in DSNHDECP• All SQL parsing is done in UTF-8 • Allows mixed codepages in a SQL Join statement• An enhanced set of Unicode SQL Functions
• Eg : substring
When we started thinking about going to UNICODE, we needed DB2 V8 ! Full UNICODE support ….
18
18
Can be handy….BROWSE DB2TS.LIB.DSN810.SDSNDBRM(DSN@CCC4) Line 00000000 Col 001 080 Command ===> Scroll ===> CSR
********************************* Top of Data **********************************DBRM...µW98COMP DSNACCC4.ÄÄd.Âé\..4.......................................1..ØLL.. ..............DBRM...m.......¡.......Ìàáä< êá.!íèíëê..äíêë!ê.ã!ê.ëá<áäè.àñëèñ+äè.ëèêñ&...àâ+ (á...è.........ãê!(.ëßëñâ(...ëßëè â<á& êè.ïçáêá.ëè!êèß&á....á.. +à.îä è+ (á............... +à.àâ+ (á.....àë+àâ....í+ñ!+.ëá<áäè.àñëèñ+äè.ëèêñ&...â...àâ+ (á...è.........ãê!(.ëßëñâ(...ëßëñ+àáì& êè. ...ëßëñâ(...ëßëñ+àáìáë.â.ïçáêá.ëè!êèß&á....á..+à. ...ñì+ (á...â...+ (á. +à. ...îä è+ (á...............ã!ê.ãáèäç.!+<ß.ïñèç.íê.
Display ccsid 1208 or display utf8
BROWSE DB2TS.LIB.DSN810.SDSNDBRM(DSN@CCC4) Converted data shownCommand ===> Scroll ===> CSR
********************************* Top of Data **********************************.......Ö.@..Ãô.cc..bQ.......................................... ..@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@...................................xDECLARE OUTUSR1 CURSOR FOR SELECT DISTINCT STRIP ( DBNAM E , T , ' ' ) FROM SYSIBM . SYSTABLEPART WHERE STORTYPE = 'E' AND VCATNAME <> '00000001' AND DBNAME <> 'DSNDB07' UNION SELECT DISTINCT STRIP ( B . DBNAME , T , ' ' ) FROM SYSIBM . SYSINDEXPART A , SYSIBM . SYSINDEXES B WHERE STORTYPE = 'E' AND A . IXNAME = B . NAME AND A . VCATNAME <> '00000001' FOR FETCH ONLY WITH UR ....@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Via an ISPF browse you can make UTF-8 or UTF-16 data readable, just enter a DISPLAY UTF8/UTF16 command or display ccsid 1208(for UTF-8) , display CCSID 1200 (forUTF-16)
z/OS 1.9 Usability: Support for Editing ASCII Data ISPF editor to support the display and manipulation of ASCII data.No need for the users to download ASCII data files toworkstation to view or change the data SOURCE ASCII commandConverts the data from ASCII to the CCSID of the terminal using z/OS Unicode Services Data converted back toASCII when savedNew edit LF primary and macro commandRestructures the data display using the ASCII linefeedcharacter (x’0A’) as record del
19
19
DB2 Support for Unicode
• Supports UTF-8 and UTF-16 equally• Defined on the Table(space)
• Char/VarChar/CLOB for UTF-8• Graphic/VarGraphic/DBCLOB for UTF-16
CREATE TABLE ………( Co11 GRAPHIC(18) NOT NULL
,Col2 GRAPHIC(4) NOT NULL,Col3 GRAPHIC(6) NOT NULL ,Col4 VARGRAPHIC(100),Col5 DATE ,Col6 CHAR(10),Col7 SMALLINT ,COl8 INTEGER
) IN DBI.TSI CCSID UNICODE NOT VOLATILE
36 bytes
200+2 bytes
No mixed CCSID in a tableUnicode : UTF-8 and UTF-16 are allowed
10 bytes
You can mix UTF-8 and UTF-16 data in one table, CHAR/VARCAHR/CLOB are for UTF-8, whileGraphic/VarGraphic/DBCLOB are for UTF-16. Please note that the length field for UTF-8 columns defines the number of bytes and NOT the number of characters (MBCS). For UTF-16 you have twice as much bytes since it is a DBCSAlthough is not syntactically correct, in our environment UTF16 can handle all our special characters in 2bytes an will never need 4bytes for a character !
20
20
Myths or Reality
• UTF-16 :• CPU increase
• Conversion costs (EBCDIC vs UNICODE)• Bytes moved from DB2 to the application times 2• It takes twice the time/cost to do the
compare/move • Twice as much bytes to sort
• Data space is doubled, need bigger bufferpools• Sortspace(DSNDB07) times 2, bigger sort
bufferpool
Select name, adress, departnm From UEmp a, UDept bWhere a.departcode = b.departcodeOrder by adress, departnm
The increase of CPU is not only because of the conversion cost … Remember the rule that you only should select the columns that your application needs, because DB2 returns on a column by column base and each columns requires a movement of cross memory bytes to the applications.. The movement of bytes costs cpu cycles… by going to UTF16 will we these cpu cycles be doubled ?Same question if DB2 compares 2 fields, compare two times 30 bytes vs compare two times 60 bytes >> is there an CPU increase ?
What about the DB2 Bufferpools, rows get longer (beyond 4K). Buffer data must be assigned to 8K or 16K bufferpools. These new bufferpools need to be defined and monitored as well…
More bytes to sort in unicode, more space more CPU ?
21
21
Conversion Cost : Mainframe Program
Local Application
SelectHV1 500
InsertHV1 500
DB2
DBM1Class 2 CPU Time
Graphic column
Graphic column
Table inUnicode
Conversions are automatically done by DB2, in this example a Mainframe application inserts EBCDIC data into a UNICODE column, DBM1 detects the codepage differences and requests the conversion (z/OS conversion services). Since DBM1 does the request, he gets the cpuaccounted.
22
22
Conversion Cost : DRDA Protocol
Graphic column
DDF DBM1
Z/OSTable inUnicode
Graphic column
AIXDISABLEUNICODE=1
Sent data in ASCII
DISABLEUNICODE=0
Sent data in UNICODE
DB2 Connect
ConversionRoutines
DRDACommon
LayerC ProgamINSERTHV1 1253
SELECT
HV1 1253
Receiver make right !
In a distributed process, things are different. For example : with DRDA protocol, the receiver makes right. So if a client program needs data in an ASCII field and retrieves data from a DB2 Unicode column the receiver (in this case DB2 Connect) will do the conversion.
If you are sending non UNICODE data to DB2 on z, using DB2 connect, you have the possibility to decide where the conversion will take place..
23
23
z/OS Conversion Services• What is this :
• Central repository (conversion tables) for z/OS system used by :• ODBC Driver• COBOL• DB2
• High performance conversion method :• Uses HW instructions available in zSeries 800, 900, 990, z9, z10
• CU12 (alias for CUTFU): Convert from UTF-8 to UTF16 • CU21 (alias for CUUTF): Convert from UTF16 to UTF-8 • TROO: Translate from one byte to another one byte (used to convert
single-byte codes) • TRTO: Translate from two bytes to one byte (used to convert from
UTF16 to EBCDIC) • TROT: Translate from one byte to two bytes (used to convert from
EBCDIC to UTF-16) • TRTT: Translate two bytes to two bytes (used to convert between
UTF16 and double-byte EBCDIC). • See also article in z/Journal : April/May 2006 DB2 for z/OS
Version 8: Improved Unicode Conversion By LaxminarayanSriram
z/os conversions Services can do conversion to and from ASCII – UNICODE and EBCDIC !
24
24
Character Conversion support in DB2 V8• DB2 catalog table SYSIBM.SYSSTRINGS
• For EBCDIC <-> ASCII conversions• For Unicode conversions DB2 uses z/OS conversion services
• To ensure that conversions between EBCDIC and UTF-8 are as fast as possible, in some cases DB2 V8 performs so-called "inline conversion" instead of calling the z/OS Conversion Services. As a general rule, inline conversion can be done when a string consists of single-byte characters in UTF-8. This conversion enhancement is not available in V7, nor is it available for conversions between EBCDIC and UTF-16 and vice versa.
• In addition, z/OS V1.4 has improved the performance of EBCDIC to UTF-8 (and vice versa) conversions by streamlining the conversion process, and V1.4 dramatically outperforms z/OS 1.3 conversions.
• On top of that, zSeries machines have hardware instructions that assist CCSID conversion. These instructions were first implemented on the z900, and have been enhanced on the z990 machines.
Although z/OS conversion services offer a high speed conversion performance to/from ASCII – EBCDIC, DB2 will still use his own SYSSTRINGS conversion method…There are no plans in changing this.
25
25
z/OS support for Unicode
• Implication of the HW instruction set is ….• Conversion from EBCDIC to UTF-8
• Step 1: EBCDIC to UTF-16 (via TROT)• Step 2: UTF-16 to UTF-8 (via CU21)
• All access to the DB2 catalog !• Spufi : Select * From SYSIBM.SYSTABLES
• DB2 V8 performs so-called "inline conversion"
Since there are no HW instructions for converting EBCDIC directly to UTF16, an extra HW conversion is needed. TROT+CU21Conversion between EBCDIC and UTF-8 is two stage, but that's because it is the most efficient way to do it (at least generically). That's because you can use tables to do the conversion (table driven conversion will always beat algorithmic conversion). So doing the conversion as two stage is actually faster because we can use the HW (for both conversions).
26
26
Conversions requested by DB2
• Bind out Process • Moves data out of the DB2 addresspace into the
application addresspace• Row by Row – Column by Column
• Setup Cost dominates the conversion cost (less size dependent)• For each row sent to the application, column conversions are done !•2 Columns – 1000 rows
•2000 conversions !
c1 c2 c3 cn
Produce the result set
Bind out1 row
Application wants the
data in EBCDIC
Column basedData Conversions
a
b
a Setup Cost (fixed cost)
b Conversion Cost Like manyother instructions, the CPU time of these instructionsincreases in proportion to the string length
Select c1, c2,c3,…cnfromUNICODE TableWhere…
QMF1000 rows
Since DB2 moves each column separately to the application, each column needs to be converted. The conversions cost consists out of two components : the SETUP cost (a) and the actual conversion cost (b). As this slide shows: the setup cost is the biggest part of the cost ! The actual conversion cost is that much impacted by the length of the string.
So if your application requests 2 columns and 1000 rows….. DB2 does (requests) 2000 calls to conversion services …..
27
27
z/OS 1.7 Unicode Conversion ServicesTS Scan in …... Result in EBCDIC
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7
CPU
Sec
onds
Unicode
EBCDIC
DSNTIAUL job : unload 45120 row, with 66 Graphic columns, 6 date columns
We did some performance test… We se tup 2 identical tables one in EBCDIC (Char columns) and one in UNICODE (Graphic columns) (Indexes, Reorg, Runstats). We use DSNTIAUL to unload the data to a sequential file. We ran 7 jobs for each code page, and for each run we increased the number of columns in the select statement.
The purple blocks is the unload from an EBCDIC table, we see a small CPU increase if we increase the number of columns….The blue blocks is the unload from the UNICODE table, spectacular cpu increase if we increase the number of columns ….>>> Call IBM
28
28
z/OS 1.7 Unicode Conversion Services
• APAR OA21903 : Unicode services (z/OS 1.7)
• Its is still +2 times more CPU vs the EBCDIC version!
TS scan on UNICODE tbl, result is EBCDIC
0
2
4
6
8
10
12
14
16
18
1 2 3 4
CPU Seconds
First a USER FIX(?) was send to us, which reduced the conversions overhead significantly… but it is still twice as much as the EBCDIC job !
29
29
z/OS 1.9 Unicode Conversion Services• Then came z/OS 1.9 …..
• CPU Increase in conversion Services !!• AparFix CA23705 (APAR OA23705)
TS Scan and resuilt in EBCDIC
0,000,501,001,502,002,503,003,504,004,505,00
EB
CD
IC
EB
CD
IC
Uni
code
Uni
code
z/OS 1.9z/OS 1.7
DSNTIAUL job : unload 45120 row, with 66 Graphic columns, 6 date columns
Same test for z/OS 1.9, the red circle is the same test as for z/OS 1.7. Again an increase, again a USER FIX was sent
30
30
z/OS 1.9 Unicode Conversion Services• After OA23705
• Better but still 6% increase• Found a ‘problem’
• with varying string sizes and seeing a difference in the amount of time it took to perform that conversion between
• Fix not yet available (29/08/2008)• Z/OS conversion services is now part of the z/OS
performance benchmark
Slide Change
The APAR fix did not close the CPU gap completely. Lab did found a possible problem… and is testing the fix…Hopefully I can give you the APAR number and the results at this presentation at IDUG
31
31
z/OS Unicode Conversion Services
** PROGRAM SECTION USAGE SUMMARY **
MODULE SECTION 16M SECT FUNCTION % CPU TIME NAME NAME <,> SIZE SOLO TOTALSYSTEM .DB2 DB2 SYSTEM SERVICES 62.78 64.44SYSTEM .SVC SUPERVISOR CONTROL 2.22 2.22 SYSTEM .XMEMORY CROSS MEMORY 1.11 1.11
----- -----SYSTEM TOTALS SYSTEM SERVICES 66.11 67.77CUNMUNI > 230392 31.67 32.22
----- -----PROGRAM IKJEFT01 TOTALS 97.78 100.00
Some OEM Report …….
How did we know that the CPU increase was in the z/OS conversion services ?
32
32
Conversions Requested by DB2EXEC SQL DECLARE PX01 CURSOR
FOR SELECT KLN_NR -- graphic(7) FROM UNICODE_TABLE_UTF16WHERE KLN_NR > :KLNR-START -- PicX(7)
END-EXEC
EXEC SQL FETCH PX01 INTO :KLNR-FETCH -- Pic x(7)
END-EXEC
SQLCODE -330 rc16 and runtimeMust be defined as Pic N(7)
If not correctly defined, CONVERSIONS !
EXEC SQL DECLARE PX01 CURSOR FOR SELECT U.KLN_NR -- graphic(7) FROM UNICODE_TABLE_UTF16 U, EBCDIC_TABLE EWHERE U.KLN_NR = E.KLN_NR
AND U.KLN_NR > :KLNR-START -- Pic n(7)
EXEC SQL FETCH PX01 INTO :KLNR-FETCH -- Pic N(7)
END-EXEC
CONVERSIONS !
•For each row of the outer table, conversion of the column needs to be done•The more columns are joined the higher the conversion cost will be•Cost for unnecessary join criteria !
Where does DB2 call for conversions ?
On the fist SQL statement, you get a runtime error when you define it as Pic X(7) must be a PIC N(). But if in host variable is not correctly defined => conversion !
The second SQL statement shows a join between a UNICODE and EBCDIC column, if you define all Host Variables correctly, you will not get conversion. However, the compare of a UNICODE and EBCDIC column does implies conversion from the EBCDIC value to a UNICODE value !So, the rule here is : do not code unnecessary joins between UNICODE column and an EBCDIC column. For each row in the outer table, a conversion needs to be done. So hopefully will DB2 chose a good outer table…
33
33
Conversions Requested by DB2
• Test the performance of Unicode Conversion services after z/OS maintenance or Upgrade !
• Conversions are CPU expensive in DB2, when possible do it in your application
01 GRP-EBCDIC03 NM-E PIC X(120)..03 ADRS-E PIC X(60)
01 GRP-UNICODE03 NM-U PIC N(120)..03 ADRS-U PIC N(60) X
If you need to convert, do the conversion in a application and do it with a groups move => one conversion for x columns in the group
34
34
Date/Time fields and Unicode• In a UNICODE table, DB2 represents Date/Time columns as
UTF-8• DCLGEN generates PIC X(10) for these fields
• MR open to generate N(10) • By choosing UTF-16 in the application, we will have a
conversions cost
• V8 Bind Out• From internal format to UTF-8 date/time/timestamp value was
1.05X• From internal format to UTF-16 date/time/timestamp value was
2X• V9 Bind Out
• From internal format to UTF-16 date/time/timestamp reduced to 1.10x
• DB2 is doing the CU12, rather then using Converison Services
Select a.Uni_date_endFrom Unicode_table
Where a.Uni_date_start = :hv
In application defined as Pic N(10)
For date and time fields, DB2 internally transforms these to UTF8, Since we decided that ALL character data will be in UTF-16, we need to modify the result of the DECLGEN (Marketing request open to make it available in the DECLGEN function). But we will have an extraconversion cost !This conversion cost will be less in the V9 release
35
35
Date/Time fields and Unicode
• Unload – Load utility only supports date/time and timestamps in UTF-8
• “Upcoming” V9 implementations :• Unload utility :
• Unload date/time and timestamps to a UTF-16 • Load Utility :
• Load date/time and timestamps in UTF-16
The UNLOAD and LOAD utility, will use UTF-8 as the datatype for Date/time fields. This will be changed in the nearby future…to also support UTF-16
36
36
Date/Time fields and Unicode
• Any Local Date(dd/mm/yyyy) – Local Time installations ?• DSNXVDTA ASCII exit (must exist)• DSNXVDTU Unicode exit (must exist)• DSNXVDTX EBCDIC exit
* * Check day value UNICODE*
LA 3,LDAGLA 9,L'LDAG
DAGFM CLI 0(3),X'30' Is 0 for UnicodeBL FORMATR CLI 0(3),X'39' Is 9 for UnicodeBH FORMATR LA 3,1(3) BCT 9,DAGFM
SLASH1 CLI LSLASH1,X'2F' Is a / in UnicodeBNE FORMATR
* * Check day value EBCDIC*
LA 3,LDAG LA 9,L'LDAG
DAGFM CLI 0(3),C'0'BL FORMATR CLI 0(3),C'9'BH FORMATR LA 3,1(3) BCT 9,DAGFM
SLASH1 CLI LSLASH1,C'/'BNE FORMATR
If you use LOCAL date, don’t forget to write the UNICODE date/time exit !
37
37
Standards
• COBOL (and PL/I) is UTF-16 oriented, PIC N()• DB2 is (re) designed for UTF-8
• SQL Parsing is done in UTF-8• Catalog/Directory in UTF-8• DBRM’s in UTF-8• Date and Time columns are defined in UTF-8• Columns can be defined in UTF-8 or UTF-16
• z/OS & HW conversion instruction set is both UTF-8 and UTF-16 oriented
Although the For z/OS conversion services can not convert directly from EBCDIC to UTF8 does not mean that is it UTF-8 oriented. Conversion between EBCDIC and UTF-8 are two stage, but that's because it is the most efficient way to do it (at least generically). That's because you can use tables to do the conversion (table driven conversion will always beat algorithmic conversion). So doing the conversion as two stage is actually faster because we can use the HW (for both conversions).
38
38
UTF-16 and DASD Space
Note: Tablespace is COMPRESSedOne table
All columns graphic or Date
EB_IDX1 UN_IDX1 FIRSTKEYCARD . . . . 440094 440094 FULLKEYCARD . . . . 3985333 3985333 NLEAF . . . . . . . . . . . . 39556 70100 77,2% NLEVELS . . . . . . . . . 3 3CLUSTERRATIO . . . . . 100 100SPACEF . . . . . . . . . . . 173520 291600 68,0%AVGKEYLEN . . . . . . . 22 44 100,0%
EB_IDX2 UN_IDX2FIRSTKEYCARD . . . . 2540258 2540258 FULLKEYCARD . . . . 3985333 3985333 NLEAF . . . . . . . . . . . . 44780 79707 78,0%NLEVELS . . . . . . . . . 4 4
CLUSTERRATIO . . . . . 68 68SPACEF . . . . . . . . . . . 173520 360000 107,5%AVGKEYLEN . . . . . . . 22 44 100,0%
EB_IDX3 UN_IDX3 FIRSTKEYCARD . . . . 1016 1016 FULLKEYCARD . . . . 595289 595289 NLEAF . . . . . . . . . . . . 11215 13808 23,1% NLEVELS . . . . . . . . . 3 3CLUSTERRATIO . . . . . 99 99SPACEF . . . . . . . . . . . 50400 57600 14,3%AVGKEYLEN . . . . . . . 11 22 100,0%
Index levels ‘might’ increase !
This slide shows a space increase for UNICODE, even if compression is used ! If you have character based indexes, the space increase can be big.Note that we do not have V9 (yet), so we could not do any tests with index compression …We do have a lot of character based indexes in our data models !
The ‘small’ space increase for IDX3 is because this index can contain DUPLICATE’S
39
39
UTF-16 and DASD Space• Less impact on Random I/O applications• Bigger impact for Sequential applications (batch jobs)
• To consider the use of 8K, 16K, 32K bufferpools• Data ONLY,indexes always in 4K bufferpools• More rows per page, but MAX is still 255 rows per page
• Use NOT PADDED if you have indexes on variable length columns !
• Does V9 with index compression help ?• Index can be defined in a larger BP size with compression• Only in the storage area• Index pages are expanded in the bufferpool
We do not expect a big I/O increase for the online transaction, since this is a merely I/O random operation. But for the batch jobs, we have an increase. On a case by case study, will we investigate if we will have benefit if we put 4K data in a 8K or 16K bufferpool. By putting data in a larger bufferpool, you could get an I/O advantage : two I/O’s for a 4K buffer can become one I/O for a 8K buffer…in theory… It depends merely on the compressed row size ( only 255 rows on a page…)
40
40
UNICODE UTF-8 & Unload Utilityselect hex(unicol) from unicode_table_UTF8 WHERE unicol = '0'---------+---------+------3030303030
select hex(unicol) from unicode_table_UTF8 WHERE unicol = X'30'---------+---------+------3030303030
UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF8HEADER NONE WHEN (unicol1 = '0')SHRLEVEL CHANGE ISOLATION UR
HIGHEST RETURN CODE=0
UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF8HEADER NONE WHEN (unicol1 = X‘30')SHRLEVEL CHANGE ISOLATION UR
HIGHEST RETURN CODE=0
Column unicol defined as char(1) -> UTF8
The next slides will handle the UTF-16 support in the UNLOAD and LOAD Utility. By a simple test case we see here that the unload fully supports UTF-8
41
41
UNICODE UTF-16 & UNLOAD Utilityselect hex(unicol) from unicode_table_UTF16 WHERE unicol = '0'---------+---------+------00300030003000300030
select hex(unicol) from unicode_table_UTF16 WHERE unicol = UX'0030' ---------+---------+------00300030003000300030
UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF16HEADER NONE WHEN (unicol = '0')SHRLEVEL CHANGE ISOLATION UR
INVALID OPERAND '0' FOR KEYWORD ‘unicol'
select hex(unicol) from unicode_table_UTF16 WHERE unicol = X'0030'---------+---------+------DSNE610I NUMBER OF ROWS DISPLAYED IS 0
INVALID OPERAND ‘ ' FOR KEYWORD ‘unicol'
UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF16HEADER NONE WHEN (unicol = UX‘0030')SHRLEVEL CHANGE ISOLATION UR
UNLOAD DATA FROM TABLE UNICODE_TABLE_UTF16HEADER NONE WHEN (unicol = X‘0030')SHRLEVEL CHANGE ISOLATION UR
UTILITY EXECUTION COMPLETE, HIGHEST RETURN CODE=0
Column unicol defined as graphic(1) -> UTF16
MR0228083923 opened, and accepted
UX :Hexadecimal Unicode string UTF16 only
Same test for an UTF-16 column gives some strange results … The second SELECT is syntactically correct, but does not give any results ?! The third select uses the hexadecimal UNICODE code string UX’ ‘When we use the where as when clause in the UNLOAD, we get strange results as this slide shows… only the second UNLOAD gives results, although the corresponding select did not return any rows.. But this is not useable in production jobs… write your unload string in HEX ! MR opened and accepted….. to make it possible to use a ‘normal’ when selection on UNICODE columns
42
42
UNICODE UTF-8 & UNLOAD Utility
• Convert Unicode UTF-8 to EBCDICUNLOAD DATA
FROM TABLE unicode_table_UTF8UNLDDN UNLFILE EBCDIC CCSID(id1,id2,id3)
Specifies that all output data of the character type is to be in …..
Specifies up to three coded character set identifiers (SBCS, MBCS, DBCS) that are to be used for the dataof character type in the output records, including data that is unloaded in the external character formats
UNLDDN UNLFILE EBCDIC CCSID(,500,)
UTILITY EXECUTION COMPLETE, HIGHEST RETURN CODE=0
Be aware of possible EBCDIC substitution characters (X'3F‘) in the output file
Since our data warehouse still remains in EBCDIC, we need to convert, via the unload utility, UTF16 to EBCDIC. According to the manuals this is possible…We accept the fact that not all UNICODE characters can be converted to an EBCDIC character, and there for some columns can have the EBCDIC substitution character (X’3F’).
This slide shows that this is possible for a UTF-8 column -table
43
43
UNICODE UTF-16 & UNLOAD Utility
• Convert Unicode UTF16 to EBCDICUNLOAD DATA
FROM TABLE unicode_table_UTF16UNLDDN UNLFILE EBCDIC CCSID(id1,id2,id3)
Specifies that all output data of the character type is to be in …..
Specifies up to three coded character set identifiers (SBCS, MBCS, DBCS) that are to be used for the dataof character type in the output records, including data that is unloaded in the external character formats
UNLDDN UNLFILE EBCDIC CCSID(,,500)
ERROR IN CCSID TRANSLATION FOR "DBCS PAD CHAR", FROM CCSID 1200 TO 500
IBM Response -> HPUnload supports this ($)…..MR1218066147
But it does not work for UTF-16 data ! After reporting this to IBM, the answer was .. Buy the HPunload…. MR closed !
44
44
UNICODE & DB2 UTILITIES
• DSNTIAUL will cause to much conversion overhead….
IBM UNLOAD DSN EBCDIC
UTF16
PGM IBM LOAD CARDS
IBM LOAD CARDS
UNICODE Table
EBCDIC Table
• Our solution for the UNICODE (UTF-16) <-> EBCDIC data conversions :
EBCDIC DSN
We built a special program that reads the UNICODE unload file together with the load cards, the program will take care of the conversion (one move statement to do all the UNICODE to EBCDIC conversions). The appropriate LOAD cards will be generated as well
45
45
Nice to know
• REXX does not support UTF-16• No RI possible between a Unicode table and an EBCDIC table
46
46
To summarize our UTF-16 experiences• Unicode overhead :
• CPU increase :• Eliminate conversions as much as possible…..• Improve date/time/timestamp to UTF16 conversion in V9• V9 end this year, Q2 2009 production
• Index Space increase :• DB2 does not support UTF-8 and UTF-16 equally
• From a SQL point of view it does• From a UNLOAD/LOAD view it does NOT
• More usage of BP8K and BP16K bufferpools• Need more memory
• Majority of Unicode problems, performance, …..• RAD/EGL, not generation/supporting the correct COBOL• Cobol compiler• LE runtime environment
47
47
Take your time
• Implementing Unicode is not a ‘normal’ project• Multiple skills are needed (Architect, Application, Development
Tools, Application Design, Database Administration, z/OS Technology, DB2 Technology,…)
• It is worth a thorough study• A learning curve of a couple of years is not unusual
• The difficulty is when introducing Unicode in an existing application portfolio
• Take your time for the first implementation projects• Building applications will take more time (+50% or more?)• Spend attention to testing (+100%?)• Try as good as possible to limit the impact on other applications
48
48
Maturity of technology
• Unicode support is still in evolution• Unicode is not new on mainframe (e.g. DB2 has introduced
mixed data since V2.3)• Unicode in development tools doesn’t exist so long• Still functional and performance enhancements• OEM vendors ‘may’ not support it fully
• How can I browse/edit unicode UTF16 data through ISPF ?
• What about output archiving tools ? • Can debugger tools and file browsing tools show
the representable UTF-16 characters ?• Mainframe perception is different compared to Open systems
• Mainframe has a longer tradition of measuring resource usage on an application level
• Mainframe hardware runs at a higher CPU busy %
49
49
KBC Global Services [email protected]
Session: A12KBC Unicode Train is on the Rails
Questions ?