Design of A Transcoding Proxy Server for Mobile Web Browsing

130
Design of A Transcoding Proxy Server for Mobile Web Browsing Fan Peng Kong -4 t hesis submit tetl in conforniity wit h the requirements for the degree of SIaster of Applied Science Graduate Department of Electrical and Cornputer Engineering University of Toronto @ Copyright by Fan Peng Kong. 2000

Transcript of Design of A Transcoding Proxy Server for Mobile Web Browsing

Design of A Transcoding Proxy

Server for Mobile Web Browsing

Fan Peng Kong

-4 t hesis submit tetl in conforniity wit h the requirements for the degree of SIaster of Applied Science

Graduate Department of Electrical and Cornputer Engineering University of Toronto

@ Copyright by Fan Peng Kong. 2000

National Library I*I of Canada Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographic Services services bibliographiques

395 Wellington Street 395. rue Wellington Ottawa ON K1A ON4 Ottawa ON K I A ON4 Canada Canada

The author has granted a non- exclusive licence dowing the National Library of Canada to reproduce, loan, distribute or sel1 copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fkom it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la fonne de microfiche/film, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Design of Transcoding Proxy

Server for Mobile Web Browsing

Fan Peng Kong

S lu te r of Applied Science. 2000

Ckicluate 1)epartnierit of Elect rical and Corriputer Engineering

University of Toronto

Abstract

-4 trariscoding prosy server is a rriiclcllemare interface bettveeii the server and the mo-

bile client: it clecreases the size of the web content by transcoding. thus rediicing the

clownloitding t inie ciramat ically.

Previoos transcoding pros? systerns can cope only with single content items. e.g..

transcoding the images in the neb page one by one rather than processing the entire page

aithin one connection. This research introduces a methocl of web page transcoding. which

transcodes the whole web page as one object a t the prosy server. The method makes

possible many netv processing techiiicpes that enable a mide range of client requirements

regarcling the processing of a web page. for esample. web page re-arrangement. searching

and filtering, and dynamic browsing. An adaptive transcoding policy is designed for the

web page transcoding. ancl overall optirnization is achieved for the first time.

This research des ips a new transcoding proxy server system and implernents it in

Java. The results show that the netv software proxy system can decrease the downloading

time by 25 times and still make the transcoded web page recognizable.

Acknowledgement s

I wo~ilcl like to express my niost sincere gratitude to my thesis supervisor Professor

Lén~rsanopoulos for his guiclance. atlcice. icleas. ancl encouragement diiring the course of

my 1IASc degree.

1 woiilcl also like to thank Dr. Adriana Diimitras for her nurncroris comments and

carchil reacling of the nianilscript.

Finally. i t hank rny faniily ancl friencls for their love and support.

iii

Contents

Abstract

Acknowledgement s

List of Tables

List of Figures

vii

1 Introduction 1

1.1 Motivation of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Previoiis work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - a

1.2. Research IVork . . . . . . . . . . . . . . . . . . . . . . . . . . . . - 3

1 - 2 2 Commercial Irnplementations . . . . . . . . . . . . . . . . . . . . 8

1.3 Research Objective and Contributions . . . . . . . . . . . . . . . . . . . 9

1.3.1 h LIethod of Web Page Transcoding . . . . . . . . . . . . . . . . 11

1.3.2 An l d a p t i w Transcoding Policy . . . . . . . . . . . . . . . . . . 12

1.3.3 -1 System Design and .Java implementation . . . . . . . . . . . . . 13

1.3.4 - i n Esperimental Evaluation . . . . . . . . . . . . . . . . . . . . . 13

1.4 Thesiss t ructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . 14

2 Background

2 1 Web Prosy Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 The HTTP Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 HTTP Pros! Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 TCP sockets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Glossary --

2.6 Sunimary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

System Design of Proposed Transcoding Proxy Server 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 .1 Design Fundamentals 24

3 Systern Design for Proposecl Pros? Systeni . . . . . . . . . . . . . . . . . 26

3 . 3 Main Flow Chart of Pmposecl Prosy Software Systeni . . . . . . . . . . . 30

3.4 Single File Process Ttircacl . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4.1 Reqiiest Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4.2 Yirtiiai Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . :33

3 - 4 3 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5 \\ébPageProcessTlireacl . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5.1 Reqiiest Processirig . . . . . . . . . . . . . . . . . . . . . . . . . . 36

i3.5.2 Virtual Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.5.3 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.6 Siimrnary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -40

4 Web Page Tkanscoding 41

4.1 Data Processing Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Image Content .\nalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3 Adaptive Transcoding Policy . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.1 Prediction of the Transcoding Time Delay . . . . . . . . . . . . . 49

4.3.2 Size Reduction Decisions . . . . . . . . . . . . . . . . . . . . . . . 53

4.3.3 Transcochg Pararnet ers Decisions . . . . . . . . . . . . . . . . . . -33

4.4 Tr;inscoding and Rnrons t riict ion . . . . . . . . . . . . . . . . . . . . . . . 65

4.3 Siimniary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5 Implementat ion and Experimental Results 68

.5 . 1 .Java Iniplementat ion of Proposecl Transcodinrr Svstem . . . . . . . . . . 68

5.1.1 RTTI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 . 2 Nested hlultiple Tliread Striicture . . . . . . . . . . . . . . . . . . 69

1 . 3 TCP Sockets: Sen-erSocket . Socket class . . . . . . . . . . . . . . 69

. . . . . . . . . . 1 . 4 Co rincction Cont rol and Resoiirce Slnnagenieiit 70

5 . 1 . H T U L Passer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.1.6 Protoc01 FIandler . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

- 5 . 1.1 Cser Preference Definition . . . . . . . . . . . . . . . . . . . . . . l 2

-- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . 2 Esperinient al Results l a

5.2. L Transcoding of single images . . . . . . . . . . . . . . . . . . . . . 76

5 - 2 2 Transcoding of rveh page nith ETP . . . . . . . . . . . . . . . . . 81

.i .2.3 Transcocling of web page with ESRR . . . . . . . . . . . . . . . . 86

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Siininiary 59

6 Conclusions and Future Work 90

A Test Images & Web page 93

B Transcoded web page 107

Bibliography 116

List of Tables

1.1 hplenientatiori features of esist ing susterris . . . . . . . . . . . . . . . .

. . . . . . . . . 4.1 Coniparison of iniage properties for clifferent iniage types

. . . . . . . . . . . 4.2 Transcocling time del- for images mith different size

4.3 Triiriscoding tinie cielay for differerit trariscoding paranieters . . . . . . .

4.4 Linear tinie clelay estimation results for cliffererit transcoding parameters

-1.5 Sizes of transcodccl iiiiages for clifferent transcoding paranieters (bytes) .

4.6 Linenr size retliiction estimation for cliffererit images . . . . . . . . . . . .

Definition of preference[O] . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . Definition of preference[l] S- [2]

Definition of preference[3] . . . . . . . . . . . . . . . . . . . . . . . . . .

Definition of preference(41 Si [SI. The notations CO'J. L-IXF: INF, MAP.

RCL. BUL. DEC. and ADV have been esplainecl in Chapter 4 . . . . . .

Definition of preference(i1 . . . . . . . . . . . . . . . . . . . . . . . . . .

Definition of pref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Keb page t ranscoding and t ranscocling parameters evaIuated by ESRR .

List of Figures

1 . L An Internet connertion via prosy [6] . . . . . . . . . . . . . . . . . . . . 3

. . . . . . . . . . . . . . . . . 2.1 Layer correspontlence betwcen OS1 and IP Il

3.1 Prosy semer susteni design . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 '\lain flowchart of the software prosy server systcwi . . . . . . . . . . . . 31

.3.3 Single file process t hreacl . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Keb page process service t hreacl . . . . . . . . . . . . . . . . . . . . . . . 37

3 3 \Y& page processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1 Data Processing module for web page transcociing . . . . . . . . . . . . . 43

. . . . . . . . . . . . . . . . . . . . . . . . . 4 . 2 Image content decision tree 46

4.3 Transcoding time clelay Vs . ' io . of pisels (R=0.37.5. C=-Lbit) . . . . . . . 5-1

4.4 Transcoding tinie cielay C's . 3'0 . of pixels (R=0.5. C=-Lbit) . . . . . . . . 3.1

- - 4 Transcoding time del- C's . No . of pisels (R=0.635. C=-lbit) . . . . . . . XI

4.6 Transcoding time d e l - Ys . ' io . of pixels (R=O.75. C=4bit) . . . . . . . 55

4.7 Transcoding time d e l - Vs . No . of pixels (al1 samples) . . . . . . . . . . 56

4.8 SIean transcoding time delay Vs . No . of pixels . . . . . . . . . . . . . . . 56

4.9 Mapping of relative importance to sîze reduction ratio . . . . . . . . . . . 58

4.10 Size reduction C's . Equivalent pkel Xo . for announcer-gif . . . . . . . . . 61

... Vlll

4.1 I Size recliict ion Ys . Eqiiivalent pisel No . for 11aboon.gif . . . . . . . . . . 6 1

4.12 Size recluction 1's . Equimlent pisel No . for coniniheacl2.gif . . . . . . . . 61

4.13 Size recluciion 1% . Ecluiwlent pisel ?;O . for iioft-gif . . . . . . . . . . . . 62

4.14 Size retluction VS . Eqiiivalent pisel No . for al1 images . . . . . . . . . . . 63

4.15 -\chprive transcocling policy . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 . 1 Irnage downloacling time a t 28.8 kbps(wit h transcocling) . . . . . . . . . . 78

5.2 Image c!onnloading time comparison at 28.8 kbps . . . . . . . . . . . . . 78

j . 3 Irriage downloading tirne at 14.4 kbps(with transcocling) . . . . . . . . . . 79

. . . . . . . . . . . . . 5.4 [mage tlowtiloadirig time cornparison at 14.4 kbps 79

- - 3.3 Image tlownloücling time at 9.6 kbps(nit h transcoding) . . . . . . . . . . 80

5.6 [mage clonnloacling t ime cornparison at 9.6 kbps . . . . . . . . . . . . . . 80

- -. . . . . . . . a . r Keb page clonnloacling tirne at 28.8 kbps(ETP transcocling) 53

5.8 Keb page clownloading time coniparison at 28-23 kbps . . . . . . . . . . . 53

. . . . . . . 5.9 Web page downloading time at 14.4 kbps(ETP transcoding) 84

5.10 Web page clolvnloading time comparison at 1-4.4 kbps . . . . . . . . . . . 84

.5 .lL Web page tlotvnloachg time at 9.6 kbps(ETP transcoding) . . . . . . . . 85

5.12 Web page domnloading time comparison at 9.6 kbps . . . . . . . . . . . . 85

5.13 Web page clotvnloncling time at 28.8 kbps (ESRR transcocling) . . . . . . 87

3.14 Web page downloading time cornparison at 2S.Q kbps . . . . . . . . . . . 83

X.1 adv-l-1Zyahoo.gif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

A.2 advL-li.5;vahoo.gif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

1 . 3 advmiadora-yahoo-gif . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

A.4 advmonster-gif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1

-43 adv-ntap-yahhoo.gif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.6 announcer 128.gif 94

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7 con1nheatl2.gif 94

. . . . . . . . . . . . . . . . . . . . . . A.8 anemorle.gif (scaled by 0.5 x 0.5) 95

. . . . . . . . . . . . . . . . . . . . . . . . . h.9 golc1.gif (scaled by 0.5 x 0.5) 95

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .-\.I0 announcergif 96

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .-\ . 11 baboon-gif 97

. . . . . . . . . . . . . . . . . . . . . . . . .hi . 12 cnheel.gif (scaled by 0.5 x 0.5) 98

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .-\.13guanyir~gi f . 99

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.14 inf-bag-yahoo.gif 100

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .I . 1.5 inf-billpay-yahoo-gif) 100

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.16 inf_mailsahuo.gif 100

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.17 inf-pts-yahoogif 100

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .-\ . IS tioft . gif 100

. . . . . . . . . . . . . . . . . . . .I . 19 jorclan-poster2.giI (scalecl by 0.8 x 0.8) 101

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.20 jorclan-postrr3.giF 101

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A21 kids-gif 107

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.22 map@ 103

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.23 spiash.gif 104

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.24 stock-gif 105

. . . . . . . . . . . . . . . . . . . . . . . . . A.25 tree-gif (scalecl by 0.7 x 0.7) 105

. . . . . . . . . . . . . . . . . . . A 2 6 Original web page (scaled by 0.4 x 0.4) 106

B.1 Transcoded web page (ETP) (scaled by 0.8 x O.S)( for TOX":R=0.373

. . . . . . . . . . . . . . . . . . . C=4: lor other content: R=0.625 C=4) 108

B.2 Transcodecl web page (ETP) (scaled by 0.8 x 0.8) (colored-for "COXL'.':R=0.3i3

. . . . . . . . . . . . . . . . . . . C=4: for other content: R=0.625 C=4) 109

. . . . . . . . B.3 Transcoded web page (ESRR with lOs)(scaled by 0.8 x 0.8) 110

. . . . . . . . B.1 Transcoded web page (ESRR nith 15s) (scaled by 0.8 x 0.8) 111

. . . . . . . . B.3 Transcoded web page (ESRR ni th '2Os)(scaled by 0.8 x 0.8) 112

. . . . . . . . B.6 Transcoclec~ web page (ESRR n i th '25s)(scaled by 0.3 x 0.8) 113

. . . . . . . . . B 7 Transcoded web page (ESRR with 30s)(scalecI by 0.3 x 0.8) I l 4

. . . . . . . . B.8 Transcotlecl meb page (ESRR with 35s)(scalecl by 0.8 x 0.8) I l 5

Chapter 1

Introduction

1.1 Motivation of Research

Nonadap. a rapiclly growing number of users are connected to the Internet for vari-

ous piirposes. al1 of rvhich are rclated to tlie eschange of rniiltiniedia content of r e b

dociinients.

The bantlnicltti of clifferent access to the Iriternet varies greatly [l]. Professional

Internet users iise high bit-rate links such as prima- rate ISDX ('2 SIbps) or .-\Tl[ (UP

to 2 Gbps). Average consilmers accesç the Internet throiigh low bit-rate connections.

such as 13-1 modems (up to 33.6 kbps) or newly released Y.90 modems (up to .56 kbps

downstream. up to 31.2kbps upstream). Some ilsers use cable niodems (arouncl 1.5 Mbps

downstreani ancl 300 kbps upstrearn) or ADSL modems (aroiind 3 &[bps downstream and

1 l I bps iipstream). For mobile users iising cellular phone links. hand-held cornputers

(HHC) and persona1 digital assistants (PD.-\) [-Io this communication rate is eveo lower.

usually only 14.4 kbps or less. Furt hermore. the actual transmission rate depends on the

current user load and the infrastructure of the part of the Internet in use. From a client's

point of view. it may V a r y unpredictably between zero and the maximum bit rate of its

network niorlem.

From the point of ciew of the content provicler ( the web server). by clehiilt. it does

not consicler which kirid of the network connection is between itself ancl the client. It

would jiist return a response to a recpest frotn the client. mith the wveh page that niight

be full of images. soiirici files. applets. .lalascript. and tests.

Thiis the problcnis for niobile neb browsing are:

0 Limitecl bandwidth: The ver- low baiiclwidth plus the high disconnection rate aiid

tiigh error rate of niobile link can easily -'kill" the process of niobile web browsing

becüiise of the long (lowiloading tinie.

a High cost: The cost of niobile coririections is very high. For esaniple. the connec-

tioii cost for Palm \'II is orle dollar per 5 ur 6 kilobytes of clownloaded data [3].

Thcrefore. a niobile user rvill pay -1 dollars to viea a '20 KB .JPEG [-LI iniage.

Liniitecl compiitatiori ability: Certain types of content cannot be hancilecl by the

mobile iisers. This is due to the th& limitecl conipiitabilitp and meniory. For

esample. the active content like .lava applets and the aiidio content.

0 Lirnitetl display: One of the practical problems for mobile web browsing is that

cli fferent nio bile clevices rnay have di fferent and limi ted dis play requirenient slien

displaying wwb cioctiment. For esarnple. a PD;\ has a screen of only 320 x 200 pixels

ni th 8 gray levels. Thus a colorful meb page suitable to be displayed on a PC will

not be displq-ecl properly on a mobile derice.

One solution of the problems mentioned above is to put .*transcoding prosy server"

as an interface between the Web server and the client. as indicated in Figure (1.1) [6]:

A transcoding prosy semer is a piece of middleware that acts as an agent between

the server and the mobile client. Wlen the data strearn passes through. the proxy senrer

1.1- Motivation O/ Research

Wireless LAN

I I Wireless Modem

Transcoding Proxy

p ISDN

INTERNET 4

Multimedia PC

Figure 1.1: An Internet connection vin pros- [6]

tries to cut its arnoiint to such a level that the mobile client can browse the Internet with

a relatively convenient specd. The method is by agressive conipression. rnainly of the

images inside a web page. Thus transcocling prox- sener can achieve:

Shortening the downloading tirne

Reducing the cost

Filtering unwanted content

Adjusting images for different display requirements to be directly viewable

Transcoding is effective in that images lorm up t.o 70% of the Internet data traffic [SI.

It becomes even more significant when the network congestion and data error rates are

considerecl. Packets get ciroppetl when tlic network is congested. Sonie experinients have

shown that loss rates in 5% range are quite cornmon and in norst case. the network c m

drop iip to 25% of al1 packets [Tl . When liigh error rates occurs. the data has to be sent

repeatedly for some time until the correct data is iiccepted hy the encl user. Therefore a

niuch srrialler size web document will be niucti more desirable for clowriloading because it

further recliices the amourit of packets* tlropping and error data a t the samc congestion

rate aricl error rate. ancl t hus makes the entire ciownloading furt her shortcr. Tliis is also

triie when the problern of frequent discorinection is taken into consicleration for mobile

meb browsing.

Transcotling at a proxy server is niore favorable t han at the content provider's server

or the client's sitle. Comparing with transcocling at cveb servers. transcocling at prosy

semers lias t lie Folloiving atlvantages:

Flesit~ility: If we CIO transcoding at the serwr's sicle. every web server in the In-

ternet hiw to be niotlified to have the ability to du transcoding. Instead if me do

transcoding via a pro- server. none of the web sewers needs an- moclifications.

At times when ftinctions neecl to be addecl for transcocling. there will be no pain of

rnoclif!.ing every aeb semer in the Internet. The only thing to rnodify is the pro?

server.

0 Enhancecl security: From a security perspective. it is more beneficial to separate

the origin servers and proxy servers. For esample. when separatecl. origin web

sen-ers clo not make connections to the internal network. mhich makes it possible

for firewall pro- semers to be set up to block any connections initiated by the

web server. This protects the internal network if the web server machine becomes

cornpromised. so even if an intruder gains access to the web semer host' he is still

unable to connect to the hosts inside the firewall (231.

0 Eiise of atlniinistration: Separating the origin web serwr and prosy server func-

t ionality niakes the atlministrat ion easier. as origin web serrer and pros? server

featiires are rlearly separatetl into differeiit aclrninistration interfaces. This rediices

the risk of niisconfiguration. For esarnplc. access control might be incorrectly set

up so tliat is applies to the origiri server and not the prosy server. or vice versa

[23] .

0 SLocliilarization of tleveloprnent: Frorn a software developer's point of viem. sepa-

rat ing t hese tno fiirict ionality makes drvelopment easier. Web serwrs and proxy

serwrs. who incleeci share some furictionality. are cpite clifferent from each other.

and fairly coniplicatetl on their own. Separat ion niakes developnient. stabilization

ancl testirig ei&r as t h e size of ttie software becornes smaller [23!.

As For transcotiing on the client's side. it is ïietinitely riot n goocl clioic~. The reason is

that this transcoding hecomes iiseless because it won't Save an? tirne: ttie big images are

already cio~vnloatlecl. Fiirtherrnore. with the client's limited computability and merno.

it will cause more clelay.

1.2 Previous work

Previoiis transcocling systenis can be classificd into two big categories: research work

category and commercial implemeiitation category.

1.2.1 Research Work

Theoretical Analysis

Theoretical research on transcoding proxy has been carried out by IBbI [2. 20]&[6]. The

research ivork focuses on analyzing only the transcoding part of the prosy system and

the proct'ising of one iniage per operation. The work in [ 2 . 201 is on the analysis of

content-basecl transcoding. wtiere the content of the images in the Keb clociirnent is

cletectecl and the transcotling policy is based on the image content. This means as long

as t m ~ images belong to the same content group. they will be transcoded by the same

transcotling paramet ers. Also t hose transcoding parameters miist be pre-set by the proxy

server or by the client.

The work in [6] consists of sonie theoretical analysis on the adaptation policy of image

t ransçocling. rvhere algorit hnis are analyzed to decide whet lier the t ranscocling should be

perforniccl accortling to the conncction banclwidth and the user's display reqiiirmient.

The important thing is that the algorittims are baseci on fised transcoding paranieters.

Thereforcl. no adaptation of transcocling parameters can be obtainecl throiigh theni. That

is the reason why in the real implementation shown in [û]. the transcoding policy uses

none of their prediction algorithnis. iristead it iiscs simple "if. then" sivitches to set the

transcocling parameters (for esarriple. if t lie oser is muPalm". t hen color dep t h reduced to

--bit gray ancl the scaling ratio is fiseci. wtiich miist be toltl hy the client in advance).

IBl I also presents a content adaptation syiteni in [Hl that adapts miiltirneclia neb

content to optirnally match the resoiirces and capabilities of cliwrse client devices. This

transcotling system is an estension to a web server and the content adaptation is per-

forrned by selecting from a ntimber ofdifferent possible transcoded versions of the content.

The selec: t ion is based on so called -*subjective memure of fideli tu" . whose computational

mechanisrn still cannot be determined and value must be pre-set to each content.

Research wit h implement at ion

Some research work with irnplementations on transcoding proxy systems has been carried

out by different universities. University of California a t Berkeles University of Maryland,

and Cniversity of Helsinki, resulting in GloMop: Mowser, and 3Iowgli systems respec-

In the Glohlop rnoclel describeci in [LOI. the pros? perfornis "ciistillation" of the image

received frorri the serwr before seiidiiig it to the client. Distillation is definecl as highly

lossy. rd-t inie. datatype-specific compression that preserves niost of the semantic con-

t e ~ ~ of :in in1i.g~. The t:iiascoc!i~:g pr!icy is Sasec! î n prec!icti-n d' the transcoding tirne.

iising fiscd parameters. which is sirriilar to [61.

The Nowser rnoclel [SI designs the HTTP prosy server to riin on a Mobile Support

Station (IISS) iintl conimunicirte witli the Mobile Host (UH). it Iiiu a preference lookup

table which uses IP address as incies. Ttie approach of transcoding image files is to fincl

out the s ix of the image and if it is higger than that can be Iianclled by Mobile Iiost,

ttie image is scalecl clown. recliicecl in ci~lor. or both. without losing the semantiçs. The

pros- dors transcoding not only For the response strenm froni the server. but also the

reqtiest to the server. It also perforrns so-called HTSIL transcoding by HT'rIL parsing

and actiw content filtering. HTIIL parsing is based on the fact that different content in

a web page can be identifiecl by its tag [30. 311.

The Mowgli mode1 [I l . 121 consists of two mediators. the hlowgli agent and the

hfowgli Prosy . locatad on Mobile Host ancl mobile-connection host respectively. The

.\[owgli prosy performs GIF [9] to .JPEG conversion. and large embedded images are not

transferred at a11 to the mobile nocle. The Slowgli agent and Slowgli Proxy use a special

protocol. called llowgli HTTP protocol. to cornmunicate mith each other. They reduce

the data transfer over the wireless link in three ways: data compression. caching and

filtering.

h surnrnary of the previous implementations of the pro- systerns is illustrated in

Table (1.1).

lmplernentation Features Mowser Mowgli GfoMop --

Communication protocol HlTP Mowgli HTTP HlTP

Simple threshold Simple tfireshold Statistical mode1 of si, and mlor of sire and color for single image

Transcoding objective Single content Single content Single content

Reûuction in size Reduction in size Fixed Image transcoding method and and cotor

HTML parçing

Active content filtering

Video Stream transcoding

Transcoding in both directions

Coding tanguage

Adaptive policy to evaluate transcoding parameters

Image content analysis

Proxy systern design

Yes

Yes

Yes

Yes

Perl

No

No

No

Yes

No

No

No

Nat Available

Yes

No

Yes

No

Perl

No

No

No

Table L. 1: Implernentation features of esisting susterris

1.2.2 Commercial Implementations

Intel's Quick Web technology (131 resides on Internet Service Provicler (ISP) servers. It

compresses images by selectively dropping bits or pisels out of an image and caches data

to overcome the problem of bandwidth bulge. The technology is only used n-hen the

access to the Internet is through ISPs. Similarly. Spyglass' Pnsm works on ISPs to c a r y

out the transcoding functions. [14]

IB 31's Web Espress [la] consists of t a o cornponents: ARTour ( Advanced Radio Corn-

munication on Tour) Gateiv- and ARToiir Client. The ga tewq provicies secure. corn-

pressed data across the selected network wit h aut hentication. It can automatically re-

trieve Web requests in the background mhile mobile users are perforrning other tasks.

IB hl also developed "Web [nt ermediaries" (WBI) (161, entities that can be posit ioned

1.9. Research Objectiue and Contributions 9

snywh~re along the HTTP stream and are progranimed to tailor. ciistomize. personalize.

or otherwise enhance data as the flow d o n g the streani. WB1 of *WcbSphere Tsanscoding

Piiblisher" [l;] is a serwr-sidc software that adapts. reforniats. and filters the esisting

conterit on the scrver. to rnake tlie data optirnally formatteci for destination environment.

This nieans first it. is a server-side transcocling schenie: second there is no need to down-

load nny content ancl the only concern for transcoding policy becomes optimal formiitting.

K B I of *-&b Browser Intelligence" [1S] resides on t lie client 's sicle and modifies content

by ohser~ing user's past activity. WB1 of %dDesk Internet Safe" [19] also resides on

the client's side. together with l~rowser. keeps kick safe from inappropriate content in

the Intrrnet. Those trvo t ranscotling schemes can be viewed as client-side transcoding.

basically not aiming mobile iisers.

Sinre commercial iniplenienta~ions tiicle the cletails of the technologies t hey use. we

don't rnake much comment ton-arc1 theni.

1.3 Research Objective and Contributions

Previous research work of transcoding pros? systenis focuses on the transcocling of indi-

vicltial content inside a web page. This means that the web page is not regardecl as one

integrnted document but rather as a series of separated data items siich as texts. image

files. sound files and active content. The problem behind this is that the transcoding

pro? cannot control the cioanloading process of the entire web page. For exomple. if

the client ~ a n t s to download the web page within 10 seconds. It would be impossible for

previoiis pros- systems to find an ansuver. because they don't see the web page and thus

cannot find a transcoding policy to do it.

?;O previous transcoding proxy systems consider the fact that mobile users may browse

the Internet for very specific purposes. Since the cost is high. people tend to know what

1.3. Research Objective und Coritri6utiort.s 10

they [varit before going to the Internet. For example. sonie ni- only want to check

e-riiails. soriio r i i q focus their interest iri stock market mith sonie simple figures shoiving

the treritl of a certain stock. and some rnay warit their browsirig completel- without the

clistractiori froni aclvertisements. Thus al1 ot her iriformation. l e t taking time to tlownload.

becorries useless from tlie iiser's point of view. Thos a dynarnic browsing. by which tlie

iisers cari tlefine their browsing reqiiirerrients with every connection. is higlily desired.

A11 previoiis triuiscocling prosy systeriis lise fisecl transcocling paranieters to transcode.

?;O atlaptat ion to iiser's banclaidt h or the iiser's reqiiirerrient is obtained. The existing

t ranscocling policies are basically "if t hen" swi tclies for fisecl riiimber of situations.

Thus in tliis thesis. the goal is to:

Propose a niethod to transcode the mtire web page witliin each connectioii. ahich

npriniizm tlie entiril wrh pngr tloivnl«;irling ;iccorciirig t o user's requirernents. in-

cliidirig filtering. transcoding. and neb page re-arranging.

a Design an aclapt ive t ranscocling policy t tiat can evaluat e t ranscotling parsmeters

atlaptively and dynamically to eacli conriection.

Irnplement the proposed trariscoding prosy server in .Java. which is conipletely

portable to a11 niajor operating systenis.

Compare the performance and speed of the proposed transcoding system vs. down-

loacling ivit tiout transcoding

Bu achieving t hese goals. the main contributions of the thesis are:

1. A method of U'eb Page Transcoding at a prosy semer

2. An adaptive transcoding policy for n-eb page transcoding t hat can evduate transcod-

ing parameters adaptively for al1 the images in a web page

1.9. Research Objeettue and Contributions 11

3 . A system design and Java implernentation of the proposecl transcoding system

4. .A experiniental evaluation of the perforniance ancl efficiency of proposecl transcod-

irig systern

1.3.1 A Method of Web Page Transcoding

In this thesis. a methocl of' ~ e b page trariscocling" is introducecl to process the entire

web page as one object a t the proxy server within one connection. This methocl gives

the pros- serwr the p o w r to clecicle how to proccss the entire n e b page according to

the user's rcqtiirenient. Thiis it has a total control of the downlonciing of the web page.

a complete knomlt~clge of the web page and conseqiiently is capable of fincling a solution

for any requirenients made by the client regartling the aeb page.

\\éb page transcotling niakes possible al1 the new techniques of transcoding like:

0 I t cnn optiniize the downloading time according to a transcoding policy applied to

the whole neb page.

It can reconstruct a new web page according to the user3 preference. unnanted

or unimportant images or other content can be fiiterecl or cornpressed at a much

higher ratio.

O The user does not have to send al1 those requests whose response will be filtered

by the pros!-. and the overall information eschange is increased.

It can provide the new web page the correct information of the images so that

browsers like Yetscape browser can start displaying the correct Iayout as soon as

the new HTML web page is received.

1.3. Research Objective und Contributions 12

0 It supports clynamic web browing requiretl by the mobile user. for example. the

user can set the paranieters for bronsing ancl change it dynarniçally with every

corinection. For one corinection. the user ni- mant test only web page. If sonie-

thing insicle ttiis web page seerns interesting. the user may click it. creating a new

connection. with new parameters that will bring a more viewable version of this

content.

a .\[ore efficient caching caii be espectecl for the pro- susterri. processing the web

page as a ahole will niake the systeni easilp manageable and fast accessible. This

is because the systeni orily neecls to recognize the web page link. al1 the links inside

the web page are üutoniatically associatecl with it. Ttiiis the systeni knows right

away wliere to firict a bunch of iniages witli only one link (the link of the web page)

as irictes.

1.3.2 An Adapt ive Sranscoding Policy

The essential nieaning of the iidaptiw transcoding policy for web page transcoding is to

decide shether to transcodc ancl how to transcode. The tlieoretical analysis of adaptive

transcocling policy in [61 is based on transcoding of single image îile and it uses fixed

transcoding pararneters to clecide wliether to transcocle. For research and implementation

of [IO]. this is the same situation. But in fact. how to transcode (changing transcoding

pararneters) ni11 affect the decision of whether to transcode. And in terms of web page

transcoding. the nenT transcoding policy also oeeds to consider the different importance

of different content. for example. different iniages.

This research introduces a new adaptive transcoding policy that decides the transcod-

ing parameters adaptively for a11 content inside a web page mith different importance and

decide nhether to transcode with varying transcoding pararneters sirnultaneously

1.3. Research Objectiue and Coirtributzons

This research also improves the t.ay of image content analvsis bp tlesigning a new

image content tlecision tree. The proposetl t ranscoding policy t reats images clifferent ly

accorcling to t heir content.

1.3.3 A System Design and Java implementation

-4 systeni design is given for the proposed transcoding pros- server. From the concept

of t r b page t ranscotling. t lie systeni design reqiiires the design of adapt ive transcoding

policy. filtering. and HTSIL parsing ancl reconstriicting. From the point view of software

prosp systern. it flirtlier reqiiires the monitoring and control for niiiltiple downloading and

processing of content tvi t hin one web page simiil t aneoiisly. Furt hermore. some important

systeni design features regarcling the software server are corisiderecl. for esample. the

system design for self-eutensi ble niul t iple services. miilt iple iisers conriect ion. and miilt iple

protocol sripporting. The prosy systetn is designecl to ilse TCP sockets. a i t h SIulti-threatl

st riict ure.

The prosy systeni is iniplernented in .Java. whicli is cornpietely portable to most

operat ing systems. The implenientat ion ~ises object-oriented programming technique

that mnkes it very easy to modi- and add function classes.

1.3.4 An Experirnental Evaluation

An esperimental evaluation of the performance and efficiency of proposed transcoding

system is given. Esperiments include: Tests of image content analysis are run to evaluate

the new image content decision tree. Tests of proposecl transcoding policy are run to

evaluate t ranscoding parameters for different images in a iveb page adapt ive l - Tests

of transcoding pro- server are run to compare the downloading tirne with transcoding

w. without transcoding. Tests of dowdoading time components are run for different

1.4. Theszs Structure 14

t ranscoding scenarios. Finally. t ranscodecl web pages are evaluated for visiial effec t for

di Reren t t ranscoding scenarios.

1.4 Thesis Structure

Tliere are 6 Chapters in t h tiiesis:

Chapter 1 is %trotliiction" to the research. [t gives a description of the research

problern. the literature survey. the objective and the contributions of this research.

Chapter 2 gkes a backgro~iritl of the proxy server design. It discusses the fiinction-

ality of proxy servcrs and describes HTTP protocol and the definition of HTTP

pro- semer. It also gives a brief description of TCP sockets.

Chapter 3 gives the system design of the proposecl transcoding proxy sytern with

the design of web page transcoding. Fiinctions of each moclule are clescribed in

clet ail. 1 t also int rocluces t h e services t hat t kie proposed t ranscoding proxy server

can provitle.

Chapter -! designs an aclaptive transcoding policy for web page transcoding. The

proposed transcocling policy can evaluate transcoding parameters for each image

differently and aciaptively in the sense of optimal downloading for the entire web

page.

Chapter .5 gives the description on how the proposed transcoding pro. server sys-

tem is impleniented. Different esperiments are tested to compare the performance

and efficiency of the transcoding system vs. downloading without transcoding.

C h a ~ t e r 6 oives the conclusion of the research and s u ~ ~ e s t s some future work.

Chapter 2

Background

I n this Chapter. ive prescrit the backgroiinci knowledge for our researcti. The content

inc1iic;les:

a Uéb Prosy Servers

The HTTP Protocol

HTTP Pros' Servers

TCP Sockets

2.1 Web Proxy Servers

In the beginning of neb history in 1990. prosy servers were originally referred as gateways.

The first such generic WLVW gateway n-as written by the C V W W team at CERN (a

European high-energu particle physics research ceriter in Stritzerland); headed by the

inventor of the World Wide Web. Tim Berners-Lee [23].

The term "gatewaf h a traditionally been used to refer to devices that fornard

packets between networks, sometimes converting between protocols. In 1993. the term

web prosy swwr \vas chosen as a prrferred terrri for these web gateways. to make a better

distinction betwt.cn Internet/fire~vall gateways ( --prosies" ) which allorv web-related traffic

to enter scciircd intranets. and information gateways (gatewq-s) tliat interface thircl-

part? inforrriatiori systeiris to the web. The Interriet prosy server was given the nanie

proxy server to k t ter reflect the fact that they act on behalf of tlie client. Information

gatewqs. on the other hancl. act on behalf of tlie server. Therefore sornetirnes they are

referreii as reverse prosies.

There iire several types of prosy servers. shown as below.

0 Generic firewall prosies: Generic proxy server is the most cornmon type of pros'

serwrs. whirh can handle trnffic with clifferent protocols. incliiclirig HTTP. FTP.

and Gopher protocols. etc. Generic prosy servers are able to proïitle access control.

filter loggirig. and caching featiires.

Departniental prosies: Department prosy servers are generic firemdl prosies. escept

t hat t heir base is narrotver: a single depart ment of a large corporation or institution.

Departniental prosy servers are claisu-chained to firewall prosy sen-ers. probably

wit h cliffererit restrictive access control. to const riict two 1-rs of prosies.

Personal prosies: Persona1 prosy severs are trirnrned-down prosy servers intended

for indivicliial iisers only. They typically run on the same host as the client program.

Feat ures providecl by personal prosies include local caching. active cache updates.

polling for changes. and notification about t hem. persona1 hot list management.

and local searches.

0 Specialized proxies: Specialized prosy seniers are a diverse group. which performs

specialized actions appropriate for the target environment. For esample. a pro-xy

semer serving client software running on a palrntop device. The prosy ma? reduce

2.2- The HTTP Protocol 17

image qiialit- and the number of colors iisecl arid convert the image to a format

iinclerstoocl b - the palrntop compiiter.

The gerieral propcrties of pros- serrers are:

Traiispareiicy to client: aside from an? filtering perfornied on proxies. they do not

affect t lie end result. Csers rvill get the same rcsponse. whether the connection was

direct. or tliroiigli a prosy server.

Flesibility: Client deterniines whetlier to use a prosy or not

Transparency to semer: T lie destination serrer is unaffected by ariy i ritermediate

pro- servers ancl. often. completely iinaware of thetn.

2.2 The HTTP Protocol

HTTP is a short forni of HuperTest Transfer Protocol. While the \\orld Kide Web

consists. ancl is biiilt on top of. a nuriiber of different protocols. HTTP is the prima-

protocol usetl for transferring web ciocunierits. HTTP is a request/response protocol.

The client sencls a request to the semer. and the server sends back a responst.. There

are no multiple-step handshakes in the beginning as with sorne other protocols. such as

FTP.

In presence of pro- servers. the client may send a request to the pros? server. and

the pros?* server mil1 forward the request to the server. or another prosy. Thus a request

c h a h is formed and the response will follom the same chain but by a reverse order.

An HTTP request consists of a method. a target CRL. protocol version identifier. and

a set of headers. The method specifies the type of operation, among which GET is the

most common one. Headers contain additional information to the requests. An HTTP

2 . 2 T ~ P HTTP Protocol

response cmsists of a protocol version identifier. statiis code. human-readable response

st atiis lirie. response heaclers. and the reques tetl rcsoiirce content.

The first version of HTTP. referrecl to as HTTP/O.9. supporteti only the %ET"

met hoc1 wlien requesting. The first version ivas siifficient for ret rieving documents. but

it providecl iio aiit henticat ion or access control features ot her t han those baseti on the IP

address and the DNS host ancl domairi nanies of the requesting client. The HTTP/O.9

responsr contained only the requesteci documents. with no aclditional iriforniation.

The HTTP/ 1.0 protocol is dociiniented in the Informational RFC 1945 [Z4],which

introcluct.d an estended format for reqiiests ancl responses allowing tiiore data to be

passetl in hoth directions. After the actual request. a set of header fielcls follow to

proride aclclitional information. such as aiithentication. credentials. to be passecl to the

server. Ttw MTTP header section is siniilar to IIiiltipurpose Iriteriiet .\lail Extensions

( s rn rE ) .

Corresponclingly. the response also includes a statiis line and its own heacler section

in adclition to the document. The heacler section in response ma. contain information

such as the type of the document and its length.

Some major improvenient thnt the HTTP/I. 1 protocol rnakes are [XI:

a Persistent connection: HTTP/ 1.1 allolvs connections to remain open over several

reqiiests. for the semer implicitly knows its hostnanie. port nimber. and the pro-

toc01 it uses.

0 Request pipelining: Csed in conjunct ion wit h persistent connections. request pipelin-

ing reduces latency between requests and responses and delivers better perceived

performance.

0 Cache control: One of the biggest missing features in HTTP/1.0. HTTPf1.1 in-

troduces a variety of directives that can be used to control caching on proses and

2.3. HTTP Prory Servers

clients.

a Formalizecl d i d a t i o n rnoclel (conditional requests): HTTPf1.0 only siipported con-

dit ional G ET features to perforni upto-date checks. HTTP/ 1. I formalizes the

HTTP ialidatiori niodel and provitles validators. instead of just the 1 s t niodifica-

tinii date iincl tirne i i s d hy HTTP!1.0.

Content variarits: HTTPf1. L provides the basic utilities for associating multiple

representations of a resource iinder a single URL. wliicli is very usefiil when pro-

viding one resoiirce in multiple langtiages or different formats.

Protocol tracing: HTTPl1.1 specifies a new request methocl. TRACE. which is

iiseful in clebiigging pros? chains (niore than one prosy chaineci togethcr between

client ancl origin semer).

2.3 HTTP Proxy Servers

The proxy server ciefined in HTTPJ1.0 is an intermedia- program thnt ncts both as a

server and a client for the purpose of making requests on behalf of other clients. Requests

are servicecl internally or bu passing them. with possible translation. to other servers. -1

prosy niust interpret and. if necessary. rewrite a request message behre foraarding it.

When HTTP/ 1.0 is used to communicate between client and pro- servers. as well

as between prosies. it is very different from that used between the client ancl the origin

server. For a request made through a p r o . the requested URL is used in full form.

including the protocol prefis. hostnanie. and the optional port nuniber. while the- are

omitted at the time when the request is sent directly to the origin semer from the client.

For example. a request for the LTRL http://nnw.comm.iitoronto.ca/facult~ml from

a client to a proxy would be like:

2.4. TCP sockets

GET lit tp://~v~~~~~.cornni.~itororito.ca/faciilty.htnil HTTP/ 1.0

user-agent : tesla

Accept: test/htnil. imagelgif. iniage/jpeg

But when forwarcled to tlie origin server by tlie proxy. the request is rewritten to

i d ide u i i i ~ tlir L'RL p i ~ t li pitr t.

GET /filculty.htmI HTTP/1.0

Cser-agent : tesla

Accept: test/htnil. iniage/gif. iniage/jpeg

Forwardeci: by http://mypro~.corrim.11t~ronto.com:50S0

The ..Fortvarclecl:" Iieatler is addecl t O inclicat e t hat the reqiies t received is passed by

a pros! server. We note t hat in HTTP/ 1.1 the iieadcr Via: is usecl for the same purpose.

Ué also r i o t ~ t h it is actiially not desirable for the recluest to be nritten in a short

forrn. the reason is due to the qiiick developrnent of web technolog. For example. many

companies ancl indivicluals rnight share t heir ive b server addresses wi t li ot hers sirice t hey

cnnnot afforcl. or tlon't want. to spencl money on a tledicated web serwr harclware. Two

solutions can cope wit h t his problem. the Host : heacler (to distingiiish arnong different

web site) and the full URL in reqiiest.

2.4 TCP sockets

The innovation of sockets alloms the programmer to treat a network connection as another

Stream t hat bytes can be writ ten onto or read from. Socket is the propmming interface

between the upper three lqers (the "application") and the transport layer.

In the Internet protocol suite. the network laver protocol is the 1Pv-L or IPv6 protocols.

The transport layer protocols that can be chosen are TCP and UDP? corresponding to

2.4. TCP sockets 21

TCP sockets ancl CDP sockets. Sote that therc is a gap between CDP and TCP. tiiis

gap inclicates tliat it is possible for an application to bppass the transport layer and use

IPv4 or IPv6 dircctly. And this direct iiccess corresponds to a raw socket [26].

Application

Presentation

Session

Application A Applizâiion

Details Transport UDP TCP

Network IPv4, IPv6 Communicatio

Datalin k

Physical

OS1 rnodol

Device, driver and Hardware

lnternet protocot suite

n Details

Figure 2.1: L a y corresponderice between OS1 ancl IP

The tipper three 1-rs of the OS1 rnodel are merged into a single 1-er callecl a p

plication in the Internet protocol suite. This can be the web client (browser). Telnet

client. the web server. the FTP servcr. or any other applications. e g . in this thesis.

the transcocling prosy server. K i t h the Internet protocols there is rarelp any distinction

betmeen upper three layrs of the OS1 model.

Data is transmitted across the Internet in packets of finite size callecl datagrams.

Earh datagram contains a header and a payloacl. The header contains the address and

port the packet is going to. the acldress and port the packet cornes from. and various

other information used to ensure reliable transmission. The payloacl contains the data

itself. Packet,s can be lost or comptecl in transit, and neecl to be retransmitted. or that

packets m e arrive out of order. Thecefore keeping track of this-splitting the data into

packets. generating headers. parsing the headers of incoming packets. keeping track of

the sequence of the packets. etc. is a lot of work. and requires a lot of intricate software

p 71 . Thiis there are two reasoiis for the socket design. First. the iipper three layers hanclle

al1 the tletails of the application (FTP. Telnet. or HTTP. for esample) and know little

aboiit the communication detnils. The lower foiir layers. on the other hancl. know little

about application biit hanclle al1 the cornniunication details: sentliiig data. maiting for

acknowleclgenient. sec{iiencing data that arrives out of order. calculating ancl verifyirig

checksiirns. and so on. The secorid reason is that the upper three Iqers often forrn what

is callecl ii user process while the lower Four layers are normally provided as part of the

operation system kernel. Man' coritcrnporary operation systerns provitle this separation.

for esaniple. the Lnis. CVinclows. the SIacintosh. etc.

2.5 Glossary

Resoiirce: A file. HTSIL document. image. applet. or any otlier objert addressable

by a single CRL.

CRL: Lniform Resource Location. a MorlcI \Vide W b resource address. for es-

ample. http://w~\nv.comm.utoronto.ci~/f.~c1i1t~.htrnl. It contains three parts. the

protocol (http). the DXS (clornain name system) name of the machine on which

the web page is located (w~~vw.cornm.utoronto.ca). and the file name (faciiltyhtml).

Client: The client side of a request-response transaction: the client side rnakes the

request. and server side responcls. The client may be the aeb navigation software

program. such as the Xetscape Navigator or Internet Esplorer. Hoivever. a proxy

server acting as a client may also be referred to as a Ment" .

Semer: h program accepting and servicing requests from clients: a server may be

an origin server (content provider) or a proxy server.

0 Proxy Server: ;\ri internietlia- server that accepts recliiests froni clients and for-

nards them to other prosy seners. the origin server. or services thr request from

its own cache. A prosy acts both as a sen-er as d l as a client: the prosy is a

server to the client connectirig to it anci a client to servers that it coiinects to.

0 Host: -4 phvsical romputer. runnin- client. sesver. proxy. or other software.

2.6 Summary

This Chapter gives a basic description of what a proxy server does. why the prosy server

is used to do trariscoding instencl of the original server or the client itself. For the purpose

of transcotling in application laver. HTTP protocol and the definition of HTTP prosy

semer are iilso analyzeci. TCP sockets are used for the client. the prosy. and the server

to coniniiinicate at the level of application layer.

Chapter 3

System Design of Proposed

Transcoding Proxy Server

In this Cliapter. we present the design of the proposecl transcoding prosy server. The

design of t lie prosy server s! stem reqiiires bot h the network progranirning and the design

of weh page t ranscoding. The content incliicles:

Design Fundamentals

System Design for Proposecl Pros- System

Main Flow Chart

Single File Process Thread

0 Web Page Process Thread

3.1 Design Fundarnent als

T h e .*Transcoding Proxy Server't is in the application Iayer in the Intemet protocol suite

['16]. as indicated in Figure (2.1).

The b'Transcocling Pro- Server" uses TCP sockets to m i t e robiist client and semer

programs that are integratecl in the P r o s . The reason is basically due to the fact that

the service proviclecl by TCP [ZS] is connectiori-oriented. This means. TCP provicles

connections between clients anci seners. A TCP client establislies a connection witli a

given server. esctianges data with t h server across the connection. and then terminates

the connec t ion. Besides.

TCP provicles reliabilit!: K h e n TCP sends data to the other encl. it requires an

acknowIeclgment in rctiirn. If' there is no acknon-lecig returned. TCP nutomati-

ciilly retransmits the chta and waits a longer amount of tirne. This retransmission

miil repeat until the ackno~vledgement is got or TCP decicles to give up. TCP con-

tains algorithni to est iniate the round-trip time (RTT) between a client and semer

tlynairiically so thnt it knows how long to m i t for the acknonledgement.

TCP sequerices the data by associating a sequence niimber wit h every byte t hat it

sencls.

a TCP provides flow control. TCP always tell its peer esactly hon many bytes of

data it is miliing to accept. This is callecl the advertised window. At an- time. the

winciow is the amount of room availahle in the receiving buffer. guaranteeing the

sencier cannot overflow the receiver's buffer.

O TCP connection is full-duplex connection. This means that an application c m

send and receive data in both directions on a given connection a t any time. TCP

keeps track of state information such as sequence nurnbers and windon sizes For

each direction of data flow: both sending and receiving.

As the interface between the server and the client. the pro= behaves as both server

and client. That is. from the point view of the semer' pros? is acting like a client; and

3.2. Systent Design /or Proposed Proq Systern 26

from the point vien of the client. prosy is acting like a server. On the other hand. the

prosy also performs cert ai ri types of data processing. for esample. ive b page t ranscoding.

The type of data processing clepencis on which type of service the client user mants. In

otlier ivords. the proxy will provide multiple services on clifferent ports ancl it's the client's

choice to decide which service it needs. To be able to accept new services created by

users. the pros? is designed co have estensible services at run tirne. which means that

the prosy does not have to know an' parricular service before it can provide this service.

Furthermore. the prosy is able to hanclle niany users at the same time for a certain service

aricl the pro- is t hiis tlesigned to utilize multiple threads for processing the requests.

\Lé also design the coniniiinicnt ion pro toc01 between the client ancl the proxy mil1

almays be HTTP. No rnèttter which particiilar protocol the client reaily wants to use. the

HTTP request via prosy srnt bu the client will wrap the iiiforniatiori of the non-HTTP

CRL of the server inside the HTTP request. For esarnple. an FTP CRL. ftp://somesite/somefile.

is requesteci to the prosy server as:

GET ftp:/ /sornesite/soniefile HTTPI 1 .O

Cser-agent : tesla

Xccept: text/htrnI. irnage/gif. imageljpeg

Then it's the proxy server's responsibility to parse the request and contact the FTP

semer in a corresponding protocol by the protocol hancller module.

3.2 System Design for Proposed Proxy System

Rè design the nem Transcoding prosy server. as indicated in Figure (3.1). The design

o l the new system indudes two major parts: netrvork programming part and web page

transcoding part. To make sure that the client. the proxy server and the server are able

3.2. Sgstern Design Jor Proposed Proqj System -7

to commiinicate mith each other. network prograrnniing is required and rhis basically

inclutles the .'Yirtiial Ser~er" niodule and "Yirtiial Client" nioclule. To be capable of

trariscorling. "Request Processing" module ancl "Data Processing' module (based on

that proposecl in [61) are tlesigned.

TRANSCODING PROXY SERVER

Virtual Crint 1 ,

Request Processing

H r r P Semer

- t I I PP

R uestvia 'L I

~r#nscoded w b Page f * I I 1 I

5 J

rn? Client

Figure 3.1: Prosy server system design

The modules inside Figure (3.1) are designeci to perform following functions:

Virtual Server

Virtual Server. listens to sorne particular ports (corresponding to different services) and

receives requests from different clients. It then creates a socket to cope mith each request.

Request Processing

Request Processing reads the request from the socket created by Virtual Server and

parses the request. It h a tmo fiinctions:

3.2. System Design for Proposed Pronj Systerri 28

0 Recpest Parsing: Request parsing fiiicls out the CRL of the original request. in-

cliicling thc protocol. host and the file name. Then it ni11 p a s this information to

the Protocol Handler modiile of Virtiial Client to niake a connection to the original

server.

s P:cfc:c::cc Pxsiag: Thc ;se: ~ i ! ! s c ~ d :I specir! !inc ii,dicatinm O its prcfermcc in

the HTTP recliiest header. This prelerenc~ ail1 he storecl as the reference to this

connection and iised by the Transcoding Policy lIanager module of Data Processing

to clecide the transcoding policy for images in the web document.

Virt ual Client

\'irtiial Client gets tlw parsecl reqiiest and sencls a corresponding connection reqiiest to

the original semer. Then it receives the response and saws it locally at the prosy. YirtuaI

Client has the followirig fiinctions:

Protocol Hatidlcr: Protocol hancller niakes the prosy capable of hanclling different

protocols. Thus it guarantees the user to use any esisting protocol. or even clefine

its own protocol. by incorporating the corresponcling protocol class at run tirne'

which makes the proxy both flexible ancl extensible.

Client socket: Khile server sockets always m i t for a connection, client sockets

actually initiate connections. Each socket has two values to identify the connection

endpoint. an IP address and a port number. M e r the protocol handler finds out

the protocol that the client nishes to use to connect to the server. it opens this

client socket a i t h the host address and port number and starts the connection with

the corresponding protocol.

HTNL Parser: HTML parser is used to find out the linked content of the web

.3.2. Systeni Design jor Proposed Promj System

document. for esample. images. test. active classes. etc. Whciiever a new link

of sotrie corirerit is foountl. it will reqiiire the Iliiltilonnection Manager nioclule to

create a riew connection for downloacling that content.

a SfdtiConnection Nanager: SIultiConnection manager gets the link. the Host. and

epms a ne*: ~hread te hÿnd!p the rlornnlenrl f ~ f thic !ink. T ~ P t h r w r l will cnnw-

qiiently activate protocol hantller to create a new client socket. SluItiConnection

manager moriitors the tlo~vnloading of eacli rveb content and then it invokeç Content

hancllcr for hantlling tiifferent content of the web document.

Content Harider: Different content of the weh document will be treated differently.

Basically. the test/html part of the clociiment will be sent to the client without

modificatioii. tlic active clius part of the clociirnerit will be filteretl. The images

in the clociitrient nill be transcoded basetl ori tlieir content and transcoding policy

given by t lie Transcoding Policj- Manager module in Data Processing. Fiirt hermore.

for images of different format. for esample. GIF. a corresponding decocler stiould be

iisecl to decocle in content handler alter the image is downloütied in SIultiConnection

.\Imager niodiile. The clecoded image can thrn be sent for content analyzing and

transcocling.

Data Processing

Data Processing processes the response saved by Virtual Client. It can perform single

file transcoding as well as web page transcoding. Data Processing has the folloming

functions:

O Transcoding Policy Manager: Transcoding policy manager is respoiisible for decid-

ing the transcoding policy For each image in the rveb page adaptively. The decision

is based on user's preference as well as image's different content. The policy sets

3.3. !Clain F h v Chart of Proposed Proxy Sofiware System 30

two parameters for each iniage. clown scaling ratio (R)nnd color reduction (in bits.

C). If the scaling ratio becornes zero. it actually means that the image is filterecl

out.

r Content Arialyer: Content analyzer finds out the content of an image by its fiinc-

tior,aiity. for ex3~1ple. thir imago ir f î r decqratien. fer clicking. fnr infnrrnat im. fnr

advert isenient, etc.

0 irnage Transcocling: Image transcoding actually has tw~o parts. The first part is

to transcode the ininge accorciing to the corresponditig t ranscoding paramet ers for

ttiis type of image. The seconcl part will invoke a encocler to encode the iniage back

to its originel forniat. for esample. for GIF format. a GIF ericoder will bc called.

r HTNL writer: HTNL rvriter wvill compose the web clociinient again. after the

iinwan ted or iininiportant content filtered and wanteci images transcocled. The

resiilt HTNL file also will have al1 its links rewritten as fut1 CRL. instead of

relative CRL. pointing to the transcoded images at the pros? semer.

3.3 Main Flow Chart of Proposed Proxy Software

System

The main Row chart of the prosy software system is iridicated in Figure (3.2).

The proposed prosy system is designed of nested rnultithread structure. Each time

when Virtual Semer of pro- identifies a senice. it d l first check whether the conirnuni-

cation port that this service is going to use has been occupied by sorne other services, if

yes. then t his service cannot be added to the requested port. If no ot her services ever use

the port. or in other words? the port is available. Virtual Server creates a thread that has

3.3. iI1ain Flow Churt of Proposed P r o q Softuure System

Proxy Starts

Proxy lnitialization

F H T P Proxy

Service

Looping

'I

Proxy Service Thread for Client $1

Service #2 . . .

Multiple Threads for Multi-Clients

7

Proxy Sewice Thread for Client #2

v Proxy Service

Thread for Client #ri

Service #2

Figure 3.2: Main Roachart of the software prosy seryer systern

a semer socket to listen to that port. This is the first 1-r of threads for running multi-

ple services. Whenever a request of certain type of service is "heard". the corresponding

server socket will create a service thread. Each service t hread handles the communication

between the prosy and the client. the communication between the proxy and the server.

It also copes mith the processing of a web content (or web page). CVlen multiple users

send multiple requests for a service: it is the situation that multiple service threads. the

second laver of threads? running at the same time. as indicated in Figure (3.2). Inside

3.4. Single File Process Threud 32

each service threacl. a tliircl l q e r of threads niay be used. For esample. MiiltiConiiectioa

.\Imager creat es miilt iple t lireütls to domnloacl n i d t iple content inside a neb page inside

a prosy service t hread.

As a server. the pros' can have an' services that a stanclarcl server has. For the

piirpose of trünscotling. two tlesignated classes of services are provitlecl t h e proposeci

transcociing proxy serwr. Each of them can carry out the requirement of traiiscoding. the

ciifference is whether it's for a single file (sanie as previous research) or for the entire web

page (new concept). The first service. "single file process" . is designecl for the prosy to

transcocle one content item within one connection. The reqiiestecl file ni- be a textlhtml

file or an image file that has a extension of .jpg. .jpeg. .gif. The seconcl senice. "web

page proccss" . is designed for the prosy to transcocie the entire web page within one

connection.

Each service can be put into the nested niiiltithrcad Aowchart. as inclicated in Figure

3 .2 as a thread. prosy service thread. For "single file process" . the corresponding proxy

sen-ice thread is single file process threacl: for "web page process". the corresponding

pros? service ttireacl is web page process threacl. Prosy service threacl performs the

control of the proccssing of both the request from the mobile client and the rpsponse

from the server. Thiis it activates and monitors Request Processing, Virtiial Client'

and Data Processing. t t is d s o responsible to detect anything erroneous and send the

corresponding error messages to the client during the entire process.

3.4 Single File Process Thread

&Single file process" thread is the thread that transcodes a single file within each con-

nection. It is equivalent to the service that previous proxy systems can provide. Single

file process thread aiIl go through the follorving steps to perforrn transcoding:

3.4. Single File Process Thread

1. Reqiiest Processing: parsing the HTTP request

2 . Virt iial Client: downloading the single content

3. Data Processing: transcoding the single content

4. Single file process t hread: sending the transcocled content

3.4.1 Request Processing

The Reqiicst proccssing reads the MILIE hcacier of the reqiiest by the Client. 1 t parses the

CRL of the HTTP request via proxy from the client. The HTTP reqiiest via pros? from

the client will have the forrn: GET sonieprotocol://soniesite/somcfile HTTP/ 1.0. The

Reqiiest Process tlien gets the original CRL out. diich is sonieprotocol://somesite/sornefile

and sen& this information to Virtiial Client.

Virtual Client

From Recpest Process. the type of the file reqiiested by the client is parsecl ancl known.

If the type of the file is image. transcoding is needed. The image has to be saved first in

prosy. Thus after Virtual Client opens a connection with the original server. it will use

a client socket (at proxy) to download this image first locally at the prosy server.

If the type of the file is a test/HTML. no transcoding is needed for the moment

(for simple. ive don't consider the situation when transcocling has to filter out some

unwantecl content inside the HTSIL file at this moment). Thus proxy server will jump

directly to the Iast step. sending the test/HTML directly. It will open a pair of threads

inside the proxy service thread. One communicates with the server. while another one

communicates with the client. These two threads will return when one of the threads

finishes the data eschange and notifies the ot her one.

3.4. Single File Process Thread

Single File Process Service Thread

Thread Initialization

Request Piocessing

7 Pane the URL ïii J aid aavr Iirr Iiiànaire Find and save the preference End the file's content type

Viflual Client

Cfeate URLConnection by proiocol handler

t UR LConnection - No

Valid' , Yes

image TextiHTML - lmageortext3 -

t File Cannot be found Save the image to proxy

Daia Processing f

Decide transcoding poiicy image content analysis Image tmscoding

Send MIME Header Send me transcoded image ta Client

Figure

3.4.3 Data Processing

Send Client error inf onnation - Close connectian with Client finish service

Send Client error information

- . Close conneclion with Client Finish service

Send Client enor information Close connectioo wrth Semer and Client Finish service

T Send MIME Header Create ta NO threads. one communicates with Server. one communicates wtth Client Wait threads end

Send Client erro tnformation Close mnnection with Sewer and Client Finish service

t -

Close mnnection with Client and Server Finish seMce

3.3: Single file process thread

Simple method of setting transcoding parameters and transcoding is used to transcode

the single image donnloaded, which is just like what previous systems do. The next step

3.5. Web Puge Pmcess Threud 35

OF sentling the trariscotlecl image is carried out - the single file process t liread itself üfter

it fincls the image lias been siiccessfiilly transcoclecl.

3.5 Web Page Process Thread

"iveb page I J L U C C ~ " ~ L ~ e i d ia iilr &.h i e id t h t ~ i ~ i i a ~ d r ~ riltki? WC), pirge ivitliii i eddi

connection. as intlicatecl in Figure (3.4) and (3.3). h t e when niobile client requests an

single image by this *Web page process" service. the service becomes same as *-Single file

process" service because they will go through esactly the same procediire. This means

that " \\éb page process" can completel! siibst itute "Single file process" Pven when only

a single image file is rcqiiired.

In ordcr to perform web page transcoding. the proxy server neecls a thorough un-

clersrancling of the aeb page and a total control of doivnloading ancl processing the web

page. This recpirenient results in the clesign of modtiles of Request Processing. Virtual

Client. Data Processing. and web page process thread itself. W b page process thread

tvill go tliroiigh the following steps to perform web page transcocling:

1. Reqiiest Processing: parsing the HTTP reqiiest and user preference

2. \'irtiial Client: clownloading HTSIL file

3. Yirtual Client: persing the HTSIL file

4. l ï r tual Client: downloading the multi-content linked by the HTSIL file

.?. Data Processing: analyzing image content

6 . Data Processing: evaluat ing t ranscotling parameters adaptively

7. Data Processing: transcoding al1 images with corresponding transcoding parame-

ters

3.5. GVeb Pirge Process Thread

S. Data Processing: reconstriict ing nen meb page

9. \ k h page process thread: sending the transcodetl web page tc client

3.5.1 Request Processing

In order to trenscocle the web page. the prory needs to know how the client uTants the

transco<lrd web page. Thiis Req~iest Processing not only needs to parse the HTTP

request. biit also neetls to parsc the preference sent by the client. The detailecl defiriit ion

of user prefereiice is given in Chapter 5. the implementation part.

3.5.2 Virtual Client

I n orcler to transcocle the web piigr. Virtual Client has to carry oiit several tasks.

The tirst step: Virtiial Client makes a connection to tlie original server. iising the

CRL tliat wits parsed by Reqiiest Processing. as indicated in Figiire (3.4).

The second step: Virtiial Client piirses the HTUL file ( the web page). as indicated

in the starting part in Figure (3.5). The original web page rnq- contain some LRL tags

that link to the content like images. active content. sound files etc. HTML parser detects

al1 these tags and Ends the corresponding file types. The tngs that link to active content

and souncl files are not saved for Iüter downloading a t 'vIiiltiConnection manager. since

most mobile users ivouldn't -nt these content when browsing the Internet. .-\il tlie tags

for images are saved for furtlier downloading in blultConnection manager.

Third step: l'irt ual Client now downloads the multiple content using its Slult Con-

nection \lanager. SIultiConnection manager gets al1 the URL links that saved by HTML

parser. changes al1 the relative CRLs into absolute URLs. and checks if each URL is

valid. If the URL is valid. MultiConnection manager creates a sub-thread for handling

this CRL.

3.5. Web Puge Process Threod

Web Page Process Service Thread

Thread Initialization

Request Pmcessing

v No

URL Valid? .

, Yes

Pane the URL Find and Save the tilename Find and Save the prelerence Find the file's content type

Virtual Client

v Create URLConnection by protocol handler

7 file Cannot be found Save the image to proxy

Data Pmcessing v Web page pmcessing 7

Deade transcoding pdicy Image content analysis Image transcoding

Send MIME Header Send the tfanscoded image to Client

Send Client emr information

e Close wnnection with Client Finish service

Send Client error information

- Close mnnection with Client Finish service

Send Client error information

b Close connedion wth Server and Client Finish semce

Figure 3.5

Send Client en0 information Close connection with Server and Client Finish service

Close connection wth b Client and Server

Finish service

Figure 3.4: Keb page process service thread

Inside each downloading sub-thread. protocol handler is invoked first to create a

connection to the corresponding semer. Once the connection is setup. the image file

CVeb Page Process Thread

Web Page Processing

Virtual Client

Save me web page as HTMt file

Pane Vie HTML file Fna tags wi(ti the URL links Save tags and URL links F i d out me image types

MumConnectiori manager t

Cteate mumole downioading mread for each valid URL leach URLConnmon created by protocol handler) Oownioading mread Save eadi image hie Wait ail ihreads retum Çave downbaded mages

Content handler f

Add pre-oracessinq functans for sûved images Retum dawnbaded images

Yes t

Smd MIME Header Send the tranxoded web page to Client

Send Client enor infomaWn C h e connecnon with Client Finish sewm

Send Client m r informaiion

c Closeconnecbonwfth Client Fmish serirtce

Figure 3.5: Web page processing

reqtiested by CRL is t hen tlownloacied ancl saved. After tlownloacling is finished. the

sub-t hread rct urns.

'iliilt iConnect ion manager moni tors al1 the s~tb-t hreacls and adds a t ime limit for sorne

if necessary. It waits d l stib-threacls to end ancl checks if each file is saved correctly. If

the image file is correctly sarecl. it will Save the new filename as û string. If not. it d l

put ;i erripty string to inclicate the file is not amilable.

3.5.3 Data Processing

Data Processing performs the web page transcoding. Again it lias to carry out severnl

tasks.

The first step: Data Processing fin& out the content for al1 the images that have

becn sitccessfully tlownloaciecl. The content. dong with user preference. tvill be used as

the hasic criterion to assign different image with different importance.

The second step: Data Processing e n h a t e s the t ranscocling pürameters adapt ively.

Diffèrent images are consiclered wit h clifferent irnport ance and the overall downloarliiig

tinie optiniization for the entire web page is used as the criterion to decide how rnuch

each image shotild be transcocled.

The third step: Data Processing transcodes al1 the images with corresponding transcod-

ing parameters evaliiated.

The fourth step: Data Processing uses al1 the transcocled images and the original

HTlIL to reconstmct a new HTML file a i th links nioclifiecl to point to the transcoded

images at the proxy semer.

Details of Data Processing is @wn in Chapter 4. web page transcoding. After the

transcoded web page is obtained. the next step of sending the transcoded web page is

carried out by the web process thread itself.

3.6 Summary

This Chapter presents the systern design of the proposed transcocling prosy server. The

-stem not only lias the network-related Featiires of multiple extensible services. multiple

users' siipporting. protocol hander. ancl content handler. but also has the transcoding-

r e h t e d fe:?r;res cf t?:& page t ranic~dif ig. adaptir:~ tran~rnding poliry! r m n t ~ n t analysis,

HTSIL parsing. active content fil tering aiid HTSIL writer. This Chapter also int rocliices

two services provitlecl by the proposecl transcocling prosy server. I t is important to note

chat both services are carried out in threads. wtiich means the pro- server is capable of

serring iiiiiltiple iisers at the sanie tinie.

Chapter 4

Web Page Transcoding

In this Chapter. we preserit the details of how web page transcoding is carriecl out in

Data Processing niodiile. This includes how to analyzc the incliviclual iniage content.

how to (lecide t lie traiiscocliiig policy for the web page. how to transcode and liom to put

every transcoclcd elemerits hack togetiier irito a new t ranscoded web page. The content

of this Chapter includcs:

cr Data Processing Flow Chart

a Image Content Analysis

a Acliipt ive Transcoding Policj- - Trnnscoding and Recons trtiction

4.1 Data Processing Flow Chart

Csuaily the web page contains man? links for different content inside it. For example.

a web page may include test. images. applets. sound files. Javascript etc. For web page

transcoding. the entire page is transcoded as one object with different content treated

4.2. Image Content rlrra/y.sis 4'2

clifferently The active coritent like .Java applet is filterecl oiit. The audio files riiay be

filtereci out or kept by the requirenicnt from the user. Test ancl HTSIL content get

t hroiigh direct lu. Different iniages are classi fied by their piirposes. The images t hat the

iiser rloesn't want to see ivill be filtereci out after their content is iclentified. Images allomed

to get t hroiigh are processecl by ari aclap t ive t ranscoding policy wi t h clifferent importance,

decitlecl by image content arialysis ancl iiser's preference. With the transcoded iniages

and the test/htrnl conterit. a riew wcb page is constriicteti and serit to the mobile user.

The niain flov: chart of Data Processing motliile for web page transcoding is inclicated

in Figure (4.1).

4.2 Image Content Analysis

Image content analysis gives tlie information of what conterit iin image haç so thnt a

correspontiiiig importance is appliecl to it according to user's preference. The content of

images is classified into S grotips accorcling to the piirpose of an image ['LI. The 8 groiips

of images are:

0 ADC': acivert isement

DEC: tlecoration. i.e.. backgrou~id image

BCL: bullets. points. bails. dots

0 RCL: riiles. lines. separators

a IIAP: maps. Le.. images with clicking focus

[NF: information. i.e.. icons. logos

0 L-PïF: Linked information- Le.. scaled images linked with original size images

4.2. [muge Con tent .-ln&sis

Data Processing

Find the c~litent for tne imge

Image D-ng. Scaling, Cdw reductng

and EKodng

Aenirn me rranxoded mage

tiiename

End the m t e m fa; 4 me image

atl images in the web ,

Page'

nit images in the rrgb

PW'

Redece the rags mth iranscoded images R m e the tags

wmse m t e n a am f iPefea inc!uding

images. aopleu eu:

Figure 4.1: Data Processing motliile for rveb page transcoding

CON: content related images. most important image group for web browsing, Le..

photos: st.ock graphics

4 .2 Image Content A nirlysis 44

In order to decicle whicli group a certain image belongs to. a step by step classification

is clesignecl. The first classification is carrieci out by the '*HTSIL Parser" of T i r t ua l

Client" moclule. At this step. images are divitlecl into 4 grotips according what type of

HTlIL tags they have. Detailed description of these four types of tags will be given in

Chapter 5. The 4 types of tags are:

a BAI<: backgroiiticl. i.e.. <body backgr = . . . >

a IS.\lAP: isniap. i.e.. <inig src = . . . ismap>

LIX: linked. i-e.. <a href = . . . > <img src = . . . >

a INL: inline. i.e.. <inkg src = . . . >

The second step is to fincl out some charactcrist ics of an image. for esample. whether

an image is color or rion-color. how man- colors or gray levels it has. wtiat size (Bytes)

and cliniension it lias. and nhether the image is photo or gaphics.

To find the color characteristics of an image. a clifferent approach is used here instead

of the compies and the-consuniing mathematiral forrniilas given in [?]. The new a p

proach uses the default ColorSiocle1 [32] in Java AWT. which uses S bits per pisel for red.

green. aricl biue. dong rvith another S bits for alpha (transparenc~) level. to decide the

color information of an image. Since an image c m be constructed by giving .Java M T

its every pisel values and the default ColorSlodel. otir software first uses .Java .\KT to

load the image from its string name. then gets eacli pkei value of the image to form a

pixel array. Alter the default Color.\lodel representation of the image is obtained. it is

useci to find out if the image is non-color or color and hon man'. colors or gray lerels the

image has are counted.

To finci whether an image is photo or graphies. the intensity switch of each image is

calculated ([z]). The intensity switch is the minimum value of horizontal switch and the

vertical switch. Horizontal switch is definetl as:

h test of color characteristics and intensity switcli of different images is s h o w in

Table (4.1).

Images

adv-1472-yahoo.gif

adv-1475-yahoo.gif

idv-miadora-yahoo-gil

adv-ntap-yahoagif

adv-monster.git

inf -uofLgif

inf-billpay-yahoo.gif

inf-mail-yahoagif

inf-pts-yahoo.gif

inf-bag-yahoo-gif

stock.gif

guanyin-gif

tuibei.jpg

o. O CO ors or widui x neignt intensity jmage type Image size grai le:ek. (pixels) switch

-. . - - t 77 color

'IN . 2*M)9B (7lgraylevel) . 88 x 31 0.3900

255 color 4*499 ( 1 28 gray leve~) 105x60 0.4883

LIN 2,481 8 - .

- 204 color (1 O4 gray level)

64 color LIN . 2*694 ' (57 gray ievei)

105 x 60 0.3271

128 color 8*980 ' (76 gray level) .

468 x 60 0.3988

32 color INL . ' - (32 gray level) - 120 x 132 0.3939

31 cofor INL 496 (13 gray level) -

27 x 25 0.5822

IN1 28 color

371 ' (25 gray level) . 25 x 20 0.58

INL 32 color

457 (27 gray Isvei) 28 x28 0.4987

INL . -

7 cilor 2456 . (7 gray tevei) ,

16 coior INL 14*285 ' . (1 5 gray level) .

500 x 285 0.1 545

16 colors INL 73*048 . (14 gray level) - 403 x 550 0.5704

- . INL 36,538 6 256 gray level 308 x 502 0.4759

Color images are changed into gray images

Table 4.1: Cornparison of image properties for different image types

The third step is to design an image content decision tree. as indicated in Figure (-4.2).

to find out the content of this image. Note the distinguishing criterion in this new decision

4.2. Imcqe Content -4 nalysis 46

tree treats the iniage file size as the main conîern instead of the iniage characteristics of

mhether to br graphics or photo ([2]). As an csarnple tiow iniage size is more important

than its photofgraphics properry. if a e have a big image. a stock trend indication. which

will be graphics. IF we think the photofgraphics property a niore important issue. since

the iniage has a tiig of --LIN9'. we will decide the iniage is a "AD?w". But wheri image size

is the triain concrrri. as long as the image has a size greater t han 10 KB. it ' s consideretl as

"L-ISF". Oiir ciccision tree don't t ry to fintl tlic key nards like cl*'. *-testurc". "map".

"logo" . -ken" . etc. This is because first it 's tinic consiiming aricl time consideration is

essential for transcodirig prosy server. secorid riot every iniage tias an esplanation key

word ncarby its iniage tag.

Parse the tags of images

T v P BAK LIN ISMAP

- ? ? 'DEC" image V "MAP' image

Image size . .10KB

c 5 KB 5 K&sc 1OKB > 10 KB 4

v Graphics , ,, ,

. or photo-

? 'ADV" image

v 'L-INF - image

w: width h: height r: aspect ratio (width/height) #: threshold from (21

Y "CON" image

? INL

t

Image size -

? User

preference

'I - 'INF image

Yes t

'BUL' image

Yes

.Y - 'RUL" image

- -

Figure 4.2: Image content decision tree

4.3. A tluptiue Transcoding Policy

Thus the important concept iincler t h ~ iniproved decision tree is that. d i e n the image

has a relatively small size. it becomes less important to decide its exact content. This is

because the objective for finding images' content iç to tlecitle their %tiportance" to the

eritire weh pagç.. Wheri the iniüge has a smaller size relative to other iniages in the web

page. it woolcl be consiclered as less important. Even for the sanie iniages in the same

web page. tlieir *%nportaiice" can change duc to user's prefercnce froni tiriie to tinie.

For esaniple. if the user really want to clo~~nloiitl the entire web page fast. al1 the iniages

otlier tlian .*CON" cari be put into one big group mith the same ..iniportance" level. At

this rtiotrirnt. thert. is no clifference whetlier ari iniage belongs to '*.\DY" or .LINF".

Fiirt herniore. If the time reqiiirement [rom the client is ver! tight. the prosy can Further

decide the sniall '*COS'* images to be W F " and thus becorne less irriportant images.

and only big .*CO.\:" iniages reniain to be as "CON" iniages.

4.3 Adapt ive Transcoding Policy

The piirpose of n new aclnptive transcoding policy is to clecide not only when to transcode.

but also tiow. Actually Iiow to t ranscode will affect the decision of when to transcode. The

research work iri [6] gives a pretliction rnethocl basecl on fixed transcocling parameters.

and thus no adaptation of transcoding parameters is obtained. Therefore. our goal is

design a new aclap t ive t ranscoding policy to eviiluate the t ranscociing adap t ively for each

image. In other words. al1 the iinknonns for decision of when to transcotle should be

predicted n-it h iinknonn transcoding parameters. which will be eventually evaluated as

how to transcode. Furt hermore. images with different content are treated differentl- so

that the entire web page is transcoded with optimization. The nen transcoding policy

also considers the quality of each transcoded image. for esample. the image should be

big enough to be recognized.

4.3. Adaptive Transcoding Policg 48

The proposetl transcoding policy for web page transcoding finds how to trarisrotle by

the condition of when to transcode. which is baser1 on time e d u a t i o n . The condition of

when to transcode is w tiet her with transcoding, the client can dowiload the web page

faster than ivithout transcoding. The inequality is represented as:

Where Ti[.: Transcoding time del- of the web page: Sir*: Size of the web page: ASu.:

Size reciuct ion of the web page: Bel[ : Mobile connect ion baridwidt h:

In ineqiiality (4.2). the size of' the w b page. Sir-. is kiiowri. The niobile connection

bantlaicltti is also known by preference[O] sent bu the client (definecl in Chapter 5 ) . So

t a o ~inknonns rieed to be evaliiatetl: the transcocling tirne clelay of the web page. Ttv.

and the rrcitiction sizc of the wcb page. ASii-. The right side of the iiieqiiality is actually

a tirne threshold for the niobile client to donnloacl the original siïe of the web page.

This threshoid itself. T. can be adjiisted by iiser by preference[7] defineci in Chapter 5 .

Preference [TI is the relative clownloacling tinie ratio ( 3 ) . iisecl to giw the mobile user a

flexibilitx of sacrificing resolution to niake clonnloaciing time shorter. This is clone by a

linear interpolation given hy:

Thus inequality (4.2) becomes:

Iiieqiiality (4.4) is ttic basic criterion to decicle transcocling polic- The goal now

is to evaluate the transcocling paranieters of scaling ratio and color reduction for each

image comporient of the web page adaptively. The procedures to get these transcoding

parameters are designed as t hree stcps:

a Deçisioiis abolit size reciuction for each image From ASLr-

Decisions about transcodi~ig parameters for each image

4.3.1 Prediction of the Transcoding Time Delay

The tinie clel- for neb page triinscoclirig. Tib.. can be represented in the surnniation form

as:

Tt\- = Ttm,age(,)

.Y

W'here TL,,,,(,, is the transcoding time del- of i th image in the weh page.

Thus if we find the indiridual transcoding time clel- for each image in the a eb

page. Tiç. can be summecl out. And once Tir. is known. we c m clecicle the transcoding

requirement for ASri-.

Transcotling time delay for an image clepends on rnany factors. including image size

(B-tes) . image dimension ( width x height). image coritent (simple or coniplicated). coding

algorithm. systern speed. number of users sharing the CPC etc. A group of images (listed

in Appendix A) are tested for their transcoding time delay. shown in Table (4.2).

Note in Table (4.2). the transcoding parameters. the scaling factor and the color

reduction. are set as constants. The scaling factor is set to be the scaling ratio From the

Image ' M W No. of Pixels 'mg' Si*' Dimension (ByteSI

adv-1472-yahoo.gif 88 x 31 2.728 2.009

adv-1475-yahoa git 105 x 60 6,300 4.499

a&-miadora-yahoo-gif 88 x 31 2.728 2.j81

adv-mnster gil

adv-ntap-yahoo gif

anernone qil

announcer g~f

annaunceil28 gif

baboon gif

mrnrnhead2.gif

cwheel gif

gold gif

gwnyin gif

inf-bag-yahoo g ~ f

in!-billpay-yahoo gif

rnf-mail-yahoo gif

intptç-yahao git

Uofr gif

jordangaster2.gif

jordangosler3 gif

kIdS gd

map qif

splash git

stock.gtf

No. of Colors or Gray Levels

177 colors

255 colorç

204 colors

128 colors

64 colors

256 colorç

256 colors

252 colors

256 cofors

14 colors

255 colois

256 d o r s

16 colors

7 COlOrs

31 W ~ O ~ S

28 colon

32 coiors

32 colors

256 gray levels 40.479

256 colon 68.555

245 gray ievefs l 2 6 . M

256 colon 56,928

'1 Mobile device as HHC. using fixed transcoding parameten to evalwte transcoding tirne dday scaling O 625 = sqrt ((640~4ûOY(1024x768)) and 256 gray ievel '2 Transcoding delay time include image content anaiysis 8 transcoding

Table 4.2: Transcocling time d e l - for images tt-ith different size

dimension of a PC (1024 x 768) to that of a HHC (640 x 480). which is 0.625. The color

reduction is set acçording to HHC's display capabilitp: 23.6 gray. Of course the parameters

are just one case of testing the transcoding time del- In order to predict the tirne delay?

we have to see what happens to the time d e l - when different transcoding parameters are

applied (Tlie actiial transcoding paranieters are still iinknowti at t his moment). Table

(4.3) shows the transcoding time clel- a i t h changing transcoding parameters. Since the

same group of images are tested. the- are identifiecl using the sarne tiumber sequence as

Table (4.2) insteacl of listing their names again.

image Wd:5 W : 5 R d 7 5 R d 6 2 5 h a 6 2 5 R d = R d 5 R d 5 R=C5 R d 3 7 5 R d 3 7 5 R d 3 7 5 R d 2 5 FM3 R d 2 5 CQSB c.16 c d C Q . ~ LM c d c 8 t 9 C I ~ S CAJ X M . C - ~ S c d c t . a cala CJ

1 adv-lu?-yahoogd 110 60 110 50 60 110 Tl0 100 50 i l0 110 110 110 110 110

2 adv-1475-yahoagif 60 i l0 110 110 M 60 110 50 50 110 110 110 50 110 110

3 adv-miadora-yahoogit 60 60 50 50 60 1tO 60 50 60 60 110 60 110 110 110

I babOOn gif 930 610 550 ï70 550 5Oû 6ûû 440 s40 550 390 390 380 380 380

commheaa2.gtt

cwneei gd

qoid qif

guanyin grf

rnf-bag-yahw gd

inf-btltpay-yahoo gif

id-mail-yahoo gd

int-pts-yanao qd

UotT gif

pm;in-paner2.gif

prdangostef3.grf

hds gif

map qit

spiash gd

stockgit

me oit

R. Scaimg Ram C Na of cdor bits for new image

Table 4.3: Transcoding time delay for different transcoding parameters

The result of Table (4.3) shows that s i t h the same image. the transcoding time delay

4.3. '4 dap titre Transçoding Polzcy 52

cloes var- n i t h transcocling parameters. Anci if n-e draw the transcocling tirne del- as a

fiinction of size arid nimber of pixels of an image. for eadi case of transcoding parameters.

a stronger linear correlation is foitricl between time del- ancl nitrnber of pisels. -4s shown

in Table (4.4).

Estimation Y = a' X + b' Correlation Coef. Correlation Caef Gradient a'

R'S.75 C0=256

Rd.75 C=16

Rd.75 C=4

Rd.625 C=256

R3.625 C= 16

R3.625 C d

R 3 . 5 C=256

R=OS C=16

R=0.5 C=4

Rd.375 C=256

Rdl.375 C=16

R=0.375 C d

R4.25 C=256

RS.25 C=16

kQ.25 C=4

All cases

Mean

A: Scaling Ratio C: No. of color bits for new image

Table 4.4: Linear time delay estimation results for different transcoding parameters

LVe c m see that when a particular paranieter pair (R Si C) is used, the correspond-

ing correlation coefficient (q) is very good. For example? Figure (4.3). Figure (-1.4):

Figure(4.5) and Figure (4.6). But when the parameter pair R & C is unknown, and al1

the sample are estimated. a poorer correlation coefficient is got in Table (4.4). as indi-

cated in Figure (4.7). -1s a resiilt. the niean transcoding time clel- is calculatecl for al1

scenarios ancl wheri Ive es t i rnat~ tliis mean value of time clelay from nurnber of pisels of

an image. a better corr~latiori is foiinti in Table (4.4). as indicatecl in Figure (4.8).

Thus whenever ari image's tlimensioii is known. we c m roughly preclict how long the

transcoding time del- 1\41 be for this image iising the linear estiniation resiilts from the

estiniation of mean transcotliiig t inie del-. .-\ncl bu suiiiniat ion of the traiiscoding time

dela? for al1 images. tire can preclict the tra~iscoding tinie tlelay for the wcb page.

4.3.2 Size Reduction Decisions

Once the transcoding time del- of the iwb page is preclicted. the only iinknomn in

irieqiiality (-!.A) is the size rediictioii of tlic web page. ASii-. Sirice our iiltiniate goal is

to preclict the transcoding parariieter pair for each image. ive rieed to fincl out how big is

the size recluction for each iniagr. in other worcis. how the size reciiict.ion of th^ web page

is constitiitccl. or sharetl by each image.

Again ASir can be represented by a sunimation form as:

tvhere ASimage(i) is the size reduction of ith image in the web page.

Now if there are N images inside a web page. we have N unknowns and only one

eqiiation? (4.6). To solve the N unknowns. we need another Y-1 equation. Thus we

design two d e s to obtain 3-1 extra equations:

Equal Transcoding Parameter Bu this rule ("ETP1). ive let al1 the images of the

same content group have the same transcoding parameters: R Si C.

Figure 4.3: Trariscocling tinie ilelay \*S. No. of pisels (R=0.375. C=4bit)

Figure 4.4: Transcoding time del- Vs. No. of pisels (R=O.5. C=4bit)

Figure 4.3: Trariscodirig tinie del- \ 'S. No. of pixels (R=O.GE. C=-lbit)

I L L L I 1 O 0 5 1 15 2 2.5 3 3.5 * $ 5 5

NO. ot p a e * 10'

Figure 4.6: Transcoding time d e l e Vs. No. of pisels (R=O.T5. C=-Lbit)

4.9. Adap tive Tmnscoding Policy

Figure 4.7: Transcoding time del- Ys. Yo. of pisels (al1 samples)

Figure 4.8: .\Lean transcoding tirne del- Ys. No. of pixels

4.3. Adap tive T~nnscoding Policg 57

Equal Size(Byte) Reduction Ratio By this riile ( --ESRR"). we let al1 the images of

the sarne cotitent group have the sarne size rccluction in Bytes ratio. a.

Sote the size reduction ratio. n. is not the sanie as the scaling ratio R. Size reduction

ratio for ith iniage is tlefinetl as:

Wheri the first riile is applietl. we find "Eqiiivalent pixel No." (tlefined in (4.9)) ancl

then lise eqtiotioii (-LM) to fincl size retiuction for each iniage. Then \w can substitute

al1 the lS,,a,,(,i in equation (4.6) and get a second order equation for unknown R. This

ecliiation is still not difficult to solve if al1 the images irisicle a web page have the same

content type. The difficiilty cornes when different irriport;ince is applied to images r i t t i

different content. This means nith less important images. the. will be compressed more.

Furtherniore. when the iniage qtiality is taken into consideration. which means R also

depencls on whether the transcodecl images c m be recognizecl by the client. this entire

decision of R 9J C noiiltl be overiy cornplicatetl in seiise of time needed to find out the

results. So for this situation. the transcoding policy woiilcl avoid solving the equation and

set parameters to ciifferent group of images a i t h different content accordicg to previous

knomledge it acquires for the similar scenario. This situation is also applied when some

clients aant to set size reduction ratio and color as tlieir clownloading criterion instead of

the downloading time in inequality (4.4). The results of this type of web page transcoding

are shown in Chapter 3.

When the second mle is applied. Ive have every image within the same content group

the same a. A relative importance is decided accorciing to the preference[.j] defined

in Chapter 5 . Usually the importance of "COW images equals to 1. and a relative

importance of other content d l be less than or equal to 1. Then we can find the

size reduct ion ratio. ci (i). for ot her images dcpencling on t heir individual importance to

"CON" images hy a niapping inclicated in Figure (4.9).

Size A Fieduetion , Ratio

Aelatwe importance t

0 value/ I 5 t

Figiirc 4.9: lIapping of relative importance to size rediiction ratio

Now if ae siibstitrite al1 the lS,,,,,(,, in qi iat ion (4.6) witli th^ nitiltiplication of

Simage(,, aricl a( i ) ( in ternis of a of T O Y ) . CI of T O N " images can be solvecl by a first

order eqiiation. Ancl iising the n( i ) and S,,,,(,i. the individual ASirnage(i) for each

image is solvecl.

4.3.3 Sranscoding Paramet ers Decisions

Once tve know the lSima,ci> for ith image. the nest step is to get the transcoding

parameter pair for this image. To do this. ive have to find out the relationship between

the ASimage(,, and transcoding parameter pair. Again the sarne group of images are tested

with different transcoding parameters to fincl out the sizes of the transcocled images.

Table (4.5) shows the result.

Based on Table (-4.5): again we can use linear estimation. but t his tirne it is between

the ASimgect> and the transcoding parameters. Since m*e need to consider two paranie-

ters(R k C) at the same time and also need to consider the specialty for each individual

R: Scaiiig Ratm C: No of cdor btts for new image

Table 4.5: Sizes of t rmscoded images for different transcoding parameters (bytes)

images. we define a variable to consist al1 t hese factors. equimlent pixel ?;o.:

(30. of pisels (1 - R')

Where R: scaling ratio: C,,: color depth in bits for the transcoded image. which is C:

4.3. -4dap tiue Transcoding Policg

Cdd: color clepth in bits for the original image.

If we m e linear estimation to fintl the correlation between for itli iiriage

and the ecpivalerit pisel no. of ith iniage. a strong linear correlation is found for al1 25

images of the testing groiip. for example. Figiire (A.10). Figure (-1.1 1). Figure(A.7) and

Figure (-l.L3)

Biit if WC put al1 the samples of equivalent pixel No. into one estiniation. a loner

correlatioti coefficient is inclicatecl in Figure (4.14). This is due to the fact that different

images have clifferent characteristics (complesity. color clepth. size. and dimension) and

their correlation patterns wit h transcoding paranieters are clifferent.

Thiis ive ncetl a rvay to fincl oiit each set of a. b for the linear estimation before rve

cari make the estiniiit ion between lS,,.,,,( ,, ancl eqiiiv;ilent pixel No. acciirate enough.

Consiclering two estreme si t tiations:

0 R = O. rneans the transcoded image will have a diniension of O. Conseqiiently the

transcoclecl image will have a O size (Bytes) ancl the equivalerit pixel No. will be the

original No. of pixels (eqiiation(4.8)). Thus when equivalent pisel Yo. is original

No. of pisels. the size reduction should be the original size of the image: point

[originnl.Vo.o f pxels. original.size] shotild be on the line.

0 R = 1 and color bits rernain the same for traiiscoded iniage. rneans no changing

at all. Consequently the transcoded image will have a size (Bytes) of the original

image and the equivalent pixel No. will be O (equation(4.8)). Thus when equivalent

pixel No. is 0. the size reduction should also be O: point [O. O] should be on the

line.

From these two points for reference. the estimatecl a' and b* of the line equation for

image(i) should be:

Figure 4. LO: Size rediict ion \'S. Equivalent pisel No. for announcer.,uif

I t 1 t 12 1.4 16 t 8 2 22 24 26

Eqrrnrlent Na. of pxeis x TO'

5

r 10' q = O 971917

' 6 i r j 1 .. 3

Figure 4.11: Size reduction Vs. Equivalent pkel No. for baboon.gif

Equrvafenr No. ot @xds x 10%

1 5 1

14-

1 3 - -

: *:.

4

1 121

08

0 7 1 15 2 2

Figure 4.12: Çize reduct ion \,-S. Eqiiivalciit pisel No. for coniniheatl2.gil

0.6 0.7 0 8 0.9 1 t t 12 1.3 1.4 1.5 16 Eqwalent No. d pcxels r 10'

Figure 4.13: Size reduction Ys. Equivalent p~xel No. for uoft-gif

4.3. rldaptice Transcoding Policy

Figure 4.14: Size reduction \'S. Eqciivalent pisel ?;o. for al1 images

Therefore. the lS image( i , is actiially proportional to Eqtiivalent pixel No.:

If WC go one step further. Ive may find out if two images have the same content and

color clepth. the estimated transcoding parameters for them ~ i l l be the same whether

"ETP'' rule or *gESRR'l rule is applied. This is the first comparison we get from *ETPV

rule and .-ESRR" rule. Another cornparison will be shown in Chapter 5 after we get the

transcoded n-eb page b - these two rules.

The linear estiniation of different images. the relative error by substitution of a' and

b* wit h eqtiation (4.9). are shown iri Table (4.6).

tinear Estimation No of ,we Size vs. Equivalent No. of Pixels

Y = a'x + b' N S (&es) Correlation

1 adv-1472-yahw gif 2.728 2.009

2 aav-tJT5-yaiioo gii ô.% 4,JiEi

3 adv-miadora-yahoo gif 2.728 2.58 1

a&-monsrer gif

adv-ntap-yahoo gif

anemone gif

annwncer gif

announcerl28 gif

baboon gif

comrnhead2 gif

cwneet gif

gold.gif

guanyin gif

inf-bag-yahm grf

int-billpay-yahoo gif

inf-mail-yahoo gif

inf gts-yahoo gif

Uoff gif

lordangosteR gif

lordangoster3.gif

hds gif

-P 9if

splash gif

stockgif

rree.gif

Relative Ermr

Table 4.6: Linear size reduction estimation for different images

From (4.10). given ASimgecil- equiwlent pixel 80. for each image can be solved tight

away. And from the definition of equivalent pixel No.. given the original pixel go- of the

4.4. Transcoding und Reconstruction 6s

image. the scaling ratio R anci the color bits of the new image C can be solved easily.

The detailed procediire of tlie decision of the trariscoding policy is inclicated in Figure

( 4 . 1 A niore cornplicatecl one can he obtainecl by adcling following functions (shown

in Figtire (4.15 as bold clotted blocks).

0 Real tinie aciaptatiori: A paraiici rnotiuie can 'oe acicieci to recorci the resuited

transcoding time dela? and re-estirnate the coefficients for linear prediction of

t rariscoding t ime del- clynamically

Recursive evaluat ion cont roi: The resul t eci transcocling paraniet ers can be put back

into eqiiat ion (4.6) to re-idj iist CI and evaliiatc t hc t ranscotling parameters again

to piirsue higher precision. This procedure can be performecl reciirsively until the

precision of eqiiatiori (4.6) is fuiind to be within a. given error.

Tlie resultecl do~vnloacling tirne will be more precise at the espense of longer transcod-

ing time clelay.

4.4 Transcoding and Reconstruction

hfter the transcoding policy is decided. the rest of the work of the Data Processing

module. as indicated in Figtire (4.1). is to use the evaluated transcoding parameters to

c a p - out the transcoding for each individual image in the aeb page. The *-HTMLWriter"

will use these transcoded images dong with the original saved HT1IL file (original web

page) to forrn a new HTSIL file (the transcoded web page). mith al1 the link tags modified

accordingl- When Data Processing module returns? it gives web page process service

thread. as indicated in Figure (3.4). the transcoded web page and web page process

service thread d l send this new web page back to the client and finish the service.

4.4. TrarsscocIiny and Reconstruction

T Set detaun ram default-R (tmm pre fe@W 11 W1) Set Client's coior quiremen! ( h m preference(31) Set min-dim (min_wdth CL min-height) for image qualrty ( h m preterenœ(6D

T Find out total no. of pueis and total sue Predict bme needed by transcoding Predict w e red~ct~ln needed for the web --an r-a-

Find out alpha b r CON

? ? T t ADV DEC SUL RUL

End relative importance 4 Find out sire reducrian ratm a(i) Find out site reductian ratio R 8 no ot color bcts: C frorn a(!)

iiCIbm <

--* *., , R = mm-dm 1 image-dim.

V V T 7 MAP INF L-INF CON

v Set transaxtmg

parameters for image(il . -

r . Real time ,

Figure 4.15: Adaptive transcoding policy

Therefore. the web page transcoding is transparent to both the client and the semer.

The client sends one request and the whole transcoded web page is got.

4.5 Summary

In this Cliapter \YP propose the design of web page trariscoding in Data Processing module

of the proxy server sustem. An improved way of image content analysis is introcluced

and the performance of the riew image content decision tree is evaluated. An adaptive

polir? is d~si~riecl For web page transcotling. Details are eiven on. accorclinp to user's

clifferent reqtiirenients (displq recpirernents anci browsing recpirernents) ancl accorcling

to differerit importance for tlifferent images. how to decide the transcoding parameters

adapt ivel~..

Chapter 5

Implement at ion and Experiment al

Results

[ri tliis Cliapter. ive clescrilxs the .Java iniple~iitmtatioii aiid the espcriniental resiilts of

the proposecl transcoding pros!. sener sustem. The conterit inclutles two parts:

.Java Iniplernetitatioti of Proposed Transcotlirig System

a Esperimental Resiilts

5.1 Java Implementation of Proposed Transcoding

System

The proxy server system is implemented in Jan . As described before. it iacludes both

the networking programniing part and the transcoding part. The source code esceeds

3300 lines. which contains the core parts of a transcoding proxy semer. One can add

corne hnctions for the semer. for example. server security and maintenance. caching and

searching, and resource managing. Since the system is designed using object-oriented

prograrnming technique. new ftinctions can be added conveniently.

RTTI

RTTI stands for Ron-time type identification. which uncovers a whole plettiora of inter-

esting ob iect-orientecl design and raises funclamental questions of how to strtictiire the

prograni. In brief. RTTI allows the program to discover information about objects and

classes a t run-time. nhicli means the prograrn can handle t hose classes t hat are iinknomn

at compile time. The tcchnology rnakes the prograni self-extensive in sense of that any

classes c m be aclclecl to the prograni groiip bu be recognized at run-time. The proxy

software systeni uses RTTI to niake the proxy to have extensive rnulti-service. which

enable services to be added by riin time [34].

5.1.2 Nested Multiple Thread Structure

-4s seen in the main Aow chart of the pros? software system. n nestecl multiple tliread

structure is tisecl. The first lewl of multiple thread is usetl for providing multiple services.

the seconcl level is iised for providing one service For nit11 t iple clients. Inside each service

thread. a third level of thread m v be iisecl at times when necessa . For example. for

single file process thread. two sub-threads are usecl to receive from the server and send to

the client a single text/html file simultaneously. For web page service thread. in Virtual

Client. a t hird level of multiple t hreads is used to download different content in the web

page-

5.1.3 TCP Sockets: ServerSocket, Socket class

Two classes are invoked to realize the TCP sockets used by the proxy server [v9]. The

ServerSocket "listens to" the requests from the client. The Socket 'kpeaks" to the original

5.1. Java Implernentation of Proposed Tmscoding Systern 70

server to get the respoiise and a-spcaks to" the client to send the transcoded response.

1. public class ServerSocket extends Object: This class implements semer sockets. A

server socket waits for reqiiests to corne in over the network. It performs some o p

eration basetl on that reqiiest. ancl then possibly returns a resiilt to the requester.

Tlle \ < : ~ r k Of the serx;-r ~ ~ c ~ ~ t i- perf~rfiied h\ an i f i r tznc~ cf $ ~ & ~ ~ ! ~ p !

class. hri application can change the socket hctory that creates the socket iniple-

nientation to configure itself to create sockets appropriate to the local firewill.

2 . public class Socket extends Object: This class iniplements client sockets (also called

jiist "sockets'.). -4 socket is a n endpoint for corrimunication between two machines.

The actiiai work of the socket is performed by an instance of the SocketImpl class.

Ari application. b~ changing the sockct factory that creates the socket iinplenien-

tation. cnn configure itselF to create sockets appropriate to the local firewall.

5.1.4 Connection Control and Resource Management

.\ seniaphore c m be used to realize the connection control and it c m be programnieci as

a separate thread to rtin sirnultaneouslj-. Semaphore is important when multiple clients

share the seme resources at the prosy server.

Resoiirce Management incliides many details for server prograrnming. incltiding file

operations. ilse file and record locking. qtiery and motlify process attributes and resource

limits. etc [35]. At the moment. ive implement only simple considerations For file oper-

ations. In order t hat same filenames from different GRL links don3 conflict locally at

transcoding prosy server. a filename administration is considered. This is realized by

adding URL into the saved filenames.

For esample, a file in the URL of someprotocol://somesite/somefi1e will use a 10-

cal filename as someprotocolsomesitesomefile for saving. Note that ":" and "/" are

5.1. .laun Implernentation of Proposed Transcoding Systeni

. .. changed as '-- - in the riew filename.

5.1.5 HTML Parser

HTllL Parser parsec1 the web page and save the lirik tags for al1 images inside this web

pa-e. In a HTML file. therr are basicallv four tvpes of tans that are linked to the images:

[30. 3 LI

Background image: de fined wittiin the tag of body. <body backgroiincl = CRL . . .

>. the CRL is specifiecl to an image to be tiled in the document backgroiirid.

Mine images: defiriecl in the tags for images. <img src = CRL . . . >. the CRL is

pointetl to the atlclress wlicre the image stiould be downloadcd.

.. . 0 Isrnap images: tlefineti in the tags for images eiided by isniap" . <iriig src = PRL

. . . isrriap>.

Linkecl images: clefitirtl in the tags for images right after a hyperlink tag. <a href

- - . . . > <img src = . . . > </a>. the image c m be clicked and after clicking the

browser will use the acldrrss indicatecl in the hyperlink to find the linkecl content.

Another tag can point to an image when there is a hyperlink tag <a href = CRL

. . . > and the URL is pointed to an image. But since at this case the image will not be

shonn at the present Web page and it is only downloaded after clicked by the user latex-?

the image is not countecl for downlonding.

5.1.6 Protocol Handler

Protocol handler of Virtual Client is responsible to set up the connection betmeen the

pros. server and the original server using the certain protocol given in the URL. It makes

5.1. .Joua Implernentation O/ Pro posed Transcoding Syst ern 12

the proxy server capable of coping with different application laver protocols. which means

a riew protocol can be writteri by the mer and recognized by protocol Iiancller mithoiit

rnodifj-ing rest of the software systern.

Protocol handler rnechanisrii is impleniented throiigh four different classes: URL.

CRLStrearnHandler. URLConnection. and CRLStreamHandlerFactory (271. Among them.

only KRL class is concrete. CRLStreamHandler and CRLConnection c l a s is abstract

classes. and CRLS treaniHancIler F a c t q is an interface. Tliat nieans to nrite t Lie pro t o-

col. one has to write concrete stibclasses for the CRLStrearnHancller and URLConnectiori.

There are three steps to realize protocol haricller.

1. To lise these clases. CRLStrearnHandlerFactory is uscd to take the protocol and

Iocate an appropriate siibclnss of CRLStreaniHandler for the protocol.

2. This C~RLStrearnHaricller will then parse the string representation of the CRL into

its separate parts and creates a corresponding CRLConnection.

3. The new LiRLConnection is responsible for the interaction mith the server. converts

ariything the server sentis into an InputStream. ancl converts anything the pros-

sentis into an Outputstreani.

In most cases. an instance of a LRLStreamHandler subcinss is not crcated directly by

an application. Rather. the first time a protocol name is encountered when constructing

a URL. the appropriate stream protocol handler is automat ically loaded.

5.1.7 User Preference Definition

Preference from the client contains the information of mobile link bandwidth. display re-

quirernents. and browsing requirements. There is no particular reason why the definition

5.1. Java Implernentatlon of Proposed Transcoding System 73

in our irnplenientatioii is (lesignecl in the following W . .ktiially the client can define its

preference in any Corniat. as long as it contains the relevant information needecl.

In our implenientation. the preference is tlesignecl to be in the HTTP header right

after the "GET" request and it uses two 32-bit integers separateci by space:

Preference: prefl pref-

Prefl ancl preP arc defined. by our implerneritation, as two 32 bit integers. They are

interpreted by Reqiiest Processing anci traiislated into pr~ference[~ wliich is an interna1

integer array. The size of the iirray is eight ancl ae define it as following:

i Preference[O]: 'tlobile coiinection ban<lwiclth (Bni). -L bits. bits 0-3 of prefl. The

clefinition is cletailecl in Table (5.1).

Pre€erence[l] k [[-: Mdiie device clisplay sizc reqiiirement. 4 bits. bits 4-7 of prefl.

The clefinition is tletailed in Table ( 5 . 2 ) .

i Preference[3]: '\[obile device clisplay color reqiiirement. -4 bits. bits 8-1 1 of prefl.

The dcfinition is cletailed in Table (5.3).

6 Prefererice[4] k [SI: llobile user image filter. iised to indicate nhether certain type

of image is wntecl by the mobile iiser. S bits. bits 12-19 of prefl. Khen the bit is

set to '' 1''. the corresponding image will be transcoded at the proxy and sent to the

mobile user. If the bit is " O " . the image mil1 be filtered out by prosy. Also each type

of image is assigned a relative importance to the "CON*' type image (Images are

classified by their content into S groups. shown in Table ( 5 . 4 , detailed in Chapter

4). This will be used for deciding the adaptive transcoding policy. Basically niore

important image type d l be assigned longer transcoding time. And this image

preference is defined as pref'erence[û] or integer pref2. Each image type will thus

take four bits out of the 32-bit integer. For image preference. each image type will

5.1. Jasa Irnplementution O/ Proposed Transcoding Sgstem 74

take 4 bits. the value will be from O to 15. For "COK" . the four bits will always

be 1.5. wtiicli rrieans the relative base. Other image types will value from 0-15.

which gives the relative importance: vaIiie/lJ. For esample. if "INF" image. if

the 4 bit value is 3. then they have a relative importance of 0.2. coniparing with

" COX" . This basically means given the same t ranscoding t inie. " IXF" image ail1

be transcocletl so t hat the transcodetl image is 0.2 the size of the transcocled image

from " COX" .

0 Preference[GI: Image quality parameter. used to give the image quality l o s tolerance

of the niobilc user.

0 Preference(T1: Relative downloading tinie ratio. used to give the mobile user a

Hesibility of sacrificing resolution to make riotvnloaclirig time shorter. The usage of

this ratio has been introclocecl in Chapter 4. It also takes 4 bits. bits 24-27 of prefl.

The definition is detailed in Table ( 3 . 5 ) -

The overall construction of prefl is g v e n in Table ( 5 . 6 ) .

Mobrb Connectlon PretemMOj BRS" pretl Bandvuidth (Sm) (4 btts: 0-3)

Bm c 9.6 kbps O 0000

9.6 kbps c 8rn c= 14.4

mPs f ml

14.4 kbps c Sm c 28.8 '

kbps ml0

28.8 khps c Bm <= 33.6 3 001 1 w= - 33.6 kbps c 8m - 56

' w s 4 0100

56 kbps c Brn 5 0101

HHC 640 putels 480 pixels 0100 - -

--. - - Cobr PC 1020 paeis 768 p a e b 1100

Table -5.1: Definit ion of preference[O] Table -5.2: Definition of preference[l] 9r [2]

5.2. Erper-imental Results

BIW 2 gray kvel O 0000

4 gray levei 1 O001

Gray (Bit 11 :

8 g n y level 2 0010

O' 16 gray kvel 3 001 1

256 gray level 4 0100

8 color to taro

Coior 1 6 color 11 107 1

(Bit 11. 1) 256 cokr 12 1100

24 bit color 15 1111

Table 5.3: Defiriition of preference[3]

Image fiiter-Preterem{4j (Bits 12-19 in pretl)

CON L-INF INF MAP RUL BUL OEC ADV - - -

Bit 19 Bit 18 Bit 17 Bit 16 Bit 15 Bit 14 Bit 13 Bit 12 -. . .

Image preference-Preference[Sl= pren (Bits 0-31)

CON L-INF INF MAP Rut BUL OEC AOV

Bit ' Bit - Bit - Bit Bit Bit ' Bit ' Bd '

28-31 24-27 20-23 16-19 12-15 8-11 L 7 &3

Table 3.4: Definition of preference[-l] 9r [JI.

The notations COX. L-IXF. INF. ,GIAP.

RCL. BC'L. DEL. ancl ADV have been ex-

plained in Chapter 4

1 pretî (Bits 0-3t) 1 . prekrence preierem preference preterenœ preterenœ preference

m . PI . ~ 4 1 BI . L - PI Bit3 24-27 Bits 20-23 Bits 12-19 Bi& 8- 11 Bits 4-7 Bits 0-3 I

Table 5.6: Definition of pref

Table X.3: Definition of preference(71

5.2 Experimental Results

In orcler to test the prosy system. a HTTPSener and a HTTPClient are also coded. The

HTTPServer is a semer that tvaits for the HTTP requests From a client at port 80. The

HTTPClient initiates a connection by sending a HTTP request' which is actually sent to

the prosy server. And then the prosy server mil1 send this request to the HTTPServer,

get the response. transcode the web page and send it back to the HTTPClient.

We test Our system in three scenarios:

1. Transcoding of single images

2. Transcoding of web pages wit h "ETP" rule

3. Transcotling of web pages witli "ESRR" riile

5.2.1 Transcoding of single images

Different images are testecl for the effect of downloading mith transcocling vs. without

t riinscorlirig Thc tests ;ire rpppi~tpd for mobile links of t l i f f~rmt banclwidttis that eqiial

to 25.5 kbps. 14.4 kbps. ancl 9.6 kbps. For ~saniple. if announcer.gif is reqiiested by the

client via prosy. Figiire (.5.1) shows how the total tlownloading time (mith transcocling)

is corist i t iitecl and Figiire (5 2) shows t lie coniparison of downloacling time between mit h

transcocling aricl n i t hout transcocling. wit t i cliffererit t ranscocling paranieter pair. a t the

bandwidtli of 28.8 kbps for the mobile link. Figiire (-3.3) and (5.4) show the results at the

banclwidth of 14.4 kbps. Figure (5.3) ancl (2.6) show the results at the bandnidth of 9.6

kbps. Sirice wc clori't know the bandwidt h between the prosy serrer and the HTTPServer.

a 1 lIbps connection is assumed to ci\lciilate the %nage Swing Tirne'. (in Figiire (3.1):

(5.3). ( 5 . 5 ) ) from HTTPSerrer to send the iniage to the proxy server. In rcal situation.

this connection rate ciin always be nionitorecl and tised for calciilation.

From Figures (5.1)-(5.6). tve see:

a -4s scaling ratio (R) or color depth of transcoded image (C) becomes smaller, the

size (Bytes) of the transcodecl image becomes smaller. Thus the *+Transmission

Time of Transcoded Image'' (in F i g e ( 1 ) (5.3). (5.5)) becomes shorter and

consequently the total downloading time (with transcoding) becomes shorter.

As the banclwidth of the mobile link becomes smaller. the *'Transmission Time of

Transcoded Image" becomes longer. Thus its proportion to the total donmloading

t ime (wit h t ranscoding) becomes larger.

As the bandwidth of the mobile link becomes smaller, the downloading time (nrith-

out transcocling) increases by the sanie ratio as the --Transmission Tirne" does.

Thtis for the cases when the --Transmission Time" is the dominant Factor of the

total tlownloading time (with transcocling). the relative ratio between the download-

irig tinie with transcocling and t hat witliout transcodiiig remains almost unchanged

(in Figure ( 3 . 2 ) . (3.4). (5.6)) .

5.2. Eqesri-rnentaf Results

Bandwidth s 2û.8 Kbpr

F i 5.1: Image cionnloading tinie at 28.8 kbps(wit h trariscoding)

Bandwidth = 28.8 Kbps

Figure 3.2: Image dosnloading time cornparison at 28.8 kbps

Bùndwidth = 14.4 Kbps

Figure 5.Q: Iniage downloacling tirne at 14.4 kbps(ait h transcocling)

Bandwidth = t4.4 Kbps

Figure 5.4: Image domloading time cornparison at 14.4 kbps

5.2.2 Transcoding of web page with ETP

With "ETP" riile. ive testetl the web piige shown in Appendix. Figure (-1.26). Figure

(5.7) siioivs tiow the entire downloadirig tinie (n i t h transcoding) is constitcitecl and Figure

(3.8) shows t hc cornparison of clonnloacling tiine between tvit h t ranscoding and wit houi

transcoding. again t here are clifferent situations wit h different t ranscotling parameter

pair aricl t he bariclwiclth of t tic rriobile link is 28.8 kbps. Figure (.5.9) and (5.10) show the

resiilts ;it the bandwidth of 11.4 kbps. Figure (5.11) aiid (3.12) siion- the resiilts at the

banclnidtli of 9.6 kbps. Actiially al1 the paranieter pairs are for *f'ON9' iniages. for other

iniages. the t ranscocling paranietcr piiir is set to R = 0.625. C = -1. Tlie same connect ion

banclwitltli of 1 Ubps between ttie pros' serrer and the HTTPScrver is assiimed.

Siriiilarly. from Figures (5.7')-(5.12). we sec:

Tlie sanie sciilirig ratio ( R ) and color depth (C) apply to al1 the images ivith same

coritrrit . For différent content. clifferent RkC are decidecl accorcling to previous

knowledgr of transcoding. the content. the importance giiven by user's preference.

ttie client's ciisplay reqiiirement and the quality requirenient for transcoded image

(for example. the srnallest R t hat niakes the transcoded iniage still recognizable).

As scaling ratio (R) or color depth (C) gets srnaller. the size (Bytes) of the transcoded

aeb page becomes smaller. Thus the "Transmission Time of Transcoded Web Page"

(in Figure (5.7). (5.9). (5.1 1)) hecomes shorter and consequently the total down-

loiding time (wit h transcoding) becomes s horter.

a As the banclwidt h of t hc mobile link becornes smaller. the **Transmission Time of

Transcoded Web Page" becomes longer. Thus its proportion to the total down-

loading time (with transcoding) becomes larger.

-4s the bandwidth of the mobile link becomes srnaller, the downioading time (with-

out transcoding) increascs by the sanie ratio as the -'Traiisriiissiun Tinie of Transcoded

Keb Page" cloes. This for the cases alien the u'Transmission Tinie of Trariscoded

\ k b Page" is the clorninant factor of the total domnloacling tinie (with transcod-

ing). the relative ratio between the clownloading time s i t h traiiscocling and t ha t

withoiit transcocling remûins almost iitichanged (in Figure (5.8). (-5.10). (5.12)).

From prcvious knowledge of transcoding. along wi t h the clelay tinie cornponents anal-

ysis ancl the clowiiloading tiriie coniparison analusis. when the clirrit gives the prosy a

requirenient that inclucles same scaling ratio for sanie content images and an amount of

tinie for clownloading. the prosy scrver caii tlecide ivhich set of paranieter pair to meet

the client's reqiiirement. An rsample corisistirig of two transcocl~tl web pages (using

ETP) is s h o w in Appenclis B. Figure (B. 1) shows the nori-color version of ETP meb

page transcoding. and Figure (B.?) sho\i.s the color version.

0 M 100 150 200 250 300

Bandwldth a 14.4 Kbps

Figure 5.9: Web page tlowriloading time at 14.4 kbps(ETP transcoding)

-- -- AL~--- - R = O 7 5 . C = 8 ' 4

t , 1

1 R = O 7 5 . C = 2 1 - ' i

i

i

2 Weo Page D<mnroadnq

m L rune mm TrarucOdnp

d

1 ow- Pûge Downioabng

i rmm unthait Transcodnp

!? 1

b I

1

1 R = O Z . C = 2 j

O 100 200 3m 500 500

Bandwidîh = 14.4 Kbps

Figure 5.10: Web page downloading time cornparison at 14.4 kbps

I Bandwtdth = 9.6 Kbps

Figure 5.1 1: \\éb page clownloacling tinie at 9.6 kbps(ETP transcoding)

I Bandwidth + 9.6 Kbps

Figure Z.12: Web page downloading time cornparison at 9.6 kbps

5.2.3 Transcoding of web page with ESRR

Wit h ..ESRRo' ride. n e testecl the web page shown in Appendis -4. Figure (A.26). Z;ow the

different situations are due to t lie different t inie t hresholcl given by the client. for exnmplc.

the client wants to get the w b page domnloadecl within 10s. etc. The bandwidth of the

niobile lirik is 28.8 kbps (same tests can be run for banclwidth of 14.4 kbps and 9.6 khps).

The estimatetl t ranscocling parameter pairs are shonn in Table (.5.7).

Total size before transcoding (bytes) .

Total sue atter iranscoding (bytes) ,

WeD Pa* Wng ama II)

f f-ngmaewing- I m w ,

W Triramason orne or E O a m q % t S i

f OP) dmdaaumq orne (9)

15

ïï4.278

22.707

6.19

2.64

6.31

15.14

R.0 1667 Cr

R*O a2i C d

R=O 625 Gr

%O 3333 Gr

Rd3 2576 GJ

Rn0 625 Gr

Ra0 625 GJ

RIO 625 CY.8

R.0 625 C-J

R d 1823 G Z

R i 0 2578 6 4

R-42518 cd

Oownloading time threshold (s)

R d 625 C d . .

%O 3333

*

%O so2a C=J

R d 625 W

Rd) 62s

- -

Gr

R B 625 Gr

* RA 625 - '

c.4

R.0 3553 car

RrO s 2 4

- C=J

h a xni - L . 2 - - -

Rdl3553 -

Table 5.7: W b page transcoding and transcoding parameters evaluated by ESRR

Another coniparison between the ETP transcoding and the ESRR transcoding can

be observed. with the same domloading time requirement of 25 s (at 25.5 kbps). If

E 30 p i = = r j f O Weô Page û m t n m

i 2s ppl F m wih T ranscodnp

- u W M Page Domiioadmg

20 rr, rime m(hout Tramcodng

i , pp, i

O M 100 1 5 0 -Xa 2M

Bandwidth = 28.8 Kbps

Figiirr -5.14: Web page clownloacling time cornparison at '25.5 kbps

A s t tic tlowriloadiiig time requireriient becoiiies longer. the evaliiated transcoding

paranieters for images insicle the aeh page becorne larger (sarne trend for al1 im-

ages. but not necessarily the same amoiint). Thus transcoded images have larger

dimension ancl more color clepth (in Table (5.7)).

For sniall images. including the .*.\DV*' images and TÜF" images. t lieir scaling

ratio R Sr color depth C are actually much more affected by the image qualit';.

Le.. the smallest image size that c m be recognized b - the client. This is because

the evaluatecl transcoding parameters makes the image too small to be recognizecl.

Thus the finally decided transcoding parameter pair remains the same for different

time requirement scenarios (in Table ( 5 . 7 ) ) . But for big images like "CON''. each

tirne the adaptive policy will decide the optimal parameters using ESRR and each

image may be different if they have originally different color depth (in bits).

The transcoded web pages have the images clear enough to be recognized, even

b r the 10s ïcqiiirernerit case. 10s of clownloading tinie with transcocling is actually

3% 'cf that witlioiit transcotlitig, LI-hicii is 21% (in Figure (B.3)-(BA)). Also it is

possible to put a link in the ..HT'\[L witer" for each scaled image so that if the

client mants a clearer view. the iniage can be clicked to be shomn bigger. At this

tinie. the prosy server can send back a higger version mithotit clotvriloacling tlie

image agairi frorii the origirial serwr.

0 Results in ..Total downloacling tirne" line in Table (5.7) show that the adaptive

trariscocling policy works rathcr precisely. For esample. for the '25 second require-

ment. thc resulting downloading time is 23.1 seconds. giving a relative error of

Ï.6%.

0 Froni the Figure (5.14). we sec t hat the user ciln choose to domnload a web page with

"iiny" aniount of time. Of course. the tirne shoiild not be sliorter t han clownloading

the test only version of the w b page. And t his is guaranteed by eqiiation (5 .3) .

Summary

In this Chapter ive present some technical considerations for dava irnplenientation of the

proposed transcoding prosy systern. Shen sonie esperimentai test results are shown for

different transcoding scenarios: tests of proposeci transcoding policy to evaluate transcod-

ing parameters for different images in a web page adaptively: tests of transcoding prosy

server to compare the downloading time wit h t ranscoding vs. \vit hout transcoding. The

test resiilts show that the downloading time ancl resolution of a web page can be con-

trolled by the new pro? systern cornpletel. They also show an excellent performance

and efficiency of the proposed transcoding .stem. for esample, the client can reduce the

domloading time with transcoding to 5% of that without transcoding.

Chapter 6

Conclusions and Future Work

This thesis has addressecl the tlesigri of trariscocling proxy semer. mhich consists of four

parts: the clesign of web page t rariscoding. the clesign of adapt ive transcoding policy.

the .lava iniplenieritatiori. iiritl experinierital tests. Eacli part coritributes to the pro-

posecl transcocling pro- system and riiakes it superior to the esisting transcoding proxy

systems. Following are some concliisions:

Tlic niethotl of web page transcoding rriakes the proposed prosy system capable

of transcoding the entire web page withiri one connection. The method rnakes

possible many new processing techniques that enable a wide range of transcoding

reqiiirements. for example. overall downloading optirnization (the user c m give a

time boiindary for downloading the web page), searching and filtering unimportant

or unn-anted images. and dynamic browsing (the user can set the parameters for

browsing n i th each connection). It also reduces the reqiiest clel- for unwanted

content. and will render a better caching management.

The proposed adaptive transcoding policy can evaluate transcoding parameters for

each image inside a web page adaptivel. The policy takes into consideration of

iniage content. clecided t hrotigh a new image content clecision tree arid t ranscoded

iniage qiiality. By the proposed adaptive policy for web page transcoding. the

optimization of the downloacling of the entire web page is achievecl for the first time.

.Usa the resiiltecl t ranscocling pararriet ers are precise in t bat the error between the

resulted donriloading time and the arbitrary time threshold giveri by the mobile

user is small.

a The researcli gives a systern design of the proposed transcoding system as well as a

.Java iniplernentation. The design aritl iniplenientation not only have the netmork-

relatecl fcaturcs of multiple estensible services. niultiple iisers' supporting. protocol

lianclier. and content handler. but also have the trançcoding-related featiires of web

page transcoding. aclap t ive t rariscoding policy. content analusis, HTSIL parsing,

active conterit filtering and HT'r IL writcr. The implementation also features object-

orientecl programming technique and portability for ail major operating systems.

a Esperiments are carried out ancl the rrsiilts show that web page transcoding has

an excellent performance and efficiency (as fast as 5% of the original domnloading

time). mi th the value added that the transcoded web page still allows the client to

recognize al1 the images mell.

a Some limitations are esisting for the proposed transcoding proxy systeni: the sys-

tem needs to be added with more server functionso including server security and

maintenance, caching and searching capability. and resource management: the sys-

tem needs to acld coding and decoding functions to cope with JPEG images: the

system needs to add video stream transcoding functions so that the mobile user

can view video content inside a meb page.

Some future work can be done to improve the proxy server system:

-4. Test Images 9- liéb page 93

r Siniiilation of the server. the pros? serl-er. and the client running on different hosts

cari be testecl to nieasiire the transcotlirig time del*. Yoiv the senver. the prosy

server. and the client are runriing on the same host. to clo simulation. which nieans

they are sliaring the entire CPC tinie. Also tirrie estimation can be carriecl out in

ai1 adaptive w- itself. This means the systeni can "learn" to know the best esti-

mation factors after doing some trariscoding ancl recording the time. Also fiirther

esperinients on the time sharing of niiilti-client shoulcl be carriecl out.

a Some server furictions can be added. for esarriple. server security and maintenance.

caching ancl seardiing cüpability. and resource management can be addecl. Now

that the core parts of the transcocling proxy server are alreacly set up. one cari have

an option to iniplenient these furictiotis with al1 the esisting parts working properl.

The ability of coping with other image formats can be atlcleci. for example. coping

with JPEG images can be aclded mitliout miich difficulty to t lie proposecl prosy

systern. h o t her interesting issue is to let the prosy capable of coping witti tideo

Stream.

Appendix A

Test Images & Web page

Free Fax P h s

Figure -1.2: atlv-i4Xyalioo.;iE

-4. Test Images Yrkb page

-4. Test Images Keb page

Figure -1.9: gold.$;if (scaled b - 0.3 x 0.3)

-4. Test Images 9. I lëb page

Figire A. II): annoiiiicer-gif

A. E s t images S; Iléb page

-4. T e s lrnages & I léb page

A. Tesc Images Si Iléb page

Figure A. 13: giianyin.gif

-4. Test Images 22 Iléb page

Figure A. 14: inlbag-yalicio.gif

1 3 IVEKSl'l'Y OF TORONTO

-4. Test Images 9- W b page

Figure -4.20: jordan-poster3.gif

-4. Test Images k N b page

Figure -4.21: kitls-sif

A. Test biges 9r Iléb page

Figure X.22: map-gif

-4. Test Images k Iléb page

B. Transcuded web pages

F iy irr* -4.24: .;rock. yif

B. Transeodecl web pages

i , .-' ,, . ..., ini r 8 , i. i.." r - 4

.di , L.. *. -.,. ... .., ri-- ., S.,-"

Figure -1.26: Original iveb page (scaled by 0.4 x 0.4)

Appendix B

Transcoded web page

B. Transcoded web pages

This is Charles!

Department of Electrical and Computer Engineering

Welcome

The Communications Group is one of sevent research groups in the Depanment of Electricd & Computer Eneineering at the University of Toronto. We undenrike research and gmduate study in the areris of telecommunications and signal pm-essing. We hope thrit this page will provide infornation about Our Group's activities.

Michael Jordan The gresitest basketbail player ever. may someday use a tmnscoding proxy semer to browse the Internet. Isn't chat fmtastic?

Figure B.1: Transcoded web page (ETP) (scaled by 0.8 x O.S)(for "CON":R=0.315 C=4;

for other content: R=0.625 C=4

B. Transcoded rr-eb pages

This is Clrarles!

Figire B.?: Trauscoded web page (ETP) (scaled by 0.8 x 0.3) (colored-for "COS":R=0.375

C=4: for other content: R=0.625 C=4)

B. Transcoded rreb pages

Department of Electrica! and Cornputer Engineering

Hello, Welcome

This i s Charles! The Communications Croup is one of sevenl tesesirch groups in the Department of Electric;il& ~ o m ~ u r c r Engineering at the Ünivenity of Toronto. We undertake research and graduate study in the a r e s of telecommunicritions and signal pmcessing. We hope that this page will provide infonnation about our Gmup's activities.

Fncultv *&&ItJ&

r . 3 r - - - Michael Jordan The greatest basketbal1 plriyer ever. may someday use a transcoding proxy semer to browse the Internet. Isn't chat fantastic?

Figure B.3: Transcoded web page (ESRR with lOs)(scaled by 0.8 x 0.8)

B. Transcoded web pages

This is Charles!

Department of Electrical and Computer Engineering

Welcome

The Communications Croup is one of sevenl reserirch groups in the Department of Electricril Sr Computer Engineering rit the University of Toronto. We undertake resesrch and gnduate study in the areru of relecomrnunications and signal processing. We hope thrit this page wili provide information about our Groupes rictiviries.

Michael Jordan The greatest basketbal1 player ever. may sorneday use a tnnscoding proxy server to browse the Internet. isn't that fantristic'?

Figure BA: Transcoded web page (ESRR with I%)(scaled by 0.8 x 0.8)

B. Transcoded web pages

He llo,

This is Charles!

Department of Electrical and Computer Engineering

Welcome

The Communications Group is one of sevenl research proups in the Department of Electricd & Computer Engineering at the Uniyersity of Toronto. We undenrike reserirch and graduate study in the areas of telecommunications and signal processing. We hope that this page will provide information about our Group's rictivities.

Michael Jordan The greatest brisketbrill plriyer ever. mriy somedriy use a transcoding proxy semer to browse the Intemet. Isn't chat Fantristic'?

Figure B.3: Transcoded web page (ESRR with 30s) (scaled by 0.8 x 0.8)

B. Transcoded w-eb pages

Department of Electrical and Computer Engineering

Hello, Welcome

This is Charles! The Communications Group is one of sevenI research groups in the Depmment of Electricd & Computer Engineering at the University of Toronto. We undenake resesirch and gnduate study in the areas of teIecommuniçritions and signal processing. We hope that this page will provide information about our Group's activities.

Fricultv StudentS

MichaeI Jordan The greritest buketball player ever. may someday use a

@=* tmnscoding proxy semer to browse the Intemet. Isn't that - .!.-;-. .? 3- i-' -.& fantristic?

. -- - . . ... - .* . . 8 +' 4 '"f '- ; .:;"*. .. - .--

L : - - - . . -- . * * . , . r i -*- . I.. /_ .- LI7--

Figure B.6: Transcoded web page (ESRR with '25s)(scaled by 0.8 x 0.8)

B. Transcoded web pages

He&,

This is Charles!

Communications Group

Department of Electrical and Cornputer Engineering

The Communications Croup is one of sevenl researc h groups in the Deplutment of Electrical& Computer Engineering at the University of Toronto. We undenake research and gnduate study in the arelis of telecommunications and signal processing. We hope that this page wilI provide information about our Group's activities.

Students

C

Michaei Jordan The greatest basketbail player ever, may sorneday use a transcoding proxy server to bmwse the Internet. Isn't that fantastic?

Figure 8.7: Transcoded web page (ESRR with 30s)(scaIed €y 0.8 x 0.8)

B. Transcoded web pages

Department of Electrical and Computer Engineering

Hello,

This ts Charles!

Welcome

The Communications Group is one of sevenI research groups in the Department of Electricsil& Cornputer Engineering at the University of Toronto. We undertake research and graduate study in the a r e s of telecommunicritions and signal processing. We hope that this page will pmvide information about Our Gmup's activities.

j 3 ~ u l t v Students

Michael Jordan The grratest basketball player ever. may somedriy use a transcoding proxy server to browse the Intemet. Isn't chat fantristic'?

Figure B.8: Transcoded web page (ESRR with 35s) (scaled by 0.8 x 0.8)

Bibliography

[l] Guido X I . S huster. etc.. 5patial ly disjoint source channel coding: taking advan-

tage of the current dial-up architectiire for vicleo over the Internet". IEEE Proc.

International Conference on Image Processing. Vol. 3. p. 17. Oct. 1998

[2] .J. R. Smith. etc.. ''Content-basecl transcocling of images in the Internet". Proc. IEEE

International Conference on Image Processing. Vol. 3. Oct. 1998

(31 Palm \'II. h t tp://w~~wvw.palm.com/procliicts/palmvii/serviceplans. htrnl

(41 W. Pennebaker. J . '\.Iitchell. ".IPEG still image compression standard". Chapman Sr

Hall. 1993

[5] -4. Ortega. etc.. *'Frorn digitized images to online catalogs. data mining a sky survey".

.\I Slagazine. Arnerican r\ssociation for Artificial Intelligence (AAAI). Summer 1996

[61 Richard Han. etc.. .bDynamic Adaptation in an image transcoding prosy for mobile

web bromsing'. IEEE Personal Communications. Dec. 1998

[Tl T. Kostas. etc.. "Real-tinie voice over packet switched networks"? IEEE Network

magazine. vol. 12. pplS 27. Feb., 1998

[SI Harini Bharadi-aj, etc.: "An active transcoding proxy to support mobile web access",

IEEE Proc. Symposium on Reliable Distributed Systems. 1998, pp118 123

[9] 11. Neison. .J. Gailly. T h e data coriipression book". 2ncl ecl.. SI S T Books. 1996

[IO] A. Fox. etc.. ..Retlucing W i V W latency and bandaidt h requirernents by real-t iine

tlistillations". Fifth International Worlcl Wide Uéb Conference. SI- 1996

[Il] SI. Liljeberg. etc.. --Entiancecl services for World-Wcie Héb in niobile wan environ-

nient". report C-1996-28 April 1996

(121 Markku Kojo. etc.. . A n efficient transport service for slow wireless telephone links".

IEEE Journal on Selectecl Areas in Conirnunications. Vol. 15. 30. 7. pp1337 13-18.

Sept. 1997

[13] Intel Corporatiori. %tel cpick web tectinology: white paper".

http://w~viv.intel.coni/qi~ickweb/mhite.htni.

[l4I S pyglass-Prisni. ht t p://~viv~r..spyglass/procliicts/prisni

[lj] IBhI Corporation. "Ringing in wireless services: web access withoitt wires",

h t tp://at~tv.ibni.com/stories/ 1997/0S/aireless. ht ml

[16] IBSI Corporation. http://~~rvw.almnden.ibm.coni/cs/rvbi/incles.html

[li] IBhI Corporation. ht t p://mww-4.ibm.com/software/webservers/ t ranscocling

[lSI IBM Corporation. http://www.almanden.ibm.com/cs/1vbi/papers/chi9ï/~~~bipaper.html

[19] IBN Corporation. ht tp://~viv~v.edmark.coni/prod/kdis/

.J. R. Smith. etc.. Transcoding internet content for heterogeneous client devices".

Proc. IEEE International Symposium on Circuits and Systems. Vol. 3. June 1998

Rakesh Uohan. etc.. %iapt ing multimedia Internet content for universal access" . IEEE Trans. on Siultimedia? Vol. 1: No. 1: March 1999

[22] C. Brooks. etc.. *Applicat ion-specific prosy servers as HTTP stream transducers" .

Proc. \V\VIV-4, Boston. ht t p://ww~v.~v3.org/piib/conferences/wivn4/papers/56.

Slay 1996

[23] Ari Luotonen. W e b Pros? Servers'? . Prentice Hall. 1998

1241 .'HyperTest Transfer Protocol-HTri'P/ 1.O". KFC 1945

[25] -HyperText Transfer Protocol-HTTP/l. 1". RFC 2068

[26] CIw. Richard Stevens. T N I S netnork programniing-Networking APIS: Sockets ancl

STI" . 1998. Prentice Ha11 PTR

[27] Elliotte Riisty Harold. ".Java Network Progreniming" . 1997. O'REILLY

[?SI Postel. J . B.. "Transniission Control Protocol". RFC 793. 1951

[29] http://java.sun.coni/. .Java 2 Platforni ;\PI Specification

[30] Elizabeth Castro. "HTSLL for the Worltl Wicie W e b . 2nd Ecl.. Peachpit Press. 1997

[31] Valerie Quercia. "Internet in a nutshell". O'Reilly. 1997

[32] .John Zukowski "dava AWT reference". O'REILLY. 1997

(331 Alberto Leon-Garcia. "Probability and random process for electrical engineering",

Addison-Wesley. 1994

[34] Bruce Eckel. "Thinking in dava". Prentice-Hall Canada. 1998

[35] Xancy .J. Yeager. Robert E. SlcGrath. 9Veb semer technology: the advanced guide

for world web information providers.'. Morgan Kaufmann Publisherso 1998